JP2018194810A

JP2018194810A - Device controlling method and electronic apparatus

Info

Publication number: JP2018194810A
Application number: JP2017157389A
Authority: JP
Inventors: ウンジョンチェー; Eon Joung Choi; ジヒパク; Jee Hee Park
Original assignee: Line Corp; Naver Corp
Current assignee: Z Intermediate Global Corp; Naver Corp
Priority date: 2017-05-15
Filing date: 2017-08-17
Publication date: 2018-12-06
Anticipated expiration: 2037-08-17
Also published as: KR102025391B1; KR20180125241A; JP6731894B2

Abstract

To provide a technique for device control according to a user's talk position.SOLUTION: A device controlling method with an electronic apparatus comprising a sound board interface includes the steps of: receiving instruction words inputted by a user's voice through a sound board interface; obtaining positional information relevant to voice input when positional information is required as a result of the analysis of the instruction words; and specifying a device to be controlled among devices the electronic apparatus can control, by using the positional information, and then making the device to be controlled perform operation corresponding to the instruction words.SELECTED DRAWING: Figure 5

Description

以下の説明は、ユーザの発話位置によるデバイス制御に関する技術であって、より詳細には、音声入力に該当する命令語からユーザの発話位置を追加的で判断してデバイス動作やコンテンツ提供を制御するデバイス制御方法およびシステム、コンピュータと結合してデバイス制御方法をコンピュータに実行させるためにコンピュータで読み取り可能な記録媒体に格納されたコンピュータプログラムとその記録媒体等に関する。 The following description relates to device control based on the user's utterance position, and more specifically, controls the device operation and content provision by additionally determining the user's utterance position from the command word corresponding to the voice input. BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a device control method and system, a computer program stored in a computer-readable recording medium for causing the computer to execute the device control method in combination with a computer, and the recording medium.

ホームネットワークサービスの人工知能スピーカのように音声を基盤として動作するインタフェースは、マイク（ｍｉｃｒｏｐｈｏｎｅ）を通じてユーザの音声入力を受信し、受信した音声入力に基づいてデバイス動作やコンテンツ提供を制御することができる。 An interface operating based on voice, such as an artificial intelligence speaker of a home network service, can receive a user's voice input through a microphone and control device operation and content provision based on the received voice input. .

例えば、特許文献１（公開日２０１１年１２月３０日）は、ホームメディアデバイスおよびこれを利用したホームネットワークシステムと方法に関する技術であって、ホームネットワークサービスにおいて移動通信網の他にＷｉ−Ｆｉ（ワイファイ）のような第２通信網を利用してホームネットワークサービスを提供することが可能であり、ホーム内の複数のマルチメディア機器をユーザのボタン操作がなくても音声命令にしたがって多重制御することができる技術を開示している。 For example, Patent Literature 1 (publication date: December 30, 2011) is a technology relating to a home media device and a home network system and method using the home media device. In home network services, Wi-Fi (in addition to a mobile communication network) is disclosed. It is possible to provide a home network service using a second communication network such as Wi-Fi, and to multiplex-control a plurality of multimedia devices in the home according to a voice command without any user button operation. The technology which can do is disclosed.

このような従来技術では、制御対象デバイスが複数ある場合、予め登録されたデバイス名称を用いた発話によってデバイス制御が可能となる。しかし、同じ役割をするデバイスが複数ある場合には、それぞれのデバイス名称を区分して登録し、更に、登録されたものをユーザがすべて記憶しなければならないという困難がある。主に音声を基盤として動作し、視覚情報は使用しないため、複数のオプションからの選択が必要な場合には、具体的な要求事項や命令形態などを言葉で構造化して表現しなければならないという問題がある。 In such a conventional technique, when there are a plurality of devices to be controlled, device control can be performed by utterance using a device name registered in advance. However, when there are a plurality of devices having the same role, there is a difficulty that the device names must be classified and registered, and further, the user must memorize all the registered devices. Since it mainly operates on the basis of sound and does not use visual information, when it is necessary to select from a plurality of options, specific requirements and instruction forms must be structured and expressed in words. There's a problem.

韓国公開特許第１０−２０１１−０１３９７９７号公報Korean Published Patent No. 10-2011-0139797

口語体形態で容易に表現可能な発話形態（例えば、くだけた会話口調）を把握し、ユーザの意図に合ったデバイスを制御することができるデバイス制御方法およびシステム、コンピュータと結合してデバイス制御方法をコンピュータに実行させるためにコンピュータで読み取り可能な記録媒体に格納されたコンピュータプログラムとその記録媒体を提供する。 A device control method and system capable of grasping an utterance form (for example, a simple conversational tone) that can be easily expressed in a colloquial form and controlling a device suitable for the user's intention, and a device control method combined with a computer A computer program stored in a computer-readable recording medium and the recording medium are provided to be executed by a computer.

会話の脈略を把握し、ユーザの意図に合ったデバイスを制御することができるデバイス制御方法およびシステム、コンピュータと結合してデバイス制御方法をコンピュータに実行させるためにコンピュータで読み取り可能な記録媒体に格納されたコンピュータプログラムとその記録媒体を提供する。 A device control method and system capable of grasping a conversational strategy and controlling a device suitable for the user's intention, and a computer-readable recording medium for causing the computer to execute the device control method in combination with a computer A stored computer program and its recording medium are provided.

デバイス使用パターンおよび履歴に基づき、ユーザ意図に合ったデバイスを制御することができるデバイス制御方法およびシステム、コンピュータと結合してデバイス制御方法をコンピュータに実行させるためにコンピュータで読み取り可能な記録媒体に格納されたコンピュータプログラムとその記録媒体を提供する。 Device control method and system capable of controlling device according to user intention based on device usage pattern and history, and stored in computer-readable recording medium for causing computer to execute device control method in combination with computer Computer program and its recording medium are provided.

音声基盤インタフェースを含む電子機器のデバイス制御方法であって、前記音声基盤インタフェースを通じてユーザの音声入力による命令語を受信する段階、前記命令語の分析の結果、位置情報が必要な場合、前記音声入力と関連する位置情報を取得する段階、および前記位置情報を利用して、前記電子機器が制御することが可能なデバイスのうちから制御対象デバイスを特定し、前記命令語に対応する動作を制御対象デバイスに実行させる段階を含む、デバイス制御方法を提供する。 A device control method for an electronic device including a voice infrastructure interface, wherein a command word by a user's voice input is received through the voice infrastructure interface, and if the position information is required as a result of the analysis of the command word, the voice input Obtaining position information related to the device, and using the position information to identify a control target device from devices that can be controlled by the electronic device, and to control an operation corresponding to the command word A device control method is provided that includes causing a device to execute.

コンピュータと結合して前記デバイス制御方法をコンピュータに実行させるためにコンピュータで読み取り可能な記録媒体に格納された、コンピュータプログラムを提供する。 A computer program stored in a computer-readable recording medium for causing the computer to execute the device control method in combination with a computer is provided.

前記デバイス制御方法をコンピュータに実行させるためのプログラムが記録されている、コンピュータで読み取り可能な記録媒体を提供する。 Provided is a computer-readable recording medium on which a program for causing a computer to execute the device control method is recorded.

電子機器であって、音声基盤インタフェース、およびコンピュータで読み取り可能な命令を実行するように実現される少なくとも１つのプロセッサを含み、前記少なくとも１つのプロセッサは、前記音声基盤インタフェースを通じてユーザの音声入力による命令語を受信し、前記命令語の分析の結果、位置情報が必要な場合、前記音声入力と関連する位置情報を取得し、前記位置情報を利用して、前記電子機器が制御することが可能なデバイスのうちから制御対象デバイスを特定し、前記命令語に対応する動作を制御対象デバイスに実行させる、電子機器を提供する。 An electronic device comprising a voice based interface and at least one processor implemented to execute computer readable instructions, wherein the at least one processor is a command by user voice input through the voice based interface When the position information is required as a result of analyzing the command word, the position information related to the voice input can be acquired and the electronic device can be controlled using the position information. Provided is an electronic device that identifies a control target device from among the devices and causes the control target device to execute an operation corresponding to the command word.

音声入力に該当する命令語にしたがってユーザの発話位置を追加的に判断し、命令語に適合するデバイスを選択することができる。 The user's utterance position can be additionally determined according to the command word corresponding to the voice input, and a device that matches the command word can be selected.

命令語に指示代名詞などが含まれる場合、ユーザの位置に基づき、ユーザの意図に合ったデバイスを推定することができる。 When a command pronoun is included in the command word, a device that matches the user's intention can be estimated based on the user's position.

会話脈略からユーザの位置を推定し、ユーザの意図に合ったデバイスを推定することができる。 A user's position can be estimated from the conversational abbreviation, and a device suitable for the user's intention can be estimated.

デバイス使用パターンおよび履歴に基づき、ユーザの意図に合ったデバイスを推定することができる。 Based on the device usage pattern and history, it is possible to estimate a device that matches the user's intention.

本発明の一実施形態における、音声基盤インタフェースを活用したサービス環境の例を示した図である。It is the figure which showed the example of the service environment using the audio | voice infrastructure interface in one Embodiment of this invention. 本発明の一実施形態における、音声基盤インタフェースを活用したサービス環境の他の例を示した図である。It is the figure which showed the other example of the service environment using the voice infrastructure interface in one Embodiment of this invention. 本発明の一実施形態における、クラウド人工知能プラットフォームの例を示した図である。It is the figure which showed the example of the cloud artificial intelligence platform in one Embodiment of this invention. 本発明の一実施形態における、電子機器およびサーバの内部構成を説明するためのブロック図である。It is a block diagram for demonstrating the internal structure of the electronic device and server in one Embodiment of this invention. 本発明の一実施形態における、デバイス制御方法の例を示したフローチャートである。It is the flowchart which showed the example of the device control method in one Embodiment of this invention. 本発明の一実施形態における、指示代名詞が含まれた音声入力にしたがって制御対象デバイスを決定する例を示した図である。It is the figure which showed the example which determines a control object device according to the audio | voice input in which the indication pronoun was included in one Embodiment of this invention. 本発明の一実施形態における、指示代名詞が含まれた音声入力にしたがって制御対象デバイスを決定する例を示した図である。It is the figure which showed the example which determines a control object device according to the audio | voice input in which the indication pronoun was included in one Embodiment of this invention. 本発明の一実施形態における、位置や指示対象が省略された音声入力にしたがって制御対象デバイスを決定する例を示した図である。It is the figure which showed the example which determines a control object device according to the audio | voice input by which position and the instruction | indication object were abbreviate | omitted in one Embodiment of this invention. 本発明の一実施形態における、位置や指示対象が省略された音声入力にしたがって制御対象デバイスを決定する例を示した図である。It is the figure which showed the example which determines a control object device according to the audio | voice input by which position and the instruction | indication object were abbreviate | omitted in one Embodiment of this invention. 本発明の一実施形態における、ユーザ行動パターン学習に基づいて制御対象デバイスを決定する例を示した図である。It is the figure which showed the example which determines a control object device based on user action pattern learning in one Embodiment of this invention.

以下、実施形態について、添付の図面を参照しながら詳しく説明する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

本発明の実施形態に係るデバイス制御システムは、音声を基盤として動作するインタフェースを提供する電子機器によって実現されてよい。電子機器は、発話者の音声入力による命令語から発話者の位置を追加的に判断してデバイス動作やコンテンツ提供を制御してよい。ここで、電子機器は、命令語に指示代名詞が含まれるか、命令語が不完全な文章で構成される場合、ユーザの位置に基づいてユーザの意図に合ったデバイスを推定してよい。さらに、電子機器は、発話者の音声入力による会話脈略を把握するか、デバイス使用パターンやデバイス制御履歴を把握することで、ユーザ意図に合ったデバイスを推定してよい。 The device control system according to the embodiment of the present invention may be realized by an electronic device that provides an interface that operates based on audio. The electronic device may additionally determine the position of the speaker from the command word by the voice input of the speaker and control device operation and content provision. Here, the electronic device may estimate a device that matches the user's intention based on the position of the user when the command word includes an instructional pronoun or the command word includes an incomplete sentence. Furthermore, the electronic device may estimate a device that matches the user's intention by grasping a conversational strategy by a voice input of a speaker or grasping a device usage pattern and a device control history.

本発明の実施形態に係るデバイス制御方法は、上述した電子機器で実行されてよい。ここで、電子機器には、本発明の一実施形態に係るコンピュータプログラムがインストールおよび駆動されてよく、電子機器は、駆動するコンピュータプログラムの制御にしたがって本発明の一実施形態に係るデバイス制御方法を実行してよい。上述したコンピュータプログラムは、コンピュータによって実現される電子機器と結合してデバイス制御方法をコンピュータに実行させるためにコンピュータで読み取り可能な記録媒体に格納されてよい。 The device control method according to the embodiment of the present invention may be executed by the electronic apparatus described above. Here, the electronic apparatus may be installed and driven with the computer program according to the embodiment of the present invention, and the electronic apparatus performs the device control method according to the embodiment of the present invention according to the control of the computer program to be driven. May be executed. The computer program described above may be stored in a computer-readable recording medium in order to cause the computer to execute the device control method in combination with an electronic device realized by the computer.

図１は、本発明の一実施形態における、音声基盤インタフェースを活用したサービス環境の例を示した図である。図１の実施形態では、スマートホーム（ｓｍａｒｔｈｏｍｅ）やホームネットワークサービスのように宅内のデバイスを連結して制御する技術において、音声を基盤として動作するインタフェースを提供する電子機器１００がユーザ１１０の発話「電気を消して」という音声入力を受信して認識および分析し、宅内で電子機器１００と内部ネットワークを介して連携する宅内照明機器１２０のライト電源を制御する例を示している。 FIG. 1 is a diagram illustrating an example of a service environment using a voice infrastructure interface according to an embodiment of the present invention. In the embodiment of FIG. 1, in a technology for connecting and controlling devices in a home such as a smart home or a home network service, an electronic device 100 that provides an interface that operates based on voice is uttered by a user 110. An example of receiving and recognizing and analyzing a voice input “turn off electricity” and controlling a light power source of a home lighting device 120 that cooperates with the electronic device 100 via an internal network in the home is shown.

例えば、宅内のデバイスは、上述した宅内照明機器１２０の他にも、テレビ、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、周辺機器、エアコン、冷蔵庫、ロボット掃除機などのような家電製品はもちろん、水道、電気、冷暖房機器などのようなエネルギー消費装置、ドアロックや監視カメラなどのような保安機器など、オンライン上で連結して制御されることができる多様なデバイスを含んでよい。また、内部ネットワークは、イーサネット（登録商標（Ｅｔｈｅｒｎｅｔ））、ＨｏｍｅＰＮＡ、ＩＥＥＥ１３９４のような有線ネットワーク技術や、ブルートゥース（登録商標（Ｂｌｕｅｔｏｏｔｈ））、ＵＷＢ（ＵｌｔｒａＷｉｄｅＢａｎｄ）、ジグビー（ＺｉｇＢｅｅ）、Ｗｉｒｅｌｅｓｓ１３９４、ＨｏｍｅＲＦのような無線ネットワーク技術などが活用されてよい。 For example, in addition to the above-described home lighting device 120, home devices include not only home appliances such as televisions, PCs (Personal Computers), peripheral devices, air conditioners, refrigerators, robot cleaners, but also water, electricity, and air conditioning. It may include various devices that can be connected and controlled on-line, such as energy consuming devices such as devices, security devices such as door locks and surveillance cameras. The internal network includes wired network technologies such as Ethernet (registered trademark (Ethernet)), HomePNA, and IEEE 1394, Bluetooth (registered trademark (Bluetooth)), UWB (Ultra Wide Band), ZigBee, and Wireless 1394. Wireless network technology such as Home RF may be used.

電子機器１００は、宅内のデバイスのうちの１つであってよい。例えば、電子機器１００は、宅内に備えらえた人工知能スピーカやロボット清掃機などのようなデバイスの１つであってよい。また、電子機器１００は、スマートフォン（ｓｍａｒｔｐｈｏｎｅ）、携帯電話、ノート型パンコン、デジタル放送用端末、ＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔｓ）、ＰＭＰ（ＰｏｒｔａｂｌｅＭｕｌｔｉｍｅｄｉａＰｌａｙｅｒ）、タブレットなどのようなユーザ１１０のモバイル機器であってもよい。このように、電子機器１００は、ユーザ１１０の音声入力を受信して宅内のデバイスを制御するために宅内のデバイスと連結可能な機能を含む機器であれば、特に制限されることはない。また、実施形態によっては、上述したユーザ１１０のモバイル機器が宅内のデバイスとして含まれてもよい。 The electronic device 100 may be one of devices in the home. For example, the electronic device 100 may be one of devices such as an artificial intelligence speaker or a robot cleaner provided in a home. The electronic device 100 is a mobile device of the user 110 such as a smart phone, a mobile phone, a notebook type pancon, a digital broadcasting terminal, a PDA (Personal Digital Assistant), a PMP (Portable Multimedia Player), and a tablet. There may be. As described above, the electronic device 100 is not particularly limited as long as the electronic device 100 includes a function that can be connected to a home device in order to receive the voice input of the user 110 and control the home device. In addition, depending on the embodiment, the mobile device of the user 110 described above may be included as a home device.

図２は、本発明の一実施形態における、音声基盤インタフェースを活用したサービス環境の他の例を示した図である。図２は、音声を基盤として動作するインタフェースを提供する電子機器１００がユーザ１１０の発話「今日の天気」という音声入力を受信して認識および分析し、外部ネットワークを介して外部サーバ２１０から今日の天気に関する情報を取得し、取得した情報を「今日の天気は・・・」のように音声で出力する例を示している。 FIG. 2 is a diagram illustrating another example of a service environment using a voice infrastructure interface according to an embodiment of the present invention. FIG. 2 shows that an electronic device 100 that provides a voice-based interface receives and recognizes and analyzes the speech input of the user 110 utterance “Today's Weather” from an external server 210 via an external network. An example is shown in which information related to the weather is acquired and the acquired information is output by voice such as “Today's weather is ...”.

例えば、外部ネットワークは、ＰＡＮ（ｐｅｒｓｏｎａｌａｒｅａｎｅｔｗｏｒｋ）、ＬＡＮ（ｌｏｃａｌａｒｅａｎｅｔｗｏｒｋ）、ＣＡＮ（ｃａｍｐｕｓａｒｅａｎｅｔｗｏｒｋ）、ＭＡＮ（ｍｅｔｒｏｐｏｌｉｔａｎａｒｅａｎｅｔｗｏｒｋ）、ＷＡＮ（ｗｉｄｅａｒｅａｎｅｔｗｏｒｋ）、ＢＢＮ（ｂｒｏａｄｂａｎｄｎｅｔｗｏｒｋ）、インターネットなどのネットワークうちの１つ以上の任意のネットワークを含んでよい。 For example, the external network includes a PAN (personal area network), a LAN (local area network), a MAN (metropolitan area network), a WAN (wide area network, etc.), a WAN (wide area network, etc.), and a WAN (wide area network, etc.). One or more of any of the networks may be included.

図２の実施形態でも、電子機器１００は、宅内のデバイスの１つであるか、ユーザ１１０のモバイル機器の１つであってよく、ユーザ１１０の音声入力を受信して処理するための機能と、外部ネットワークを介して外部サーバ２１０に接続して外部サーバ２１０が提供するサービスやコンテンツをユーザ１１０に提供するための機能を含む機器であれば、特に制限されることはない。 Also in the embodiment of FIG. 2, the electronic device 100 may be one of the devices in the home or one of the mobile devices of the user 110, and the function for receiving and processing the voice input of the user 110 and Any device including a function for connecting to the external server 210 via an external network and providing the user 110 with services and contents provided by the external server 210 is not particularly limited.

このように、本発明の実施形態に係る電子機器１００は、音声基盤インタフェースを通じてユーザ１１０の発話によって受信する音声入力を少なくとも含むユーザ命令を処理することができる機器であれば、特に制限されることはない。例えば、電子機器１００は、ユーザの音声入力を直接に認識および分析して音声入力に適した動作を実行することによってユーザ命令を処理してもよいが、実施形態によっては、ユーザの音声入力に対する認識や認識された音声入力の分析、ユーザに提供される音声の合成などの処理を、電子機器１００と連係する外部のプラットフォームで実行することも可能である。 As described above, the electronic device 100 according to the embodiment of the present invention is particularly limited as long as it is a device that can process a user command including at least a voice input received by the speech of the user 110 through the voice based interface. There is no. For example, the electronic device 100 may process user instructions by directly recognizing and analyzing the user's voice input and performing operations suitable for the voice input, but in some embodiments, Processing such as recognition, analysis of recognized speech input, and speech synthesis provided to the user can be executed on an external platform linked to the electronic device 100.

図３は、本発明の一実施形態における、クラウド人工知能プラットフォームの例を示した図である。図３は、電子機器３１０、クラウド人工知能プラットフォーム３２０、およびコンテンツ・サービス３３０を示している。 FIG. 3 is a diagram illustrating an example of a cloud artificial intelligence platform according to an embodiment of the present invention. FIG. 3 shows an electronic device 310, a cloud artificial intelligence platform 320, and a content service 330.

一例として、電子機器３１０は、宅内に備えられるデバイスを意味してよく、少なくとも上述した電子機器１００を含んでよい。このような電子機器３１０や電子機器３１０にインストールおよび駆動されるアプリケーション（以下、アプリ）は、インタフェースコネクト３４０を通じてクラウド人工知能プラットフォーム３２０と連係してよい。ここで、インタフェースコネクト３４０は、電子機器３１０や電子機器３１０にインストールおよび駆動されるアプリの開発のためのＳＤＫ（ＳｏｆｔｗａｒｅＤｅｖｅｌｏｐｍｅｎｔＫｉｔ）および／または開発文書を開発者に提供してよい。また、インタフェースコネクト３４０は、電子機器３１０や電子機器３１０にインストールおよび駆動されるアプリがクラウド人工知能プラットフォーム３２０の提供する機能を活用することができるＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍＩｎｔｅｒｆａｃｅ）を提供してよい。具体的な例として、開発者は、インタフェースコネクト３４０が提供するＳＤＫ（ＳｏｆｔｗａｒｅＤｅｖｅｌｏｐｍｅｎｔＫｉｔ）および／または開発文書を利用して開発した機器やアプリは、インタフェースコネクト３４０が提供するＡＰＩを利用してクラウド人工知能プラットフォーム３２０が提供する機能を活用できるようになる。 As an example, the electronic device 310 may mean a device provided in a home, and may include at least the electronic device 100 described above. Such an electronic device 310 and an application installed and driven on the electronic device 310 (hereinafter referred to as an application) may be linked to the cloud artificial intelligence platform 320 through the interface connect 340. Here, the interface connect 340 may provide the developer with an SDK (Software Development Kit) and / or development document for developing the electronic device 310 or an application installed and driven in the electronic device 310. Further, the interface connect 340 may provide an application program interface (API) that allows the electronic device 310 or an application installed and driven to the electronic device 310 to utilize the function provided by the cloud artificial intelligence platform 320. As a specific example, a developer can use a device and application developed using SDK (Software Development Kit) provided by the interface connect 340 and / or a development document in the cloud using an API provided by the interface connect 340. The functions provided by the artificial intelligence platform 320 can be utilized.

ここで、クラウド人工知能プラットフォーム３２０は、音声基盤のサービスを提供するための機能を提供してよい。例えば、クラウド人工知能プラットフォーム３２０は、受信した音声を認識し、出力する音声を合成するための音声処理モジュール３２１、受信した映像や動画を分析して処理するためのビジョン処理モジュール３２２、受信した音声にしたがって適合する音声を出力するために適切な会話を決定するための会話処理モジュール３２３、受信した音声に適した機能を勧めるための推薦モジュール３２４、人工知能がデータ学習に基づいて文章単位で言語を翻訳するように支援する人工神経網基盤機械翻訳（ＮｅｕｒａｌＭａｃｈｉｎｅＴｒａｎｓｌａｔｉｏｎ：ＮＭＴ）３２５などのように、音声基盤サービスを提供するための多様なモジュールを含んでよい。 Here, the cloud artificial intelligence platform 320 may provide a function for providing a voice-based service. For example, the cloud artificial intelligence platform 320 recognizes the received voice and synthesizes the voice to be output, the voice processing module 321 for analyzing the received video and video, and the received voice. Conversation processing module 323 for determining an appropriate conversation for outputting a sound that conforms to the language, a recommendation module 324 for recommending a function suitable for the received speech, and a language in which artificial intelligence is based on data learning in sentence units Various modules for providing voice-based services may be included, such as an artificial neural network-based machine translation (NMT) 325 that assists in translating.

例えば、図１および図２の実施形態において、電子機器１００は、ユーザ１１０の音声入力を、インタフェースコネクト３４０から提供されるＡＰＩを利用してクラウド人工知能プラットフォーム３２０に送信してよい。この場合、クラウド人工知能プラットフォーム３２０は、受信した音声入力を上述したモジュール３２１〜３２５を活用して認識および分析してよく、受信した音声入力にしたがって適切な返答音声を合成して提供したり、適切な動作を勧めたりしてよい。 For example, in the embodiment of FIGS. 1 and 2, the electronic device 100 may transmit the voice input of the user 110 to the cloud artificial intelligence platform 320 using an API provided from the interface connect 340. In this case, the cloud artificial intelligence platform 320 may recognize and analyze the received voice input using the above-described modules 321 to 325, synthesize and provide an appropriate response voice according to the received voice input, Appropriate action may be recommended.

また、拡張キット３５０は、第三者のコンテンツ開発者または会社がクラウド人工知能プラットフォーム３２０を基盤として新たな音声基盤機能を実現することができる開発キットを提供してよい。例えば、図２の実施形態において、電子機器１００は、受信したユーザ１１０の音声入力を外部サーバ２１０に送信してよく、外部サーバ２１０は、拡張キット３５０から提供されるＡＰＩを通じてクラウド人工知能プラットフォーム３２０に音声入力を送信してよい。この場合、上述した説明と同じように、クラウド人工知能プラットフォーム３２０は、受信した音声入力を認識および分析して適切な返答音声を合成して提供したり、音声入力にしたがって処理されるべき機能に関する推薦情報を外部サーバ２１０に提供したりしてよい。一例として、図２において、外部サーバ２１０は、音声入力「今日の天気」をクラウド人工知能プラットフォーム３２０に送信してよく、クラウド人工知能プラットフォーム３２０から、音声入力「今日の天気」の認識によって抽出されるキーワード「今日の」および「天気」を受信してよい。この場合、外部サーバ２１０は、キーワード「今日の」および「天気」に基づいて「今日の天気は・・・」のようなテキスト情報を生成し、再びクラウド人工知能プラットフォーム３２０に生成されたテキスト情報を送信してよい。ここで、クラウド人工知能プラットフォーム３２０は、テキスト情報を音声で合成して外部サーバ２１０に提供してよい。外部サーバ２１０は、合成された音声を電子機器１００に送信してよく、電子機器１００は、合成された音声「今日の天気は・・・」をスピーカから出力することにより、ユーザ１１０から受信した音声入力「今日の天気」が処理されてよい。 The extension kit 350 may provide a development kit that allows a third-party content developer or company to realize a new voice infrastructure function based on the cloud artificial intelligence platform 320. For example, in the embodiment of FIG. 2, the electronic device 100 may transmit the received voice input of the user 110 to the external server 210, and the external server 210 transmits the cloud artificial intelligence platform 320 through an API provided from the extension kit 350. Voice input may be sent to In this case, as described above, the cloud artificial intelligence platform 320 recognizes and analyzes the received voice input, synthesizes and provides an appropriate response voice, or relates to a function to be processed according to the voice input. Recommendation information may be provided to the external server 210. As an example, in FIG. 2, the external server 210 may send a speech input “Today's weather” to the cloud artificial intelligence platform 320, which is extracted by recognition of the speech input “Today's weather”. The keywords “today” and “weather” may be received. In this case, the external server 210 generates text information such as “Today's weather is ...” based on the keywords “Today” and “Weather”, and the text information generated in the cloud artificial intelligence platform 320 again. May be sent. Here, the cloud artificial intelligence platform 320 may synthesize text information by voice and provide it to the external server 210. The external server 210 may transmit the synthesized voice to the electronic device 100, and the electronic device 100 receives the synthesized voice “Today's weather ...” from the user 110 by outputting from the speaker. The voice input “Today's weather” may be processed.

このとき、電子機器１００は、音声入力に対応するデバイス動作やコンテンツ提供のために、本発明の実施形態に係るデバイス制御方法を実行してよい。 At this time, the electronic device 100 may execute the device control method according to the embodiment of the present invention in order to provide device operation corresponding to voice input and content provision.

図４は、本発明の一実施形態における、電子機器およびサーバの内部構成を説明するためのブロック図である。図４の電子機器４１０は、上述した電子機器１００に対応してよく、サーバ４２０は、上述した外部サーバ２１０やクラウド人工知能プラットフォーム３２０を実現する１つのコンピュータ装置に対応してよい。 FIG. 4 is a block diagram for explaining the internal configuration of the electronic device and the server in one embodiment of the present invention. The electronic device 410 of FIG. 4 may correspond to the electronic device 100 described above, and the server 420 may correspond to one computer device that implements the external server 210 and the cloud artificial intelligence platform 320 described above.

電子機器４１０とサーバ４２０は、メモリ４１１、４２１、プロセッサ４１２、４２２、通信モジュール４１３、４２３、および入力／出力インタフェース４１４、４２４を含んでよい。メモリ４１１、４２１は、コンピュータで読み取り可能な記録媒体であって、ＲＡＭ（ｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ）、ＲＯＭ（ｒｅａｄｏｎｌｙｍｅｍｏｒｙ）、およびディスクドライブのような永久大容量記憶装置（ｐｅｒｍａｎｅｎｔｍａｓｓｓｔｏｒａｇｅｄｅｖｉｃｅ）を含んでよい。ここで、ＲＯＭとディスクドライブのような永久大容量記憶装置は、メモリ４１１、４２１とは区分される別の永久格納装置として電子機器４１０やサーバ４２０に含まれてもよい。また、メモリ４１１、４２１には、オペレーティングシステム（ＯＳ）と、少なくとも１つのプログラムコード（一例として、電気機器４１０にインストールされて特定のサービスの提供のために電子機器４１０で駆動するアプリケーションなどのためのコード）が格納されてよい。このようなソフトウェア構成要素は、メモリ４１１、４２１とは別のコンピュータで読み取り可能な記録媒体からロードされてよい。このような別のコンピュータで読み取り可能な記録媒体は、フロッピー（登録商標）ドライブ、ディスク、テープ、ＤＶＤ／ＣＤ−ＲＯＭドライブ、メモリカードなどのコンピュータで読み取り可能な記録媒体を含んでよい。他の実施形態において、ソフトウェア構成要素は、コンピュータで読み取り可能な記録媒体ではない通信モジュール４１３、４２３を通じてメモリ４１１、４２１にロードされてもよい。例えば、少なくとも１つのプログラムは、開発者またはアプリケーションのインストールファイルを配布するファイル配布システムがネットワーク４３０を介して提供するファイルによってインストールされるプログラム（一例として、上述したアプリケーション）に基づいて電子機器４１０のメモリ４１１にロードされてよい。 The electronic device 410 and the server 420 may include memories 411 and 421, processors 412 and 422, communication modules 413 and 423, and input / output interfaces 414 and 424. The memories 411 and 421 are computer-readable recording media and include a random access memory (RAM), a read only memory (ROM), and a permanent mass storage device such as a disk drive. It's okay. Here, a permanent mass storage device such as a ROM and a disk drive may be included in the electronic device 410 or the server 420 as another permanent storage device separated from the memories 411 and 421. The memories 411 and 421 include an operating system (OS) and at least one program code (for example, an application installed in the electric device 410 and driven by the electronic device 410 to provide a specific service). May be stored. Such software components may be loaded from a computer-readable recording medium different from the memories 411 and 421. Such another computer-readable recording medium may include a computer-readable recording medium such as a floppy (registered trademark) drive, a disk, a tape, a DVD / CD-ROM drive, and a memory card. In other embodiments, the software components may be loaded into the memories 411 and 421 through the communication modules 413 and 423 that are not computer-readable recording media. For example, the at least one program is based on a program (for example, the application described above) installed by a file distribution system that distributes an installation file of a developer or an application via a network 430. It may be loaded into the memory 411.

プロセッサ４１２、４２２は、基本的な算術、ロジック、および入出力演算を実行することにより、コンピュータプログラムの命令を処理するように構成されてよい。命令は、メモリ４１１、４２１または通信モジュール４１３、４２３によって、プロセッサ４１２、４２２に提供されてよい。例えば、プロセッサ４１２、４２２は、メモリ４１１、４２１のような記録装置に格納されたプログラムコードにしたがって受信される命令を実行するように構成されてよい。 Processors 412, 422 may be configured to process computer program instructions by performing basic arithmetic, logic, and input / output operations. The instructions may be provided to the processors 412, 422 by the memories 411, 421 or the communication modules 413, 423. For example, the processors 412, 422 may be configured to execute instructions received according to program code stored in a recording device such as the memory 411, 421.

通信モジュール４１３、４２３は、ネットワーク４３０を介して電子機器４１０とサーバ４２０が互いに通信するための機能を提供してもよく、電子機器４１０および／またはサーバ４２０が他の電子機器または他のサーバと通信するための機能を提供してもよい。一例として、電子機器４１０のプロセッサ４１２がメモリ４１１のような記録装置に格納されたプログラムコードにしたがって生成した要求が、通信モジュール２１３の制御にしたがってネットワーク４３０を介してサーバ４２０に伝達されてよい。これとは逆に、サーバ４２０のプロセッサ４２２の制御にしたがって提供される制御信号や命令、コンテンツ、ファイルなどが、通信モジュール２２３とネットワーク４３０を経て電子機器４１０の通信モジュール２１３を通じて電子機器４１０に受信されてもよい。例えば、通信モジュール２１３を通じて受信したサーバ４２０の制御信号や命令、コンテンツ、ファイルなどは、プロセッサ４１２やメモリ４１１に伝達されてよく、コンテンツやファイルなどは、電子機器４１０がさらに含むことのできる格納媒体（上述した永久格納装置）に格納されてよい。 The communication modules 413 and 423 may provide a function for the electronic device 410 and the server 420 to communicate with each other via the network 430, and the electronic device 410 and / or the server 420 may communicate with other electronic devices or other servers. A function for communication may be provided. As an example, a request generated by the processor 412 of the electronic device 410 according to a program code stored in a recording device such as the memory 411 may be transmitted to the server 420 via the network 430 under the control of the communication module 213. On the contrary, control signals, commands, contents, files, etc. provided in accordance with the control of the processor 422 of the server 420 are received by the electronic device 410 through the communication module 223 and the network 430 and the communication module 213 of the electronic device 410. May be. For example, the control signal, command, content, file, and the like of the server 420 received through the communication module 213 may be transmitted to the processor 412 and the memory 411, and the storage medium that the electronic device 410 can further include the content, the file, and the like. (The permanent storage device described above).

入力／出力インタフェース４１４は、入力／出力装置４１５とのインタフェースのための手段であってよい。例えば、入力装置は、マイク、キーボード、またはマウスなどの装置を、出力装置は、ディスプレイやスピーカのような装置を含んでよい。他の例として、入力／出力インタフェース４１４は、タッチスクリーンのように入力と出力のための機能が１つに統合された装置とのインタフェースのための手段であってもよい。入力／出力装置４１５は、電子機器４１０と１つの装置で構成されてもよい。また、サーバ４２０の入力／出力インタフェース４２４は、サーバ４２０と連結するかサーバ４２０が含むことのできる入力または出力のための装置（図示せず）とのインタフェースのための手段であってよい。 Input / output interface 414 may be a means for interfacing with input / output device 415. For example, the input device may include a device such as a microphone, a keyboard, or a mouse, and the output device may include a device such as a display or a speaker. As another example, the input / output interface 414 may be a means for interfacing with a device that integrates functions for input and output, such as a touch screen. The input / output device 415 may be configured by the electronic device 410 and one device. The input / output interface 424 of the server 420 may also be a means for interfacing with an apparatus (not shown) for input or output that may be coupled to or included in the server 420.

また、他の実施形態において、電子機器４１０およびサーバ４２０は、図４の構成要素よりも多くの構成要素を含んでもよい。しかし、大部分の従来技術的構成要素を明確に図に示す必要はない。例えば、電子機器４１０は、上述した入力／出力装置４１５のうちの少なくとも一部を含むように実現されてもよいし、トランシーバ、ＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）モジュール、カメラ、各種センサ、データベースなどのような他の構成要素をさらに含んでもよい。より具体的な例として、電子機器４１０がスマートフォンである場合、一般的にスマートフォンが含んでいる加速度センサやジャイロセンサ、カメラモジュール、物理的な各種ボタン、タッチパネルを利用したボタン、入力／出力ポート、振動のための振動器などの多様な構成要素が電子機器４１０にさらに含まれるように実現されてよい。 In other embodiments, electronic device 410 and server 420 may include more components than the components of FIG. However, most prior art components need not be clearly illustrated in the figure. For example, the electronic device 410 may be realized to include at least a part of the input / output device 415 described above, or may be a transceiver, a GPS (Global Positioning System) module, a camera, various sensors, a database, or the like. Other components may also be included. As a more specific example, when the electronic device 410 is a smartphone, generally an acceleration sensor or gyro sensor included in the smartphone, a camera module, various physical buttons, buttons using a touch panel, input / output ports, Various components such as a vibrator for vibration may be further included in the electronic device 410.

本実施形態において、電子機器４１０は、ユーザの音声入力を受信するためのマイクを入力／出力装置４１５として基本的に含んでよく、ユーザの音声入力に対応する返答音声やオーディオコンテンツのような音を出力するためのスピーカを入力／出力装置４１５としてさらに含んでよい。 In the present embodiment, the electronic device 410 may basically include a microphone for receiving the user's voice input as the input / output device 415, and a sound such as a reply voice or audio content corresponding to the user's voice input. May be further included as an input / output device 415.

図５は、本発明の一実施形態における、デバイス制御方法の例を示したフローチャートである。本発明の実施形態に係るデバイス制御方法は、上述した電子機器４１０のようなコンピュータ装置によって実行されてよい。このとき、電子機器４１０のプロセッサ４１２は、メモリ４１１が含むオペレーティングシステムのコードや少なくとも１つのプログラムのコードによる制御命令（ｉｎｓｔｒｕｃｔｉｏｎ）を実行するように実現されてよい。ここで、プロセッサ４１２は、電子機器４１０に格納されたコードが提供する制御命令にしたがい、電子機器４１０が図５のデバイス制御方法に含まれる段階５１０〜５４０を実行するように電子機器４１０を制御してよい。 FIG. 5 is a flowchart illustrating an example of a device control method according to an embodiment of the present invention. The device control method according to the embodiment of the present invention may be executed by a computer apparatus such as the electronic apparatus 410 described above. At this time, the processor 412 of the electronic device 410 may be implemented so as to execute a control instruction (instruction) based on an operating system code included in the memory 411 or a code of at least one program. Here, the processor 412 controls the electronic device 410 so that the electronic device 410 executes steps 510 to 540 included in the device control method of FIG. 5 according to the control command provided by the code stored in the electronic device 410. You can do it.

段階５１０で、電子機器４１０は、音声基盤インタフェースを通じて発話者から音声入力による命令語を受信してよい。例えば、電子機器４１０は、電子機器４１０が含むマイクまたは電子機器４１０と連動するマイクのような音声入力装置を通じてユーザの発話による音声入力を受信してよい。発話者の音声入力を受信する技術は、音声認識に関する周知の技術から当業者が容易に理解できるであろう。 In operation 510, the electronic device 410 may receive a command word by voice input from a speaker through a voice based interface. For example, the electronic device 410 may receive a voice input by a user's utterance through a voice input device such as a microphone included in the electronic device 410 or a microphone linked to the electronic device 410. Techniques for receiving a speaker's voice input will be readily apparent to those skilled in the art from well-known techniques for voice recognition.

段階５２０で、電子機器４１０は、発話者の音声入力によって受信した命令語の分析結果、位置判断が必要な場合、音声入力と関連する発話者の位置情報を取得してよい。一例として、電子機器４１０は、命令語に指示代名詞（例えば、ここ、そこ、あそこ、こちら、そちら、あちら、こっち、そっち、あっち、その場所など）が含まれる場合あるいは命令語が場所や指示対象などが省略された不完全文章で構成される場合、位置判断のための追加的な処理が行われてもよい。したがって、電子機器４１０は、音声入力による命令語だけでは制御対象位置や制御対象デバイスの特定が難しい場合には、命令語に基づいて発話者の位置情報を追加的に取得してよい。電子機器４１０は、位置判断に対する推定が必要な場合に音声入力と関連する発話者の位置情報を取得するが、このとき、音声入力と関連する発話者の位置情報は、音声入力の受信と関連する時点または期間に測定される電子機器４１０に対する発話者の相対的な位置や方向、相対的な位置や方向の変化の有無、および相対的な位置や方向の変化方向のうちの少なくとも１つを含んでよい。 In operation 520, the electronic device 410 may obtain the position information of the speaker related to the voice input when the result of analyzing the command word received by the voice input of the speaker and the position determination is necessary. As an example, the electronic device 410 may include a command pronoun (eg, here, there, over there, here, over there, over there, over there, over there, over there, the location, etc.) In the case where an incomplete sentence is omitted, additional processing for position determination may be performed. Therefore, when it is difficult to specify the position to be controlled and the device to be controlled only by the command word by voice input, the electronic device 410 may additionally acquire the position information of the speaker based on the command word. The electronic device 410 acquires the position information of the speaker related to the voice input when it is necessary to estimate the position determination. At this time, the position information of the speaker related to the voice input is related to the reception of the voice input. At least one of the relative position and direction of the speaker with respect to the electronic device 410 measured at the time or period, the presence / absence of a change in the relative position and direction, and the direction of change in the relative position and direction. May include.

例えば、電子機器４１０は、音声基盤インタフェースが含む複数のマイクに入力される音声入力の位相偏移（ｐｈａｓｅｓｈｉｆｔ）に基づいて音声入力と関連する位置情報を取得してよい。複数のマイクに入力される同じ音信号の位相偏移を利用して音信号の発生位置を測定する技術は、ビームフォーミング（ｂｅａｍｆｏｒｍｉｎｇ）技術のように周知の技術から当業者が簡単に理解できるであろう。この場合、位置情報が発話者の音声入力によって測定されるため、発話者が特定の方向を向いている必要がなく、発話者の発話の認識が可能な距離以内であれば発話者の位置が制限されることもない。また、音声基盤インタフェース以外の別の装置が電子機器４１０に追加される必要はなく、発話者の位置情報を取得することが可能となる。 For example, the electronic device 410 may acquire position information related to voice input based on a phase shift of voice input input to a plurality of microphones included in the voice-based interface. A person skilled in the art can easily understand a technique for measuring a generation position of a sound signal by using a phase shift of the same sound signal inputted to a plurality of microphones from a known technique such as a beam forming technique. Will. In this case, since the position information is measured by the voice input of the speaker, the speaker does not need to face a specific direction, and the speaker's position is within a distance that allows the speaker's speech to be recognized. There is no limit. Further, it is not necessary to add another device other than the voice-based interface to the electronic device 410, and it is possible to acquire the position information of the speaker.

実施形態によっては、発話者の位置情報を取得するために、電子機器４１０がカメラやセンサのような追加装備を含む場合も考慮できるが、音声基盤インタフェースを通じて測定される発話者の位置情報とカメラやセンサのような追加装備を利用して測定される発話者の位置情報が活用されてもよい。カメラおよび／またはセンサを活用する場合、電子機器４１０は、音声入力が受信する場合のカメラおよび／またはセンサの出力値に基づいて音声入力と関連する位置情報を取得してよい。 In some embodiments, it may be considered that the electronic device 410 includes additional equipment such as a camera or a sensor in order to acquire the speaker's position information. The position information of the speaker measured using additional equipment such as a sensor or a sensor may be used. When utilizing a camera and / or sensor, the electronic device 410 may obtain position information associated with the audio input based on the output value of the camera and / or sensor when the audio input is received.

一例として、電子機器４１０は、発話者の位置を視覚的に特定するために光学デバイスをポーリングしてよい。カメラのような光学デバイスは、ユーザを識別するための認識ソフトウェア（例えば、顔認識、特徴認識など）を利用してよい。ここで、ポーリングは、光学デバイスで光学情報を取得することに対し、情報を要求するか又は要求せずに情報を受信することを含んでよい。光学デバイスの他にも、発話者の位置を視覚的に特定するために音声デバイスをポーリングすることも可能である。音声ツールは、事前に記録された音声プロファイルに基づいて音声入力による発話者を識別するために使用されてよい。 As an example, the electronic device 410 may poll the optical device to visually locate the speaker. Optical devices such as cameras may utilize recognition software (eg, face recognition, feature recognition, etc.) to identify the user. Here, polling may include receiving information without or requesting information for obtaining optical information with an optical device. In addition to optical devices, it is also possible to poll the audio device to visually locate the speaker. The voice tool may be used to identify a speaker by voice input based on a pre-recorded voice profile.

さらに、電子機器４１０は、発話者の音声入力による会話脈略やデバイス制御履歴に基づいて音声入力と関連する発話者の位置情報を取得してよい。言い換えれば、電子機器４１０は、発話者の以前会話の脈略やデバイス制御履歴を把握して発話者の相対的な位置や方向、相対的な位置や方向の変化の有無、および相対的な位置や方向の変化方向のうちの少なくとも１つを含む位置情報を取得してよい。例えば、ここ最近、音声を基盤として動作するインタフェースを通じて料理のレシピに関する質問や料理のためのタイマー設定などの動作履歴があれば、該当の動作を指示した音声入力の会話脈略やデバイス制御履歴に基づき、発話者の現在位置あるいは最近の位置が「台所」であることを推定してよい。 Further, the electronic device 410 may acquire the position information of the speaker related to the voice input based on the conversational strategy and the device control history by the voice input of the speaker. In other words, the electronic device 410 grasps the previous conversation of the speaker and the device control history to determine the relative position and direction of the speaker, whether there is a change in the relative position and direction, and the relative position. Alternatively, position information including at least one of the change directions of the direction may be acquired. For example, recently, if there is an operation history such as a question about a recipe for cooking or a timer setting for cooking through an interface that operates based on voice, it will be included in the conversation strategy and device control history of the voice input instructing the corresponding operation Based on this, it may be estimated that the current position or the latest position of the speaker is “kitchen”.

段階５３０で、電子機器４１０は、発話者の音声入力によって受信した命令語に対し、発話者の位置情報に基づいて制御対象デバイスを特定してよい。電子機器４１０は、音声入力による命令語だけでは制御対象位置や制御対象デバイスの特定が難しい場合には、音声入力と関連する発話者の位置情報を基準として発話者の音声入力に適したデバイスを特定してよい。 In operation 530, the electronic apparatus 410 may identify a control target device based on the position information of the speaker with respect to the command word received by the voice input of the speaker. In the case where it is difficult to specify the control target position and the control target device only by the command word by voice input, the electronic device 410 is a device suitable for the voice input of the speaker based on the position information of the speaker related to the voice input. May be specified.

一例として、電子機器４１０は、発話者の音声入力によって受信した命令語に指示代名詞が含まれる場合、発話者の位置情報に基づいて制御対象デバイスを特定してよい。例えば、電子機器４１０は、発話者の音声入力に対応する動作が可能なデバイスのうち、発話者から最も近い位置にあるデバイス、あるいは発話者から最も遠い位置にあるデバイス、あるいは発話者の方向に位置するデバイス、あるいは発話者の方向とは反対の場所に位置するデバイスなどを制御対象デバイスとして特定してよい。一例として、発話者の音声入力によって受信した命令語に「ここ」、「こちら」、「こっち」などのような意味の指示代名詞が含まれる場合には、発話者から最も近い位置にあるデバイスまたは発話者の方向に位置するデバイスを制御対象デバイスとして特定してよい。他の例として、発話者の音声入力によって受信した命令語に「あそこ」、「あちら」、「あっち」などのような意味の指示代名詞が含まれる場合には、発話者から最も遠い位置にあるデバイスまたは発話者の方向とは反対の場所に位置するデバイスを制御対象デバイスとして特定してよい。さらに他の例として、発話者の音声入力によって受信した命令語に「そこ」、「その場所」などのような意味の指示代名詞が含まれる場合には、発話者が最近までいた以前の位置にあるデバイスを制御対象デバイスとして特定してよい。 As an example, when a command pronoun is included in the command word received by the voice input of the speaker, the electronic device 410 may specify the control target device based on the position information of the speaker. For example, the electronic device 410 may be a device that can operate in response to the voice input of a speaker, a device that is closest to the speaker, a device that is farthest from the speaker, or in the direction of the speaker. A device that is located or a device that is located at a location opposite to the direction of the speaker may be specified as the device to be controlled. As an example, if a command word received by voice input of a speaker includes a pronoun meaning such as “here”, “here”, “here”, or the like, the device closest to the speaker or A device located in the direction of the speaker may be specified as a control target device. As another example, when the command word received by the voice input of the speaker includes a pronoun meaning such as “over”, “over”, “over”, etc., it is farthest from the speaker A device located at a location opposite to the direction of the device or the speaker may be specified as the control target device. As another example, if the command word received by the voice input of the speaker includes a pronoun that means “there”, “the place”, etc. A certain device may be specified as a control target device.

他の例として、電子機器４１０は、発話者の音声入力によって受信した命令語が、場所や指示対象などが省略された不完全文章で構成される場合、発話者の位置情報に基づいて制御対象デバイスを特定してよい。すなわち、電子機器４１０は、会話脈略から省略された指示対象に対する部分を発話者の位置情報に基づいて推定してよい。このとき、電子機器４１０は、発話者の音声入力による以前会話の脈略やデバイス制御履歴から推定された発話者の位置情報に基づいて制御対象デバイスを特定してよい。例えば、音声入力「動画をつけて」のように場所や指示対象が省略された命令語の場合には、発話者の音声入力に対応する動作が可能なデバイスのうちから発話者が最近までいた位置にあるデバイスを制御対象デバイスとして特定してよい。 As another example, when the command word received by the voice input of the speaker is composed of an incomplete sentence in which the location or the target object is omitted, the electronic device 410 is controlled based on the position information of the speaker. You may identify the device. That is, the electronic device 410 may estimate a portion corresponding to the instruction target omitted from the conversation pulse based on the position information of the speaker. At this time, the electronic device 410 may specify the control target device based on the location information of the speaker estimated from the previous conversational strategy or the device control history by the voice input of the speaker. For example, in the case of a command word in which the location and the instruction target are omitted, such as voice input “with video”, the speaker has been in the list of devices that can handle the voice input of the speaker until recently. The device at the position may be specified as the control target device.

電子機器４１０は、発話者の音声入力によって受信した命令語に対応する１つの制御動作に対する制御対象デバイスとして２つ以上のデバイスが特定された場合、デバイス使用パターンやデバイス制御履歴に基づいて発話者の意図に合ったデバイスを選択してよい。電子機器４１０は、音声入力による命令語だけでは制御対象位置や制御対象デバイスの特定が難しい場合（対象候補が複数存在する場合）、音声入力と関連する発話者の位置情報に基づいて制御対象デバイスを特定する。このとき、複数の制御対象デバイスが特定された場合には、特定されたデバイスのうちからユーザ行動パターン学習に基づいてより適したデバイスを選択してよい。例えば、音声入力「アニメーションキャラクタＡの動画をつけて」のような命令語を受信した場合、該当のコンテンツが主に再生されていることを示すデバイス履歴を把握し、履歴が最も多いデバイスを制御対象デバイスとして選定してよい。 When two or more devices are identified as control target devices for one control operation corresponding to the command word received by the voice input of the speaker, the electronic apparatus 410 is based on the device usage pattern and the device control history. You may select a device that suits your intention. When it is difficult to specify a control target position or a control target device only with a command word by voice input (electronic device 410) (when there are a plurality of target candidates), control target device based on the position information of the speaker related to the voice input Is identified. At this time, when a plurality of devices to be controlled are specified, a more suitable device may be selected from the specified devices based on user behavior pattern learning. For example, when a command such as voice input “with animation of animation character A” is received, a device history indicating that the corresponding content is mainly played is grasped, and a device having the largest history is controlled. You may select as a target device.

したがって、電子機器４１０は、音声入力による命令語だけでは制御対象デバイスの特定が難しい場合には、音声入力と関連する発話者の位置情報に基づいて発話者の意図に合ったデバイスを推定することができる。さらに、電子機器４１０は、発話者の音声入力による会話脈略を把握するか、デバイス使用パターンまたはデバイス制御履歴を把握することにより、より適した制御対象デバイスを選択することができる。 Therefore, when it is difficult to specify a device to be controlled only by a command word by voice input, the electronic device 410 estimates a device that matches the speaker's intention based on the position information of the speaker related to the voice input. Can do. Furthermore, the electronic device 410 can select a more suitable control target device by grasping the conversational strategy by the voice input of the speaker, or grasping the device usage pattern or the device control history.

段階５４０で、電子機器４１０は、発話者の音声入力によって受信した命令語に対応する動作を制御するために、段階５３０で特定された制御対象デバイスに該当の命令語を伝達してよい。電子機器４１０は、発話者の意図に合った動作が制御対象デバイスで行われるように、発話者の音声入力による命令語を該当のデバイスに伝達してよい。電子機器４１０は、発話者の音声入力による命令語を制御対象デバイスに伝達するに先立ち、制御対象の位置や制御対象デバイスの確認、推薦のための情報を音声で出力してよい。例えば、電子機器４１０は、音声入力「電気を消して」を認識および分析し、位置や対象を特定の情報、例えば「リビングの電気を消しましょうか？」のような確認または推薦音声をスピーカから出力してよく、これにより、音声入力「うん」が受信されることにより、リビングにある照明機器の電源を消すように該当のデバイスに命令を伝達してよい。 In operation 540, the electronic apparatus 410 may transmit the corresponding instruction word to the control target device identified in operation 530 in order to control an operation corresponding to the instruction word received by the voice input of the speaker. The electronic device 410 may transmit a command word based on the voice input of the speaker to the corresponding device so that an operation suitable for the intention of the speaker is performed in the controlled device. The electronic device 410 may output information for confirming and recommending the position of the control target and the control target device before transmitting the command word by the voice input of the speaker to the control target device. For example, the electronic device 410 recognizes and analyzes a voice input “turn off electricity”, and confirms or recommends a specific information such as “Let's turn off the living room?” From a speaker. When the voice input “Yes” is received, an instruction may be transmitted to the corresponding device to turn off the lighting device in the living room.

したがって、電子機器４１０は、位置や指示対象などが特定されない音声入力に対して発話者の位置を追加的に判断することにより、発話者の意図に合ったデバイス動作やコンテンツ提供を制御することができる。これにより、指示代名詞を利用した発話形態や、位置や指示対象などが省略された発話形態であっても、発話者の位置に基づき、多数のディスプレイや照明、多数のスピーカ、部屋別の温度制御装置など、宅内に存在する同じ機能の複数デバイスに対する選択的な制御が可能となる。 Therefore, the electronic device 410 can control device operation and content provision that match the intention of the speaker by additionally determining the position of the speaker with respect to the voice input in which the position and the instruction target are not specified. it can. As a result, even if the utterance form uses a pronoun, or the utterance form omits the position and the target, etc., the temperature control for each display, lighting, many speakers, and rooms is based on the position of the speaker. It is possible to selectively control a plurality of devices having the same function existing in the house such as an apparatus.

図６〜図７は、本発明の一実施形態における、指示代名詞が含まれた音声入力にしたがって制御対象デバイスを決定する例を示した図である。 6 to 7 are diagrams illustrating an example in which a control target device is determined according to a voice input including a pronoun in one embodiment of the present invention.

図６は、音声基盤インタフェースを含む電子機器４１０がユーザ１１０の発話「ここのＴＶをつけて」を認識および分析し、電子機器４１０と内部ネットワークを介して連動する宅内ＴＶ６１０の電源を制御する（すなわち、電源をＯＮにする）例を示している。ユーザ１１０の音声入力「ここのＴＶをつけて」に指示代名詞「ここ」が含まれているため、位置判断に対する推定が必要となり、これにより、ユーザ１１０の位置情報を追加的に取得し、これに基づいてユーザ１１０が意図する制御対象デバイスを選定してよい。 In FIG. 6, the electronic device 410 including the voice-based interface recognizes and analyzes the user 110's utterance “Turn on the TV here” and controls the power supply of the home TV 610 linked with the electronic device 410 via the internal network ( That is, an example is shown in which the power is turned on. Since the pronoun pronoun “here” is included in the voice input “turn on TV here” of the user 110, it is necessary to estimate the position determination, thereby additionally acquiring the position information of the user 110. The control target device intended by the user 110 may be selected based on the above.

指示代名詞によって発話者とデバイス間の位置関係が事前に定められてもよく、例えば、「ここ」、「こちら」、「こっち」などのような意味の指示代名詞が含まれる場合には、発話者から最も近い位置にあるデバイスまたは発話者の方向に位置するデバイスを、「あそこ」、「あちら」、「あっち」などのような意味の指示代名詞が含まれる場合には、発話者から最も遠い位置にあるデバイスまたは発話者の方向とは反対の場所に位置するデバイスを、「そこ」、「その場所」などのような意味の指示代名詞が含まれる場合には、発話者が最近までいた以前位置にあるデバイスを、制御対象デバイスとして特定してよい。 The positional relationship between the speaker and the device may be determined in advance by the demonstrative pronoun. For example, if a demonstrative pronoun with a meaning such as "here", "here", or "here" is included, the speaker If the device located closest to or in the direction of the speaker contains a pronoun that means “over”, “over”, “over”, etc., the position farthest from the speaker If the device is located in a location opposite to the speaker's direction or contains a pronoun that means something like `` There '', `` That location '', etc., the previous location where the speaker was most recently May be specified as a device to be controlled.

電子機器４１０は、電子機器４１０と連動する複数の互いに異なるデバイスの位置を管理してよい。ここで、宅内のデバイスの位置は、多様な方式に設定されてよい。例えば、このような位置は、ユーザ１１０や管理者によって電子機器４１０に入力および設定されてもよいし、電子機器４１０が他の電子機器と通信するために送受信する信号の強度などのように周知の位置測位技術を活用して測定されてもよい。モバイル機器の場合には、位置測位技術を活用してその位置が動的に測定されてよい。このとき、電子機器４１０は、ユーザ１１０の位置と複数の他のデバイスの位置に基づいて測定されるユーザ１１０と複数の他の電子機器間の距離をさらに利用して制御対象デバイスを決定してよい。ここで、ユーザ１１０の位置がユーザ１１０の発話を基盤として測定されることは、上述したとおりである。 The electronic device 410 may manage the positions of a plurality of different devices that are linked to the electronic device 410. Here, the position of the device in the home may be set in various ways. For example, such a position may be input and set in the electronic device 410 by the user 110 or an administrator, or is well-known such as the strength of a signal that the electronic device 410 transmits / receives to communicate with another electronic device. It may be measured by utilizing the position positioning technique. In the case of a mobile device, the position may be measured dynamically using a positioning technique. At this time, the electronic device 410 further determines the control target device by further using the distance between the user 110 and the plurality of other electronic devices measured based on the position of the user 110 and the positions of the plurality of other devices. Good. Here, as described above, the position of the user 110 is measured based on the utterance of the user 110.

例えば、図６に示すように、ユーザ１１０がリビングに立って「ここのＴＶをつけて」を発話した場合、音声入力に指示代名詞「ここ」が含まれることによってユーザ１１０の位置を推定し、宅内の複数のＴＶのうちからユーザ１１０の位置と最も近いリビングにあるＴＶ６１０を制御対象デバイスとして選択してよい。 For example, as shown in FIG. 6, when the user 110 stands in the living room and utters “Turn on TV here”, the pronoun “here” is included in the voice input to estimate the position of the user 110, The TV 610 in the living room closest to the position of the user 110 may be selected as a control target device from among a plurality of TVs in the house.

他の例として、図７は、音声基盤インタフェースを含む電子機器４１０がユーザ１１０の発話「あっちの部屋のエアコンをつけて」を認識および分析し、電子機器４１０と内部ネットワークを介して連動する宅内温度制御装置７１０の電源を制御する例を示している。ユーザ１１０がリビングに立って「あっちの部屋のエアコンをつけて」を発話した場合、音声入力に指示代名詞「あっち」が含まれることによってユーザ１１０の位置を推定し、宅内温度制御装置のうちからユーザ１１０の位置と最も遠い部屋にある温度制御装置７１０を制御対象デバイスとして選択してよい。 As another example, FIG. 7 shows an example in which an electronic device 410 including a voice-based interface recognizes and analyzes the user 110's utterance “turn on an air conditioner in that room” and links the electronic device 410 via an internal network. The example which controls the power supply of the temperature control apparatus 710 is shown. When the user 110 stands in the living room and utters “turn on the air conditioner in that room”, the position of the user 110 is estimated by including the instruction pronoun “that” in the voice input, and the temperature control device in the house 110 The temperature control device 710 in the room farthest from the position of the user 110 may be selected as a control target device.

図８〜図９は、本発明の一実施形態における、位置や指示対象が省略された音声入力にしたがって制御対象デバイスを決定する例を示した図である。 8 to 9 are diagrams illustrating an example in which a control target device is determined according to a voice input in which a position and an instruction target are omitted according to an embodiment of the present invention.

図８は、音声基盤インタフェースを含む電子機器４１０がユーザ１１０の発話「動画をつけて」を認識および分析し、電子機器４１０と内部ネットワークを介して連動する宅内ディスプレイ８１０を制御する（ＯＮにする、或いは、動画を再生する）例を示している。例えば、ユーザ１１０が音声基盤インタフェースを通じてレシピに関する質問または料理のためのタイマー設定などの動作をした後、所定の時間以内に「動画をつけて」を発話した場合、以前の音声入力の会話脈略やデバイス制御履歴からユーザ１１０が台所に位置することを推定し、台所付近に設置されたディスプレイ（例えば、ＴＶや冷蔵庫ディスプレイなど）８１０に該当のコンテンツを伝達してよい。あるいは、ユーザ１１０の発話「プルコギのレシピ動画をつけて」に対して発話内容自体の会話脈略を分析し、ユーザ１１０の位置が台所付近であることを推定することも可能である。 FIG. 8 shows that the electronic device 410 including the voice-based interface recognizes and analyzes the utterance “add video” of the user 110, and controls (turns on) the in-home display 810 linked to the electronic device 410 via the internal network. Or, a moving image is reproduced). For example, when the user 110 utters “with video” within a predetermined time after performing an operation such as a recipe question or setting a timer for cooking through the voice-based interface, the conversational context of the previous voice input Alternatively, it may be estimated that the user 110 is located in the kitchen from the device control history, and the corresponding content may be transmitted to a display (for example, a TV or a refrigerator display) 810 installed near the kitchen. Alternatively, it is also possible to estimate that the position of the user 110 is in the vicinity of the kitchen by analyzing the conversation of the utterance content itself with respect to the utterance of the user 110 “with a recipe video of bulgogi”.

図９を参照すると、ユーザ１１０が音声基盤インタフェースを通じて台所で料理に関する要求をした後、リビングに戻ってきて「そこの電気を消して」を発話した場合、音声入力に指示代名詞「そこ」が含まれることによってユーザ１１０の位置を推定し、宅内照明のうちからユーザ１１０が最近までいた以前位置の照明９１０を制御対象デバイスとして選択してよい。 Referring to FIG. 9, when the user 110 makes a request for cooking in the kitchen through the voice-based interface and then returns to the living room and utters “Turn off the electricity there”, the voice pronoun “There” is included. Thus, the position of the user 110 may be estimated, and the illumination 910 at the previous position from which the user 110 has been located recently may be selected as the control target device.

図１０は、本発明の一実施形態における、ユーザ行動パターン学習に基づいて制御対象デバイスを決定する例を示した図である。 FIG. 10 is a diagram illustrating an example of determining a control target device based on user behavior pattern learning in an embodiment of the present invention.

図１０は、音声基盤インタフェースを含む電子機器４１０がユーザ１１０の発話「アニメーションキャラクタＡの動画をつけて」を認識および分析し、電子機器４１０と内部ネットワークを介して連動する宅内ディスプレイ１０１０を制御する例を示している。電子機器４１０は、ユーザ１１０の発話「アニメーションキャラクタＡの動画をつけて」には位置や指示対象などが省略されているため、ユーザ１１０の位置を追加的に判断してよく、ユーザ１１０の位置を基準として制御対象デバイスを選択してよい。例えば、ユーザ１１０が位置するリビングに動画再生が可能なデバイスとしてＴＶ６１０とタブレット１０１０が存在する場合、該当のコンテンツが主に再生されたデバイス履歴を把握し、ＴＶ６１０とタブレット１０１０のうちから該当の履歴が最も多いデバイスを選定してコンテンツ再生を制御してよい。これと同様に、ユーザ１１０の発話「ヒップホップをかけて」に対し、ユーザ１１０の周辺に音声コンテンツ出力が可能なデバイスが複数存在する場合には、該当の音楽ジャンルが主に再生されたデバイス履歴を把握し、該当の履歴が最も多いデバイスを制御対象デバイスとして選定してコンテンツ再生を制御してよい。 FIG. 10 shows that the electronic device 410 including the voice-based interface recognizes and analyzes the user 110's utterance “turn on the animation of the animation character A” and controls the home display 1010 linked to the electronic device 410 via the internal network. An example is shown. The electronic device 410 may additionally determine the position of the user 110 because the position, the instruction target, and the like are omitted from the user 110's utterance “add the animation of the animation character A”. The control target device may be selected based on the above. For example, when a TV 610 and a tablet 1010 exist as devices capable of playing a video in the living room where the user 110 is located, the device history in which the corresponding content is mainly played back is grasped, and the corresponding history from the TV 610 and the tablet 1010 is obtained. The device with the largest number may be selected to control content playback. Similarly, when there are a plurality of devices capable of outputting audio content in the vicinity of the user 110 with respect to the utterance “hip hop” of the user 110, the device in which the corresponding music genre is mainly played back A history may be grasped, and a device with the largest number of corresponding history may be selected as a control target device to control content reproduction.

以上のように、本発明の実施形態によると、発話者の音声入力による命令語にしたがって発話者の位置を追加的に判断した後、命令語に適したデバイスを選定してデバイス動作やコンテンツ提供を制御することができる。特に、命令語に指示代名詞が含まれるか、命令語から位置や指示対象が省略された不完全文章で構成される場合、ユーザの位置を基盤としてユーザ意図に適したデバイスを推定することができると同時に、発話者の音声入力による会話脈略を把握するか、デバイス使用パターンやデバイス制御履歴を把握することにより、ユーザの意図に合ったデバイスを推定することができる。 As described above, according to the embodiment of the present invention, after additionally determining the position of the speaker according to the command word by the voice input of the speaker, a device suitable for the command word is selected to provide device operation and content. Can be controlled. In particular, when a command word includes a command pronoun or is composed of an incomplete sentence in which a position and a command target are omitted from the command word, a device suitable for the user's intention can be estimated based on the user's position. At the same time, it is possible to estimate a device that meets the user's intention by grasping the conversational tactics by the voice input of the speaker or grasping the device usage pattern and the device control history.

上述したシステムまたは装置は、ハードウェア構成要素、ソフトウェア構成要素、またはハードウェア構成要素とソフトウェア構成要素との組み合わせによって実現されてよい。例えば、実施形態で説明された装置および構成要素は、例えば、プロセッサ、コントローラ、ＡＬＵ（ａｒｉｔｈｍｅｔｉｃｌｏｇｉｃｕｎｉｔ）、デジタル信号プロセッサ、マイクロコンピュータ、ＦＰＧＡ（ｆｉｅｌｄｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙ）、ＰＬＵ（ｐｒｏｇｒａｍｍａｂｌｅｌｏｇｉｃｕｎｉｔ）、マイクロプロセッサ、または命令を実行して応答することができる様々な装置のように、１つ以上の汎用コンピュータまたは特殊目的コンピュータを利用して実現されてよい。処理装置は、オペレーティングシステム（ＯＳ）および前記ＯＳ上で実行される１つ以上のソフトウェアアプリケーションを実行してよい。また、処理装置は、ソフトウェアの実行に応答し、データにアクセスし、データを格納、操作、処理、および生成してもよい。理解の便宜のために、１つの処理装置が使用されるとして説明される場合もあるが、当業者は、処理装置が複数個の処理要素および／または複数種類の処理要素を含んでもよいことが理解できるであろう。例えば、処理装置は、複数個のプロセッサまたは１つのプロセッサおよび１つのコントローラを含んでよい。また、並列プロセッサのような、他の処理構成も可能である。 The system or apparatus described above may be realized by a hardware component, a software component, or a combination of a hardware component and a software component. For example, the devices and components described in the embodiments include, for example, a processor, a controller, an ALU (arithmic logic unit), a digital signal processor, a microcomputer, an FPGA (field programmable gate array), a PLU (programmable logic unit), a micro It may be implemented using one or more general purpose or special purpose computers, such as a processor or various devices capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the OS. The processing device may also respond to software execution, access data, and store, manipulate, process, and generate data. For convenience of understanding, one processing device may be described as being used, but those skilled in the art may include a plurality of processing elements and / or multiple types of processing elements. You can understand. For example, the processing device may include a plurality of processors or a processor and a controller. Other processing configurations such as parallel processors are also possible.

ソフトウェアは、コンピュータプログラム、コード、命令、またはこれらのうちの１つ以上の組み合わせを含んでもよく、思うままに動作するように処理装置を構成したり、独立的または集合的に処理装置に命令したりしてよい。ソフトウェアおよび／またはデータは、処理装置に基づいて解釈されたり、処理装置に命令またはデータを提供したりするために、いかなる種類の機械、コンポーネント、物理装置、仮想装置、コンピュータ格納媒体または装置に具現化されてよい。ソフトウェアは、ネットワークによって接続されたコンピュータシステム上に分散され、分散された状態で格納されても実行されてもよい。ソフトウェアおよびデータは、１つ以上のコンピュータで読み取り可能な記録媒体に格納されてよい。 The software may include computer programs, code, instructions, or a combination of one or more of these, configuring the processor to operate as desired, or instructing the processor independently or collectively. You may do it. Software and / or data may be embodied on any type of machine, component, physical device, virtual device, computer storage medium or device to be interpreted based on the processing device or to provide instructions or data to the processing device. May be used. The software may be distributed over computer systems connected by a network and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

実施形態に係る方法は、多様なコンピュータ手段によって実行可能なプログラム命令の形態で実現されてコンピュータで読み取り可能な媒体に記録されてよい。前記コンピュータで読み取り可能な媒体は、プログラム命令、データファイル、データ構造などを単独でまたは組み合わせて含んでよい。媒体は、コンピュータによって実行可能なプログラムを継続して格納するものであってもよいし、実行またはダウンロードのために臨時格納するものであってもよい。また、媒体は、単一または複数のハードウェアが結合した形態の多様な記録手段または格納手段であってよいが、いずれかのコンピュータシステムに直接接続される媒体に限定されることはなく、ネットワーク上に分散存在するものであってもよい。媒体の例としては、ハードディスク、フロッピー（登録商標）ディスク、および磁気テープのような磁気媒体、ＣＤ−ＲＯＭおよびＤＶＤのような光媒体、フロプティカルディスク（ｆｌｏｐｔｉｃａｌｄｉｓｋ）のような光磁気媒体、およびＲＯＭ、ＲＡＭ、フラッシュメモリなどを含んでプログラム命令語が格納されるように構成されたものであってよい。また、他の媒体の例として、アプリケーションを流通するアプリストアや、その他にも多様なソフトウェアを供給あるいは流通するサイト、サーバなどで管理する記録媒体あるいは格納媒体も挙げられる。プログラム命令の例には、コンパイラによって生成されるもののような機械語コードだけではなく、インタプリタなどを使用してコンピュータによって実行される高級言語コードを含む。 The method according to the embodiment may be realized in the form of program instructions executable by various computer means and recorded on a computer-readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The medium may continuously store a program that can be executed by a computer, or may be temporarily stored for execution or download. The medium may be a variety of recording means or storage means in which a single piece or a plurality of pieces of hardware are combined. However, the medium is not limited to a medium that is directly connected to any computer system. It may be dispersed on the surface. Examples of the medium include a magnetic medium such as a hard disk, a floppy (registered trademark) disk, and a magnetic tape, an optical medium such as a CD-ROM and a DVD, a magneto-optical medium such as a floppy disk, The program instruction word may be stored including ROM, RAM, flash memory and the like. Examples of other media include an application store that distributes applications, a site that supplies or distributes various software, a recording medium that is managed by a server, or a storage medium. Examples of program instructions include not only machine language code such as that generated by a compiler, but also high-level language code executed by a computer using an interpreter or the like.

以上のように、実施形態を、限定された実施形態と図面に基づいて説明したが、当業者であれば、上述した記載から多様な修正および変形が可能であろう。例えば、説明された技術が、説明された方法とは異なる順序で実行されたり、かつ／あるいは、説明されたシステム、構造、装置、回路などの構成要素が、説明された方法とは異なる形態で結合されたりまたは組み合わされたり、他の構成要素または均等物によって対置されたり置換されたとしても、適切な結果を達成することができる。 As mentioned above, although embodiment was described based on limited embodiment and drawing, those skilled in the art will be able to perform various correction and deformation | transformation from the above-mentioned description. For example, the described techniques may be performed in a different order than the described method and / or components of the described system, structure, apparatus, circuit, etc. may be different from the described method. Appropriate results can be achieved even when combined or combined, or opposed or replaced by other components or equivalents.

したがって、異なる実施形態であっても、特許請求の範囲と均等なものであれば、添付される特許請求の範囲に属する。 Accordingly, even different embodiments belong to the appended claims as long as they are equivalent to the claims.

３１０：電子機器
３２０：クラウド人工知能プラットフォーム
３３０：コンテンツ・サービス
３４０：インタフェースコネクト
３５０：拡張キット 310: Electronic device 320: Cloud artificial intelligence platform 330: Content service 340: Interface connect 350: Expansion kit

Claims

A device control method executed by an electronic device including an audio infrastructure interface,
Receiving a command word by a user's voice input through the voice infrastructure interface;
If position information is required as a result of analysis of the command word, obtaining position information related to the voice input; and using the position information, a control target among devices that can be controlled by the electronic device A device control method comprising: specifying a device and causing the control target device to execute an operation corresponding to the command word.

The obtaining step includes
If the command word includes a pronoun, a position information associated with the voice input is acquired.
The device control method according to claim 1.

The obtaining step includes
When the command word is composed of an incomplete sentence in which the location and the instruction target are omitted, the position information related to the voice input is acquired.
The device control method according to claim 1.

The positional information related to the voice input is
The relative position and direction of the user relative to the electronic device measured at the time or period associated with the reception of the audio input, the presence or absence of a change in the relative position and direction, and the relative position and direction. Including at least one of the changing directions,
The device control method according to claim 1.

The obtaining step includes
Obtaining positional information related to the voice input based on a phase shift of the voice input inputted to a plurality of microphones included in the voice base interface;
The device control method according to claim 1.

The electronic device includes at least one of a camera and a sensor,
The obtaining step includes
Obtaining positional information associated with the audio input based on an output value of at least one of the camera and sensor when the audio input is received;
The device control method according to claim 1.

The obtaining step includes
Grasping at least one of the conversational abbreviation corresponding to the voice input and the device control history of the user by the voice-based interface to acquire position information related to the voice input;
The device control method according to claim 1.

The identifying step includes:
The positional relationship between the user and the device is determined by the indicating pronoun, and the device to be controlled is specified using the positional information related to the voice input according to the positional relationship.
The device control method according to claim 2.

The identifying step includes:
Along with the position information related to the voice input, the device to be controlled is specified using the device usage pattern of the user based on the voice-based interface.
The device control method according to claim 1.

The identifying step includes:
When there are a plurality of devices capable of executing an operation corresponding to the command word, a device having a relatively high frequency of use among a plurality of devices using the user device usage pattern by the voice-based interface Identify the device to be controlled,
The device control method according to claim 1.

The computer program which makes a computer perform the method as described in any one of Claims 1-10.

The computer-readable recording medium with which the program for making a computer perform the method as described in any one of Claims 1-10 is recorded.

Electronic equipment,
A voice based interface and at least one processor implemented to execute computer readable instructions;
The at least one processor comprises:
Receiving a command word by a user's voice input through the voice base interface;
As a result of the analysis of the command word, if position information is required, position information related to the voice input is acquired,
Using the position information, specify a control target device from devices that can be controlled by the electronic device, and cause the control target device to perform an operation corresponding to the command word.
Electronics.

The at least one processor is configured to obtain location information associated with the voice input.
When the command word includes a pronoun or when the command word is composed of an incomplete sentence in which a location or a command object is omitted from the command word, the position information related to the voice input is acquired.
The electronic device according to claim 13.

The positional information related to the voice input is
The relative position and direction of the user relative to the electronic device measured at the time or period associated with the reception of the audio input, the presence or absence of a change in the relative position and direction, and the relative position and direction. Including at least one of the changing directions,
The electronic device according to claim 13.

The at least one processor is configured to obtain location information associated with the voice input.
Obtaining positional information related to the voice input based on a phase shift of the voice input inputted to a plurality of microphones included in the voice base interface;
The electronic device according to claim 13.

The electronic device includes at least one of a camera and a sensor,
The at least one processor is configured to obtain location information associated with the voice input.
Obtaining positional information associated with the audio input based on an output value of at least one of the camera and sensor when the audio input is received;
The electronic device according to claim 13.

The at least one processor is configured to obtain location information associated with the voice input.
Grasping at least one of the conversational abbreviation corresponding to the voice input and the device control history of the user by the voice-based interface to acquire position information related to the voice input;
The electronic device according to claim 13.

The at least one processor is configured to obtain location information associated with the voice input.
If the command word includes a pronoun, the position information associated with the voice input is acquired,
The at least one processor specifies the device to be controlled,
The positional relationship between the user and the device is determined by the indicating pronoun, and the device to be controlled is specified using the positional information related to the voice input according to the positional relationship.
The electronic device according to claim 13.

The at least one processor specifies the device to be controlled,
Along with the position information related to the voice input, the device to be controlled is specified using the device usage pattern of the user based on the voice-based interface.
The electronic device according to claim 13.