JP6752870B2

JP6752870B2 - Methods and systems for controlling artificial intelligence devices using multiple wake words

Info

Publication number: JP6752870B2
Application number: JP2018233017A
Authority: JP
Inventors: ウンシルイ; ジョンイルカン; ジュンヒョンパク; スンウォンチェー
Original assignee: Naver Corp
Current assignee: Naver Corp
Priority date: 2017-12-18
Filing date: 2018-12-13
Publication date: 2020-09-09
Anticipated expiration: 2038-12-13
Also published as: JP2019109510A

Description

以下の説明は、人工知能対話システムに関する。 The following description relates to an artificial intelligence dialogue system.

一般的に、個人秘書システム、人工知能（ＡＩ）スピーカ、チャットボットプラットフォーム（ｃｈａｔｂｏｔｐｌａｔｆｏｒｍ）などで使用される人工知能対話システムは、人間が発した命令語の意図を理解し、それに対応する返答文を提供する方式を採用している。 Generally, artificial intelligence dialogue systems used in personal secretary systems, artificial intelligence (AI) speakers, chatbot platforms, etc. understand the intent of human-issued command words and respond to them. The method of providing is adopted.

主に、人工知能対話システムは、人間から機能的な要求が伝達されると、このような人間の要求に対する解答を機械が提供する方式をとっており、マイク（ｍｉｃｒｏｐｈｏｎｅ）でユーザの音声入力を受信し、受信した音声入力に基づいてデバイスの動作やコンテンツの提供を制御するようになる。 Mainly, the artificial intelligence dialogue system adopts a method in which a machine provides an answer to such a human request when a functional request is transmitted from a human, and a user's voice input is input by a microphone (microphone). It will receive and control the operation of the device and the provision of content based on the received voice input.

例えば、特許文献１（公開日２０１１年１２月３０日）には、ホームネットワークサービスにおいて、移動通信網の他にＷｉ−Ｆｉのような第２通信網を利用してホームネットワークサービスを提供することができ、ホーム内の複数のマルチメディア機器を、ユーザがボタン操作をしなくても音声命令によってマルチコントロールすることができる技術が開示されている。 For example, in Patent Document 1 (publication date: December 30, 2011), in the home network service, the home network service is provided by using a second communication network such as Wi-Fi in addition to the mobile communication network. A technology is disclosed that enables a plurality of multimedia devices in a home to be multi-controlled by voice commands without the user having to operate a button.

一般的な人工知能対話システムは、事前に定められたウェイクワード（例えば、機器の名称など）を、機器を活性化（ｗａｋｅ−ｕｐ）させるための対話活性トリガとして使用している。これにより、人工知能機器は、ウェイクワードを基盤として音声認識機能を実行するようになる。例えば、機器は、ユーザが機械名を発すると活性化され、これに続くユーザの音声命令（質問）を受信するための待機モードに突入するようになる。 A general artificial intelligence dialogue system uses a predetermined wake word (for example, the name of a device) as a dialogue activation trigger for activating (wake-up) the device. As a result, the artificial intelligence device will execute the voice recognition function based on the wake word. For example, the device is activated when the user issues a machine name, and enters a standby mode for receiving a subsequent user's voice command (question).

韓国公開特許第１０−２０１１−０１３９７９７号公報Korean Publication No. 10-2011-0139977

音声基盤インタフェースを提供する人工知能機器のウェイクワードを２つ以上に区分し、各ウェイクワードに応じて人工知能機器の動作を異なるように制御することができる方法およびシステムを提供する。 Provided are a method and a system capable of classifying a wake word of an artificial intelligence device that provides a voice-based interface into two or more, and controlling the operation of the artificial intelligence device so as to be different for each wake word.

コンピュータによって実現される電子機器で実行される人工知能対話方法であって、前記電子機器の音声インタフェースによって、予め設定された複数のウェイクワードのうちのいずれか１つのウェイクワードが認識されると、対話機能を活性化させる段階、および前記対話機能が活性化した状態で入力された音声命令に対し、前記認識されたウェイクワードに応じて異なる動作を実行するように制御する段階を含む、人工知能対話方法を提供する。 It is an artificial intelligence dialogue method implemented by an electronic device realized by a computer, and when the voice interface of the electronic device recognizes any one of a plurality of preset wake words, Artificial intelligence includes a step of activating the dialogue function and a step of controlling a voice command input in the state where the dialogue function is activated so as to perform a different operation according to the recognized wake word. Provide a way of dialogue.

一側面によると、前記電子機器で実行可能な動作ごとに、該当の動作を特定するためのウェイクワードを設定する段階をさらに含んでよい。 According to one aspect, for each action that can be performed on the electronic device, a step of setting a wake word for identifying the action may be further included.

他の側面によると、前記ウェイクワードと各ウェイクワードの動作は、前記電子機器のユーザに個人化されてよい。 According to another aspect, the wake word and the operation of each wake word may be personalized to the user of the electronic device.

また他の側面によると、前記電子機器のようなネットワークに連結する他のデバイスとペアリングする段階をさらに含み、前記ウェイクワードは、基本ウェイクワードと追加ウェイクワードを含み、前記制御する段階は、前記対話機能が前記基本ウェイクワードによって活性化された場合には前記音声命令に対応する動作を前記電子機器で実行し、前記対話機能が前記追加ウェイクワードによって活性化された場合には前記音声命令に対応する動作が前記他のデバイスで実行されるように、前記音声命令を前記他のデバイスに伝達してよい。 Further according to another aspect, the wake word includes a basic wake word and an additional wake word, and the control step includes a step of pairing with another device connected to the network such as the electronic device. When the dialogue function is activated by the basic wake word, the operation corresponding to the voice command is executed by the electronic device, and when the dialogue function is activated by the additional wake word, the voice command is executed. The voice command may be transmitted to the other device so that the corresponding operation is performed on the other device.

また他の側面によると、前記ペアリングする段階は、前記ネットワークに連結する他のデバイスを検索し、検索信号に応答したデバイスとペアリングしてよい。 According to another aspect, the pairing step may search for other devices connected to the network and pair with the device that responds to the search signal.

また他の側面によると、前記電子機器とペアリングした他のデバイスが複数である場合、デバイスごとに前記追加ウェイクワードを異なるように設定する段階をさらに含んでよい。 Further, according to another aspect, when there are a plurality of other devices paired with the electronic device, the step of setting the additional wake word to be different for each device may be further included.

また他の側面によると、前記制御する段階は、前記認識されたウェイクワードに応じて異なるエンジンを呼び出し、該当のエンジンから前記音声命令に対応する返答情報を出力してよい。 According to another aspect, the control step may call a different engine according to the recognized wake word and output the response information corresponding to the voice command from the engine.

さらに他の側面によると、前記活性化する段階は、前記認識されたウェイクワードに応じて活性化状態を区別して表示する段階を含んでよい。 According to yet another aspect, the activation step may include a step of distinguishing and displaying the activation state according to the recognized wake word.

コンピュータと結合して前記人工知能対話方法をコンピュータに実行させるためにコンピュータ読取可能な記録媒体に格納された、コンピュータプログラムを提供する。 Provided is a computer program stored in a computer-readable recording medium in combination with a computer to cause the computer to perform the artificial intelligence interaction method.

前記人工知能対話方法をコンピュータに実行させるためのプログラムが記録されていることを特徴とする、コンピュータ読取可能な記録媒体を提供する。 Provided is a computer-readable recording medium, characterized in that a program for causing a computer to execute the artificial intelligence dialogue method is recorded.

コンピュータによって実現される電子機器の人工知能対話システムであって、コンピュータ読取可能な命令を実行するように実現される少なくとも１つのプロセッサを含み、前記少なくとも１つのプロセッサは、前記電子機器の対話機能を活性化するための対話活性トリガとして使用される２つ以上のウェイクワードを設定する設定部、前記電子機器の音声インタフェースによって前記ウェイクワードのうちのいずれか１つのウェイクワードが認識されると、前記対話機能を活性化させる活性化部、および前記対話機能が活性化した状態で入力された音声命令に対し、前記認識されたウェイクワードに応じて異なる動作を実行するように制御する動作実行部を備える、人工知能対話システムを提供する。 An artificial intelligent dialogue system of an electronic device realized by a computer, including at least one processor realized to execute a computer-readable instruction, the at least one processor performing the interactive function of the electronic device. Dialogue for activation When a wake word for setting two or more wake words used as an activation trigger, or one of the wake words is recognized by the voice interface of the electronic device, the wake word is described. An activation unit that activates the dialogue function, and an operation execution unit that controls a voice command input in a state where the dialogue function is activated so as to execute a different operation according to the recognized wake word. Provide an artificial intelligence dialogue system to be equipped.

本発明の実施形態によると、音声基盤インタフェースを提供する人工知能機器のウェイクワードを２つ以上に区分し、各ウェイクワードに応じて人工知能機器の動作を異なるように制御することができる。 According to the embodiment of the present invention, the wake word of the artificial intelligence device that provides the voice-based interface can be divided into two or more, and the operation of the artificial intelligence device can be controlled differently according to each wake word.

本発明の一実施形態における、音声基盤インタフェースを活用したサービス環境の例を示した図である。It is a figure which showed the example of the service environment which utilized the voice-based interface in one Embodiment of this invention. 本発明の一実施形態における、音声基盤インタフェースを活用したサービス環境の他の例を示した図である。It is a figure which showed the other example of the service environment which utilized the voice-based interface in one Embodiment of this invention. 本発明の一実施形態における、クラウド人工知能プラットフォームの例を示した図である。It is a figure which showed the example of the cloud artificial intelligence platform in one Embodiment of this invention. 本発明の一実施形態における、電子機器およびサーバの内部構成を説明するためのブロック図である。It is a block diagram for demonstrating the internal structure of an electronic device and a server in one Embodiment of this invention. 本発明の一実施形態における、電子機器のプロセッサが含むことができる構成要素の例を示したブロック図である。It is a block diagram which showed the example of the component which can include the processor of the electronic device in one Embodiment of this invention. 本発明の一実施形態における、電子機器が実行することができる人工知能対話方法の例を示したフローチャートである。It is a flowchart which showed the example of the artificial intelligence dialogue method which an electronic device can execute in one Embodiment of this invention. 本発明の一実施形態における、複数のウェイクワードと各ウェイクワードの機器動作の例を示した図である。It is a figure which showed the example of the device operation of a plurality of wake words and each wake word in one Embodiment of this invention. 本発明の一実施形態における、複数のウェイクワードを利用した制御環境の例を示した図である。It is a figure which showed the example of the control environment using a plurality of wake words in one Embodiment of this invention. 本発明の一実施形態における、複数のウェイクワードを利用して電子機器の動作を制御する過程の例を示したフローチャートである。It is a flowchart which showed the example of the process of controlling the operation of an electronic device by using a plurality of wake words in one Embodiment of this invention.

以下、本発明の実施形態について、添付の図面を参照しながら詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

本発明の実施形態に係る人工知能対話システムは、ユーザとの対話を基盤として動作するインタフェースを提供する電子機器によって実現されてよい。このとき、人工知能対話システムは、機器を活性化（ｗａｋｅ−ｕｐ）させるために２つ以上のウェイクワードを利用し、各ウェイクワードに応じて機器の動作を区分してよい。 The artificial intelligence dialogue system according to the embodiment of the present invention may be realized by an electronic device that provides an interface that operates on the basis of dialogue with a user. At this time, the artificial intelligence dialogue system may use two or more wake words to activate (wake-up) the device, and may divide the operation of the device according to each wake word.

本発明の実施形態に係る人工知能対話方法は、上述した電子機器によって実行されてよい。このとき、電子機器には、本発明の一実施形態に係るコンピュータプログラムがインストールおよび駆動されてよく、電子機器は、駆動するコンピュータプログラムの制御に従って本発明の一実施形態に係る人工知能対話方法を実行してよい。上述したコンピュータプログラムは、コンピュータで実現される電子機器と結合して人工知能対話方法をコンピュータに実行させるために、コンピュータ読取可能な記録媒体に格納されてよい。 The artificial intelligence dialogue method according to the embodiment of the present invention may be executed by the above-mentioned electronic device. At this time, the computer program according to the embodiment of the present invention may be installed and driven in the electronic device, and the electronic device uses the artificial intelligence dialogue method according to the embodiment of the present invention under the control of the driving computer program. You may do it. The computer program described above may be stored in a computer-readable recording medium in order to combine with an electronic device realized by the computer to cause the computer to execute an artificial intelligence dialogue method.

図１は、本発明の一実施形態における、音声基盤インタフェースを活用したサービス環境の例を示した図である。図１の実施形態では、スマートホーム（ｓｍａｒｔｈｏｍｅ）やホームネットワークサービスのように宅内のデバイスを連結して制御する技術において、音声を基盤として動作するインタフェースを提供する電子機器１００が、ユーザ１１０の発話によって受信した音声入力「電気を消して」を認識および分析し、宅内で電子機器１００と内部ネットワークを介して繋がっている宅内照明機器１２０の電源を制御する例について示している。 FIG. 1 is a diagram showing an example of a service environment utilizing a voice-based interface according to an embodiment of the present invention. In the embodiment of FIG. 1, in a technique for connecting and controlling home devices such as a smart home and a home network service, an electronic device 100 that provides an interface that operates based on voice is a user 110. An example of recognizing and analyzing the voice input "turn off the light" received by utterance and controlling the power supply of the home lighting device 120 connected to the electronic device 100 via the internal network in the home is shown.

例えば、宅内のデバイスは、上述した宅内照明機器１２０の他にも、テレビ、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、周辺機器、エアコン、冷蔵庫、ロボット清掃機などのような家電製品はもちろん、水道、電気、冷暖房機器などのようなエネルギー消費装置、ドアロックや監視カメラなどのような保安機器など、オンライン上で連結して制御される多様なデバイスを含んでよい。また、内部ネットワークには、イーサネット（Ｅｔｈｅｒｎｅｔ）（登録商標）、ＨｏｍｅＰＮＡ、ＩＥＥＥ１３９４のような有線ネットワーク技術や、ブルートゥース（Ｂｌｕｅｔｏｏｔｈ）（登録商標）、ＵＷＢ（ｕｌｔｒａＷｉｄｅＢａｎｄ）、ジグビー（ＺｉｇＢｅｅ）（登録商標）、Ｗｉｒｅｌｅｓｓ１３９４、ＨｏｍｅＲＦのような無線ネットワーク技術などが活用されてよい。 For example, in-house devices include not only the above-mentioned in-house lighting device 120, but also home appliances such as TVs, PCs (Personal Computers), peripheral devices, air conditioners, refrigerators, robot cleaners, etc., as well as water, electricity, and air conditioning. It may include a variety of devices that are connected and controlled online, such as energy consuming devices such as devices and security devices such as door locks and surveillance cameras. In addition, wired network technologies such as Ethernet (registered trademark), HomePNA, and IEEE 1394, as well as Bluetooth (registered trademark), UWB (ultra Wide Band), and ZigBee (registered) are included in the internal network. Wireless network technology such as Bluetooth 1394, Home RF, etc. may be utilized.

電子機器１００は、宅内のデバイスのうちの１つであってよい。例えば、電子機器１００は、宅内に備えられた人工知能スピーカやロボット清掃機などのようなデバイスのうちの１つであってよい。また、電子機器１００は、スマートフォン（ｓｍａｒｔｐｈｏｎｅ）、携帯電話、ナビゲーション、ノート型パンコン、デジタル放送用端末、ＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔｓ）、ＰＭＰ（ＰｏｒｔａｂｌｅＭｕｌｔｉｍｅｄｉａＰｌａｙｅｒ）、タブレット、ゲームコンソール、ウェアラブルデバイス、ＩｏＴ（ｉｎｔｅｒｎｅｔｏｆｔｈｉｎｇｓ）デバイス、ＶＲ（ｖｉｒｔｕａｌｒｅａｌｉｔｙ）デバイス、ＡＲ（ａｕｇｍｅｎｔｅｄｒｅａｌｉｔｙ）デバイスなどのようなユーザ１１０のモバイル機器であってもよい。このように、電子機器１００は、ユーザ１１０の音声入力を受信して宅内のデバイスを制御するために宅内のデバイスと連結可能な機能を含む機器であれば、特に制限されることはない。また、実施形態によっては、上述したユーザ１１０のモバイル機器が宅内のデバイスとして含まれてもよい。 The electronic device 100 may be one of the devices in the home. For example, the electronic device 100 may be one of devices such as an artificial intelligence speaker and a robot cleaner provided in the house. The electronic device 100 includes a smartphone (smart phone), a mobile phone, a navigation system, a notebook pancon, a digital broadcasting terminal, a PDA (Personal Digital Assistants), a PMP (Portable Multimedia Player), a tablet, a game console, a wearable device, and an IoT. It may be a mobile device of the user 110 such as an (internet of things) device, a VR (virtual reality) device, an AR (augmented reality) device, and the like. As described above, the electronic device 100 is not particularly limited as long as it is a device including a function that can be connected to the device in the house in order to receive the voice input of the user 110 and control the device in the house. Further, depending on the embodiment, the mobile device of the user 110 described above may be included as a device in the home.

図２は、本発明の一実施形態における、音声基盤インタフェースを活用したサービス環境の他の例を示した図である。図２は、音声を基盤として動作するインタフェースを提供する電子機器１００が、ユーザ１１０の発話によって受信した音声入力「今日の天気」を認識および分析し、外部ネットワークを介して外部サーバ２１０から今日の天気に関する情報を取得し、取得した情報を「今日の天気は・・・」のように音声で出力する例について示している。 FIG. 2 is a diagram showing another example of a service environment utilizing a voice-based interface according to an embodiment of the present invention. In FIG. 2, an electronic device 100 that provides an interface that operates based on voice recognizes and analyzes the voice input “today's weather” received by the utterance of the user 110, and from an external server 210 via an external network today. An example of acquiring information about the weather and outputting the acquired information by voice such as "Today's weather is ..." is shown.

例えば、外部ネットワークは、ＰＡＮ（ｐｅｒｓｏｎａｌａｒｅａｎｅｔｗｏｒｋ）、ＬＡＮ（ｌｏｃａｌａｒｅａｎｅｔｗｏｒｋ）、ＣＡＮ（ｃａｍｐｕｓａｒｅａｎｅｔｗｏｒｋ）、ＭＡＮ（ｍｅｔｒｏｐｏｌｉｔａｎａｒｅａｎｅｔｗｏｒｋ）、ＷＡＮ（ｗｉｄｅａｒｅａｎｅｔｗｏｒｋ）、ＢＢＮ（ｂｒｏａｄｂａｎｄｎｅｔｗｏｒｋ）、インターネットなどのネットワークのうちの１つ以上の任意のネットワークを含んでよい。 For example, the external network includes PAN (personal area network), LAN (local area network), CAN (campus area network), MAN (metropolitan area network), WAN (wide network), WAN (wide network), etc. It may include any one or more of the networks.

図２の実施形態でも、電子機器１００は、宅内のデバイスのうちの１つであるか、ユーザ１１０のモバイル機器のうちの１つであってよく、ユーザ１１０の音声入力を受信して処理するための機能と、外部ネットワークを介して外部サーバ２１０に接続して外部サーバ２１０が提供するサービスやコンテンツをユーザ１１０に提供するための機能を含む機器であれば、特に制限されることはない。 Also in the embodiment of FIG. 2, the electronic device 100 may be one of the devices in the home or one of the mobile devices of the user 110, and receives and processes the voice input of the user 110. The device is not particularly limited as long as it includes a function for providing the service and a function for connecting to the external server 210 via an external network and providing services and contents provided by the external server 210 to the user 110.

このように、本発明の実施形態に係る電子機器１００は、音声基盤インタフェースでユーザ１１０の発話によって受信される音声入力を含むユーザ命令を処理することのできる機器であれば、特に制限されなくてよい。例えば、電子機器１００は、ユーザの音声入力を直接に認識および分析して音声入力に適した動作を実行することによってユーザ命令を処理してよいが、実施形態によっては、ユーザの音声入力に対する認識や認識された音声入力の分析、ユーザに提供される音声の合成などの処理を、電子機器１００と連係する外部のプラットフォームで実行してもよい。 As described above, the electronic device 100 according to the embodiment of the present invention is not particularly limited as long as it is a device capable of processing a user command including a voice input received by the utterance of the user 110 on the voice platform interface. Good. For example, the electronic device 100 may process the user command by directly recognizing and analyzing the user's voice input and performing an operation suitable for the voice input, but in some embodiments, the electronic device 100 recognizes the user's voice input. And processing such as analysis of the recognized voice input and synthesis of voice provided to the user may be executed on an external platform linked with the electronic device 100.

図３は、本発明の一実施形態における、クラウド人工知能プラットフォームの例を示した図である。図３は、電子機器３１０とクラウド人工知能プラットフォーム３２０、およびコンテンツ・サービス３３０を示している。 FIG. 3 is a diagram showing an example of a cloud artificial intelligence platform according to an embodiment of the present invention. FIG. 3 shows an electronic device 310, a cloud artificial intelligence platform 320, and a content service 330.

一例として、電子機器３１０は、宅内に備えられるデバイスを意味してよく、少なくとも上述した電子機器１００を含んでよい。このような電子機器３１０や電子機器３１０にインストールおよび駆動されるアプリケーション（以下、「アプリ」とする。）は、インタフェースコネクト３４０を介してクラウド人工知能プラットフォーム３２０と連係することができる。ここで、インタフェースコネクト３４０は、電子機器３１０や電子機器３１０にインストールおよび駆動されるアプリの開発のためのＳＤＫ（ＳｏｆｔｗａｒｅＤｅｖｅｌｏｐｍｅｎｔＫｉｔ）および／または開発文書を開発者に提供してよい。また、インタフェースコネクト３４０は、電子機器３１０や電子機器３１０にインストールおよび駆動されるアプリが、クラウド人工知能プラットフォーム３２０が提供する機能を活用することを可能にする、ＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍＩｎｔｅｒｆａｃｅ）を提供してよい。具体的な例として、開発者は、インタフェースコネクト３４０により提供されるＳＤＫおよび／または開発文書を利用して開発した機器やアプリが、インタフェースコネクト３４０が提供するＡＰＩを利用してクラウド人工知能プラットフォーム３２０により提供される機能を活用できるようにする。 As an example, the electronic device 310 may mean a device provided in the home, and may include at least the electronic device 100 described above. Such an electronic device 310 or an application installed and driven on the electronic device 310 (hereinafter, referred to as “application”) can be linked with the cloud artificial intelligence platform 320 via the interface connect 340. Here, the Interface Connect 340 may provide the developer with an SDK (Software Development Kit) and / or a development document for developing an electronic device 310 or an application installed and driven on the electronic device 310. In addition, Interface Connect 340 provides an API (Application Program Interface) that enables an electronic device 310 or an application installed and driven on the electronic device 310 to utilize the functions provided by the cloud artificial intelligence platform 320. You can. As a specific example, the developer uses the SDK and / or the development document provided by Interface Connect 340 to develop a device or application using the API provided by Interface Connect 340 to create a cloud artificial intelligence platform 320. Allows you to take advantage of the features provided by.

ここで、クラウド人工知能プラットフォーム３２０は、音声基盤のサービスを提供するための機能を提供してよい。例えば、クラウド人工知能プラットフォーム３２０は、受信した音声を認識し、出力する音声を合成するための音声処理モジュール３２１、受信した映像や動画を分析して処理するためのビジョン処理モジュール３２２、受信した音声に適した音声を出力するために適切な対話を決定するための対話処理モジュール３２３、受信した音声に適した機能を薦めるための推薦モジュール３２４、人工知能がデータ学習に基づいて文章単位で言語を翻訳するように支援するニューラル機械翻訳（ＮｅｕｒａｌＭａｃｈｉｎｅＴｒａｎｓｌａｔｉｏｎ：ＮＭＴ）３２５などのように、音声基盤サービスを提供するための多様なモジュールを含んでよい。 Here, the cloud artificial intelligence platform 320 may provide a function for providing a voice-based service. For example, the cloud artificial intelligence platform 320 has an audio processing module 321 for recognizing received audio and synthesizing output audio, a vision processing module 322 for analyzing and processing received video and video, and received audio. Dialogue processing module 323 for determining the appropriate dialogue to output the voice suitable for the received voice, recommendation module 324 for recommending the function suitable for the received voice, artificial intelligence uses the language in sentence units based on data learning. It may include various modules for providing voice-based services, such as Neural Machine Translation (NMT) 325, which assists in translating.

例えば、図１および図２の実施形態において、電子機器１００は、ユーザ１１０の音声入力をインタフェースコネクト３４０が提供するＡＰＩを利用してクラウド人工知能プラットフォーム３２０に送信してよい。この場合、クラウド人工知能プラットフォーム３２０は、受信した音声入力を上述したモジュール３２１〜３２５を活用して認識および分析してよく、受信した音声入力に応じて適切な返答音声を合成して提供したり、適切な動作を推薦したりしてよい。 For example, in the embodiments of FIGS. 1 and 2, the electronic device 100 may transmit the voice input of the user 110 to the cloud artificial intelligence platform 320 using the API provided by the interface connect 340. In this case, the cloud artificial intelligence platform 320 may recognize and analyze the received voice input by utilizing the modules 321 to 325 described above, and synthesize and provide an appropriate response voice according to the received voice input. , You may recommend the appropriate operation.

また、拡張キット３５０は、第三者コンテンツ開発者または会社がクラウド人工知能プラットフォーム３２０を基盤とした新たな音声基盤機能を実現することのできる開発キットを提供してよい。例えば、図２の実施形態において、電子機器１００は、ユーザ１１０の音声入力を外部サーバ２１０に送信してよく、外部サーバ２１０は、拡張キット３５０により提供されるＡＰＩを利用してクラウド人工知能プラットフォーム３２０に音声入力を送信してよい。この場合、上述と同じように、クラウド人工知能プラットフォーム３２０は、受信した音声入力を認識および分析して適切な返答音声を合成して提供したり、音声入力に応じて処理すべき機能に対する推薦情報を外部サーバ２１０に提供したりしてよい。一例として、図２において、外部サーバ２１０は、音声入力「今日の天気」をクラウド人工知能プラットフォーム３２０に送信し、クラウド人工知能プラットフォーム３２０から音声入力「今日の天気」の認識によって抽出されるキーワード「今日の」および「天気」を受信したとする。この場合、外部サーバ２１０は、キーワード「今日の」および「天気」に基づいて「今日の天気は・・・」のようなテキスト情報を生成し、生成されたテキスト情報をクラウド人工知能プラットフォーム３２０に再送してよい。このとき、クラウド人工知能プラットフォーム３２０は、テキスト情報を音声で合成して外部サーバ２１０に提供してよい。外部サーバ２１０は、合成された音声を電子機器１００に送信してよく、電子機器１００は、合成された音声「今日の天気は・・・」をスピーカから出力することにより、ユーザ１１０から受信した音声入力「今日の天気」を処理することができる。このとき、電子機器１００は、ユーザとの対話を基盤としてデバイス動作やコンテンツ提供を実施するためのものである。 Further, the expansion kit 350 may provide a development kit that enables a third-party content developer or a company to realize a new voice-based function based on the cloud artificial intelligence platform 320. For example, in the embodiment of FIG. 2, the electronic device 100 may transmit the voice input of the user 110 to the external server 210, and the external server 210 utilizes the API provided by the expansion kit 350 to be a cloud artificial intelligence platform. A voice input may be transmitted to the 320. In this case, as described above, the cloud artificial intelligence platform 320 recognizes and analyzes the received voice input, synthesizes and provides an appropriate response voice, and recommends information for a function to be processed according to the voice input. May be provided to the external server 210. As an example, in FIG. 2, the external server 210 transmits the voice input “today's weather” to the cloud artificial intelligence platform 320, and the keyword “today's weather” is extracted from the cloud artificial intelligence platform 320 by recognizing the voice input “today's weather”. Suppose you receive "today's" and "weather". In this case, the external server 210 generates text information such as "Today's weather is ..." based on the keywords "today" and "weather", and transfers the generated text information to the cloud artificial intelligence platform 320. You may resend. At this time, the cloud artificial intelligence platform 320 may synthesize text information by voice and provide it to the external server 210. The external server 210 may transmit the synthesized voice to the electronic device 100, and the electronic device 100 receives the synthesized voice from the user 110 by outputting the synthesized voice “Today's weather is ...” from the speaker. It can process the voice input "Today's weather". At this time, the electronic device 100 is for carrying out device operation and content provision based on a dialogue with the user.

図４は、本発明の一実施形態における、電子機器およびサーバの内部構成を説明するためのブロック図である。図４の電子機器４１０は、上述した電子機器１００に対応してよく、サーバ４２０は、上述した外部サーバ２１０またはクラウド人工知能プラットフォーム３２０を実現する１つのコンピュータ装置に対応してよい。 FIG. 4 is a block diagram for explaining the internal configurations of the electronic device and the server according to the embodiment of the present invention. The electronic device 410 of FIG. 4 may correspond to the electronic device 100 described above, and the server 420 may correspond to the external server 210 described above or one computer device realizing the cloud artificial intelligence platform 320.

電子機器４１０とサーバ４２０は、それぞれメモリ４１１、４２１、プロセッサ４１２、４２２、通信モジュール４１３、４２３、および入力／出力インタフェース４１４、４２４を含んでよい。メモリ４１１、４２１は、コンピュータ読取可能な記録媒体であって、ＲＡＭ（ｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ）、ＲＯＭ（ｒｅａｄｏｎｌｙｍｅｍｏｒｙ）、ディスクドライブ、ＳＳＤ（ｓｏｌｉｄｓｔａｔｅｄｒｉｖｅ）、フラッシュメモリ（ｆｌａｓｈｍｅｍｏｒｙ）などのような永久大容量記憶装置（ｐｅｒｍａｎｅｎｔｍａｓｓｓｔｏｒａｇｅｄｅｖｉｃｅ）を含んでよい。ここで、ＲＯＭ、ＳＳＤ、フラッシュメモリ、ディスクドライブなどのような永久大容量記憶装置は、メモリ４１１、４２１とは区分される別の永久記憶装置として電子機器４１０やサーバ４２０に含まれてもよい。また、メモリ４１１、４２１には、オペレーティングシステムと、少なくとも１つのプログラムコード（一例として、電子機器４１０にインストールされ、特定のサービスの提供のために電子機器４１０で駆動するアプリケーションなどのためのコード）が格納されてよい。このようなソフトウェア構成要素は、メモリ４１１、４２１とは別のコンピュータ読取可能な記録媒体からロードされてよい。このような別のコンピュータ読取可能な記録媒体は、フロッピー（登録商標）ドライブ、ディスク、テープ、ＤＶＤ／ＣＤ−ＲＯＭドライブ、メモリカードなどのコンピュータ読取可能な記録媒体を含んでよい。他の実施形態において、ソフトウェア構成要素は、コンピュータ読取可能な記録媒体ではない通信モジュール４１３、４２３を通じてメモリ４１１、４２１にロードされてもよい。例えば、少なくとも１つのプログラムは、開発者またはアプリケーションのインストールファイルを配布するファイル配布システムがネットワーク４３０を介して提供するファイルによってインストールされるコンピュータプログラム（一例として、上述したアプリケーション）に基づいて電子機器４１０のメモリ４１１にロードされてよい。 The electronics 410 and the server 420 may include memories 411, 421, processors 412, 422, communication modules 413, 423, and input / output interfaces 414, 424, respectively. The memories 411 and 421 are computer-readable recording media, such as a RAM (random access memory), a ROM (read only memory), a disk drive, an SSD (solid state drive), and a flash memory (flash memory). Permanent mass storage device may be included. Here, a permanent large-capacity storage device such as a ROM, SSD, flash memory, disk drive, etc. may be included in the electronic device 410 or the server 420 as another permanent storage device classified from the memories 411 and 421. .. In addition, the memories 411 and 421 contain an operating system and at least one program code (for example, a code for an application installed in the electronic device 410 and driven by the electronic device 410 to provide a specific service). May be stored. Such software components may be loaded from a computer-readable recording medium other than the memories 411 and 421. Such other computer-readable recording media may include computer-readable recording media such as floppy (registered trademark) drives, disks, tapes, DVD / CD-ROM drives, and memory cards. In other embodiments, software components may be loaded into memory 411,421 through communication modules 413, 423, which are not computer readable recording media. For example, at least one program is an electronic device 410 based on a computer program (as an example, the application described above) installed by a file provided by a file distribution system that distributes a developer or application installation file over network 430. It may be loaded into the memory 411 of.

プロセッサ４１２、４２２は、基本的な算術、ロジック、および入力／出力演算を実行することにより、コンピュータプログラムの命令を処理するように構成されてよい。命令は、メモリ４１１、４２１または通信モジュール４１３、４２３によって、プロセッサ４１２、４２２に提供されてよい。例えば、プロセッサ４１２、４２２は、メモリ４１１、４２１のような記憶装置に格納されたプログラムコードに従って受信される命令を実行するように構成されてよい。 Processors 412 and 422 may be configured to process instructions in a computer program by performing basic arithmetic, logic, and input / output operations. Instructions may be provided to processors 412, 422 by memory 411, 421 or communication modules 413, 423. For example, processors 412 and 422 may be configured to execute instructions received according to program code stored in storage devices such as memories 411 and 421.

通信モジュール４１３、４２３は、ネットワーク４３０を介して電子機器４１０とサーバ４２０とが互いに通信するための機能を提供してもよく、電子機器４１０および／またはサーバ４２０が他の電子機器または他のサーバと通信するための機能を提供してもよい。一例として、電子機器４１０のプロセッサ４１２がメモリ４１１のような記憶装置に格納されたプログラムコードに従って生成した要求が、通信モジュール４１３の制御に従ってネットワーク４３０を介してサーバ４２０に伝達されてよい。これとは逆に、サーバ４２０のプロセッサ４２２の制御に従って提供される制御信号や命令、コンテンツ、ファイルなどが、通信モジュール４２３およびネットワーク４３０を経て電子機器４１０の通信モジュール４１３を通じて電子機器４１０で受信されてもよい。例えば、通信モジュール４１３を通じて受信したサーバ４２０の制御信号や命令、コンテンツ、ファイルなどは、プロセッサ４１２やメモリ４１１に伝達されてよく、コンテンツやファイルなどは、電子機器４１０がさらに含むことができる記録媒体（上述した永久記憶装置）に格納されてよい。 The communication modules 413 and 423 may provide a function for the electronic device 410 and the server 420 to communicate with each other via the network 430, and the electronic device 410 and / or the server 420 may be another electronic device or another server. May provide the ability to communicate with. As an example, a request generated by the processor 412 of the electronic device 410 according to a program code stored in a storage device such as the memory 411 may be transmitted to the server 420 via the network 430 under the control of the communication module 413. On the contrary, control signals, instructions, contents, files, etc. provided under the control of the processor 422 of the server 420 are received by the electronic device 410 through the communication module 413 of the electronic device 410 via the communication module 423 and the network 430. You may. For example, control signals, commands, contents, files, etc. of the server 420 received through the communication module 413 may be transmitted to the processor 412 and the memory 411, and the contents, files, etc. may be further included in the electronic device 410. It may be stored in (permanent storage device described above).

入力／出力インタフェース４１４は、入力／出力装置４１５とのインタフェースのための手段であってよい。例えば、入力装置は、キーボード、マウス、マイクロフォン、カメラなどの装置を含んでよく、出力装置は、ディスプレイ、スピーカ、ハプティックフィードバックデバイス（ｈａｐｔｉｃｆｅｅｄｂａｃｋｄｅｖｉｃｅ）などのような装置を含んでよい。他の例として、入力／出力インタフェース４１４は、タッチスクリーンのように入力と出力のための機能が１つに統合された装置とのインタフェースのための手段であってもよい。入力／出力装置４１５は、電子機器４１０と１つの装置で構成されてもよい。また、サーバ４２０の入力／出力インタフェース４２４は、サーバ４２０と連結されるかサーバ４２０が含むことができる、入力または出力のための装置（図示せず）とのインタフェースのための手段であってもよい。より具体的な例として、電子機器４１０のプロセッサ４１２がメモリ４１１にロードされたコンピュータプログラムの命令を処理するにあたり、サーバ４２０や他の電子機器が提供するデータを利用して構成されるサービス画面やコンテンツが、入力／出力インタフェース４１４を経てディスプレイに表示されてよい。 The input / output interface 414 may be a means for an interface with the input / output device 415. For example, the input device may include devices such as a keyboard, mouse, microphone, camera, and the output device may include devices such as a display, speaker, haptic feedback device (haptic feedback device), and the like. As another example, the input / output interface 414 may be a means for an interface with a device such as a touch screen in which functions for input and output are integrated into one. The input / output device 415 may be composed of an electronic device 410 and one device. Also, the input / output interface 424 of the server 420 may be a means for interfacing with a device (not shown) for input or output that can be linked to or included in the server 420. Good. As a more specific example, when the processor 412 of the electronic device 410 processes an instruction of a computer program loaded in the memory 411, a service screen configured by using data provided by the server 420 or another electronic device or the like. The content may be displayed on the display via the input / output interface 414.

また、他の実施形態において、電子機器４１０およびサーバ４２０は、図４の構成要素よりも少ないか多くの構成要素を含んでもよい。しかし、大部分の従来技術的構成要素を明確に図に示す必要はない。例えば、電子機器４１０は、上述した入力／出力装置４１５のうちの少なくとも一部を含むように実現されてもよく、トランシーバ、ＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）モジュール、カメラ、各種センサ、データベースなどのような他の構成要素をさらに含んでもよい。より具体的な例として、電子機器４１０がスマートフォンである場合、一般的にスマートフォンに含まれている加速度センサやジャイロセンサ、動作センサ、カメラモジュール、物理的な各種ボタン、タッチパネルを利用したボタン、入力／出力ポート、振動のための振動器などのような多様な構成要素が、電子機器４１０にさらに含まれるように実現されてよい。 Also, in other embodiments, the electronics 410 and the server 420 may include fewer or more components than the components of FIG. However, most prior art components need not be clearly illustrated. For example, the electronic device 410 may be realized to include at least a part of the above-mentioned input / output device 415, such as a transceiver, a GPS (Global Positioning System) module, a camera, various sensors, a database, and the like. Other components may be further included. As a more specific example, when the electronic device 410 is a smartphone, acceleration sensors and gyro sensors, motion sensors, camera modules, various physical buttons, buttons using a touch panel, and inputs generally included in the smartphone are used. Various components such as / output ports, vibrators for vibration, etc. may be implemented to be further included in the electronic device 410.

本実施形態において、電子機器４１０は、ユーザの音声入力を受信するためのマイクを入力／出力装置４１５として基本的に含んでよく、ユーザの音声入力に対応する返答音声やオーディオコンテンツのような音を出力するためのスピーカを入力／出力装置４１５としてさらに含んでよい。 In the present embodiment, the electronic device 410 may basically include a microphone for receiving the user's voice input as the input / output device 415, and sounds such as a response voice or audio content corresponding to the user's voice input. A speaker for outputting the above may be further included as an input / output device 415.

図５は、本発明の一実施形態における、電子機器のプロセッサが含むことができる構成要素の例を示したブロック図であり、図６は、本発明の一実施形態における、電子機器が実行することができる人工知能対話方法の例を示したフローチャートである。 FIG. 5 is a block diagram showing an example of components that can be included in the processor of the electronic device according to the embodiment of the present invention, and FIG. 6 is a block diagram shown by the electronic device according to the embodiment of the present invention. It is a flowchart which showed the example of the artificial intelligence dialogue method which can be done.

本実施形態に係る電子機器４１０内に、人工知能対話システムを構成することができる。人工知能対話システムは、ＰＣ基盤のプログラムまたはモバイル端末専用のアプリケーションで構成されてよい。本実施形態における人工知能対話システムは、独立的に動作するプログラム形態で実現されるか、あるいは特定のアプリケーションのイン−アプリ（ｉｎ−ａｐｐ）形態で構成されて当該特定のアプリケーション上で動作可能であるように実現されてよい。 An artificial intelligence dialogue system can be configured in the electronic device 410 according to the present embodiment. The artificial intelligence dialogue system may consist of a PC-based program or an application dedicated to a mobile terminal. The artificial intelligence dialogue system in the present embodiment can be realized in a program form that operates independently, or can be configured in an in-app form of a specific application and can operate on the specific application. It may be realized as it is.

例えば、電子機器４１０にインストールされたアプリケーションが提供する命令に基づき、電子機器４１０に実現された人工知能対話システムは、人工知能対話方法を実行してよい。図６に係る人工知能対話方法を実行するために、電子機器４１０のプロセッサ４１２は、構成要素として、図５に示すように、設定部５１０、活性化部５２０、および動作実行部を備えることができる。実施形態によっては、プロセッサ４１２の構成要素は、選択的にプロセッサ４１２に含まれても除外されてもよい。また、実施形態によっては、プロセッサ４１２の構成要素は、プロセッサ４１２の機能の表現のために分離されても併合されてもよい。 For example, the artificial intelligence dialogue system realized in the electronic device 410 may execute the artificial intelligence dialogue method based on the instruction provided by the application installed in the electronic device 410. In order to execute the artificial intelligence dialogue method according to FIG. 6, the processor 412 of the electronic device 410 may include a setting unit 510, an activation unit 520, and an operation execution unit as components, as shown in FIG. it can. In some embodiments, the components of processor 412 may be selectively included or excluded from processor 412. Also, in some embodiments, the components of processor 412 may be separated or merged to represent the functionality of processor 412.

このようなプロセッサ４１２およびプロセッサ４１２の構成要素は、図６の人工知能対話方法に含まれる段階６１０〜段階６４０を実行するように電子機器４１０を制御することができる。例えば、プロセッサ４１２およびプロセッサ４１２の構成要素は、メモリ４１１に含まれるオペレーティングシステムのコードと少なくとも１つのプログラムのコードによる命令（ｉｎｓｔｒｕｃｔｉｏｎ）を実行するように実現されてよい。 Such a processor 412 and components of the processor 412 can control the electronic device 410 to perform steps 610 to 640 included in the artificial intelligence interaction method of FIG. For example, the processor 412 and the components of the processor 412 may be implemented to execute instructions by the code of the operating system contained in the memory 411 and the code of at least one program.

ここで、プロセッサ４１２の構成要素は、電子機器４１０に格納されたプログラムコードが提供する命令（一例として、電子機器４１０で駆動するアプリケーションが提供する命令）に従ってプロセッサ４１２によって実行される、プロセッサ４１２の互いに異なる機能（ｄｉｆｆｅｒｅｎｔｆｕｎｃｔｉｏｎｓ）の表現であってよい。例えば、電子機器４１０が各設定過程を実行するように上述した命令に従って電子機器４１０を制御するプロセッサ４１２の機能的表現として、設定部５１０が利用されてよい。 Here, the components of the processor 412 are executed by the processor 412 according to the instructions provided by the program code stored in the electronic device 410 (for example, the instructions provided by the application driven by the electronic device 410). It may be an expression of functions that are different from each other. For example, the setting unit 510 may be used as a functional representation of the processor 412 that controls the electronic device 410 according to the instructions described above so that the electronic device 410 executes each setting process.

段階６１０において、プロセッサ４１２は、電子機器４１０の制御と関連する命令がロードされたメモリ４１１から必要な命令を読み取ることができる。この場合、読み取った命令には、プロセッサ４１２が以下で説明する段階６２０〜段階６４０を実行するように制御するための命令が含まれてよい。 At step 610, the processor 412 can read the required instructions from the memory 411 loaded with the instructions associated with the control of the electronic device 410. In this case, the read instruction may include an instruction for controlling the processor 412 to execute steps 620 to 640 described below.

段階６２０において、設定部５１０は、電子機器４１０の対話機能を活性化させるための対話活性トリガとして利用される２つ以上のウェイクワードを設定することができる。本発明において、ウェイクワードは、電子機器４１０の対話機能を活性化させるための対話活性トリガとして利用されると同時に、電子機器４１０の動作を特定するための用途に利用されてよい。このとき、ウェイクワードは、電子機器４１０で実行可能な動作ごとに事前に定義されて設定されてよく、あるいはユーザによって任意に設定されてもよい。一例として、電子機器４１０の動作は、音声命令に対応する動作を電子機器４１０で直接実行するメインデバイスの役割と、音声命令に対応する動作が電子機器４１０とペアリングしている他のデバイスで実行されるように該当の音声命令を伝達するリレーの役割とに分けられてよい。例えば、図７（Ａ）に示すように、メインデバイスの動作に対するウェイクワードとして「クローバ」が定義され、リレー動作に対するウェイクワードとして「フレンド」が定義されているとする。このとき、電子機器４１０に複数のデバイスがペアリングされている場合、デバイスごとにウェイクワードが異なるように指定し、それぞれ異なるウェイクワードに応じてリレー対象を特定してよい。他の例として、電子機器４１０の動作は、音声命令に対応する返答情報を出力する音声エンジンの種類によって分けられてよい。例えば、図７（Ｂ）に示すように、女性キャラクタの音声エンジンによる動作には「シリ」が、男性キャラクタの音声エンジンによる動作には「ブラウン」が、それぞれウェイクワードとして定義されてよい。この他にも、ウェイクワードはもちろん、各ウェイクワードに対応する電子機器４１０の動作も、ユーザが任意に指定することによって個別化することも可能である。上述した動作は例示的なものに過ぎず、電子機器４１０によって実現可能な動作であればすべて適用可能である。 In step 620, the setting unit 510 can set two or more wakewords used as a dialogue activation trigger for activating the dialogue function of the electronic device 410. In the present invention, the wake word may be used as a dialogue activation trigger for activating the dialogue function of the electronic device 410, and at the same time, may be used for the purpose of specifying the operation of the electronic device 410. At this time, the wake word may be predefined and set for each operation that can be performed by the electronic device 410, or may be arbitrarily set by the user. As an example, the operation of the electronic device 410 is the role of the main device that directly executes the operation corresponding to the voice command in the electronic device 410, and the operation corresponding to the voice command is another device paired with the electronic device 410. It may be divided into the role of a relay that transmits the corresponding voice command to be executed. For example, as shown in FIG. 7A, it is assumed that "clover" is defined as a wake word for the operation of the main device and "friend" is defined as a wake word for the relay operation. At this time, when a plurality of devices are paired with the electronic device 410, the wake word may be specified to be different for each device, and the relay target may be specified according to the different wake word. As another example, the operation of the electronic device 410 may be divided according to the type of voice engine that outputs the response information corresponding to the voice command. For example, as shown in FIG. 7B, “siri” may be defined as a wake word for the action of the female character by the voice engine, and “brown” may be defined as the action of the male character by the voice engine. In addition to this, not only the wake word but also the operation of the electronic device 410 corresponding to each wake word can be individually specified by the user. The above-mentioned operation is merely an example, and any operation that can be realized by the electronic device 410 can be applied.

再び図６を参照すると、段階６３０において、活性化部５２０は、２つ以上のウェイクワードのうちのいずれか１つのウェイクワードが認識されると、電子機器４１０の対話機能を活性化させることができる。活性化部５２０は、電子機器４１０の対話機能が非活性化の状態で音声インタフェース（例えば、スピーカ）によって受信した音声が、複数のウェイクワードのうちのいずれか１つのウェイクワードに該当する場合、対話機能を活性化させることができる。このとき、活性化部５２０は、音声インタフェースで受信した音声の雑音除去などの前処理過程を行った後、前処理された音声が事前に定められたウェイクワードに該当するかどうかを識別してよい。 Referencing FIG. 6 again, in step 630, the activation unit 520 may activate the interactive function of the electronic device 410 when any one of the two or more wake words is recognized. it can. When the voice received by the voice interface (for example, the speaker) in the state where the interactive function of the electronic device 410 is deactivated, the activation unit 520 corresponds to any one of the plurality of wake words. The dialogue function can be activated. At this time, the activation unit 520 performs a preprocessing process such as noise removal of the voice received by the voice interface, and then identifies whether or not the preprocessed voice corresponds to a predetermined wake word. Good.

段階６４０において、動作実行部５３０は、電子機器４１０の対話機能が活性化した状態で音声インタフェースによって受信した音声入力に対し、電子機器４１０が対話機能を活性化するときに利用されたウェイクワードに対応する動作を実行するように制御することができる。動作実行部５３０は、対話機能を活性化するのに利用されたウェイクワードに応じて異なる動作を実行することができる。図７（Ａ）を参照すると、電子機器４１０に２つのウェイクワード「クローバ」と「フレンド」が搭載されている場合、動作実行部５３０は、ウェイクワード「クローバ」によって対話機能が活性化されたときには、メインデバイス動作として、音声命令に対応する動作を電子機器４１０で直接実行し、一方、ウェイクワード「フレンド」によって対話機能が活性化されたときには、リレー動作として、音声命令に対応する動作が電子機器４１０とペアリングされている他のデバイスで実行されるように、該当のデバイスに音声命令を伝達することができる。言い換えれば、電子機器４１０に異なる種類のウェイクワードを搭載しておき、そのうちの１つは電子機器４１０に入力された命令を他のデバイスに伝達するために利用されるようになるが、このとき、電子機器４１０は、マイクと同時にリレーの役割を行うようになる。他の例として、動作実行部５３０は、電子機器４１０で、音声命令に対応する返答情報を、各ウェイクワードに応じて異なるエンジンから出力してよい。図７（Ｂ）を参照すると、電子機器４１０にウェイクワード「シリ」と「ブラウン」が搭載されている場合、動作実行部５３０は、ウェイクワード「シリ」によって対話機能が活性化されたときには、音声命令に対応する返答情報を女性キャラクタエンジンから出力し、ウェイクワード「ブラウン」によって対話機能が活性化されたときには、音声命令に対応する返答情報を男性キャラクタエンジンから出力することができる。言い換えれば、電子機器４１０に異なる種類のウェイクワードを搭載しておくことで、１つの機器内で、ウェイクワードに応じて異なる音声エンジンを呼び出す機能を実現することができる。 In step 640, the operation execution unit 530 makes a wake word used when the electronic device 410 activates the dialogue function with respect to the voice input received by the voice interface in the state where the dialogue function of the electronic device 410 is activated. It can be controlled to perform the corresponding action. The action execution unit 530 can execute different actions depending on the wake word used to activate the interactive function. Referring to FIG. 7A, when the electronic device 410 is equipped with two wake words "clover" and "friend", the operation execution unit 530 is activated by the wake word "clover". Occasionally, as the main device operation, the operation corresponding to the voice instruction is directly executed by the electronic device 410, while when the dialogue function is activated by the wake word "friend", the operation corresponding to the voice instruction is performed as the relay operation. A voice command can be transmitted to the other device paired with the electronic device 410 so that it can be executed. In other words, the electronic device 410 is equipped with different types of wakewords, one of which is used to transmit a command input to the electronic device 410 to another device. , The electronic device 410 will act as a relay at the same time as the microphone. As another example, the operation execution unit 530 may output the response information corresponding to the voice command from the electronic device 410 from a different engine according to each wake word. Referring to FIG. 7B, when the wake words “siri” and “brown” are mounted on the electronic device 410, the operation execution unit 530 is activated when the dialogue function is activated by the wake words “siri”. The response information corresponding to the voice command can be output from the female character engine, and when the dialogue function is activated by the wake word "Brown", the response information corresponding to the voice command can be output from the male character engine. In other words, by mounting different types of wakewords in the electronic device 410, it is possible to realize a function of calling different voice engines according to the wakewords in one device.

したがって、本発明では、電子機器４１０の対話機能を活性化させるためのウェイクワードを２つ以上に区分し、各ウェイクワードに応じて電子機器４１０の動作を異なるように制御することができる。 Therefore, in the present invention, the wake word for activating the interactive function of the electronic device 410 can be divided into two or more, and the operation of the electronic device 410 can be controlled differently according to each wake word.

以下では、人工知能対話システムの具体的なシナリオについて例示的に説明する。 In the following, a specific scenario of the artificial intelligence dialogue system will be described exemplarily.

図８は、本発明の一実施形態における、複数のウェイクワードを利用した制御環境の例を示した図である。図８において、４１０は、音声インタフェースを提供する電子機器の例であって、人工知能スピーカを示しており、人工知能スピーカ４１０がＩＰＴＶのセットトップボックス（ＳＴＢ）８００とペアリングする様子を示している。 FIG. 8 is a diagram showing an example of a control environment using a plurality of wake words in one embodiment of the present invention. In FIG. 8, 410 is an example of an electronic device that provides an audio interface, showing an artificial intelligence speaker, and shows how the artificial intelligence speaker 410 is paired with an IPTV set-top box (STB) 800. There is.

人工知能スピーカ４１０は、該人工知能スピーカ４１０のような内部ネットワーク（図示せず）に連結しているセットトップボックス８００とペアリングが可能である。例えば、人工知能スピーカ４１０とセットトップボックス８００は、Ｗｉ−Ｆｉルータ（図示せず）を利用して同じＷｉ−Ｆｉネットワークに接続しており、相互間でデータ通信が可能である。内部ネットワークの機器の相互間のペアリングおよびデータ通信のためには、オールジョイン（Ａｌｌｊｏｙｎ）方式などが使用されてよいが、これに限定されることはない。 The artificial intelligence speaker 410 can be paired with a set-top box 800 connected to an internal network (not shown) such as the artificial intelligence speaker 410. For example, the artificial intelligence speaker 410 and the set-top box 800 are connected to the same Wi-Fi network using a Wi-Fi router (not shown), and data communication is possible between them. For pairing and data communication between devices in the internal network, an all-join method or the like may be used, but the method is not limited thereto.

人工知能スピーカ４１０には、メインデバイス動作のための基本ウェイクワード（例えば、「クローバ」）の他に、セットトップボックス８００に命令を伝達するリレー動作のための追加ウェイクワード（例えば、「ＴＶフレンド」）が搭載されてよい。ユーザが発する命令語は、人工知能スピーカ４１０での動作を希望する場合には基本ウェイクワードを利用し、ＩＰＴＶでの動作を希望する場合には追加ウェイクワードを利用する。人工知能スピーカ４１０のウェイクワード、すなわち「クローバ」が発話されると、ユーザの音声命令は人工知能スピーカ４１０で直接処理がなされて人工知能スピーカ４１０で該当の動作が実行されるようになるが、セットトップボックス８００のウェイクワード、すなわち「ＴＶフレンド」が発話されると、ユーザの音声命令は人工知能スピーカ４１０からセットトップボックス８００に伝達され、セットトップボックス８００で該当の動作が実行される。具体的な例として、人工知能スピーカ４１０は、機器が活性化された場合、先ずは機器を活性化させたウェイクワードが何であるかを確認し、確認されたウェイクワードに応じ、ユーザが発話した命令語が含まれる音声ＰＣＭ（ｐｕｌｓｅｃｏｄｅｍｏｄｕｌａｔｉｏｎ）を対象機器に伝達して処理されるようにする。 The artificial intelligence speaker 410 includes a basic wake word (eg, "clover") for main device operation, as well as an additional wake word (eg, "TV friend") for relay operation that transmits instructions to the set-top box 800. ") May be installed. As the instruction word issued by the user, the basic wake word is used when the operation on the artificial intelligence speaker 410 is desired, and the additional wake word is used when the operation on the IPTV is desired. When the wake word of the artificial intelligence speaker 410, that is, "clover" is spoken, the user's voice command is directly processed by the artificial intelligence speaker 410, and the corresponding operation is executed by the artificial intelligence speaker 410. When the wake word of the set-top box 800, that is, "TV friend" is spoken, the user's voice command is transmitted from the artificial intelligence speaker 410 to the set-top box 800, and the corresponding operation is executed in the set-top box 800. As a specific example, when the device is activated, the artificial intelligence speaker 410 first confirms what the wake word that activated the device is, and the user speaks according to the confirmed wake word. A voice PCM (pulse code modulation) including a command word is transmitted to a target device so that it can be processed.

したがって、人工知能スピーカ４１０に基本ウェイクワードと追加ウェイクワードを搭載しておき、追加ウェイクワードはセットトップボックス８００に命令するためのウェイクワードとして利用されるが、このとき、人工知能スピーカ４１０は、セットトップボックス８００のマイクの役割を担うようになる。追加ウェイクワードは、セットトップボックス８００の状態に応じてイネーブル（ｅｎａｂｌｅ）またはディスエーブル（ｄｉｓａｂｌｅ）となってよい。イネーブル（ｅｎａｂｌｅ）される状態というのは、デバイス設定過程の中でセットトップボックス８００がネットワークに正常に連結されている場合（このとき、人工知能スピーカ４１０とセットトップボックス８００が連結されているネットワークは同じであるべきである）に該当することができる。ディスエーブル（ｄｉｓａｂｌｅ）される状態というのは、デバイス設定過程の中でセットトップボックス８００がネットワークに連結されていない場合、人工知能スピーカ４１０とセットトップボックス８００が連結されているネットワークが同じでない場合、ユーザが連結を取り消した場合などが該当することができる。 Therefore, the artificial intelligence speaker 410 is equipped with a basic wake word and an additional wake word, and the additional wake word is used as a wake word for instructing the set-top box 800. At this time, the artificial intelligence speaker 410 is used. It will play the role of the microphone of the set-top box 800. The additional wake word may be enabled or disabled depending on the state of the set-top box 800. The enabled state is when the set-top box 800 is normally connected to the network during the device setting process (at this time, the network to which the artificial intelligence speaker 410 and the set-top box 800 are connected). Should be the same). The disabled state is when the set-top box 800 is not connected to the network during the device setting process, or when the artificial intelligence speaker 410 and the set-top box 800 are not connected to the same network. , When the user cancels the connection, etc. can be applicable.

図９は、本発明の一実施形態における、複数のウェイクワードを利用して人工知能スピーカ４１０の動作を制御する過程の例を示したフローチャートである。 FIG. 9 is a flowchart showing an example of a process of controlling the operation of the artificial intelligence speaker 410 by using a plurality of wake words in one embodiment of the present invention.

図９を参照すると、人工知能スピーカ４１０は、初期設定過程でネットワーク設定が完了した後、連結可能なセットトップボックス８００を検索してよい（ＤｉｓｃｏｖｅｒｙＭｏｄｅ）（Ｓ９０１）。 Referring to FIG. 9, the artificial intelligence speaker 410 may search the connectable set-top box 800 after the network setting is completed in the initial setting process (Discovery Mode) (S901).

人工知能スピーカ４１０がＤｉｓｃｏｖｅｒｙＭｏｄｅになると、オールジョイン方式により、一定の時間（例えば、３０秒間）の間、オールジョイン信号に応答するセットトップボックス８００を待機する（Ｓ９０２）。 When the artificial intelligence speaker 410 becomes the Discovery Mode, the set-top box 800 that responds to the all-join signal is waited for a certain period of time (for example, 30 seconds) by the all-join method (S902).

人工知能スピーカ４１０は、オールジョイン信号に一定の時間内に応答したセットトップボックス８００と連結することができる（ペアリング）（Ｓ９０３）。 The artificial intelligence speaker 410 can be connected to the set-top box 800 that responds to the all-join signal within a certain period of time (pairing) (S903).

このとき、人工知能スピーカ４１０が連結している内部ネットワークにセットトップボックスが１つだけ存在する場合には、人工知能スピーカ４１０とセットトップボックス８００とは直ぐにペアリングをする。ＩＰＴＶ画面には、「ＡＩスピーカと連結しました。」のようなメッセージを含むポップアップ画面が表示される。 At this time, if there is only one set-top box in the internal network to which the artificial intelligence speaker 410 is connected, the artificial intelligence speaker 410 and the set-top box 800 are immediately paired. On the IPTV screen, a pop-up screen containing a message such as "Connected with AI speaker" is displayed.

一方、人工知能スピーカ４１０が連結している内部ネットワークにセットトップボックスが２つ以上存在する場合には、連結可能なすべてのＩＰＴＶ画面に「ＡＩスピーカと連結しますか？連結ボタンを押してください。」のようなポップアップが表示され、最も先に連結ボタンを押したセットトップボックス８００とペアリングをする。どのセットトップボックスと連結可能であるかを、ＩＰＴＶ画面に表示されるポップアップメッセージからユーザが確認できるようにし、進行過程では音声案内を行うことによってユーザの混乱を最小化することができる。 On the other hand, if there are two or more set-top boxes in the internal network to which the artificial intelligence speaker 410 is connected, press the "Connect with AI speaker? Connect button" on all connectable IPTV screens. A pop-up like "" is displayed, and pairing is performed with the set-top box 800 that pressed the connect button first. The user can confirm which set-top box can be connected from the pop-up message displayed on the IPTV screen, and the user's confusion can be minimized by providing voice guidance during the process.

人工知能スピーカ４１０が連結している内部ネットワークにセットトップボックスが存在しない場合には、連結可能なＩＰＴＶは見つからなかったという案内音声と同時に、ＤｉｓｃｏｖｅｒｙＭｏｄｅは終了する。 If the set-top box does not exist in the internal network to which the artificial intelligence speaker 410 is connected, the Discovery Mode ends at the same time as the guidance voice that the connectable IPTV was not found.

セットトップボックス８００がアクティブモード（ＡｃｔｉｖｅＭｏｄｅ）であるときだけでなく、スリープモード（ＳｌｅｅｐＭｏｄｅ）であるときでも、人工知能スピーカ４１０とのペアリングは常に可能でなければならない。さらに、人工知能スピーカ４１０は、セットトップボックス８００とのペアリング完了後に起動を実行するが、セットトップボックス８００とのペアリング設定に失敗しても人工知能スピーカ４１０の起動は実行させることで、人工知能スピーカ４１０の利用を可能にしなければならない。 Pairing with the artificial intelligence speaker 410 must always be possible, not only when the set-top box 800 is in the active mode, but also when it is in the sleep mode (Sleep Mode). Further, the artificial intelligence speaker 410 is activated after the pairing with the set-top box 800 is completed, but even if the pairing setting with the set-top box 800 fails, the artificial intelligence speaker 410 is activated. The use of the artificial intelligence speaker 410 must be enabled.

機器間のペアリングはもちろん、ウェイクワードなどに対する設定は、別の機器（例えば、ユーザのスマートフォン）で実行される、人工知能スピーカ４１０のネットワーク連結および各設定のためのアプリ（すなわち、マネジネントアプリ）で行われてもよい。マネジネントアプリは、同じ内部ネットワーク上に連結している人工知能スピーカ４１０と１つ以上の他のデバイス、例えば、セットトップボックス８００を検索し、該当の情報を受信して設定を行うことができる。 Not only pairing between devices, but also settings for wake words etc. are executed on another device (for example, the user's smartphone), an application for network connection of the artificial intelligence speaker 410 and each setting (that is, a management application). It may be done in. The management app can search for the artificial intelligence speaker 410 and one or more other devices, such as the set-top box 800, connected on the same internal network, receive the information, and configure the settings.

人工知能スピーカ４１０は、セットトップボックス８００とのペアリングが完了するかタイムアウト（Ｔｉｍｅｏｕｔ）した場合、該当の情報をサーバ４２０に伝達してよい。このとき、サーバ４２０にセットトップボックスの登録が受信されると、マネジネントアプリでは設定画面に「セットトップボックスと連結しました。」あるいは「セットトップボックスとの連結に失敗しました。」のようなトーストポップアップを表示した後、メイン画面に移動するようになる。マネジネントアプリの設定画面にはセットトップボックスの登録状況が示されるが、このとき、登録されていれば、セットトップモデル（マックアドレス）形態でユーザに表示される。 When the pairing with the set-top box 800 is completed or timed out, the artificial intelligence speaker 410 may transmit the corresponding information to the server 420. At this time, when the registration of the set-top box is received on the server 420, the management application displays "Connected to the set-top box" or "Failed to connect to the set-top box" on the setting screen. After displaying the toast pop-up, you will be taken to the main screen. The setting screen of the management application shows the registration status of the set-top box. At this time, if it is registered, it is displayed to the user in the form of a set-top model (Mac address).

また、人工知能スピーカ４１０は、必要な設定が完了するまでは、ユーザがウェイクワードを発話したとしても、「連結中です。しばらくお待ちください。」のようなローカルアナウンスを発話し、活性化は行わない。 In addition, the artificial intelligence speaker 410 is activated by issuing a local announcement such as "Connecting. Please wait." Even if the user speaks a wake word until the necessary settings are completed. Absent.

人工知能スピーカ４１０でプロビジョニングをする際に、セットトップボックス８００とのペアリング履歴が存在すれば、上述したような初期設定過程をスキップし、ここ最近にペアリングした履歴のあるセットトップボックス８００と自動で再連結してよい。 If there is a pairing history with the set-top box 800 when provisioning with the artificial intelligence speaker 410, the initial setting process as described above is skipped, and the set-top box 800 with the history of recent pairing is used. It may be reconnected automatically.

マネジネントアプリを利用して内部ネットワークを再設定するとき、ネットワーク設定後にはペアリング過程をもう一度行わなければならない。このとき、人工知能スピーカ４１０に格納されていたペアリング設定履歴は、ネットワークの再設定後には削除される。 When reconfiguring the internal network using the management app, the pairing process must be repeated after the network is configured. At this time, the pairing setting history stored in the artificial intelligence speaker 410 is deleted after the network is reset.

人工知能スピーカ４１０は、セットトップボックス８００とのペアリング設定が終わり、基本ウェイクワード「クローバ」が発話されると、メインデバイス動作のために対話機能を活性化させることができる（Ｓ９０４）。 When the pairing setting with the set-top box 800 is completed and the basic wake word "clover" is spoken, the artificial intelligence speaker 410 can activate the interactive function for the operation of the main device (S904).

一方、人工知能スピーカ４１０は、セットトップボックス８００とのペアリング設定が終わり、追加ウェイクワード「ＴＶフレンド」が発話されると、リレー動作のために対話機能を活性化させることができる（Ｓ９０６）。 On the other hand, when the pairing setting with the set-top box 800 is completed and the additional wake word "TV friend" is uttered, the artificial intelligence speaker 410 can activate the dialogue function for the relay operation (S906). ..

人工知能スピーカ４１０は、ウェイクワードによって対話機能を活性化した場合、ＬＥＤのような表示手段を利用して活性化状態を表示してよい。このとき、基本ウェイクワードによる活性化状態と追加ウェイクワードによる活性化状態を区別して表示してよい。例えば、基本ウェイクワードによって活性化した場合には「オレンジ色」のＬＥＤが、追加ウェイクワードによって活性化した場合には「紫色」のＬＥＤが点灯してよい。 When the interactive function is activated by the wake word, the artificial intelligence speaker 410 may display the activated state by using a display means such as an LED. At this time, the activation state by the basic wake word and the activation state by the additional wake word may be displayed separately. For example, the "orange" LED may be lit when activated by the basic wake word, and the "purple" LED may be lit when activated by the additional wake word.

人工知能スピーカ４１０が基本ウェイクワード「クローバ」によって活性化された場合には、ユーザの音声命令に対応する動作を電子機器４１０で直接実行するようになるが、一例として、音声命令に対応する返答情報をサーバ４２０から取得し、人工知能スピーカ４１０の音声インタフェースから出力することができる（Ｓ９０５）。 When the artificial intelligence speaker 410 is activated by the basic wake word "clover", the electronic device 410 directly executes the operation corresponding to the user's voice command. As an example, the response corresponding to the voice command is performed. Information can be acquired from the server 420 and output from the voice interface of the artificial intelligence speaker 410 (S905).

人工知能スピーカ４１０が追加ウェイクワード「ＴＶフレンド」によって活性化された場合には、ユーザの音声命令に対応する動作がセットトップボックス８００で実行されるように、該当の音声命令をセットトップボックス８００に伝達する（Ｓ９０７）。これにより、セットトップボックス８００は、人工知能スピーカ４１０からユーザの音声命令を受信し、該当の音声命令に対応する動作をセットトップボックス８００で実行するようになるが、一例として、音声命令に対応する返答情報をサーバ４２０から取得し、セットトップボックス８００のインタフェースから出力することができる（Ｓ９０８）。 When the artificial intelligence speaker 410 is activated by the additional wake word "TV Friend", the corresponding voice instruction is set in the set-top box 800 so that the operation corresponding to the user's voice instruction is executed in the set-top box 800. (S907). As a result, the set-top box 800 receives the user's voice command from the artificial intelligence speaker 410 and executes the operation corresponding to the corresponding voice command in the set-top box 800. As an example, the set-top box 800 corresponds to the voice command. The response information to be executed can be acquired from the server 420 and output from the interface of the set-top box 800 (S908).

人工知能スピーカ４１０がセットトップボックス８００とペアリングしている状態で追加ウェイクワードによって活性化されるようになれば、ユーザから入力された音声命令をセットトップボックス８００に伝達する。このとき、人工知能スピーカ４１０は、セットトップボックス８００との通信に基づいて、ユーザが、ＩＰＴＶでサポートされていない機能を発話した場合には、「サポートされていない機能です。」のような案内アナウンスを出力することができる。この他にも、人工知能スピーカ４１０は、セットトップボックス８００の電源状態や連結状態などをモニタリングし、ユーザの発話時にセットトップボックス８００の状態に対応する案内アナウンスを出力してもよい。 When the artificial intelligence speaker 410 is activated by the additional wake word while being paired with the set-top box 800, the voice command input from the user is transmitted to the set-top box 800. At this time, when the user utters a function that is not supported by the IPTV based on the communication with the set-top box 800, the artificial intelligence speaker 410 provides guidance such as "This is an unsupported function." Announcements can be output. In addition to this, the artificial intelligence speaker 410 may monitor the power supply state, the connected state, and the like of the set-top box 800, and output a guidance announcement corresponding to the state of the set-top box 800 when the user speaks.

言い換えれば、人工知能スピーカ４１０に対するウェイクワードのうち、基本ウェイクワードが発話されたときには、ユーザの音声命令が人工知能スピーカ４１０に伝達され、人工知能スピーカ４１０で該当の動作が実行されるが、追加ウェイクワードが発話されたときには、ユーザの音声命令が人工知能スピーカ４１０を経てセットトップボックス８００に伝達され、セットトップボックス８００で該当の動作が実行されるようになる。例えば、ユーザが「クローバ、今日の天気は？」のような発話をした場合、人工知能スピーカ４１０はサーバ４２０から今日の天気に関する情報を取得し、取得した情報を「今日の天気は・・・」のようにスピーカ音声によって出力する。一方、ユーザが「ＴＶフレンド、今日の天気は？」のような発話をした場合には、該当の音声命令が人工知能スピーカ４１０を経てセットトップボックス８００に伝達され、セットトップボックス８００はサーバ４２０から今日の天気に関する情報を取得し、取得した情報を「今日の天気は・・・」のようにＴＶ音声で出力したり、天気に関する情報を画面に出力したりする。 In other words, of the wake words for the artificial intelligence speaker 410, when the basic wake word is spoken, the user's voice command is transmitted to the artificial intelligence speaker 410, and the corresponding operation is executed by the artificial intelligence speaker 410. When the wake word is spoken, the user's voice command is transmitted to the set top box 800 via the artificial intelligence speaker 410, and the corresponding operation is executed in the set top box 800. For example, when the user utters something like "Clover, what's the weather today?", The artificial intelligence speaker 410 acquires information about today's weather from the server 420, and the acquired information is "What's the weather today ...". It is output by speaker sound like ". On the other hand, when the user makes a speech such as "TV friend, what is the weather today?", The corresponding voice command is transmitted to the set-top box 800 via the artificial intelligence speaker 410, and the set-top box 800 is the server 420. Information on today's weather is acquired from, and the acquired information is output by TV sound such as "Today's weather is ...", or information on the weather is output on the screen.

図８と図９では、人工知能スピーカ４１０が１つのセットトップボックス８００とペアリングしていることについて説明しているが、これに限定されることはなく、複数のセットトップボックスあるいは互いに異なる複数のデバイスと連動することも可能である。人工知能スピーカ４１０と複数のデバイスとがペアリングする場合も、デバイスごとにウェイクワードを異なるように指定してよく、例えば、人工知能スピーカ４１０とペアリングするセットトップボックスが２つであるときにはウェイクワードを３つ、すなわち、基本ウェイクワード、セットトップボックス１のウェイクワード、セットトップボックス２のウェイクワードを指定して搭載してよい。 8 and 9 describe that the artificial intelligence speaker 410 is paired with one set-top box 800, but the present invention is not limited to this, and a plurality of set-top boxes or a plurality of different set-top boxes may be used. It is also possible to work with the device of. When pairing the artificial intelligence speaker 410 with a plurality of devices, the wake word may be specified to be different for each device. For example, when there are two set-top boxes to be paired with the artificial intelligence speaker 410, the wake word may be specified. Three words, that is, a basic wake word, a wake word of the set-top box 1, and a wake word of the set-top box 2 may be specified and mounted.

基本ウェイクワードと追加ウェイクワードは、事前に定義されたウェイクワードが適用されてよいが、ユーザが直接指定することも可能である。例えば、宅内に２つのセットトップボックスが存在する場合、人工知能スピーカ４１０に２つのセットトップボックスを連結させ、セットトップボックス１のウェイクワードとセットトップボックス２のウェイクワードはユーザが直接指定してよい。 The basic wake word and the additional wake word may be a predefined wake word, but can also be specified directly by the user. For example, when there are two set-top boxes in the house, the artificial intelligence speaker 410 is connected to the two set-top boxes, and the wake word of the set-top box 1 and the wake word of the set-top box 2 are directly specified by the user. Good.

したがって、基本ウェイクワードの他に追加ウェイクワードが搭載され、追加ウェイクワードを利用して人工知能スピーカ４１０を活性化させた場合、人工知能スピーカ４１０は、ユーザの音声命令を追加ウェイクワードに対応するデバイスに伝達する役割をする。 Therefore, when an additional wake word is installed in addition to the basic wake word and the artificial intelligence speaker 410 is activated by using the additional wake word, the artificial intelligence speaker 410 corresponds to the user's voice command to the additional wake word. It is responsible for communicating to the device.

この他にも、上述したように、複数のウェイクワードを利用することにより、１つの人工知能スピーカ４１０内で各ウェイクワードに応じて動作を異ならせることができ、あるいは異なるエンジンを呼び出す機能をサポートすることが可能となる。 In addition to this, as described above, by using a plurality of wake words, it is possible to make the operation different according to each wake word in one artificial intelligence speaker 410, or to support a function of calling a different engine. It becomes possible to do.

このように、本発明の実施形態によると、音声基盤インタフェースを提供する人工知能機器のウェイクワードを２つ以上に区分し、各ウェイクワードに応じて人工知能機器の動作を異なるように制御することができる。 As described above, according to the embodiment of the present invention, the wake words of the artificial intelligence device that provides the voice-based interface are divided into two or more, and the operation of the artificial intelligence device is controlled so as to be different according to each wake word. Can be done.

上述した装置は、ハードウェア構成要素、ソフトウェア構成要素、および／またはハードウェア構成要素とソフトウェア構成要素との組み合わせによって実現されてよい。例えば、実施形態で説明された装置および構成要素は、プロセッサ、コントローラ、ＡＬＵ（ａｒｉｔｈｍｅｔｉｃｌｏｇｉｃｕｎｉｔ）、デジタル信号プロセッサ、マイクロコンピュータ、ＦＰＧＡ（ｆｉｅｌｄｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙ）、ＰＬＵ（ｐｒｏｇｒａｍｍａｂｌｅｌｏｇｉｃｕｎｉｔ）、マイクロプロセッサ、または命令を実行して応答することができる様々な装置のように、１つ以上の汎用コンピュータまたは特殊目的コンピュータを利用して実現されてよい。処理装置は、オペレーティングシステム（ＯＳ）およびＯＳ上で実行される１つ以上のソフトウェアアプリケーションを実行してよい。また、処理装置は、ソフトウェアの実行に応答し、データにアクセスし、データを格納、操作、処理、および生成してもよい。理解の便宜のために、１つの処理装置が使用されるとして説明される場合もあるが、当業者は、処理装置が複数個の処理要素および／または複数種類の処理要素を含んでもよいことを理解できるであろう。例えば、処理装置は、複数個のプロセッサまたは１つのプロセッサと１つのコントローラを含んでよい。また、並列プロセッサのような、他の処理構成も可能である。 The devices described above may be implemented by hardware components, software components, and / or combinations of hardware components and software components. For example, the devices and components described in the embodiments include a processor, a controller, an ALU (arithmetic logic unit), a digital signal processor, a microcomputer, an FPGA (field programgate array), a PLU (programmable log unit), a microprocessor, and the like. Alternatively, it may be implemented using one or more general purpose computers or special purpose computers, such as various devices capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the OS. The processing device may also respond to the execution of the software, access the data, store, manipulate, process, and generate the data. For convenience of understanding, one processor may be described as being used, but one of ordinary skill in the art will appreciate that the processor may include multiple processing elements and / or multiple types of processing elements. You can understand. For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations, such as parallel processors, are also possible.

ソフトウェアは、コンピュータプログラム、コード、命令、またはこれらのうちの１つ以上の組み合わせを含んでもよく、所望のとおりに動作するように処理装置を構成してもよく、独立的または集合的に処理装置に命令してもよい。ソフトウェアおよび／またはデータは、処理装置に基づいて解釈されたり、処理装置に命令またはデータを提供したりするために、いかなる種類の機械、コンポーネント、物理装置、コンピュータ記録媒体または装置に具現化されてよい。ソフトウェアは、ネットワークによって接続されたコンピュータシステム上に分散され、分散された状態で格納されても実行されてもよい。ソフトウェアおよびデータは、１つ以上のコンピュータ読取可能な記録媒体に格納されてよい。 The software may include computer programs, codes, instructions, or a combination of one or more of these, and the processing equipment may be configured to operate as desired, either independently or collectively. May be ordered to. The software and / or data is embodied in any type of machine, component, physical device, computer recording medium or device to be interpreted based on the processing device or to provide instructions or data to the processing device. Good. The software is distributed on networked computer systems and may be stored or executed in a distributed state. The software and data may be stored on one or more computer-readable recording media.

実施形態に係る方法は、多様なコンピュータ手段によって実行可能なプログラム命令の形態で実現されてコンピュータ読取可能な媒体に記録されてよい。このとき、媒体は、コンピュータ実行可能なプログラムを継続して格納するものであってもよく、実行またはダウンロードのために一時的に格納するものであってもよい。また、媒体は、単一または複数のハードウェアが結合した形態の多様な記録手段または格納手段であってよく、あるコンピュータシステムに直接に接続する媒体に限定されてはならず、ネットワーク上に分散して存在するものであってもよい。媒体の例としては、ハードディスク、フロッピー（登録商標）ディスク、および磁気テープのような磁気媒体、ＣＤ−ＲＯＭ、ＤＶＤのような光媒体、フロプティカルディスク（ｆｌｏｐｔｉｃａｌｄｉｓｋ）のような光磁気媒体、およびＲＯＭ、ＲＡＭ、フラッシュメモリなどを含み、プログラム命令が格納されるように構成されたものであってよい。また、媒体の他の例として、アプリケーションを流通するアプリストアやその他の多様なソフトウェアを供給あるいは流通するサイト、サーバなどで管理する記録媒体あるいは格納媒体が挙げられてもよい。 The method according to the embodiment may be realized in the form of program instructions that can be executed by various computer means and recorded on a computer-readable medium. At this time, the medium may be one that continuously stores a computer-executable program, or one that temporarily stores the program for execution or download. Also, the medium may be a variety of recording or storage means in the form of a combination of single or multiple hardware, not limited to media that are directly connected to a computer system, and distributed over the network. It may exist. Examples of media include hard disks, floppy (registered trademark) disks, magnetic media such as magnetic tapes, optical media such as CD-ROMs and DVDs, and optical magnetic media such as flotropic disks. And ROM, RAM, flash memory, etc., and may be configured to store program instructions. Further, as another example of the medium, a recording medium or a storage medium managed by an application store that distributes applications, a site that supplies or distributes various other software, a server, or the like may be mentioned.

以上のように、実施形態を、限定された実施形態と図面に基づいて説明したが、当業者であれば、上述した記載から多様な修正および変形が可能であろう。例えば、説明された技術が、説明された方法とは異なる順序で実行されたり、かつ／あるいは、説明されたシステム、構造、装置、回路などの構成要素が、説明された方法とは異なる形態で結合されたりまたは組み合わされたり、他の構成要素または均等物によって代替されたり置換されたとしても、適切な結果を達成することができる。 As described above, the embodiments have been described based on the limited embodiments and drawings, but those skilled in the art will be able to make various modifications and modifications from the above description. For example, the techniques described may be performed in a different order than the methods described, and / or components such as the systems, structures, devices, circuits described may be in a form different from the methods described. Appropriate results can be achieved even if they are combined or combined, or replaced or replaced by other components or equivalents.

したがって、異なる実施形態であっても、特許請求の範囲と均等なものであれば、添付される特許請求の範囲に属する。 Therefore, even different embodiments belong to the attached claims as long as they are equivalent to the claims.

４１２：プロセッサ
５１０：設定部
５２０：活性化部
５３０：動作実行部 412: Processor 510: Setting unit 520: Activation unit 530: Operation execution unit

Claims

An artificial intelligence dialogue method that is executed in an electronic device realized by a computer.
The stage of pairing the electronic device with other devices connected to the network,
When any one of a plurality of preset wake words is recognized by the voice interface of the electronic device, the step of activating the dialogue function and the input in the state where the dialogue function is activated to voice command is, the recognized viewing including the step of controlling to perform different operations in response to the wake word, the wake word, containing the basic wake word and additional wake word,
The control step is
When the dialogue function is activated by the basic wake word, the operation corresponding to the voice command is executed by the electronic device.
An artificial intelligence dialogue that transmits the voice command to the other device so that when the dialogue function is activated by the additional wake word, the operation corresponding to the voice command is executed on the other device. Method.

The artificial intelligence dialogue method according to claim 1, further comprising a step of setting a wake word for identifying the corresponding operation for each operation that can be performed by the electronic device.

The artificial intelligence dialogue method according to claim 1, wherein the wake word and the operation of each wake word are personalized to the user of the electronic device.

The pairing stage is
The artificial intelligence dialogue method according to claim 1 , wherein another device connected to the network is searched and paired with the device that responds to the search signal.

The electronic device and when other devices that pairing is plural, further comprising the step of setting differently the additional wake word for each device, artificial intelligence conversation method of claim 1.

The control step is
The artificial intelligence dialogue method according to claim 1, wherein a different engine is called according to the recognized wake word, and the response information corresponding to the voice command is output from the engine.

The activation step is
The artificial intelligence dialogue method according to claim 1, further comprising a step of distinguishing and displaying the activated state according to the recognized wake word.

A computer program that combines with a computer to cause the computer to execute the artificial intelligence dialogue method according to any one of claims 1 to 7 .

A computer-readable recording medium, wherein a program for causing a computer to execute the artificial intelligence dialogue method according to any one of claims 1 to 7 is recorded.

An artificial intelligence dialogue system for electronic devices realized by a computer.
Includes at least one processor implemented to execute computer-readable instructions
The at least one processor
A setting unit that sets two or more wake words used as a dialogue activation trigger for activating the dialogue function of the electronic device, and pairs the electronic device with another device connected to the network .
When any one of the wake words is recognized by the voice interface of the electronic device, the activation unit that activates the dialogue function and the voice input in the state where the dialogue function is activated. The wake word includes an action execution unit that controls the command to perform a different action according to the recognized wake word, and the wake word includes a basic wake word and an additional wake word.
The operation execution unit
When the dialogue function is activated by the basic wake word, the operation corresponding to the voice command is executed by the electronic device.
An artificial intelligence dialogue that transmits the voice command to the other device so that when the dialogue function is activated by the additional wake word, the operation corresponding to the voice command is executed on the other device. system.

The setting unit,
The artificial intelligence dialogue system according to claim 10 , wherein a wake word for specifying the corresponding operation is set for each operation that can be performed by the electronic device.

The artificial intelligence dialogue system according to claim 10 , wherein the wake word and the operation of each wake word are personalized to the user of the electronic device.

The setting unit,
The artificial intelligence dialogue system according to claim 10 , wherein another device connected to the network is searched and paired with the device that responds to the search signal.

The setting unit,
The artificial intelligence dialogue system according to claim 10 , wherein when there are a plurality of other devices to be paired with the electronic device, the additional wake word is set differently for each device.

The operation execution unit
The artificial intelligence dialogue system according to claim 10 , wherein a different engine is called according to the recognized wake word, and the response information corresponding to the voice command is output from the engine.

The activation part is
The artificial intelligence dialogue system according to claim 10 , wherein the activated state is displayed separately according to the recognized wake word.