JP2020038709A

JP2020038709A - Continuous conversation function with artificial intelligence device

Info

Publication number: JP2020038709A
Application number: JP2019206132A
Authority: JP
Inventors: ジウンイ; Jieun Lee; ドンヨルイ; Dong Yeoul Lee; ジンウクホン; Jinook Hong; ギョンヨンキム; Kyungyeon Kim
Original assignee: Line Corp; Naver Corp
Current assignee: Z Intermediate Global Corp; Naver Corp
Priority date: 2017-08-22
Filing date: 2019-11-14
Publication date: 2020-03-12
Anticipated expiration: 2038-08-13
Also published as: KR102098633B1; JP6619488B2; JP2019040602A; JP6920398B2; KR20190021012A

Abstract

To provide a technique for providing an artificial intelligence device with a continuous conversation function.SOLUTION: An artificial intelligence continuous conversation method comprises steps of; switching into a voice command stand-by mode by activating a conversation function upon recognition of a conversation activation trigger while the conversation function is inactive; upon receipt of a voice command while in the voice command stand-by mode, performing a task corresponding to the voice command; and enabling a continuous conversation function by switching to the next voice command stand-by mode straight away according to the type of the task.SELECTED DRAWING: Figure 5

Description

以下の説明は、人工知能会話システムに関する。 The following description relates to an artificial intelligence conversation system.

一般的に、個人秘書システム、チャットボットプラットフォーム（ｃｈａｔｂｏｔｐｌａｔｆｏｒｍ）、人工知能（ＡＩ）スピーカなどで使用される人工知能会話システムは、人間の命令語に対する意図を理解し、これに対応する返答を提供する方式を採用している。 Generally, an artificial intelligence conversation system used in a personal secretary system, a chatbot platform, an artificial intelligence (AI) speaker, etc., understands a human intention to a command word and provides a corresponding response. It adopts the method of doing.

主に人工知能会話システムは、人間が機能的な要求を伝達すると、マシンが人間の要求に対する返答を提供する方式で実行されるが、マイクでユーザの音声入力を受信し、受信した音声入力に基づいてデバイス動作やコンテンツ提供を制御することができる。 Primarily, artificial intelligence conversation systems are executed in such a way that when a human conveys a functional request, a machine provides a response to the human request. Device operation and content provision can be controlled based on the device operation.

例えば、特許文献１は、ホームネットワークサービスにおいて、移動通信ネットワーク（第１通信ネットワーク）の他にＷｉ−Ｆｉ（登録商標）のような第２通信ネットワークを利用してホームネットワークサービスを提供することが可能であり、宅内の複数のマルチメディア機器を、ユーザがボタン操作をしなくても音声命令によってマルチコントロールすることができる技術を開示している。 For example, Patent Literature 1 discloses that in a home network service, a home network service is provided using a second communication network such as Wi-Fi (registered trademark) in addition to a mobile communication network (first communication network). It discloses a technique capable of multi-controlling a plurality of multimedia devices in a house by voice command without a user operating a button.

一般的に、人工知能会話システムは、事前に定められたキーワード（すなわち、ウェイクアップワード（ｗａｋｅｕｐｗｏｒｄ））（例えば、機器名など）を、機器をアクティブ化するための会話アクティブ化トリガーとして使用している。 In general, artificial intelligence conversation systems use a predetermined keyword (ie, wake up word) (eg, device name, etc.) as a conversation activation trigger to activate a device. are doing.

人工知能機器は、キーワード呼び出しを基盤として音声認識機能を実行するようになるが、例えば、ユーザが機器名を呼ぶと機器がアクティブ化し、ユーザの音声命令を収集するための待機モードになる。 The artificial intelligence device performs a voice recognition function based on a keyword call. For example, when a user calls a device name, the device is activated and enters a standby mode for collecting voice commands of the user.

キーワードの呼び出しがなければ機器が自らアクティブ化することはなく、短い時間の間に音声命令を再試行しようとする場合は、そのたびに先ずはキーワードの呼び出しが要求されるため、使用のたびに疲労を感じざるを得なかった。 If there is no keyword call, the device does not activate itself, and if it tries to retry the voice command in a short time, it will be required to call the keyword each time, so each time it is used, I had to feel tired.

韓国公開特許第１０−２０１１−０１３９７９７号公報Korean Published Patent Application No. 10-2011-0139797

機器をアクティブ化するためのキーワード（ウェイクアップワード）の呼び出しがなくても音声命令を再要求することができる、人工知能連続会話方法およびシステムを提供する。 Provided is an artificial intelligence continuous conversation method and system capable of re-requesting a voice command without invoking a keyword (wake-up word) for activating a device.

以前の音声命令に対応する作業によっては、キーワードの呼び出しがなくても自動で音声命令待機状態に切り換えることができる、人工知能連続会話方法およびシステムを提供する。 Provided is an artificial intelligence continuous conversation method and system that can automatically switch to a voice command waiting state without calling a keyword depending on a task corresponding to a previous voice command.

コンピュータによって実現される人工知能連続会話方法であって、会話機能が非アクティブな状態で会話アクティブ化トリガーが認識されると、前記会話機能をアクティブ化させて音声命令待機状態に入る段階、前記音声命令待機状態で音声命令が受信されると、前記音声命令に対応する作業を実行する段階、および前記作業の類型に基づいて次の音声命令待機状態に直ぐに入ることによって連続会話機能を提供する段階を含む、人工知能連続会話方法を提供する。 An artificial intelligence continuous conversation method implemented by a computer, wherein when a conversation activation trigger is recognized while the conversation function is inactive, the conversation function is activated to enter a voice command waiting state; Performing a task corresponding to the voice command when a voice command is received in the command waiting state; and providing a continuous conversation function by immediately entering a next voice command waiting state based on the type of the task. And an artificial intelligence continuous conversation method.

一側面によると、ウェイクアップワードに指定されたキーワードが前記会話アクティブ化トリガーとして利用され、前記連続会話機能を提供する段階は、前記音声命令に対応する作業によっては、前記キーワードの呼び出しがなくても前記会話機能のアクティブな状態を維持して前記次の音声命令待機状態に自動で切り換えることを含んでよい。 According to one aspect, a keyword specified in a wake-up word is used as the conversation activation trigger, and the step of providing the continuous conversation function includes, depending on an operation corresponding to the voice command, without calling the keyword. May automatically switch to the next voice command waiting state while maintaining the active state of the conversation function.

他の側面によると、前記連続会話機能を提供する段階は、前記音声命令に対応する作業が持続的な動作を必要とする作業でない場合、前記会話機能のアクティブな状態を維持して前記次の音声命令待機状態に自動で切り換えることを含んでよい。 According to another aspect, the step of providing the continuous conversation function includes, if the operation corresponding to the voice command is not an operation requiring a continuous operation, maintaining the active state of the conversation function and performing the next operation. Automatically switching to a voice command waiting state may be included.

また他の側面によると、前記連続会話機能を提供する段階は、終了命令が入力されるまで持続される第１作業と終了命令がなくても所定の時点で終了する第２作業とのうち、前記音声命令に対応する作業が前記第２作業に該当する場合、前記会話機能のアクティブな状態を維持して前記次の音声命令待機状態に自動で切り換えることを含んでよい。 According to another aspect, the step of providing the continuous conversation function includes a first task that is continued until an end command is input and a second task that ends at a predetermined time without an end command. When the work corresponding to the voice command corresponds to the second work, the method may include automatically switching to the next voice command waiting state while maintaining the active state of the conversation function.

また他の側面によると、前記連続会話機能を提供する段階は、前記音声命令に対応する作業がユーザの返答を要求する会話形式の動作を含む場合、前記会話機能のアクティブな状態を維持して前記次の音声命令待機状態に自動で切り換えることを含んでよい。 According to another aspect, the step of providing the continuous conversation function includes maintaining an active state of the conversation function when the work corresponding to the voice command includes a conversation-type operation requesting a response from a user. Automatically switching to the next voice command waiting state may be included.

また他の側面によると、前記連続会話機能を提供する段階は、作業別に実行パターンを学習した結果に基づき、前記音声命令に対応する作業が追加の音声命令が予測される作業に該当する場合、前記会話機能のアクティブな状態を維持して前記次の音声命令待機状態に自動で切り換えることを含んでよい。 According to another aspect, the step of providing the continuous conversation function is based on a result of learning an execution pattern for each task, wherein a task corresponding to the voice command corresponds to a task in which an additional voice command is predicted. Automatically switching to the next voice command waiting state while maintaining the active state of the conversation function may be included.

また他の側面によると、前記音声命令に対応する作業が持続的な動作を必要とする作業の場合には、前記会話機能を非アクティブな状態に切り換えて前記音声命令待機状態を終了させる段階をさらに含んでよい。 According to another aspect, when the task corresponding to the voice command is a task requiring continuous operation, the step of switching the conversation function to an inactive state and terminating the voice command waiting state is included. It may further include.

さらに他の側面によると、前記音声命令待機状態を終了する段階は、前記音声命令に対応する作業が前記持続的な動作を必要とする作業と関連のある作業の場合、前記会話機能を非アクティブな状態に切り換えて前記音声命令待機状態を終了させることを含んでよい。 According to yet another aspect, ending the voice command waiting state includes deactivating the conversation function when a task corresponding to the voice command is a task related to the task requiring the continuous operation. To end the voice command waiting state by switching to the normal state.

前記人工知能連続会話方法をコンピュータに実行させるためのコンピュータプログラムを提供する。 A computer program for causing a computer to execute the artificial intelligence continuous conversation method is provided.

前記コンピュータプログラムを記録しているコンピュータ読み取り可能な記録媒体を提供する。 A computer-readable recording medium recording the computer program is provided.

コンピュータによって実現される人工知能連続会話システムであって、コンピュータ読み取り可能な命令を実行するように実現される少なくとも１つのプロセッサを含み、前記少なくとも１つのプロセッサは、会話機能が非アクティブな状態で会話アクティブ化トリガーが認識されると、前記会話機能をアクティブ化させて音声命令待機状態に入る過程、前記音声命令待機状態で音声命令が受信されると、前記音声命令に対応する作業を実行する過程、および前記作業の類型に基づいて次の音声命令待機状態に直ぐに入ることによって連続会話機能を提供する過程を処理する、人工知能連続会話システムを提供する。 A computer-implemented artificial intelligence continuous conversation system, comprising at least one processor implemented to execute computer-readable instructions, wherein the at least one processor has a conversation function inactive. Activating the conversation function and entering a voice command waiting state when an activation trigger is recognized, and performing a task corresponding to the voice command when a voice command is received in the voice command waiting state. And providing a continuous conversation function by immediately entering a next voice command waiting state based on the type of the task.

本発明の実施形態によると、以前の音声命令がなされた後、以前の音声命令に対応する作業によっては、機器のアクティブな状態を維持して次の音声命令待機状態に自動で切り換えることにより、短時間内に音声命令が再試行された場合でも、機器アクティブ化のためのキーワードを呼び出す必要がなく、直ぐに作動できることから、ユーザ便宜と使用疲労度を改善することができる。 According to the embodiment of the present invention, after the previous voice command is issued, depending on the operation corresponding to the previous voice command, by maintaining the active state of the device and automatically switching to the next voice command standby state, Even if the voice command is retried within a short time, it is not necessary to call out a keyword for activating the device, and the device can be operated immediately, thereby improving user convenience and use fatigue.

本発明の一実施形態における、音声基盤インタフェースを活用したサービス環境の例を示した図である。FIG. 2 is a diagram illustrating an example of a service environment utilizing a voice-based interface according to an embodiment of the present invention. 本発明の一実施形態における、音声基盤インタフェースを活用したサービス環境の他の例を示した図である。FIG. 6 is a diagram illustrating another example of a service environment utilizing a voice-based interface according to an embodiment of the present invention. 本発明の一実施形態における、クラウド人工知能プラットフォームの例を示した図である。FIG. 2 is a diagram illustrating an example of a cloud artificial intelligence platform according to an embodiment of the present invention. 本発明の一実施形態における、電子機器およびサーバの内部構成を説明するためのブロック図である。FIG. 2 is a block diagram illustrating an internal configuration of the electronic device and the server according to the embodiment of the present invention. 本発明の一実施形態における、人工知能連続会話方法を示したフローチャートである。5 is a flowchart illustrating a method for continuous artificial intelligence conversation according to an embodiment of the present invention. 本発明の一実施形態における、キーワードの呼び出しがなくても直ぐに音声命令を再要求することのできる作業リストを説明するための例示図である。FIG. 4 is an exemplary diagram illustrating a work list that can promptly re-request a voice command without calling a keyword according to an embodiment of the present invention. 本発明の一実施形態における、キーワードの呼び出しがなくても直ぐに音声命令を再要求することのできる作業リストを説明するための例示図である。FIG. 4 is an exemplary diagram illustrating a work list that can promptly re-request a voice command without calling a keyword according to an embodiment of the present invention. 本発明の一実施形態における、キーワードの呼び出しがなくても直ぐに音声命令を再要求することのできる作業リストを説明するための例示図である。FIG. 4 is an exemplary diagram illustrating a work list that can promptly re-request a voice command without calling a keyword according to an embodiment of the present invention.

以下、本発明の実施形態について、添付の図面を参照しながら詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

本発明の実施形態に係る人工知能連続会話システムは、ユーザとの会話を基盤として動作するインタフェースを提供する電子機器によって実現されてよい。このとき、人工知能連続会話システムは、機器のアクティブ化のためのキーワード（ウェイクアップワード）を呼び出す必要なく、直ぐに音声命令を再要求することができる連続会話機能を提供する。 The artificial intelligence continuous conversation system according to the embodiment of the present invention may be realized by an electronic device that provides an interface that operates based on a conversation with a user. At this time, the artificial intelligence continuous conversation system provides a continuous conversation function that can immediately re-request a voice command without having to call a keyword (wakeup word) for activating the device.

本発明の実施形態に係る人工知能連続会話方法は、上述した電子機器によって実行されてよい。このとき、電子機器において、本発明の一実施形態に係るコンピュータプログラムがインストールおよび実行されてよく、電子機器は、実行されるコンピュータプログラムの制御にしたがって本発明の一実施形態に係る人工知能連続会話方法を実行してよい。上述したコンピュータプログラムは、コンピュータによって実現される電子機器に人工知能連続会話方法を実行させるために、コンピュータ読み取り可能な記録媒体に記録されてよい。 An artificial intelligence continuous conversation method according to an embodiment of the present invention may be executed by the electronic device described above. At this time, the computer program according to one embodiment of the present invention may be installed and executed in the electronic device, and the electronic device performs the artificial intelligence continuous conversation according to one embodiment of the present invention according to the control of the executed computer program. The method may be performed. The above-described computer program may be recorded on a computer-readable recording medium in order to cause an electronic device implemented by a computer to execute the artificial intelligence continuous conversation method.

図１は、本発明の一実施形態における、音声基盤インタフェースを活用したサービス環境の例を示した図である。図１の実施形態では、スマートホーム（ｓｍａｒｔｈｏｍｅ）やホームネットワークサービスのように宅内のデバイスを接続して制御する技術において、音声を基盤として動作するインタフェースを提供する電子機器１００が、ユーザ１１０の発話によって受信された音声入力「電気を消して」を認識および分析し、宅内で内部ネットワークを介して電子機器１００と連係する宅内照明機器１２０のライト電源を制御する例を示している。 FIG. 1 is a diagram illustrating an example of a service environment utilizing a voice-based interface according to an embodiment of the present invention. In the embodiment of FIG. 1, in a technology for connecting and controlling devices in a home, such as a smart home or a home network service, an electronic device 100 that provides an interface that operates based on voice is used by an electronic device 100 of a user 110. An example is shown in which the voice input “turn off the light” received by the utterance is recognized and analyzed, and the light power of the in-home lighting device 120 that is linked to the electronic device 100 in the home via the internal network is controlled.

例えば、宅内のデバイスは、上述した宅内照明機器１２０の他にも、テレビ、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、周辺機器、エアコン、冷蔵庫、ロボット清掃機などのような家電製品はもちろん、水道、電気、冷／暖房機器などのようなエネルギー消費装置、ドアロックや監視カメラなどのような保安機器など、オンライン上で接続して制御可能な多様なデバイスを含んでよい。また、内部ネットワークとして、イーサネット（登録商標）、ＨｏｍｅＰＮＡ、ＩＥＥＥ１３９４のような有線ネットワーク技術や、ブルートゥース（登録商標）、ＵＷＢ（ｕｌｔｒａＷｉｄｅＢａｎｄ）、ジグビー（登録商標）、Ｗｉｒｅｌｅｓｓ１３９４、ＨｏｍｅＲＦのような無線ネットワーク技術などが活用されてよい。 For example, home devices include not only the home lighting device 120 described above but also home appliances such as a television, a PC (Personal Computer), peripheral devices, an air conditioner, a refrigerator, and a robot cleaner. It may include various devices that can be connected and controlled online, such as energy consuming devices such as heating / heating devices and security devices such as door locks and surveillance cameras. As the internal network, a wired network technology such as Ethernet (registered trademark), HomePNA, or IEEE 1394, or Bluetooth (registered trademark), ultra wideband (UWB), ZigBee (registered trademark), Wireless 1394, or Home RF may be used. Various wireless network technologies may be used.

電子機器１００は、宅内のデバイスのうちの１つであってよい。例えば、電子機器１００は、宅内に備えられた人工知能スピーカやロボット清掃機などのようなデバイスのうちの１つであってよい。また、電子機器１００は、スマートフォン、携帯電話機、ノート型ＰＣ、デジタル放送用端末、ＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）、ＰＭＰ（ＰｏｒｔａｂｌｅＭｕｌｔｉｍｅｄｉａＰｌａｙｅｒ）、タブレットなどのような、ユーザ１１０のモバイル機器であってもよい。このように、電子機器１００は、ユーザ１１０の音声入力を受信して宅内のデバイスを制御するために宅内のデバイスと接続可能な機能を含む機器であれば、特に制限されることはない。また、実施形態によっては、上述したユーザ１１０のモバイル機器が宅内のデバイスとして含まれてもよい。 Electronic device 100 may be one of the devices at home. For example, the electronic device 100 may be one of devices such as an artificial intelligence speaker and a robot cleaner provided in a house. The electronic device 100 may be a mobile device of the user 110 such as a smartphone, a mobile phone, a notebook PC, a digital broadcast terminal, a PDA (Personal Digital Assistant), a PMP (Portable Multimedia Player), a tablet, or the like. Good. As described above, the electronic device 100 is not particularly limited as long as it is a device including a function connectable to a device in the home to receive the voice input of the user 110 and control the device in the home. In some embodiments, the above-described mobile device of the user 110 may be included as a device in the house.

図２は、本発明の一実施形態における、音声基盤インタフェースを活用したサービス環境の他の例を示した図である。図２は、音声を基盤として動作するインタフェースを提供する電子機器１００が、ユーザ１１０の発話によって受信された音声入力「今日の天気」を認識および分析し、外部ネットワークを介して外部サーバ２１０から今日の天気に関する情報を取得し、取得した情報を「今日の天気は・・・」のように音声で出力する例を示している。 FIG. 2 is a diagram illustrating another example of a service environment utilizing a voice-based interface according to an embodiment of the present invention. FIG. 2 shows that the electronic device 100 that provides an interface that operates based on voice recognizes and analyzes the voice input “Today's weather” received by the utterance of the user 110, and transmits the voice input from the external server 210 via the external network. An example is shown in which information about the weather is acquired and the acquired information is output as a voice such as "Today's weather is ...".

例えば、外部ネットワークは、ＰＡＮ（ｐｅｒｓｏｎａｌａｒｅａｎｅｔｗｏｒｋ）、ＬＡＮ（ｌｏｃａｌａｒｅａｎｅｔｗｏｒｋ）、ＣＡＮ（ｃａｍｐｕｓａｒｅａｎｅｔｗｏｒｋ）、ＭＡＮ（ｍｅｔｒｏｐｏｌｉｔａｎａｒｅａｎｅｔｗｏｒｋ）、ＷＡＮ（ｗｉｄｅａｒｅａｎｅｔｗｏｒｋ）、ＢＢＮ（ｂｒｏａｄｂａｎｄｎｅｔｗｏｒｋ）、インターネットなどのネットワークのうちの１つ以上の任意のネットワークを含んでよい。 For example, an external network is a PAN (personal area network), a LAN (local area network), a CAN (campus area network), a MAN (metropolitan area network, a WAN (wide area network), a WAN (wide area network), and the like). It may include any one or more of the networks.

図２の実施形態でも、電子機器１００は、宅内のデバイスのうちの１つであっても、ユーザ１１０のモバイル機器のうちの１つであってもよく、ユーザ１１０の音声入力を受信して処理するための機能と、外部ネットワークを介して外部サーバ２１０に接続して外部サーバ２１０が提供するサービスやコンテンツをユーザ１１０に提供するための機能とを含む機器であれば、特に制限されることはない。 In the embodiment of FIG. 2 as well, the electronic device 100 may be one of the devices at home or one of the mobile devices of the user 110, and may receive the voice input of the user 110. The device is not particularly limited as long as the device includes a function for processing and a function for connecting to the external server 210 via an external network and providing services and contents provided by the external server 210 to the user 110. There is no.

このように、本発明の実施形態に係る電子機器１００は、音声基盤インタフェースにより、ユーザ１１０の発話によって受信された音声入力を少なくとも含むユーザ命令を処理することのできる機器であれば、特に制限されることはない。例えば、電子機器１００は、ユーザの音声入力を直接的に認識および分析して音声入力に適した動作を実行することでユーザ命令を処理してもよいが、実施形態によっては、ユーザの音声入力に対する認識や認識された音声入力の分析、ユーザに提供される音声の合成などの処理を、電子機器１００と連係する外部のプラットフォームで実行してもよい。 As described above, the electronic device 100 according to the embodiment of the present invention is not particularly limited as long as it can process a user command including at least the voice input received by the utterance of the user 110 by the voice-based interface. Never. For example, the electronic device 100 may process the user command by directly recognizing and analyzing the user's voice input and performing an operation suitable for the voice input, but in some embodiments, the user's voice input may be performed. For example, processing such as recognition of the input, analysis of the recognized voice input, and synthesis of the voice provided to the user may be executed by an external platform linked to the electronic device 100.

図３は、本発明の一実施形態における、クラウド人工知能プラットフォームの例を示した図である。図３は、電子機器３１０、クラウド人工知能プラットフォーム３２０、およびコンテンツ・サービス３３０を示している。 FIG. 3 is a diagram illustrating an example of a cloud artificial intelligence platform according to an embodiment of the present invention. FIG. 3 shows an electronic device 310, a cloud artificial intelligence platform 320, and a content service 330.

一例として、電子機器３１０は、宅内に備えられるデバイスを意味してよく、少なくとも上述した電子機器１００を含んでよい。このような電子機器３１０や電子機器３１０においてインストールおよび実行されるアプリケーション（以下、アプリとも呼ばれる）は、インタフェースコネクト３４０を介してクラウド人工知能プラットフォーム３２０と連係してよい。ここで、インタフェースコネクト３４０は、電子機器３１０や電子機器３１０においてインストールおよび実行されるアプリの開発のためのＳＤＫ（ＳｏｆｔｗａｒｅＤｅｖｅｌｏｐｍｅｎｔＫｉｔ）および／または開発文書を開発者に提供してよい。また、インタフェースコネクト３４０は、電子機器３１０や電子機器３１０においてインストールおよび実行されるアプリが、クラウド人工知能プラットフォーム３２０が提供する機能を活用することのできるＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍＩｎｔｅｒｆａｃｅ）を提供してよい。具体的な例として、開発者がインタフェースコネクト３４０の提供するＳＤＫおよび／または開発文書を利用して開発した機器やアプリは、インタフェースコネクト３４０が提供するＡＰＩを利用することで、クラウド人工知能プラットフォーム３２０が提供する機能を活用することが可能となる。 As an example, the electronic device 310 may mean a device provided in a house, and may include at least the electronic device 100 described above. Such an electronic device 310 and an application installed and executed on the electronic device 310 (hereinafter, also referred to as an application) may be linked with the cloud artificial intelligence platform 320 via the interface connect 340. Here, the interface connect 340 may provide the developer with an electronic device 310 or a software development kit (SDK) and / or a development document for developing an application installed and executed on the electronic device 310. In addition, the interface connect 340 may provide an API (Application Program Interface) that allows the electronic device 310 or an application installed and executed on the electronic device 310 to utilize a function provided by the cloud artificial intelligence platform 320. As a specific example, a device or an application developed by the developer using the SDK and / or the development document provided by the interface connect 340 uses the API provided by the interface connect 340 to execute the cloud artificial intelligence platform 320. Can be utilized.

ここで、クラウド人工知能プラットフォーム３２０は、音声基盤サービスを提供するための機能を提供してよい。例えば、クラウド人工知能プラットフォーム３２０は、受信した音声を認識し、出力する音声を合成するための音声処理モジュール３２１、受信した映像や動画を分析して処理するためのビジョン処理モジュール３２２、受信した音声にしたがって適切な音声を出力するために適した会話を決定するための会話処理モジュール３２３、受信した音声に適した機能を勧めるための推薦モジュール３２４、人工知能がデータ学習に基づいて文章単位で言語を翻訳するように支援するニューラル機械翻訳（ＮＭＴ：ＮｅｕｒａｌＭａｃｈｉｎｅＴｒａｎｓｌａｔｉｏｎ）３２５などのように、音声基盤サービスを提供するための多様なモジュールを含んでよい。 Here, the cloud artificial intelligence platform 320 may provide a function for providing a voice-based service. For example, the cloud artificial intelligence platform 320 includes a voice processing module 321 for recognizing a received voice and synthesizing a voice to be output, a vision processing module 322 for analyzing and processing a received video or moving image, a received voice. A conversation processing module 323 for determining a suitable conversation for outputting a suitable voice according to the following, a recommendation module 324 for recommending a function suitable for the received voice, and a language for each sentence based on data learning based on artificial intelligence. For example, various modules for providing a voice-based service, such as a neural machine translation (NMT) 325 that assists in translating a document, may be included.

例えば、図１および図２の実施形態において、電子機器１００が、ユーザ１１０の音声入力を、インタフェースコネクト３４０で提供するＡＰＩを利用してクラウド人工知能プラットフォーム３２０に送信したとする。この場合、クラウド人工知能プラットフォーム３２０は、上述したモジュール３２１〜３２５を活用して、受信した音声入力を認識および分析してよく、受信した音声入力にしたがって適した返答音声を合成して提供したり、適した動作を勧めたりしてよい。 For example, in the embodiments of FIGS. 1 and 2, it is assumed that the electronic device 100 transmits a voice input of the user 110 to the cloud artificial intelligence platform 320 using an API provided by the interface connect 340. In this case, the cloud artificial intelligence platform 320 may utilize the above-described modules 321 to 325 to recognize and analyze the received voice input, and synthesize and provide a suitable response voice according to the received voice input. Or an appropriate action may be recommended.

また、拡張キット３５０は、サードパーティコンテンツ開発者または会社がクラウド人工知能プラットフォーム３２０を基盤として新たな音声基盤機能を実現することのできる開発キットを提供してよい。例えば、図２の実施形態において、電子機器１００がユーザ１１０の音声入力を外部サーバ２１０に送信し、外部サーバ２１０が拡張キット３５０を通じて提供されるＡＰＩを利用してクラウド人工知能プラットフォーム３２０に音声入力を送信したとする。この場合、上述したものと同じように、クラウド人工知能プラットフォーム３２０は、受信した音声入力を認識および分析して適切な返答音声を合成して提供したり、音声入力にしたがって処理されなければならない機能に対する推薦情報を外部サーバ２１０に提供したりしてよい。一例として、図２において、外部サーバ２１０が音声入力「今日の天気」をクラウド人工知能プラットフォーム３２０に送信し、クラウド人工知能プラットフォーム３２０から音声入力「今日の天気」の認識によって抽出されたキーワード「今日の」および「天気」を受信したとする。この場合、外部サーバ２１０は、キーワード「今日の」および「天気」に基づいて「今日の天気は・・・」のようなテキスト情報を生成した後、クラウド人工知能プラットフォーム３２０に生成されたテキスト情報を送信してよい。このとき、クラウド人工知能プラットフォーム３２０は、テキスト情報を音声で合成して外部サーバ２１０に提供してよい。外部サーバ２１０は、合成された音声を電子機器１００に送信してよく、電子機器１００は、合成された音声「今日の天気は・・・」をスピーカから出力することにより、ユーザ１１０から受信した音声入力「今日の天気」が処理されるようになる。 In addition, the extension kit 350 may provide a development kit that enables a third-party content developer or company to implement a new voice-based function based on the cloud artificial intelligence platform 320. For example, in the embodiment of FIG. 2, the electronic device 100 transmits the voice input of the user 110 to the external server 210, and the external server 210 uses the API provided through the extension kit 350 to input the voice input to the cloud artificial intelligence platform 320. Is transmitted. In this case, as described above, the cloud artificial intelligence platform 320 recognizes and analyzes the received voice input and synthesizes and provides an appropriate response voice, or a function that must be processed according to the voice input. May be provided to the external server 210. As an example, in FIG. 2, the external server 210 transmits the voice input “today's weather” to the cloud artificial intelligence platform 320, and the keyword “today” extracted by the recognition of the voice input “today's weather” from the cloud artificial intelligence platform 320. "And" weather "are received. In this case, the external server 210 generates text information such as “Today's weather is ...” based on the keywords “today” and “weather”, and then generates the text information generated by the cloud artificial intelligence platform 320. May be sent. At this time, the cloud artificial intelligence platform 320 may provide the external server 210 by synthesizing the text information by voice. The external server 210 may transmit the synthesized voice to the electronic device 100, and the electronic device 100 receives the synthesized voice "Today's weather ..." from the user 110 by outputting the synthesized voice from the speaker. The voice input "Today's weather" is processed.

このとき、電子機器１００は、ユーザとの会話を基盤としてデバイス動作やコンテンツ提供を実施するために、本発明の実施形態に係る人工知能連続会話方法を実行してよい。 At this time, the electronic device 100 may execute the artificial intelligence continuous conversation method according to the embodiment of the present invention in order to perform device operation and content provision based on conversation with the user.

図４は、本発明の一実施形態における、電子機器およびサーバの内部構成を説明するためのブロック図である。図４の電子機器４１０は、上述した電子機器１００に対応してよく、サーバ４２０は、上述した外部サーバ２１０またはクラウド人工知能プラットフォーム３２０を実現する１つのコンピュータ装置に対応してよい。 FIG. 4 is a block diagram illustrating an internal configuration of an electronic device and a server according to an embodiment of the present invention. The electronic device 410 in FIG. 4 may correspond to the electronic device 100 described above, and the server 420 may correspond to the external server 210 or one computer device that implements the cloud artificial intelligence platform 320 described above.

電子機器４１０およびサーバ４２０は、メモリ４１１、４２１、プロセッサ４１２、４２２、通信モジュール４１３、４２３、および入力／出力インタフェース４１４、４２４を含んでよい。メモリ４１１、４２１は、コンピュータ読み取り可能な記録媒体であって、ＲＡＭ（ｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ）、ＲＯＭ（ｒｅａｄｏｎｌｙｍｅｍｏｒｙ）、およびディスクドライブのような永続的大容量記録装置を含んでよい。ここで、ＲＯＭやディスクドライブのような永続的大容量記録装置は、メモリ４１１、４２１とは区別される別の永続的記録装置として電子機器４１０やサーバ４２０に含まれてもよい。また、メモリ４１１、４２１には、オペレーティングシステムと、少なくとも１つのプログラムコード（一例として、電子機器４１０にインストールされ、特定のサービスを提供するために電子機器４１０で実行されるアプリケーションなどのためのコード）が記録されてよい。このようなソフトウェア構成要素は、メモリ４１１、４２１とは別のコンピュータ読み取り可能な記録媒体からロードされてよい。このような別のコンピュータ読み取り可能な記録媒体は、フロッピー（登録商標）ドライブ、ディスク、テープ、ＤＶＤ／ＣＤ−ＲＯＭドライブ、メモリカードなどのコンピュータ読み取り可能な記録媒体を含んでよい。他の実施形態において、ソフトウェア構成要素は、コンピュータ読み取り可能な記録媒体ではない通信モジュール４１３、４２３を通じてメモリ４１１、４２１にロードされてもよい。例えば、少なくとも１つのプログラムは、開発者またはアプリケーションのインストールファイルを配布するファイル配布システムがネットワーク４３０を介して提供するファイルによってインストールされるプログラム（一例として、上述したアプリケーション）に基づいて電子機器４１０のメモリ４１１にロードされてよい。 The electronic device 410 and the server 420 may include memories 411 and 421, processors 412 and 422, communication modules 413 and 423, and input / output interfaces 414 and 424. The memories 411 and 421 are computer-readable recording media, and may include a permanent mass storage device such as a random access memory (RAM), a read only memory (ROM), and a disk drive. Here, a permanent large-capacity recording device such as a ROM or a disk drive may be included in the electronic device 410 or the server 420 as another permanent recording device distinguished from the memories 411 and 421. The memories 411 and 421 include an operating system and at least one program code (for example, a code for an application installed on the electronic device 410 and executed by the electronic device 410 to provide a specific service). ) May be recorded. Such a software component may be loaded from a computer-readable recording medium other than the memories 411 and 421. Such another computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, disk, tape, DVD / CD-ROM drive, memory card, and the like. In another embodiment, the software components may be loaded into the memories 411 and 421 through the communication modules 413 and 423 that are not computer-readable storage media. For example, the at least one program may be installed on the electronic device 410 based on a program (for example, the above-described application) installed by a file provided by a developer or a file distribution system that distributes an installation file of the application via the network 430. It may be loaded into the memory 411.

プロセッサ４１２、４２２は、基本的な算術、ロジック、および入出力演算を実行することにより、コンピュータプログラムの命令を処理するように構成されてよい。命令は、メモリ４１１、４２１または通信モジュール４１３、４２３によって、プロセッサ４１２、４２２に提供されてよい。例えば、プロセッサ４１２、４２２は、メモリ４１１、４２１のような記録装置に記録されたプログラムコードにしたがって受信される命令を実行するように構成されてよい。 Processors 412, 422 may be configured to process computer program instructions by performing basic arithmetic, logic, and input / output operations. The instructions may be provided to the processors 412, 422 by the memories 411, 421 or the communication modules 413, 423. For example, the processors 412, 422 may be configured to execute instructions received according to program codes recorded on a recording device such as the memories 411, 421.

通信モジュール４１３、４２３は、ネットワーク４３０を介して電子機器４１０とサーバ４２０とが互いに通信するための機能を提供してもよいし、電子機器４１０および／またはサーバ４２０が他の電子機器または他のサーバと通信するための機能を提供してもよい。一例として、電子機器４１０のプロセッサ４１２がメモリ４１１のような記録装置に記録されたプログラムコードにしたがって生成した要求が、通信モジュール４１３の制御にしたがってネットワーク４３０を介してサーバ４２０に伝達されてよい。これとは逆に、サーバ４２０のプロセッサ４２２の制御にしたがって提供される制御信号や命令、コンテンツ、ファイルなどが、通信モジュール４２３とネットワーク４３０を経て電子機器４１０の通信モジュール４１３を通じて電子機器４１０に受信されてもよい。例えば、通信モジュール４１３を通じて受信したサーバ４２０の制御信号や命令などは、プロセッサ４１２やメモリ４１１に伝達されてよく、コンテンツやファイルなどは、電子機器４１０がさらに含むことのできる記録媒体（上述した永続的記録装置）に記録されてよい。 The communication modules 413 and 423 may provide a function for the electronic device 410 and the server 420 to communicate with each other via the network 430, or the electronic device 410 and / or the server 420 may connect to another electronic device or another server. A function for communicating with the server may be provided. As an example, a request generated by the processor 412 of the electronic device 410 according to a program code recorded in a recording device such as the memory 411 may be transmitted to the server 420 via the network 430 under the control of the communication module 413. Conversely, control signals, commands, contents, files, and the like provided under the control of the processor 422 of the server 420 are received by the electronic device 410 through the communication module 423 and the communication module 413 of the electronic device 410 via the network 430. May be done. For example, a control signal or a command of the server 420 received through the communication module 413 may be transmitted to the processor 412 or the memory 411, and the content or the file may be stored in a recording medium that the electronic device 410 can further include (the above-described persistent medium). Recording device).

入力／出力インタフェース４１４は、入力／出力装置４１５とのインタフェースのための手段であってよい。例えば、入力装置は、マイク、キーボード、またはマウスなどの装置を含んでよく、出力装置は、ディスプレイやスピーカのような装置を含んでよい。他の例として、入力／出力インタフェース４１４は、タッチスクリーンのように入力と出力のための機能が１つに統合された装置とのインタフェースのための手段であってもよい。入力／出力装置４１５は、電子機器４１０と１つの装置で構成されてもよい。また、サーバ４２０の入力／出力インタフェース４２４は、サーバ４２０と接続するかまたはサーバ４２０が含むことのできる入力または出力のための装置（図示せず）とのインタフェースのための手段であってよい。 The input / output interface 414 may be a means for interfacing with the input / output device 415. For example, the input device may include a device such as a microphone, a keyboard, or a mouse, and the output device may include a device such as a display or a speaker. As another example, the input / output interface 414 may be a means for interfacing with a device in which input and output functions are integrated into one, such as a touch screen. The input / output device 415 may be composed of the electronic device 410 and one device. Also, the input / output interface 424 of the server 420 may be a means for connecting to the server 420 or interfacing with an input or output device (not shown) that the server 420 may include.

また、他の実施形態において、電子機器４１０およびサーバ４２０は、図４の構成要素よりも少ないまたは多い構成要素を含んでもよい。しかし、大部分の従来技術的構成要素を明確に図に示す必要はない。例えば、電子機器４１０は、上述した入力／出力装置４１５のうちの少なくとも一部を含むように実現されてもよいし、トランシーバ、ＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）モジュール、カメラ、各種センサ、データベースなどのような他の構成要素をさらに含んでもよい。より具体的な例として、電子機器４１０がスマートフォンである場合、一般的にスマートフォンが含んでいる加速度センサやジャイロセンサ、動作センサ、カメラモジュール、物理的な各種ボタン、タッチパネルを利用したボタン、入力／出力ポート、振動のための振動器などのように、多様な構成要素が電子機器４１０にさらに含まれるように実現されてよい。 Also, in other embodiments, electronic device 410 and server 420 may include fewer or more components than those of FIG. However, most prior art components need not be clearly shown in the figures. For example, the electronic device 410 may be realized to include at least a part of the input / output device 415 described above, or may be a transceiver, a GPS (Global Positioning System) module, a camera, various sensors, a database, and the like. Other components may be further included. As a more specific example, when the electronic device 410 is a smartphone, generally, an acceleration sensor, a gyro sensor, a motion sensor, a camera module, various physical buttons, buttons using a touch panel, input / Various components, such as an output port and a vibrator for vibration, may be implemented to be further included in the electronic device 410.

本実施形態において、電子機器４１０は、ユーザの音声入力を受信するためのマイクを入力／出力装置４１５として基本的に含んでよく、ユーザの音声入力に対応する返答音声やオーディオコンテンツのような音を出力するためのスピーカを入力／出力装置４１５としてさらに含んでよい。 In this embodiment, the electronic device 410 may basically include a microphone for receiving the user's voice input as the input / output device 415, and may respond to the user's voice input such as a response voice or a sound such as audio content. May be further included as the input / output device 415.

本発明の実施形態に係る人工知能連続会話方法は、上述した電子機器４１０のようなコンピュータ装置によって実行されてよい。このとき、電子機器４１０のプロセッサ４１２は、メモリ４１１が含むオペレーティングシステムのコードや少なくとも１つのプログラムのコードによる制御命令を実行するように実現されてよい。ここで、プロセッサ４１２は、電子機器４１０に記録されたコードが提供する制御命令にしたがって電子機器４１０が後述する人工知能連続会話方法に含まれる段階を実行するように電子機器４１０を制御してよい。 The artificial intelligence continuous conversation method according to the embodiment of the present invention may be executed by a computer device such as the electronic device 410 described above. At this time, the processor 412 of the electronic device 410 may be implemented to execute a control instruction based on an operating system code or at least one program code included in the memory 411. Here, the processor 412 may control the electronic device 410 such that the electronic device 410 performs a step included in an artificial intelligence continuous conversation method described below according to a control command provided by a code recorded in the electronic device 410. .

図５は、本発明の一実施形態における、人工知能連続会話方法の例を示したフローチャートである。 FIG. 5 is a flowchart illustrating an example of an artificial intelligence continuous conversation method according to an embodiment of the present invention.

段階５１０で、電子機器４１０は、非アクティブな状態（５０）で会話基盤インタフェースを通じてユーザの発話から事前に定められたキーワード（５１）が認識されると、会話機能をアクティブ化させてよい（キーワード呼び出し機能）。例えば、電子機器４１０は、会話基盤インタフェースとして、電子機器４１０に含まれるマイクまたは電子機器４１０と連動するマイクのような音声入力装置を通じてユーザの発話による音声入力を受信してよい。言い換えれば、電子機器４１０は、何らの作業も実行されていないアイドル状態などの非アクティブな状態（５０）でユーザがウェイクアップワードに指定されたキーワード５１を発話した場合、該当のキーワード５１を認識して会話機能をアクティブ化させてよい。このとき、電子機器４１０は、会話アクティブ化時に、電子機器４１０に含まれるかまたは電子機器４１０と連動可能な出力装置（例えば、ＰＵＩ（ｐｈｙｓｉｃａｌｕｓｅｒｉｎｔｅｒｆａｃｅ））を通じてユーザに会話アクティブ状態を認知させるようにするための情報、例えば、ディスプレイを点灯したり効果音を出力したりしてよい。上述したキーワード呼び出し機能は、電子機器４１０が無作業の状態ではもちろん、音楽再生のような作業が実行されている状態でも実行可能である。 In operation 510, the electronic device 410 may activate a conversation function when a predetermined keyword (51) is recognized from a user's utterance through the conversation-based interface in an inactive state (50) (keyword). Calling function). For example, the electronic device 410 may receive a voice input by a user's utterance through a voice input device such as a microphone included in the electronic device 410 or a microphone associated with the electronic device 410 as a conversation-based interface. In other words, when the user utters the keyword 51 specified in the wake-up word in an inactive state (50) such as an idle state in which no operation is performed, the electronic device 410 recognizes the corresponding keyword 51. To activate the conversation function. At this time, when the conversation is activated, the electronic device 410 may allow the user to recognize a conversation active state through an output device (for example, a physical user interface (PUI)) included in the electronic device 410 or capable of interlocking with the electronic device 410. For example, the display may be turned on or a sound effect may be output. The above-described keyword calling function can be executed not only in a state where the electronic device 410 is not working but also in a state where a work such as music playback is being performed.

本実施形態では、会話アクティブ化トリガーとして特定のキーワードが利用されると説明しているが、これに限定されることはなく、特定のアクションや音、信号などのように多様な形態の会話アクティブ化トリガーが利用されてもよい。 In the present embodiment, it is described that a specific keyword is used as a conversation activation trigger. However, the present invention is not limited to this, and various forms of conversation activation such as specific actions, sounds, signals, and the like are used. An activation trigger may be used.

段階５２０で、電子機器４１０は、キーワード（５１）の呼び出しによって会話機能がアクティブ化されたことにしたがい、ユーザの音声命令を待機してよい（音声命令待機状態）。アクティブ化された電子機器４１０は、一定の時間（例えば、７秒間）、ユーザの音声命令を待機する待機状態を自動的に維持する。ユーザは、例えばＰＵＩによって電子機器４１０の会話アクティブ状態を認知した後、電子機器４１０が待機状態を維持している一定の時間内に音声命令を入力してよい。 In step 520, the electronic device 410 may wait for a voice command of the user according to the activation of the conversation function by calling the keyword (51) (voice command waiting state). The activated electronic device 410 automatically maintains a standby state of waiting for a user's voice command for a predetermined time (for example, 7 seconds). After recognizing the conversation active state of the electronic device 410 by, for example, the PUI, the user may input a voice command within a certain period of time during which the electronic device 410 maintains the standby state.

段階５３０で、電子機器４１０は、音声命令待機状態に対して指定された時間をカウントして音声命令待機状態に入った後、一定の時間が経過したかを判定してよい。電子機器４１０は、指定された一定の時間内にユーザからの音声命令が確認されなかった場合には、音声命令待機状態に対してタイムアウト処理して非アクティブな状態（５０）に戻る。 In operation 530, the electronic device 410 may determine whether a predetermined time has elapsed after entering the voice command waiting state by counting a time designated for the voice command waiting state. If the voice command from the user is not confirmed within the specified time, the electronic device 410 returns to the inactive state (50) by performing a time-out process on the voice command waiting state.

段階５４０で、電子機器４１０は、音声命令待機状態において一定の時間内にユーザから音声命令が受信された場合、受信した音声命令を分析してよい（音声分析機能）。電子機器４１０は、音声命令待機状態で会話基盤インタフェースを通じてユーザの音声入力を受信してよい。例えば、電子機器４１０は、会話基盤インタフェースとして、電子機器４１０に含まれるマイクまたは電子機器４１０と連動するマイクのような音声入力装置を通じてユーザの発話による音声入力を受信してよく、受信した音声命令の意味を分析する自然言語理解（ＮＬＵ：ｎａｔｕｒａｌｌａｎｇｕａｇｅｕｎｄｅｒｓｔａｎｄｉｎｇ）作業を実行してよい。 In operation 540, when the voice command is received from the user within a predetermined time in the voice command waiting state, the electronic device 410 may analyze the received voice command (voice analysis function). The electronic device 410 may receive a user's voice input through the conversation-based interface in a voice command waiting state. For example, the electronic device 410 may receive a voice input by a user's utterance through a voice input device such as a microphone included in the electronic device 410 or a microphone associated with the electronic device 410 as a conversation-based interface. May perform a natural language understanding (NLU) task of analyzing the meaning of the language.

本実施形態では、電子機器４１０で音声分析機能を実行すると説明しているが、これに限定されることはない。例えば、電子機器４１０は、音声命令待機状態でユーザの音声命令を受信した後、受信した音声命令をサーバ４２０に伝達し、サーバ４２０でユーザの音声命令を分析するプロセスも可能である。 In the present embodiment, it is described that the voice analysis function is executed by the electronic device 410, but the present invention is not limited to this. For example, the electronic device 410 may receive the voice command of the user in the voice command waiting state, transmit the received voice command to the server 420, and analyze the user's voice command in the server 420.

段階５５０で、電子機器４１０は、音声分析機能による意味分析結果に基づく作業を実行してよい（作業実行機能）。電子機器４１０は、ユーザの音声命令に対応する情報を出力するものであって、必要に応じて、動作案内のためのＴＴＳ（ｔｅｘｔｔｏｓｐｅｅｃｈ）と共に意味分析結果に基づく作業を実行してよい。ユーザの音声命令に対応する情報には、会話基盤インタフェースで出力可能な情報として、音声情報、映像情報、動作情報のうちの少なくとも１つが含まれてよい。一例として、電子機器４１０は、電子機器４１０に含まれるスピーカまたは電子機器４１０と連動するスピーカのような音声出力装置から、ユーザの音声命令に対応する音声情報を出力してよい。また、電子機器４１０は、電子機器４１０に含まれるディスプレイまたは電子機器４１０と連動するディスプレイのような映像出力装置から、ユーザの音声命令に対応する映像情報を出力してよい。さらに、電子機器４１０は、電子機器４１０に含まれるモータ制御動作装置または電子機器４１０と連動するモータ制御動作装置から、ユーザの音声命令に対応する動作情報を出力してよい。例えば、電子機器４１０が会話ロボットである場合、ユーザの音声命令に対応する情報にしたがって関連動作を実現してよい。 In operation 550, the electronic device 410 may perform an operation based on a result of the semantic analysis performed by the voice analysis function (operation performing function). The electronic device 410 outputs information corresponding to a user's voice command, and may execute a task based on a semantic analysis result together with a text-to-speech (TTS) for operation guidance as needed. The information corresponding to the user's voice command may include at least one of voice information, video information, and motion information as information that can be output by the conversation-based interface. As an example, the electronic device 410 may output audio information corresponding to the user's voice command from an audio output device such as a speaker included in the electronic device 410 or a speaker that works with the electronic device 410. In addition, the electronic device 410 may output video information corresponding to the user's voice command from a video output device such as a display included in the electronic device 410 or a display linked to the electronic device 410. Further, the electronic device 410 may output operation information corresponding to a user's voice command from a motor control operation device included in the electronic device 410 or a motor control operation device linked to the electronic device 410. For example, when the electronic device 410 is a conversation robot, a related operation may be implemented according to information corresponding to a voice command of a user.

本実施形態では、電子機器４１０で作業実行機能を実行すると説明しているが、これに限定されることはない。例えば、電子機器４１０は、サーバ４２０からユーザの音声命令に対応する情報を受信し、受信した情報を会話基盤インタフェースで出力するプロセスも可能である。 In the present embodiment, it has been described that the work execution function is executed by the electronic device 410, but the present invention is not limited to this. For example, the electronic device 410 may receive information corresponding to a user's voice command from the server 420 and output the received information through a conversation-based interface.

段階５６０で、電子機器４１０は、ユーザの音声命令に対応する作業にしたがい、連続会話機能が必要であるかを判定してよい。一例として、電子機器４１０は、ユーザから終了命令がなされるまで持続される作業（以下、「第１作業」と称する）と、終了命令がなされなくても所定の時点で終了する作業（以下、「第２作業」と称する）（すなわち、持続的な動作が必要でない作業）とを区分し、第２作業を連続会話機能が必要な作業であると判定してよい。このとき、ユーザから終了命令が下されるまで持続される作業と関連のある作業（例えば、再生中の音楽のボリュームを上げる動作など）も、第１作業として区分してよい。他の例として、電子機器４１０は、ユーザの音声命令に対応する作業が、ユーザに質問して返答を要求するなどのような会話形式の動作を含む場合、連続会話機能が必要な作業であると判定してよい。また他の例として、電子機器４１０は、作業別に該当の作業の実行パターンを学習し、学習結果に基づいて短時間（例えば、７秒間）内に追加の音声命令に繋がると予測される作業の場合、連続会話機能が必要な作業であると判定してよい。 At step 560, the electronic device 410 may determine whether a continuous conversation function is required according to the task corresponding to the voice command of the user. As an example, the electronic device 410 includes a task that is continued until a termination command is issued from the user (hereinafter, referred to as a “first task”) and a task that terminates at a predetermined point in time without the termination command (hereinafter, referred to as “first task”). (A task that does not require a continuous operation) may be classified, and the second task may be determined to be a task that requires a continuous conversation function. At this time, a task related to a task that is continued until a user issues an end command (for example, an operation to increase the volume of the music being reproduced) may be classified as the first task. As another example, the electronic device 410 is a task that requires a continuous conversation function when the task corresponding to the user's voice command includes a conversational operation such as asking the user for a response. May be determined. As another example, the electronic device 410 learns an execution pattern of a corresponding task for each task, and based on a result of the learning, performs a task that is predicted to lead to an additional voice command in a short time (for example, 7 seconds). In this case, it may be determined that the work requires the continuous conversation function.

特に、電子機器４１０は、ユーザの音声命令に対応する作業が連続会話機能を必要とする作業であると判定された場合、機器のアクティブな状態を維持し、次の音声命令待機状態に自動で切り換えてよい。このとき、電子機器４１０は、ユーザの音声命令が入力された後に機器を非アクティブな状態に切り換えるのではなく、機器のアクティブな状態をそのまま維持させながら次の音声命令待機状態に自動で切り換えることで、機器をウェイクアップさせるためのキーワードの呼び出しがなくても、ユーザに直ぐに音声命令を再要求することが可能となる。次の音声命令待機状態でも、以前の音声命令待機状態と同じように一定の時間が待機時間として指定されているため、ユーザが追加命令を拒否して音声命令が入力されなかった場合には、待機時間が経過した直後にタイムアウト処理される。 In particular, when it is determined that the task corresponding to the user's voice command is a task requiring the continuous conversation function, the electronic device 410 maintains the active state of the device and automatically enters the next voice command waiting state. You may switch. At this time, the electronic device 410 does not switch the device to the inactive state after the user's voice command is input, but automatically switches to the next voice command standby state while keeping the device active. Thus, it is possible to immediately re-request the voice command from the user without calling a keyword for waking up the device. In the next voice command waiting state, the fixed time is specified as the waiting time as in the previous voice command waiting state, so if the user rejects the additional command and no voice command is input, Timeout processing is performed immediately after the elapse of the waiting time.

段階５７０で、電子機器４１０は、ユーザの音声命令に対応する作業が連続会話機能を必要とする作業に該当しない場合は、機器を非アクティブな状態（５０）に切り換えて音声命令待機過程を終了させる。言い換えれば、電子機器４１０は、連続会話機能を必要としない作業の場合には、音声命令待機過程を終了させた後、キーワードが呼び出されるまで機器の非アクティブな状態（５０）を維持し、キーワードが呼び出されたときに機器を再びアクティブ化させてよい。 In operation 570, if the operation corresponding to the voice command of the user does not correspond to the operation requiring the continuous conversation function, the electronic device 410 switches the device to an inactive state (50) and ends the voice command waiting process. Let it. In other words, if the operation does not require the continuous conversation function, the electronic device 410 ends the voice command waiting process, and then maintains the device inactive state (50) until the keyword is called up. May be reactivated when is called.

キーワードの呼び出しがなされた直後の最初の音声命令待機過程において音声命令分析に失敗した場合には、所定の内容（例えば、「よく分かりません。」）のＴＴＳ返答を提供した後に音声命令待機過程を終了させ、キーワードの呼び出しがなされずに連続会話機能として提供される音声命令待機過程において音声命令分析に失敗した場合には、ＴＴＳ返答をせずに音声命令待機過程を終了させてよい。 If the voice command analysis fails in the first voice command waiting process immediately after the call of the keyword, the voice command waiting process is performed after providing a TTS response of a predetermined content (for example, "I do not understand."). If the voice command analysis fails in the voice command waiting process provided as a continuous conversation function without calling a keyword, the voice command waiting process may be terminated without replying to the TTS.

電子機器４１０は、例えば、音楽、ニュース、童話、ラジオなどのようなコンテンツを再生する作業のように、ユーザから終了命令が下されるまで機器の持続的な作業を必要とする場合は、音声命令待機過程を終了させ、実行要求した作業を持続させてよい。これに対し、以前の音声命令待機過程で認識された音声命令による作業が、簡単な情報要求や機器とのチャットのように終了命令がなくても情報伝達後に終了する作業であったり、ユーザとの連続する会話形式の作業に該当したりする場合には、キーワードの呼び出しがなくても機器のアクティブな状態を維持し、次の音声命令待機状態に直ぐに入ってよい。 The electronic device 410 may output audio if the device requires continuous operation until an end command is issued by the user, such as a process of playing back content such as music, news, a fairy tale, or a radio. The command waiting process may be terminated, and the work requested to be executed may be continued. On the other hand, the work based on the voice command recognized in the previous voice command standby process may be a task that ends after information transmission without a termination command, such as a simple information request or a chat with a device, or a communication with the user. In this case, the active state of the device may be maintained without calling the keyword, and the apparatus may immediately enter the standby state for the next voice command.

図６は、メディアコンテンツに属する作業リストの例を示している。 FIG. 6 shows an example of a work list belonging to the media content.

電子機器４１０は、ユーザの音声命令に対応する作業の一例として、メディアコンテンツを提供してよい。このとき、メディアコンテンツの一例として、音声情報で構成されたオーディオコンテンツ６００を含んでよく、図６を参照すると、オーディオコンテンツ６００は、音楽、ニュース、童話、ラジオなどに分類されてよい。 The electronic device 410 may provide the media content as an example of a task corresponding to the voice command of the user. At this time, an example of the media content may include an audio content 600 including audio information. Referring to FIG. 6, the audio content 600 may be classified into music, news, fairy tales, radio, and the like.

電子機器４１０は、ユーザから受信した音声命令の意味を分析し、意味分析結果に基づく作業を実行するようになるが、ユーザの音声命令として「ニュースをつけて」が受信された場合、オーディオコンテンツ６００のうちからニュースに分類されたコンテンツを再生してよい。 The electronic device 410 analyzes the meaning of the voice command received from the user, and performs an operation based on the result of the semantic analysis. Content classified as news out of 600 may be reproduced.

このように、オーディオコンテンツ６００は、ユーザから終了命令が下されるまで持続して実行される作業に該当することから、このような作業に対しては、音声命令待機過程を直ぐに終了させて機器を非アクティブ化させる。 As described above, since the audio content 600 corresponds to a task that is continuously executed until a termination command is issued by the user, the audio command waiting process is immediately terminated for such a task. Deactivate.

図７は、ＴＴＳ返答に属する作業リストの例を示している。 FIG. 7 shows an example of a work list belonging to the TTS reply.

電子機器４１０は、ユーザの音声命令に対応する作業の一例として、ＴＴＳ形態の返答情報７００を提供してよい。例えば、ユーザの音声命令として「今日のスケジュールを教えて」が受信された場合、ユーザのスケジュール情報のうちから今日の日付に該当するスケジュール情報をＴＴＳとして出力してよい。 The electronic device 410 may provide the response information 700 in a TTS format as an example of a task corresponding to the user's voice command. For example, when "tell me today's schedule" is received as a user's voice command, schedule information corresponding to today's date from among the user's schedule information may be output as a TTS.

情報検索のようなＴＴＳ返答７００は、終了命令が下されなくても情報伝達後には作業が終了し、このような作業は、短時間内にユーザから追加の情報要求がある可能性が高いという点を考慮した上で、機器のアクティブな状態を維持して次の音声命令待機状態に直ぐに入ることによって連続会話機能を提供する。 The TTS reply 700 such as information search is completed after information transmission even if no end command is given, and such work is likely to require additional information from the user in a short time. Considering the points, a continuous conversation function is provided by maintaining the active state of the device and immediately entering the next voice command waiting state.

図６および図７を参照しながら説明したように、命令種類別、すなわちユーザの音声命令に対応する作業類型に応じて、次の音声命令待機状態に切り換えるか否かが決定されてよい。 As described with reference to FIGS. 6 and 7, whether or not to switch to the next voice command standby state may be determined according to the type of command, that is, according to the work type corresponding to the voice command of the user.

図８を参照すると、音楽のようなオーディオコンテンツを再生する作業、オーディオコンテンツの再生と関連のあるコントロール作業（例えば、ボリュームの調節、再生コンテンツの変更など）などでは、音声命令待機過程を直ぐに終了させて次の音声命令待機状態に切り換えない。これに対し、短答型のＴＴＳ返答を提供する作業やユーザと会話をやり取りする簡単なチャット作業などでは、機器のアクティブな状態を維持して次の音声命令待機状態に直ぐに切り換える。 Referring to FIG. 8, in the operation of reproducing audio content such as music, and in the control operation related to the reproduction of audio content (for example, adjusting the volume, changing the reproduction content, etc.), the voice command waiting process is immediately terminated. Do not switch to the next voice command waiting state. On the other hand, in a task of providing a short answer type TTS reply or a simple chat task of exchanging a conversation with a user, the device is maintained in an active state and is immediately switched to a next voice command waiting state.

したがって、電子機器４１０は、基本的にはキーワードの呼び出しが必須であり、これが先行することによって音声命令待機状態に入るようになるが、一部の類型作業の場合、すなわち、短時間内に音声命令が再試行される可能性の高い作業である場合には、キーワードの呼び出しがなくても音声命令待機状態に直ぐに入ることができるようにする。 Therefore, the electronic device 410 basically needs to call a keyword, and the preceding operation leads to a voice command waiting state. However, in the case of some types of work, that is, the voice If the command is a task that is likely to be retried, it is possible to immediately enter the voice command waiting state without calling the keyword.

このように、本発明の実施形態によると、以前の音声命令がなされた後、以前の音声命令に対応する作業によっては、機器のアクティブな状態を維持して次の音声命令待機状態に自動で切り換えることにより、短時間内に音声命令が再試行された場合でも、機器アクティブ化のためのキーワードを呼び出す必要がなく、直ぐに作動できることから、ユーザ便宜と使用疲労度を改善することができる。 As described above, according to the embodiment of the present invention, after the previous voice command is issued, depending on the operation corresponding to the previous voice command, the device is kept in the active state and automatically enters the next voice command standby state. By switching, even if the voice command is retried within a short time, it is not necessary to call a keyword for activating the device, and the device can be operated immediately, so that user convenience and use fatigue can be improved.

上述した装置は、ハードウェア構成要素、ソフトウェア構成要素、および／またはハードウェア構成要素とソフトウェア構成要素との組み合わせによって実現されてよい。例えば、実施形態で説明された装置および構成要素は、プロセッサ、コントローラ、ＡＬＵ（ａｒｉｔｈｍｅｔｉｃｌｏｇｉｃｕｎｉｔ）、デジタル信号プロセッサ、マイクロコンピュータ、ＦＰＧＡ（ｆｉｅｌｄｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙ）、ＰＬＵ（ｐｒｏｇｒａｍｍａｂｌｅｌｏｇｉｃｕｎｉｔ）、マイクロプロセッサ、または命令を実行して応答することができる様々な装置のように、１つ以上の汎用コンピュータまたは特殊目的コンピュータを利用して実現されてよい。処理装置は、オペレーティングシステム（ＯＳ）およびＯＳ上で実行される１つ以上のソフトウェアアプリケーションを実行してよい。また、処理装置は、ソフトウェアの実行に応答し、データにアクセスし、データを記録、操作、処理、および生成してもよい。理解の便宜のために、１つの処理装置が使用されるとして説明される場合もあるが、当業者は、処理装置が複数個の処理要素および／または複数種類の処理要素を含んでもよいことが理解できるであろう。例えば、処理装置は、複数個のプロセッサまたは１つのプロセッサおよび１つのコントローラを含んでよい。また、並列プロセッサのような、他の処理構成も可能である。 The apparatus described above may be realized by hardware components, software components, and / or a combination of hardware components and software components. For example, the devices and components described in the embodiments include a processor, a controller, an ALU (arithmetic logic unit), a digital signal processor, a microcomputer, an FPGA (field programmable gate array), a PLU (programmable logic unit), a microprocessor, Or, it may be implemented utilizing one or more general purpose or special purpose computers, such as various devices capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the OS. The processing device may also respond to the execution of the software, access the data, and record, manipulate, process, and generate the data. For convenience of understanding, one processing device may be described as being used, but those skilled in the art will appreciate that a processing device may include multiple processing elements and / or multiple types of processing elements. You can understand. For example, the processing device may include multiple processors or one processor and one controller. Other processing configurations, such as a parallel processor, are also possible.

ソフトウェアは、コンピュータプログラム、コード、命令、またはこれらのうちの１つ以上の組み合わせを含んでもよく、思うままに動作するように処理装置を構成したり、独立的または集合的に処理装置に命令したりしてよい。ソフトウェアおよび／またはデータは、処理装置に基づいて解釈されたり、処理装置に命令またはデータを提供したりするために、いかなる種類の機械、コンポーネント、物理装置、コンピュータ記録媒体または装置に具現化されてよい。ソフトウェアは、ネットワークによって接続されたコンピュータシステム上に分散され、分散された状態で記録されても実行されてもよい。ソフトウェアおよびデータは、１つ以上のコンピュータ読み取り可能な記録媒体に記録されてよい。 The software may include a computer program, code, instructions, or a combination of one or more of these, configuring the processing device to operate at will, or instructing the processing device independently or collectively. Or you can. The software and / or data may be embodied in any type of machine, component, physical device, computer storage medium or device to interpret or provide instructions or data to a processing device. Good. The software may be distributed on computer systems connected by a network and recorded or executed in a distributed manner. Software and data may be recorded on one or more computer-readable recording media.

実施形態に係る方法は、多様なコンピュータ手段によって実行可能なプログラム命令の形態で実現されてコンピュータ読み取り可能な媒体に記録されてよい。このとき、媒体は、コンピュータによって実行可能なプログラムを継続して記録してもよいし、実行またはダウンロードのために一時的に記録するものであってもよい。また、媒体は、単一または複数のハードウェアが結合した形態の多様な記録手段または格納手段であってよいが、あるコンピュータシステムに直接接続する媒体に限定されるものではなく、ネットワーク上に分散して存在するものであってもよい。媒体の例としては、ハードディスク、フロッピー（登録商標）ディスク、および磁気テープのような磁気媒体、ＣＤ−ＲＯＭおよびＤＶＤのような光媒体、フロプティカルディスク（ｆｌｏｐｔｉｃａｌｄｉｓｋ）のような光磁気媒体、およびＲＯＭ、ＲＡＭ、フラッシュメモリなどを含み、プログラム命令が記録されるように構成されたものであってよい。また、他の媒体の例としては、アプリケーションを配布するアプリケーションストアやその他の多様なソフトウェアを供給または配布するサイト、サーバなどで管理する記録媒体または格納媒体も含まれる。 The method according to the embodiments may be implemented in the form of program instructions executable by various computer means and recorded on a computer-readable medium. At this time, the medium may continuously record a computer-executable program, or may temporarily record the program for execution or download. Further, the medium may be a variety of recording means or storage means in a form in which one or a plurality of hardware are combined, but is not limited to a medium directly connected to a certain computer system, and may be distributed on a network. May exist. Examples of media include hard disks, floppy disks, and magnetic media, such as magnetic tape, optical media, such as CD-ROMs and DVDs, magneto-optical media, such as floppy disks, And a ROM, a RAM, a flash memory, and the like, and may be configured to record program instructions. Further, examples of other media include an application store for distributing an application, and a recording medium or storage medium managed by a site or server that supplies or distributes various other software.

以上のように、実施形態を、限定された実施形態と図面に基づいて説明したが、当業者であれば、上述した記載から多様な修正および変形が可能であろう。例えば、説明された技術が、説明された方法とは異なる順序で実行されたり、かつ／あるいは、説明されたシステム、構造、装置、回路などの構成要素が、説明された方法とは異なる形態で結合されたりまたは組み合わされたり、他の構成要素または均等物によって対置されたり置換されたとしても、適切な結果を達成することができる。 As described above, the embodiments have been described based on the limited embodiments and the drawings. However, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques may be performed in a different order than the described methods, and / or components, such as the described systems, structures, devices, circuits, etc., in different forms than the described methods. Appropriate results can be achieved even when combined or combined, and opposed or replaced by other components or equivalents.

したがって、異なる実施形態であっても、特許請求の範囲と均等なものであれば、添付される特許請求の範囲に属する。 Therefore, even different embodiments belong to the appended claims as long as they are equivalent to the claims.

１００：電子機器
１１０：ユーザ
２１０：外部サーバ 100: Electronic device 110: User 210: External server

Claims

An artificial intelligence continuous conversation method realized by a computer,
When a conversation activation trigger is recognized while the conversation function is inactive, activating the conversation function and entering a voice command waiting state;
When a voice command is received in the voice command standby state, performing a task corresponding to the voice command;
After performing the task, based on the type of task, provide a continuous conversation function by immediately entering a next voice command waiting state, or provide a conversation function by not entering the next voice command waiting state. And deactivating the conversation function so as not to enter the next voice command waiting state when the type of the task is a task requiring a continuous operation. Including, artificial intelligence continuous conversation method.

The keyword specified in the wake-up word is used as the conversation activation trigger,
The step of providing the continuous conversation function includes:
2. The method according to claim 1, wherein, depending on a task corresponding to the voice command, automatically switching to the next voice command standby state while maintaining an active state of the conversation function without calling the keyword. 3. Artificial intelligence continuous conversation method.

The step of providing the continuous conversation function includes:
The method according to claim 1, further comprising automatically switching to the next voice command waiting state while maintaining the active state of the conversation function, when the task corresponding to the voice command is not a task requiring continuous operation. The described artificial intelligence continuous conversation method.

The step of providing the continuous conversation function includes:
When the work corresponding to the voice command corresponds to the second work, of the first work that is continued until the end command is input and the second work that ends at a predetermined time without the end command, The artificial intelligence continuous conversation method according to claim 1, further comprising: automatically switching to the next voice command waiting state while maintaining an active state of the conversation function.

The step of providing the continuous conversation function includes:
If the operation corresponding to the voice command includes a conversation-type operation requesting a user's response, automatically switching to the next voice command waiting state while maintaining the active state of the conversation function. 2. The method for continuous conversation of artificial intelligence according to 1.

The step of providing the continuous conversation function includes:
Based on the result of learning the execution pattern for each task, if the task corresponding to the voice command corresponds to a task in which an additional voice command is predicted, the active state of the conversation function is maintained and the next voice command waits. The artificial intelligence continuous conversation method according to claim 1, further comprising: automatically switching to a state.

The step of deactivating the conversation function includes:
If the task corresponding to the voice command is a task related to the task requiring the continuous operation, switching the conversation function to an inactive state to end the voice command waiting state. The continuous method for continuous artificial intelligence according to claim 1.

A computer program for causing a computer to execute the artificial intelligence continuous conversation method according to any one of claims 1 to 7.

A computer-readable recording medium recording the computer program according to claim 8.

An artificial intelligence continuous conversation system realized by a computer,
At least one processor implemented to execute computer readable instructions,
The at least one processor comprises:
When a conversation activation trigger is recognized while the conversation function is inactive, a step of activating the conversation function and entering a voice command waiting state;
Performing a task corresponding to the voice command when the voice command is received in the voice command standby state;
After performing the task, based on the type of task, provide a continuous conversation function by immediately entering a next voice command waiting state, or provide a conversation function by not entering the next voice command waiting state. In the deactivating step, when the type of the task is a task that requires a continuous operation, a step of deactivating a conversation function so as not to enter the next voice command waiting state. Processing, artificial intelligence continuous conversation system.

The keyword specified in the wake-up word is used as the conversation activation trigger,
The step of providing the continuous conversation function,
The method according to claim 10, further comprising: maintaining an active state of the conversation function even without calling the keyword and automatically switching to the next voice command waiting state, depending on a task corresponding to the voice command. Artificial intelligence continuous conversation system.

The step of providing the continuous conversation function,
The method according to claim 10, further comprising: if the operation corresponding to the voice command is not a task requiring continuous operation, automatically switching to the next voice command standby state while maintaining the active state of the conversation function. The described artificial intelligence continuous conversation system.

The step of providing the continuous conversation function,
When the work corresponding to the voice command corresponds to the second work, of the first work that is continued until the end command is input and the second work that ends at a predetermined time without the end command, The artificial intelligence continuous conversation system according to claim 10, further comprising: automatically switching to the next voice command waiting state while maintaining the active state of the conversation function.

The step of providing the continuous conversation function,
If the operation corresponding to the voice command includes a conversation-type operation requesting a user's response, automatically switching to the next voice command waiting state while maintaining the active state of the conversation function. 10. The artificial intelligence continuous conversation system according to item 10.

The step of providing the continuous conversation function,
Based on the result of learning the execution pattern for each task, if the task corresponding to the voice command corresponds to a task in which an additional voice command is predicted, the active state of the conversation function is maintained and the next voice command waits. The artificial intelligence continuous conversation system according to claim 10, comprising automatically switching to a state.

The step of deactivating the conversation function includes:
If the task corresponding to the voice command is a task related to the task requiring the continuous operation, switching the conversation function to an inactive state to end the voice command waiting state. Item 11. An artificial intelligence continuous conversation system according to Item 10.