JP7113047B2

JP7113047B2 - AI-based automatic response method and system

Info

Publication number: JP7113047B2
Application number: JP2020124156A
Authority: JP
Inventors: ギョンテト; サンウイ; ヘジキム; ヒョンフンチョン; ソンファンチョン
Original assignee: Naver Corp
Current assignee: Naver Corp
Priority date: 2019-07-24
Filing date: 2020-07-21
Publication date: 2022-08-04
Anticipated expiration: 2040-07-21
Also published as: JP2021022928A; KR102170088B1

Description

以下の説明は、人工知能（ＡＩ）を基盤とした自動応答システム（ＡＲＳ）に関する。 The following description relates to an artificial intelligence (AI) based automated response system (ARS).

情報通信技術の発達によって情報化社会が進展しており、社会、文化、および経済などのようなあらゆる分野においてインターネットが重要かつ必須な媒体となりつつある。 The development of information communication technology has led to the development of an information society, and the Internet is becoming an important and indispensable medium in all fields such as society, culture, and economy.

各種企業に問い合わせ、予約、配達などを要請するためには、ユーザが企業に直接電話をかけたり、代行企業を通じて要請事項を伝達したりするオフライン方式がある。 In order to make inquiries, reservations, deliveries, etc. to various companies, there is an offline method in which a user calls the company directly or transmits the request through an agent company.

オフライン方式よりも改善された方法としては、インターネット技術の発達に基づいてウェブサイトから関連企業の顧客センターに接続する方式や、スマートフォンなどのようなモバイル技術の発達に基づいてＡＲＳ方式を利用する方法、顧客センターのアプリ（Ａｐｐ、Ａｐｐｌｉｃａｔｉｏｎ）をインストールして実行させることで、必要な事項を処理できるようになった。 A method that is better than the offline method is the method of connecting to the customer center of related companies from the website based on the development of Internet technology, and the method of using the ARS method based on the development of mobile technology such as smartphones. , By installing and running the customer center application (App, Application), it became possible to process necessary items.

例えば、特許文献１（公開日２０１９年３月２９日）には、人工知能に基づき、顧客に画面上で顧客センター接続サービスを提供する技術が開示されている。 For example, Patent Literature 1 (published on March 29, 2019) discloses a technique for providing a customer center connection service on a screen to a customer based on artificial intelligence.

韓国公開特許第１０－２０１９－００３３１３８号公報Korean Patent Publication No. 10-2019-0033138

ユーザ発話に対する中間結果（ｍｉｄｒｅｓｕｌｔｓ）から発話の意味を把握して返答を予め準備しておくことにより、返答を迅速に提供することができる方法およびシステムを提供する。 To provide a method and system capable of quickly providing a reply by grasping the meaning of the utterance from mid results to the user's utterance and preparing the reply in advance.

返答が送出される途中にユーザ発話による音声信号が受信される場合、返答の送出を中断することができる方法およびシステムを提供する。 To provide a method and system capable of interrupting the transmission of a reply when a voice signal by a user's utterance is received while the reply is being transmitted.

ユーザ発話の速度に合わせて応答速度を異にして提供することができる方法およびシステムを提供する。 To provide a method and system capable of providing different response speeds according to the speed of user's speech.

リアルタイム翻訳を利用してユーザが発話している言語に翻訳して返答を提供することができる方法およびシステムを提供する。 To provide a method and system capable of providing a reply by translating into a language spoken by a user using real-time translation.

コンピュータシステムが実行する人工知能基盤の自動応答方法であって、前記コンピュータシステムは、メモリに含まれるコンピュータ読み取り可能な命令を実行するように構成された少なくとも１つのプロセッサを含み、前記人工知能基盤の自動応答方法は、前記少なくとも１つのプロセッサにより、ユーザと通話が繋がることにより前記ユーザの発話音声を受信する段階、前記少なくとも１つのプロセッサにより、前記発話音声の中間結果（ｍｉｄｒｅｓｕｌｔｓ）を利用して意図（ｉｎｔｅｎｔ）を分析した後に返答を生成する段階、および前記少なくとも１つのプロセッサにより、前記発話音声に対して前記返答を送出する段階を含む、人工知能基盤の自動応答方法を提供する。 1. An artificial intelligence-based automated response method performed by a computer system, said computer system including at least one processor configured to execute computer readable instructions contained in a memory, said artificial intelligence-based automated response method comprising: The automatic response method comprises the step of: receiving the user's uttered voice by establishing a call with the user by the at least one processor; and using the mid results of the uttered voice by the at least one processor. An artificial intelligence-based automatic response method is provided, comprising: generating a response after analyzing an intent; and sending the response to the spoken voice by the at least one processor.

一側面によると、前記生成する段階は、自動応答サービスの会話ログからサンプル文章を抽出する段階、および前記サンプル文章から語尾を除いた文章を学習データとして利用する会話学習により、前記中間結果から前記意図を分析する段階を含んでよい。 According to one aspect, the step of generating includes extracting sample sentences from a conversation log of an automatic response service, and performing conversation learning using sentences from which endings are removed from the sample sentences as learning data. It may include analyzing intent.

他の側面によると、前記生成する段階は、前記中間結果として取得した音節単位で前記意図を分析する段階を含んでよい。 According to another aspect, the generating step may include analyzing the intent on a syllable-by-syllable basis obtained as the intermediate result.

また他の側面によると、前記生成する段階は、前記中間結果を利用した意図分析結果に対するコンフィデンス（ｃｏｎｆｉｄｅｎｃｅ）に基づいて前記返答の生成時点を決定する段階を含んでよい。 According to another aspect, the generating step may include determining when to generate the reply based on confidence in an intent analysis result using the intermediate result.

また他の側面によると、前記生成する段階は、前記中間結果を利用した意図分析結果に対するコンフィデンスが、語尾を除いた文章を利用した会話学習によって決定された閾値に達する時点に、前記返答を予め生成する段階を含んでよい。 According to another aspect, in the step of generating, when the confidence of the intention analysis result using the intermediate result reaches a threshold determined by conversation learning using sentences excluding the endings, the reply is generated in advance. generating.

さらに他の側面によると、前記送出する段階は、前記発話音声からエンドポイント（ｅｎｄｐｏｉｎｔ）が感知されれば、前記返答を音声信号で送出してよい。 According to yet another aspect, the step of sending may send the reply as a voice signal if an end point is detected from the spoken voice.

コンピュータシステムが実行する人工知能基盤の自動応答方法であって、前記コンピュータシステムは、メモリに含まれるコンピュータ読み取り可能な命令を実行するように構成された少なくとも１つのプロセッサを含み、前記人工知能基盤の自動応答方法は、前記少なくとも１つのプロセッサにより、ユーザと通話が繋がることにより前記ユーザの発話音声を受信する段階、前記少なくとも１つのプロセッサにより、前記発話音声に対して返答を送出する段階、および前記少なくとも１つのプロセッサにより、前記返答が送出される途中に前記ユーザの発話音声が受信されれば、前記返答の送出を中断する段階を含む、人工知能基盤の自動応答方法を提供する。 1. An artificial intelligence-based automated response method performed by a computer system, said computer system including at least one processor configured to execute computer readable instructions contained in a memory, said artificial intelligence-based automated response method comprising: The automatic response method comprises the steps of: receiving a speech voice of the user by establishing a call with the user by the at least one processor; sending a reply to the speech speech by the at least one processor; Provided is an artificial intelligence-based automatic response method, comprising the step of interrupting transmission of the response if at least one processor receives the user's uttered voice while the response is being transmitted.

一側面によると、前記中断する段階は、前記ユーザの発話音声が受信されれば、前記返答の送出を即刻にまたはフェードアウト（ｆａｄｅ－ｏｕｔ）を適用した後に中断してよい。 According to one aspect, the suspending step may suspend sending the reply immediately or after applying a fade-out if the user's speech is received.

他の側面によると、前記中断する段階は、非言語的（ｎｏｎ－ｖｅｒｂａｌ）表現を分類して学習する段階、および前記返答が送出される途中に受信された発話音声が前記学習された非言語的表現に該当する場合には前記返答の送出を維持し、前記学習された非言語的表現に該当しない場合には前記返答の送出を中断する段階を含んでよい。 According to another aspect, the step of interrupting includes classifying and learning non-verbal expressions, and speech received while the reply is being sent is maintaining the sending of the reply if it corresponds to the grammatical expression, and suspending the sending of the reply if it does not correspond to the learned non-verbal expression.

また他の側面によると、前記人工知能基盤の自動応答方法は、前記少なくとも１つのプロセッサにより、前記発話音声の認識結果として返答に必要な情報が足りない場合、前記情報を誘導するための問い返しの質問を提供する段階をさらに含んでよい。 According to another aspect of the artificial intelligence-based automatic response method, when the at least one processor lacks information necessary for a response as a result of recognition of the uttered voice, the question is returned to guide the information. It may further include providing a question.

また他の側面によると、前記人工知能基盤の自動応答方法は、前記少なくとも１つのプロセッサにより、前記発話音声の認識結果によって複数の意図が認識された場合、意図の認識順にしたがって各意図に対する返答を順に提供する段階をさらに含んでよい。 According to another aspect, in the artificial intelligence-based automatic response method, when a plurality of intentions are recognized by the recognition result of the uttered voice, the at least one processor responds to each intention according to the recognition order of the intentions. The step of sequentially providing may also be included.

また他の側面によると、前記人工知能基盤の自動応答方法は、前記少なくとも１つのプロセッサにより、前記ユーザの発話速度を認識する段階、および前記少なくとも１つのプロセッサにより、前記ユーザの発話速度によって前記返答の発話速度を決定する段階をさらに含んでよい。 According to another aspect, the artificial intelligence-based automatic response method comprises: recognizing the speech rate of the user by the at least one processor; may further include determining the speech rate of the .

さらに他の側面によると、前記人工知能基盤の自動応答方法は、前記少なくとも１つのプロセッサにより、前記ユーザの発話言語を認識する段階、および前記少なくとも１つのプロセッサにより、自動応答サービスのための言語モデルを前記ユーザの発話言語に対応する言語モデルに切り換える段階をさらに含んでよい。 According to yet another aspect, the artificial intelligence-based auto-response method comprises: recognizing a spoken language of the user with the at least one processor; to a language model corresponding to the user's spoken language.

前記人工知能基盤の自動応答方法をコンピュータに実行させるためのプログラムが記録されている、非一時なコンピュータ読み取り可能な記録媒体を提供する。 A non-temporary computer-readable recording medium is provided in which a program for causing a computer to execute the artificial intelligence-based automatic response method is recorded.

コンピュータシステムであって、メモリに含まれるコンピュータ読み取り可能な命令を実行するように構成された少なくとも１つのプロセッサを含み、前記少なくとも１つのプロセッサは、ユーザと通話が繋がることにより前記ユーザの発話音声を受信する過程、前記発話音声の中間結果を利用して意図を分析した後に返答を生成する過程、および前記発話音声に対して前記返答を送出する過程を処理する、コンピュータシステムを提供する。 1. A computer system, comprising at least one processor configured to execute computer readable instructions contained in a memory, wherein the at least one processor is configured to communicate with a user to reproduce the user's speech. A computer system is provided for processing the steps of receiving, generating a reply after analyzing intent using intermediate results of said speech, and sending said reply to said speech.

コンピュータシステムであって、メモリに含まれるコンピュータ読み取り可能な命令を実行するように構成された少なくとも１つのプロセッサを含み、前記少なくとも１つのプロセッサは、ユーザと通話が繋がることにより前記ユーザの発話音声を受信する過程、前記発話音声に対して返答を送出する過程、および前記返答が送出される途中に前記ユーザの発話音声が受信されれば、前記返答の送出を中断する過程を処理する、コンピュータシステムを提供する。 1. A computer system, comprising at least one processor configured to execute computer readable instructions contained in a memory, wherein the at least one processor is configured to communicate with a user to reproduce the user's speech. A computer system for processing the steps of receiving, sending a reply to the spoken voice, and interrupting the sending of the reply if the user's spoken voice is received while the reply is being sent. I will provide a.

本発明の実施形態によると、ユーザ発話に対する中間結果から発話の意味を把握し、発話が終わる前に返答を予め準備しておくことにより、返答を迅速に提供することができる。 According to the embodiment of the present invention, the meaning of the utterance is grasped from the intermediate results of the user's utterance, and the reply is prepared before the end of the utterance, so that the reply can be quickly provided.

本発明の実施形態によると、返答が送出される途中にユーザ発話による音声信号が受信される場合に返答の送出を中断することにより、実際に人間と通話するような形態を実現することができる。 According to the embodiment of the present invention, by interrupting the transmission of the response when the voice signal of the user's utterance is received while the response is being transmitted, it is possible to realize a form of actually talking to a person. .

本発明の実施形態によると、ユーザ発話の速度に合わせて応答速度を異にして提供することにより、ユーザの発話速度に適合するインタラクションによって適したサービスを提供することができる。 According to an embodiment of the present invention, different response speeds are provided according to the user's utterance speed, thereby providing a suitable service through an interaction adapted to the user's utterance speed.

本発明の実施形態によると、リアルタイム翻訳を利用してユーザが発話している言語に翻訳して返答を提供することにより、言語に制限されずにサービスへのアクセス性と利便性を高めることができる。 According to embodiments of the present invention, real-time translation can be used to translate and provide responses in the language the user is speaking, thereby increasing the accessibility and convenience of services regardless of language. can.

本発明の一実施形態における、ネットワーク環境の例を示した図である。1 is a diagram showing an example of a network environment in one embodiment of the present invention; FIG. 本発明の一実施形態における、電子機器およびサーバの内部構成を説明するためのブロック図である。1 is a block diagram for explaining internal configurations of an electronic device and a server in one embodiment of the present invention; FIG. 本発明の一実施形態における、ＡＩ自動応答システムを説明するための例示図である。1 is an exemplary diagram for explaining an AI automatic response system in one embodiment of the present invention; FIG. 本発明の一実施形態における、ＡＩ自動応答システムが含むことのできる構成要素の例を示した図である。1 is a diagram illustrating examples of components that an AI automated response system may include, in one embodiment of the present invention; FIG. 本発明の一実施形態における、ＡＩ自動応答方法の例を示したフローチャートである。4 is a flow chart showing an example of an AI automatic response method in one embodiment of the present invention; 本発明の一実施形態における、学習文章の例を説明するための図である。FIG. 4 is a diagram for explaining examples of learning sentences in one embodiment of the present invention; 本発明の一実施形態における、ユーザの発話音声に対する中間結果を利用してユーザの意図を分析する過程を説明するための例示図である。FIG. 6 is an exemplary diagram illustrating a process of analyzing user's intention using an intermediate result of user's uttered voice according to an embodiment of the present invention; 本発明の一実施形態における、ユーザの発話音声に対する中間結果を利用してユーザの意図を分析する過程を説明するための例示図である。FIG. 6 is an exemplary diagram illustrating a process of analyzing user's intention using an intermediate result of user's uttered voice according to an embodiment of the present invention; 本発明の一実施形態における、返答の送出を中断する過程の一例を説明するための図である。FIG. 4 is a diagram for explaining an example of a process of interrupting transmission of a reply in one embodiment of the present invention; 本発明の一実施形態における、返答の送出を中断する過程の他の例を説明するための図である。FIG. 12 is a diagram for explaining another example of the process of interrupting sending of a reply in one embodiment of the present invention; 本発明の一実施形態における、返答の送出を中断しない例外状況を説明するための図である。FIG. 4 is a diagram for explaining an exception situation that does not interrupt the sending of replies in one embodiment of the present invention;

以下、本発明の実施形態について、添付の図面を参照しながら詳細に説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

本発明の実施形態は、人工知能（ＡＩ）を基盤とした自動応答システム（ＡＲＳ）に関する。 Embodiments of the present invention relate to artificial intelligence (AI) based automated response systems (ARS).

本明細書で具体的に開示される事項を含む実施形態は、人工知能を基盤として実際に人間と会話するような形態で実現された自動応答システムを提供することができ、これによってユーザとのより自然な通話を実現しながら、問い合わせ、予約、配達注文などを迅速かつ便利に処理することができる。 Embodiments including the matters specifically disclosed in this specification can provide an automatic response system realized in a form of actually conversing with a human being based on artificial intelligence, thereby communicating with a user. Inquiries, reservations, delivery orders, and more can be handled quickly and conveniently while making calls more natural.

図１は、本発明の一実施形態における、ネットワーク環境の例を示した図である。図１のネットワーク環境は、複数の電子機器１１０、１２０、１３０、１４０、複数のサーバ１５０、１６０、およびネットワーク１７０を含む例を示している。このような図１は、発明の説明のための一例に過ぎず、電子機器の数やサーバの数が図１のように限定されることはない。 FIG. 1 is a diagram showing an example of a network environment in one embodiment of the present invention. The network environment of FIG. 1 illustrates an example including multiple electronic devices 110 , 120 , 130 , 140 , multiple servers 150 , 160 , and a network 170 . Such FIG. 1 is merely an example for explaining the invention, and the number of electronic devices and the number of servers are not limited as in FIG.

複数の電子機器１１０、１２０、１３０、１４０は、コンピュータシステムによって実現される固定端末や移動端末であってよい。複数の電子機器１１０、１２０、１３０、１４０の例としては、ＡＩスピーカ、スマートフォン、携帯電話、ナビゲーション、ＰＣ（ｐｅｒｓｏｎａｌｃｏｍｐｕｔｅｒ）、ノート型ＰＣ、デジタル放送用端末、ＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）、ＰＭＰ（ＰｏｒｔａｂｌｅＭｕｌｔｉｍｅｄｉａＰｌａｙｅｒ）、タブレット、ゲームコンソール、ウェアラブルデバイス、ＩｏＴ（ｉｎｔｅｒｎｅｔｏｆｔｈｉｎｇｓ）デバイス、ＶＲ（ｖｉｒｔｕａｌｒｅａｌｉｔｙ）デバイス、ＡＲ（ａｕｇｍｅｎｔｅｄｒｅａｌｉｔｙ）デバイスなどがある。一例として、図１では、電子機器１１０としてＡＩスピーカを示しているが、本発明の実施形態において、電子機器１１０は、実質的に無線または有線通信方式を利用し、ネットワーク１７０を介して他の電子機器１２０、１３０、１４０および／またはサーバ１５０、１６０と通信することのできる多様な物理的なコンピュータシステムのうちの１つを意味してよい。 The plurality of electronic devices 110, 120, 130, 140 may be fixed terminals or mobile terminals implemented by a computer system. Examples of the plurality of electronic devices 110, 120, 130, and 140 include AI speakers, smart phones, mobile phones, navigation systems, PCs (personal computers), notebook PCs, digital broadcasting terminals, PDAs (Personal Digital Assistants), PMPs ( portable multimedia players), tablets, game consoles, wearable devices, IoT (internet of things) devices, VR (virtual reality) devices, AR (augmented reality) devices, and the like. As an example, although FIG. 1 shows an AI speaker as the electronic device 110, in embodiments of the present invention, the electronic device 110 substantially utilizes a wireless or wired communication scheme and communicates with other users via the network 170. It may refer to one of a variety of physical computer systems capable of communicating with electronic devices 120, 130, 140 and/or servers 150, 160.

通信方式が限定されることはなく、ネットワーク１７０が含むことのできる通信網（一例として、移動通信網、有線インターネット、無線インターネット、放送網、衛星網など）を利用する通信方式だけではなく、機器間の近距離無線通信が含まれてもよい。例えば、ネットワーク１７０は、ＰＡＮ（ｐｅｒｓｏｎａｌａｒｅａｎｅｔｗｏｒｋ）、ＬＡＮ（ｌｏｃａｌａｒｅａｎｅｔｗｏｒｋ）、ＣＡＮ（ｃａｍｐｕｓａｒｅａｎｅｔｗｏｒｋ）、ＭＡＮ（ｍｅｔｒｏｐｏｌｉｔａｎａｒｅａｎｅｔｗｏｒｋ）、ＷＡＮ（ｗｉｄｅａｒｅａｎｅｔｗｏｒｋ）、ＢＢＮ（ｂｒｏａｄｂａｎｄｎｅｔｗｏｒｋ）、インターネットなどのネットワークのうちの１つ以上の任意のネットワークを含んでよい。さらに、ネットワーク１７０は、バスネットワーク、スターネットワーク、リングネットワーク、メッシュネットワーク、スター－バスネットワーク、ツリーまたは階層的ネットワークなどを含むネットワークトポロジのうちの任意の１つ以上を含んでもよいが、これらに限定されることはない。 The communication method is not limited, and not only the communication method using the communication network that can be included in the network 170 (eg, mobile communication network, wired Internet, wireless Internet, broadcast network, satellite network, etc.), but also the device It may also include short-range wireless communication between For example, the network 170 includes a PAN (personal area network), a LAN (local area network), a CAN (campus area network), a MAN (metropolitan area network), a WAN (wide area network), a BBN (broadband network), and the Internet. Any one or more of the networks may be included. Additionally, network 170 may include any one or more of network topologies including, but not limited to, bus networks, star networks, ring networks, mesh networks, star-bus networks, tree or hierarchical networks, and the like. will not be

サーバ１５０、１６０は、それぞれ、複数の電子機器１１０、１２０、１３０、１４０とネットワーク１７０を介して通信して、命令、コード、ファイル、コンテンツ、サービスなどを提供する、１つ以上のコンピュータ装置によって実現されてよい。例えば、サーバ１５０は、ネットワーク１７０を介して接続した複数の電子機器１１０、１２０、１３０、１４０に第１サービスを提供するシステムであってよく、サーバ１６０も、ネットワーク１７０を介して接続した複数の電子機器１１０、１２０、１３０、１４０に第２サービスを提供するシステムであってよい。より具体的な例として、サーバ１５０は、複数の電子機器１１０、１２０、１３０、１４０においてインストールされて実行されるコンピュータプログラムであるアプリケーションを通じ、該当のアプリケーションが目的とするサービス（一例として、自動応答サービスなど）を第１サービスとして複数の電子機器１１０、１２０、１３０、１４０に提供してよい。他の例として、サーバ１６０は、上述したアプリケーションのインストールおよび実行のためのファイルを複数の電子機器１１０、１２０、１３０、１４０に配布するサービスを第２サービスとして提供してよい。 Servers 150, 160 are each configured by one or more computing devices that communicate with a plurality of electronic devices 110, 120, 130, 140 over network 170 to provide instructions, code, files, content, services, etc. may be realized. For example, the server 150 may be a system that provides a first service to a plurality of electronic devices 110, 120, 130, 140 connected via the network 170, and the server 160 may also be a system that provides a plurality of electronic devices connected via the network 170. It may be a system that provides the second service to the electronic devices 110 , 120 , 130 , 140 . As a more specific example, the server 150 provides a service intended by the application (for example, automatic response service, etc.) may be provided to a plurality of electronic devices 110, 120, 130, 140 as a first service. As another example, the server 160 may provide, as a second service, a service of distributing files for installing and executing the applications described above to the plurality of electronic devices 110 , 120 , 130 , 140 .

図２は、本発明の一実施形態における、電子機器およびサーバの内部構成を説明するためのブロック図である。図２では、電子機器に対する例として電子機器１１０の内部構成およびサーバ１５０の内部構成について説明する。また、他の電子機器１２０、１３０、１４０やサーバ１６０も、上述した電子機器１１０またはサーバ１５０と同一または類似の内部構成を有してよい。 FIG. 2 is a block diagram for explaining internal configurations of an electronic device and a server in one embodiment of the present invention. In FIG. 2, the internal configuration of the electronic device 110 and the internal configuration of the server 150 will be described as an example of the electronic device. Other electronic devices 120, 130, 140 and server 160 may also have the same or similar internal configurations as electronic device 110 or server 150 described above.

電子機器１１０およびサーバ１５０は、メモリ２１１、２２１、プロセッサ２１２、２２２、通信モジュール２１３、２２３、および入力／出力インタフェース２１４、２２４を含んでよい。メモリ２１１、２２１は、非一時的なコンピュータ読み取り可能な記録媒体であってよく、ＲＡＭ（ｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ）、ＲＯＭ（ｒｅａｄｏｎｌｙｍｅｍｏｒｙ）、ディスクドライブ、ＳＳＤ（ｓｏｌｉｄｓｔａｔｅｄｒｉｖｅ）、フラッシュメモリ（ｆｌａｓｈｍｅｍｏｒｙ）などのような非一時的な大容量記録装置を含んでよい。ここで、ＲＯＭ、ＳＳＤ、フラッシュメモリ、ディスクドライブのような非一時的な大容量記録装置は、メモリ２１１、２２１とは区分される別の非一時的な記録装置として電子機器１１０やサーバ１５０に含まれてもよい。また、メモリ２１１、２２１には、オペレーティングシステムと、少なくとも１つのプログラムコード（一例として、電子機器１１０においてインストールされて実行されるブラウザや、特定のサービスの提供のために電子機器１１０にインストールされたアプリケーションなどのためのコード）が記録されてよい。このようなソフトウェア構成要素は、メモリ２１１、２２１とは別のコンピュータ読み取り可能な記録媒体からロードされてよい。このような別のコンピュータ読み取り可能な記録媒体は、フロッピー（登録商標）ドライブ、ディスク、テープ、ＤＶＤ／ＣＤ－ＲＯＭドライブ、メモリカードなどのコンピュータ読み取り可能な記録媒体を含んでよい。他の実施形態において、ソフトウェア構成要素は、コンピュータ読み取り可能な記録媒体ではない通信モジュール２１３、２２３を通じてメモリ２１１、２２１にロードされてもよい。例えば、少なくとも１つのプログラムは、開発者またはアプリケーションのインストールファイルを配布するファイル配布システム（一例として、上述したサーバ１６０）がネットワーク１７０を介して提供するファイルによってインストールされるコンピュータプログラム（一例として、上述したアプリケーション）に基づいてメモリ２１１、２２１にロードされてよい。 Electronic device 110 and server 150 may include memory 211 , 221 , processors 212 , 222 , communication modules 213 , 223 , and input/output interfaces 214 , 224 . The memories 211 and 221 may be non-temporary computer-readable recording media such as RAM (random access memory), ROM (read only memory), disk drive, SSD (solid state drive), flash memory. ), etc. Here, non-temporary large-capacity storage devices such as ROMs, SSDs, flash memories, and disk drives are installed in the electronic device 110 and the server 150 as non-temporary storage devices separate from the memories 211 and 221. may be included. The memories 211 and 221 also store an operating system and at least one program code (for example, a browser installed and executed in the electronic device 110 or a browser installed in the electronic device 110 to provide a specific service). code for applications, etc.) may be recorded. Such software components may be loaded from a computer-readable medium separate from memories 211,221. Such other computer-readable recording media may include computer-readable recording media such as floppy drives, disks, tapes, DVD/CD-ROM drives, memory cards, and the like. In other embodiments, software components may be loaded into memory 211, 221 through communication modules 213, 223 that are not computer-readable media. For example, at least one program is a computer program (as an example, the above-mentioned may be loaded into the memory 211, 221 based on the application).

プロセッサ２１２、２２２は、基本的な算術、ロジック、および入出力演算を実行することにより、コンピュータプログラムの命令を処理するように構成されてよい。命令は、メモリ２１１、２２１または通信モジュール２１３、２２３によって、プロセッサ２１２、２２２に提供されてよい。例えば、プロセッサ２１２、２２２は、メモリ２１１、２２１のような記録装置に記録されたプログラムコードにしたがって受信される命令を実行するように構成されてよい。 Processors 212, 222 may be configured to process computer program instructions by performing basic arithmetic, logic, and input/output operations. Instructions may be provided to processors 212 , 222 by memory 211 , 221 or communication modules 213 , 223 . For example, processors 212 , 222 may be configured to execute instructions received according to program code stored in storage devices, such as memories 211 , 221 .

通信モジュール２１３、２２３は、ネットワーク１７０を介して電子機器１１０とサーバ１５０とが互いに通信するための機能を提供してもよいし、電子機器１１０および／またはサーバ１５０が他の電子機器（一例として、電子機器１２０）または他のサーバ（一例として、サーバ１６０）と通信するための機能を提供してもよい。一例として、電子機器１１０のプロセッサ２１２がメモリ２１１のような記録装置に記録されたプログラムコードにしたがって生成した要求が、通信モジュール２１３の制御にしたがってネットワーク１７０を介してサーバ１５０に伝達されてよい。これとは逆に、サーバ１５０のプロセッサ２２２の制御にしたがって提供される制御信号や命令、コンテンツ、ファイルなどが、通信モジュール２２３とネットワーク１７０を経て電子機器１１０の通信モジュール２１３を通じて電子機器１１０に受信されてよい。例えば、通信モジュール２１３を通じて受信されたサーバ１５０の制御信号や命令、コンテンツ、ファイルなどは、プロセッサ２１２やメモリ２１１に伝達されてよく、コンテンツやファイルなどは、電子機器１１０がさらに含むことのできる記録媒体（上述した非一時的な記録装置）に記録されてよい。 Communication modules 213 and 223 may provide functions for electronic device 110 and server 150 to communicate with each other via network 170, and electronic device 110 and/or server 150 may communicate with other electronic devices (for example, , electronics 120) or other servers (eg, server 160). As an example, requests generated by the processor 212 of the electronic device 110 according to program code recorded in a recording device such as the memory 211 may be communicated to the server 150 via the network 170 under the control of the communication module 213 . Conversely, control signals, instructions, contents, files, etc. provided under the control of the processor 222 of the server 150 are received by the electronic device 110 through the communication module 213 of the electronic device 110 via the communication module 223 and the network 170. may be For example, control signals, instructions, content, files, etc. of the server 150 received through the communication module 213 may be transferred to the processor 212 and the memory 211, and the content, files, etc. may be further stored in the electronic device 110. It may be recorded on a medium (the non-temporary recording device described above).

入力／出力インタフェース２１４は、入力／出力装置２１５とのインタフェースのための手段であってよい。例えば、入力装置は、キーボード、マウス、マイクロフォン、カメラなどの装置を、出力装置は、ディスプレイ、スピーカ、触覚フィードバックデバイスなどのような装置を含んでよい。他の例として、入力／出力インタフェース２１４は、タッチスクリーンのように入力と出力のための機能が１つに統合された装置とのインタフェースのための手段であってもよい。入力／出力装置２１５は、電子機器１１０と１つの装置で構成されてもよい。また、サーバ１５０の入力／出力インタフェース２２４は、サーバ１５０に接続するかサーバ１５０が含むことのできる入力または出力のための装置（図示せず）とのインタフェースのための手段であってよい。より具体的な例として、電子機器１１０のプロセッサ２１２がメモリ２１１にロードされたコンピュータプログラムの命令を処理するにあたり、サーバ１５０や電子機器１２０が提供するデータを利用して構成されるサービス画面やコンテンツが、入力／出力インタフェース２１４を通じてディスプレイに表示されてよい。 Input/output interface 214 may be a means for interfacing with input/output devices 215 . For example, input devices may include devices such as keyboards, mice, microphones, cameras, and output devices may include devices such as displays, speakers, tactile feedback devices, and the like. As another example, input/output interface 214 may be a means for interfacing with a device that integrates functionality for input and output, such as a touch screen. Input/output device 215 may be one device with electronic device 110 . Input/output interface 224 of server 150 may also be a means for interfacing with devices (not shown) for input or output that are connected to server 150 or that server 150 may include. As a more specific example, when the processor 212 of the electronic device 110 processes the instructions of the computer program loaded in the memory 211, service screens and content configured using data provided by the server 150 and the electronic device 120 are displayed. may be displayed on the display through input/output interface 214 .

また、他の実施形態において、電子機器１１０およびサーバ１５０は、図２の構成要素よりも多くの構成要素を含んでもよい。しかし、大部分の従来技術的構成要素を明確に図に示す必要はない。例えば、電子機器１１０は、上述した入力／出力装置２１５のうちの少なくとも一部を含むように実現されてもよいし、トランシーバ、カメラ、各種センサ、データベースなどのような他の構成要素をさらに含んでもよい。より具体的な例として、電子機器１１０がＡＩスピーカである場合、一般的にＡＩスピーカが含んでいる各種センサ、カメラモジュール、物理的な各種ボタン、タッチパネルを利用したボタン、入力／出力ポート、振動のための振動器などのような多様な構成要素が、電子機器１１０にさらに含まれるように実現されてよい。 Also, in other embodiments, electronic device 110 and server 150 may include more components than those in FIG. However, most prior art components need not be explicitly shown in the figures. For example, electronic device 110 may be implemented to include at least some of the input/output devices 215 described above, and may also include other components such as transceivers, cameras, various sensors, databases, and the like. It's okay. As a more specific example, if the electronic device 110 is an AI speaker, various sensors, camera modules, physical buttons, touch panel-based buttons, input/output ports, vibration Various components, such as a vibrator for, may be implemented to be further included in electronic device 110 .

自動応答システムは、ユーザとの会話に基づいてレストラン、宿泊施設、航空券、映画、公演、病院（診療）、旅行などに関する各種情報を伝達する自動応答サービスプラットフォームを提供するものである。 The auto-response system provides an auto-response service platform that transmits various information regarding restaurants, accommodations, airline tickets, movies, performances, hospitals (medical care), travel, etc., based on conversations with users.

以下では、一例として、レストランの店員に代わってユーザと自然に通話をしながら、レストランへの問い合わせ、予約、配達注文などを処理することについて説明するが、これは一例に過ぎず、これに限定されてはならず、自動応答システムの使用が可能な企業や分野のすべてに適用可能である。 In the following, as an example, processing an inquiry to a restaurant, a reservation, a delivery order, etc. while naturally talking with a user on behalf of a restaurant clerk will be described, but this is only an example and is limited to this. It should not be used and is applicable to all companies and sectors where the use of automated response systems is possible.

図３は、本発明の一実施形態における、ＡＩ自動応答システムを説明するための例示図である。 FIG. 3 is an exemplary diagram for explaining the AI automatic response system in one embodiment of the present invention.

例えば、本発明の実施形態に係るＡＩ自動応答システム３００は、図１と図２を参照しながら説明したサーバ１５０上に実現されてよい。 For example, the AI automated response system 300 according to embodiments of the present invention may be implemented on the server 150 described with reference to FIGS. 1 and 2. FIG.

図３を参照すると、ＡＩ自動応答システム３００は、複数の企業３１～３３に対する問い合わせ、予約、配達注文などを処理するための自動応答サービスを提供するものであってよい。 Referring to FIG. 3, AI autoresponder system 300 may provide an autoresponder service for processing inquiries, reservations, delivery orders, etc. for multiple businesses 31-33.

ＡＩ自動応答システム３００は、電話機能（ｃａｌｌ）やチャットボット（ｃｈａｔｂｏｔ）による通話によってユーザ３０１との人工知能会話を提供してよく、ユーザ３０１との会話に基づき、ユーザ３０１が望む情報を提供したり、ユーザ３０１の要求を企業３１～３３に伝達したりしてよい。 The AI automatic response system 300 may provide an artificial intelligence conversation with the user 301 through a phone call or chatbot, and provides information desired by the user 301 based on the conversation with the user 301. or communicate the request of the user 301 to the companies 31-33.

ＡＩ自動応答システム３００は、企業３１～３３と関連する自動応答サービスを提供するために、各企業３１～３３別の企業情報が含まれたデータベースシステム（図示せず）と１つのシステムで実現されてもよいし、あるいは連動可能な別のシステムで実現されてもよい。ＡＩ自動応答システム３００は、企業情報に基づき、ユーザ３０１が望む情報を提供したり、ユーザ３０１の要求を処理したりしてよい。 The AI automatic response system 300 is realized by a database system (not shown) containing company information of each company 31-33 and a single system in order to provide an automatic response service related to the companies 31-33. Alternatively, it may be implemented in another system that can be linked. The AI automatic response system 300 may provide information desired by the user 301 or process the request of the user 301 based on company information.

例えば、ＡＩ自動応答システム３００は、ユーザ３０１との会話に基づいて動作するインタフェースが含まれた電子機器１１０から、ユーザ３０１の発話による音声入力「企業Ａに注文可能ですか？」を受信したとする。これにより、ＡＩ自動応答システム３００は、電子機器１１０から受信されたユーザ３０１の音声入力「企業Ａに注文可能ですか？」を認識および分析した後、企業情報に基づいて返答「現在、注文可能です。メニューをお伝えください。」を生成し、生成された返答を音声信号によって電子機器１１０に送出してよい。ＡＩ自動応答システム３００は、ユーザ３０１との会話に基づき、ユーザ３０１の要求、例えば、選択されたメニューや数量などをまとめた後、まとめた情報を該当の企業（３１～３３のうちの１つ）に伝達してよい。 For example, the AI automatic response system 300 receives a speech input by the user 301 "Can you place an order with Company A?" do. Accordingly, the AI automatic response system 300 recognizes and analyzes the voice input of the user 301 received from the electronic device 110, "Can you place an order with company A?" Please tell me the menu.", and the generated reply may be sent to the electronic device 110 by means of an audio signal. Based on the conversation with the user 301, the AI automatic response system 300 summarizes the request of the user 301, such as the selected menu and quantity, and then sends the summarized information to the relevant company (one of 31 to 33 ).

以下では、人工知能基盤の自動応答方法およびシステムの具体的な実施形態について説明する。 Hereinafter, specific embodiments of the AI-based automatic response method and system will be described.

図４は、本発明の一実施形態における、ＡＩ自動応答システムが含むことのできる構成要素の例を示した図である。 FIG. 4 is a diagram illustrating examples of components that an AI automated response system may include, in one embodiment of the present invention.

本実施形態に係るサーバ１５０は、企業と関連する自動応答サービスを提供するプラットフォームの役割を担う。特に、サーバ１５０は、人工知能を基盤として実際に人間と会話するような形態で実現されたＡＩ自動応答システム３００を含んでよい。 The server 150 according to the present embodiment serves as a platform that provides automatic response services related to companies. In particular, the server 150 may include an AI automatic response system 300 that is implemented in a form that actually converses with humans based on artificial intelligence.

ＡＩ自動応答システム３００は、図４に示すように、ゲートウェイ４０１、音声認識機４１０、ダイアログマネージャ４２０、返答生成器４３０、音声合成器４４０、感知部４５０、および翻訳機４６０を含んでよい。 AI automated attendant system 300 may include gateway 401, speech recognizer 410, dialog manager 420, reply generator 430, speech synthesizer 440, sensing unit 450, and translator 460, as shown in FIG.

ゲートウェイ４０１は、電子機器１１０にインストールされたアプリケーションの電話機能やチャットボットによって電子機器１１０のユーザの音声入力を受信する役割をする受信端（ＲＸ）と、ＡＲＳ応答によって電子機器１１０に返答音声を送出する役割をする送信端（ＴＸ）とを含んでよい。また、ゲートウェイ４０１には、返答音声の送出を制御するためのソケットコントローラ（ｓｏｃｋｅｔｃｏｎｔｒｏｌｌｅｒ）が含まれてよい。 The gateway 401 is a receiving end (RX) that receives voice input from a user of the electronic device 110 through a phone function or a chatbot of an application installed in the electronic device 110, and a response voice to the electronic device 110 through an ARS response. and a transmitting end (TX), which serves to transmit. Gateway 401 may also include a socket controller for controlling the sending of the reply voice.

音声認識機４１０、ダイアログマネージャ４２０、返答生成器４３０、音声合成器４４０、感知部４５０、および翻訳機４６０は、サーバ１５０のプロセッサ２２２の構成要素として含まれてよい。実施形態によって、プロセッサ２２２の構成要素は、選択的にプロセッサ２２２に含まれても除外されてもよい。また、実施形態によって、プロセッサ２２２の構成要素は、プロセッサ２２２の機能の表現のために分離されても併合されてもよい。 Speech recognizer 410 , dialog manager 420 , reply generator 430 , speech synthesizer 440 , detector 450 and translator 460 may be included as components of processor 222 of server 150 . Depending on the embodiment, components of processor 222 may be selectively included or excluded from processor 222 . Also, depending on the embodiment, the components of processor 222 may be separated or merged to represent the functionality of processor 222 .

このようなプロセッサ２２２およびプロセッサ２２２の構成要素は、以下で説明されるＡＩ自動応答方法の実行のために、制御命令による演算を直接処理してもよいし、またはサーバ１５０を制御してもよい。例えば、プロセッサ２２２およびプロセッサ２２２の構成要素は、メモリ２２１が含むオペレーティングシステムのコードと、少なくとも１つのプログラムのコードとによる命令（ｉｎｓｔｒｕｃｔｉｏｎ）を実行するように実現されてよい。 Such processor 222 and components of processor 222 may directly process operations according to control instructions or may control server 150 for execution of the AI auto-response methods described below. . For example, processor 222 and components of processor 222 may be implemented to execute instructions according to the code of an operating system and the code of at least one program contained in memory 221 .

ここで、プロセッサ２２２の構成要素は、サーバ１５０に記録されたプログラムコードが提供する命令にしたがってプロセッサ２２２によって実行される、プロセッサ２２２の互いに異なる機能（ｄｉｆｆｅｒｅｎｔｆｕｎｃｔｉｏｎｓ）の表現であってよい。例えば、サーバ１５０が電子機器１１０から受信された音声入力を認識するように上述した命令にしたがってサーバ１５０を制御するプロセッサ２２２の機能的表現として、音声認識機４１０が利用されてよい。 Here, the components of processor 222 may be representations of different functions of processor 222 executed by processor 222 according to instructions provided by program code stored on server 150 . For example, speech recognizer 410 may be utilized as a functional representation of processor 222 that controls server 150 according to the instructions described above such that server 150 recognizes speech input received from electronic device 110 .

プロセッサ２２２は、サーバ１５０の制御と関連する命令がロードされたメモリ２２１から必要な命令を読み取ってよい。この場合、前記読み取られた命令は、以下で説明するＡＩ自動応答方法をプロセッサ２２２が実行するように制御するための命令を含んでよい。 Processor 222 may read the necessary instructions from memory 221 loaded with instructions associated with the control of server 150 . In this case, the read instructions may include instructions for controlling processor 222 to perform the AI auto-response method described below.

以下のＡＩ自動応答方法は、図に示した順に発生しなくてもよく、段階の一部が省略されたり追加の過程がさらに含まれたりしてもよい。 The AI auto-response method below does not have to occur in the order shown, some steps may be omitted, or additional steps may be included.

図５は、本発明の一実施形態における、ＡＩ自動応答方法の一例を示したフローチャートである。 FIG. 5 is a flow chart showing an example of an AI automatic response method in one embodiment of the present invention.

段階５１０で、プロセッサ２２２は、語尾を除いた文章と該当の文章に対する返答を含んだ学習データセットを利用して会話学習を実行してよい。ＡＩ自動応答システム３００とユーザとの会話のためには、語尾を除いた文章を学習データ水準にラベリングした後、ラベリングされた学習データをディープラーニングや機械学習によって学習して会話学習モデルを構築してよい。このとき、学習文章それぞれに対し、確率の高い返答も学習データセットとしてともに構成して会話学習に利用してよい。 At step 510, the processor 222 may perform conversation learning using the training data set containing the sentences with the endings removed and the responses to the corresponding sentences. For the conversation between the AI automatic response system 300 and the user, after labeling the sentences excluding the endings to the learning data level, the labeled learning data is learned by deep learning or machine learning to build a conversation learning model. you can At this time, for each of the learning sentences, a high-probability reply may also be configured together as a learning data set and used for conversation learning.

図６は、本発明の一実施形態における、学習文章の例を説明するための図である。 FIG. 6 is a diagram for explaining examples of learning sentences in one embodiment of the present invention.

例えば、図６を参照すれば、自動応答サービスで登場する会話ログからサンプル文章６０１を抽出してよく、抽出されたサンプル文章から語尾を除いた文章を学習文章６０２として活用してよい。例えば、サンプル文章「今週の土曜日、レストランの予約は可能ですか？」から語尾「ですか？」を除いた残りの文章「今週の土曜日、レストランの予約は可能？」を学習文章として利用してよい。語尾を除いた残りの文章を学習させることにより、会話の途中である言葉が終わる前に、どのような返答をすべきかを示す正解セットが予め分かるように学習することができる。文章を終わらせる語末語尾を意味する終結語尾だけでなく、接続節に入っている接続語尾、埋め込み節に入っている転成語尾などの非終結語尾、あるいは核心キーワード以外の残りの構成要素を除いた文章を学習文章として活用することも可能である。 For example, referring to FIG. 6, sample sentences 601 may be extracted from a conversation log that appears in an automatic response service, and sentences excluding the endings of the extracted sample sentences may be used as learning sentences 602 . For example, the sample sentence "Is it possible to make a restaurant reservation this Saturday?" good. By learning the remaining sentences excluding the endings of words, it is possible to learn in advance the correct answer set indicating what kind of response should be given before the end of a certain word in the middle of the conversation. Not only ending endings that end sentences, but also non-ending endings such as connecting endings in connecting clauses, inversion endings in embedded clauses, and remaining constituents other than core keywords. It is also possible to utilize sentences as learning sentences.

言い換えれば、ＡＩ自動応答システム３００は、語尾を除いた文章を学習させて事前に構築された会話学習モデルを含むものである。 In other words, the AI automatic response system 300 includes a conversation learning model built in advance by learning sentences excluding endings.

再び図５において、段階５２０で、音声認識機４１０は、電子機器１１０のユーザと通話が繋がれば、ゲートウェイ４０１を介して電子機器１１０からユーザの発話音声をリアルタイムストリームで受信してよく、このとき、音声認識機４１０は、受信された発話音声をＳＴＴ（ｓｐｅｅｃｈｔｏｔｅｘｔ）によってテキストにリアルタイムで変換してダイアログマネージャ４２０に伝達してよい。 Referring back to FIG. 5, at step 520, speech recognizer 410 may receive a real-time stream of the user's speech from electronic device 110 via gateway 401 once a call is established with the user of electronic device 110. At this time, the speech recognizer 410 may convert the received uttered speech into text in real time by STT (speech to text) and transmit the text to the dialog manager 420 .

段階５３０で、ダイアログマネージャ４２０は、音声認識機４１０から伝達されたテキストに対し、自然語理解（ＮＬＵ）技術と会話学習に基づいてユーザ意図（ｉｎｔｅｎｔ）を分析してよい。特に、ダイアログマネージャ４２０は、テキストの音節単位でユーザの意図を把握してよい。つまり、ダイアログマネージャ４２０は、ユーザの言葉が終わってから最終結果（ｆｉｎａｌｒｅｓｕｌｔｓ）として取得する文章単位ではなく、リアルタイムテキスト変換によって中間結果として取得する音節単位に区切ってユーザ意図を把握してよい。ダイアログマネージャ４２０は、言葉が終わる前の中間結果の段階で、ユーザ発話の意味を予め把握してよい。 At step 530, the dialog manager 420 may analyze the text communicated from the speech recognizer 410 for user intent based on natural language understanding (NLU) techniques and conversational learning. In particular, dialog manager 420 may capture the user's intent on a syllable-by-syllable basis of the text. That is, the dialog manager 420 may grasp the user's intention by dividing it into syllable units obtained as intermediate results through real-time text conversion instead of sentence units obtained as final results after the user finishes speaking. Dialog manager 420 may pre-know the meaning of user utterances at the stage of intermediate results before the end of the utterance.

段階５４０で、返答生成器４３０は、意図分析結果に基づいて返答を予め生成してよい。特に、返答生成器４３０は、中間結果を利用した意図分析結果に対するコンフィデンス（ｃｏｎｆｉｄｅｎｃｅ）に基づいて返答生成時点を決定してよい。一例として、返答生成器４３０は、言葉が終わる前の中間結果の段階で、意図分析結果のコンフィデンスが事前に定められた閾値以上となるときに、該当の時点にユーザ意図に対応する返答を予め生成してよい。返答生成時点を決定するためのコンフィデンスは、語尾を除いた文章を利用した会話学習によって決定されてよく、例えば、中間結果に対するコンフィデンスが最終結果と比べて誤差範囲内にある数値を見つけ出す過程を繰り返した後、繰り返しの過程によって得られた値の統計値に基づいて返答生成時点を決定するためのコンフィデンス閾値を決定してよい。 At step 540, the reply generator 430 may pre-generate a reply based on the intent analysis results. In particular, the response generator 430 may determine the response generation time point based on the confidence of the intention analysis result using the intermediate result. As an example, the reply generator 430 generates a reply corresponding to the user's intention at that point in time when the confidence of the intention analysis result is equal to or greater than a predetermined threshold at the intermediate result stage before the end of the word. may be generated. Confidence for determining the time to generate a reply may be determined by conversational learning using sentences with endings removed, for example, repeating the process of finding a numerical value where the confidence for the intermediate result is within the error range compared to the final result. A confidence threshold for determining when to generate a response may then be determined based on the statistics of values obtained by the iterative process.

図７および図８は、本発明の一実施形態における、ユーザの発話音声に対する中間結果を利用してユーザの意図を分析する過程を説明するための例示図である。 7 and 8 are exemplary diagrams for explaining the process of analyzing the user's intention using an intermediate result of the user's uttered voice according to an embodiment of the present invention.

図７を参照すると、ユーザの発話音声「両親を連れて行く予定なのですが、年配の方におすすめのメニューはありますか？」に対して言葉が終わる前の中間結果に基づいてユーザ意図を分析する場合、「両親」、「年配」、「メニュー」まで把握したときに返答を生成することのできる閾値のコンフィデンスが出現してよい。ユーザの言葉が終わる前、つまり、ユーザが語尾「ありますか？」を発する時間に、予め返答を生成しておくことが可能となる。 Referring to FIG. 7, the user's intention is analyzed based on the intermediate result before the end of the user's utterance voice "I am going to take my parents. Is there a recommended menu for elderly people?" If so, there may emerge a threshold confidence that can generate a response when ascertaining "parents", "elderly", and even "menu". Before the end of the user's speech, that is, at the time when the user utters the ending "Are you?", it is possible to generate a reply in advance.

他の例として、図８を参照すると、予約のための自動応答サービスの場合、ユーザの発話音声「今週の土曜日、３人で予約お願いします。」に対し、中間結果に基づいてユーザ意図を分析するようになるが、このとき、意図分析結果により、予約と関連して事前に定められた必要な情報（情報スロット）がすべて満たされれば、返答を生成することのできるコンフィデンスに達したと判断し、ユーザの言葉が終わる前、つまり、ユーザが語尾「お願いします。」を発する時間に、予め返答を生成しておくことが可能となる。 As another example, referring to FIG. 8, in the case of an automatic response service for making a reservation, the user's utterance "Please make a reservation for three people this Saturday." At this time, the intent analysis results indicate that if all the pre-determined required information (information slots) related to the reservation are satisfied, the confidence is reached that a response can be generated. It is possible to judge and generate a response in advance before the user finishes speaking, that is, at the time when the user utters the ending "Please."

再び図５において、段階５５０で、音声合成器４４０は、段階５４０で生成された返答を音声信号として合成した後、ユーザの言葉が終われば、ゲートウェイ４０１を介して電子機器１１０に送出してよい。一例として、音声合成器４４０は、ユーザの発話音声からエンドポイント（ｅｎｄｐｏｉｎｔ）が感知される場合にユーザの言葉が終わったと判断してよく、このとき、発話音声が感知されない状態が事前に定められた一定時間以上に維持される場合、エンドポイントとして認識してよい。音声合成器４４０は、ユーザの言葉が終わる前に中間結果によって生成された返答を予め音声信号として合成しておき、ユーザの言葉が終われば、返答音声を電子機器１１０に送出してよい。 5, at step 550, the voice synthesizer 440 may synthesize the response generated at step 540 as a voice signal, and then send it to the electronic device 110 via the gateway 401 when the user finishes speaking. . For example, the speech synthesizer 440 may determine that the user's speech is over when an end point is detected from the user's utterance, and at this time, a state in which the utterance is not detected is predetermined. It may be recognized as an endpoint if it is maintained for a certain period of time or more. The voice synthesizer 440 may synthesize a reply generated by the intermediate result as a voice signal in advance before the user finishes speaking, and may send the reply voice to the electronic device 110 after the user finishes speaking.

したがって、ＡＩ自動応答システム３００は、ユーザの言葉が終わる前に中間結果からユーザ意図を把握し、ユーザが語尾を発する時間に返答を予め生成および合成した後、ユーザの言葉が終われば、予め準備しておいた返答を提供することにより、応答をより迅速に提供することができる。 Therefore, the AI automatic response system 300 grasps the user's intention from the intermediate result before the user's speech ends, generates and synthesizes a reply in advance at the time when the user utters the ending, and then prepares in advance after the user's speech ends. By providing a saved reply, a response can be provided more quickly.

図９は、本発明の一実施形態における、ＡＩ自動応答方法の他の例を示したフローチャートである。以下のＡＩ自動応答方法は、上述した段階５５０に含まれてよい。 FIG. 9 is a flow chart showing another example of the AI automatic response method in one embodiment of the present invention. The following AI auto-response methods may be included in step 550 described above.

段階９０１で、ダイアログマネージャ４２０は、音声合成器４４０から電子機器１１０に返答音声が送出されている間に、ユーザの発話音声が受信されるかを持続的にチェックしてよい。 At step 901 , the dialog manager 420 may continuously check whether the user's spoken voice is received while the reply voice is being sent from the voice synthesizer 440 to the electronic device 110 .

段階９０２で、音声合成器４４０は、返答音声が送出されている途中にユーザの発話音声が受信されれば、返答音声の送出を中断してよい。 In step 902, the voice synthesizer 440 may stop sending the reply voice if the user's uttered voice is received while the reply voice is being sent.

ＡＩ自動応答システム３００は、基本的に、ゲートウェイ４０１として受信端（ＲＸ）と送信端（ＴＸ）とが共存する構造を含んでよく、このとき、ゲートウェイ４０１には、ユーザの発話音声の受信と返答音声の送出を制御するためのソケットコントローラが含まれてよい。ＡＩ自動応答システム３００は、送信端（ＴＸ）から返答音声を送出している途中に受信端（ＲＸ）にユーザの発話音声による音声信号が入力されれば、送信端（ＴＸ）から送出する音声信号を中断してよい。 The AI automatic response system 300 may basically include a structure in which a receiving end (RX) and a transmitting end (TX) coexist as a gateway 401. At this time, the gateway 401 receives and receives the user's uttered voice. A socket controller may be included to control the delivery of the reply voice. AI automatic response system 300, if a voice signal of the user's uttered voice is input to the receiving end (RX) while the transmitting end (TX) is transmitting the response voice, the voice sent from the transmitting end (TX) You can interrupt the signal.

一例として、音声合成器４４０は、ユーザの発話音声が入力されれば、送出中であった返答音声を直ぐに中断してよい。他の例として、音声合成器４４０は、ユーザの発話音声が入力されれば、送出中であった返答音声に対して定められた長さのフェードアウト（ｆａｄｅ－ｏｕｔ）を適用した後、返答音声を中断してよい。 As an example, the voice synthesizer 440 may immediately interrupt the response voice being sent when the user's uttered voice is input. For another example, if the voice uttered by the user is input, the voice synthesizer 440 applies fade-out of a predetermined length to the response voice being sent, and then outputs the response voice. may be interrupted.

返答音声の送出途中に受信されるすべての発話音声に対して返答音声を中断するのではなく、非言語的（ｎｏｎ－ｖｅｒｂａｌ）フィルタを利用して選択的に返答音声を中断してよい。 Rather than suspending the response voice for all speech responses received during the transmission of the response voice, a non-verbal filter may be used to selectively suspend the response voice.

図１０は、本発明の一実施形態における、返答の送出を中断する過程の他の例を説明するための図である。 FIG. 10 is a diagram for explaining another example of the process of interrupting transmission of a reply in one embodiment of the present invention.

図１０を参照すると、段階１００１で、感知部４５０は、無視しなければならない音声として非言語的表現をフィルタリングするために、返答音声の送出途中に受信された発話音声が非言語的形態の表現であるかを判断してよい。感知部４５０は、返答送出途中に返答送出を中断しなくてもよいコンティニュ語（ｃｏｎｔｉｎｕｅｒ）として非言語的表現を判断してよく、さらに、非言語的表現をユーザの意図を把握しなくてもよいものとして判断してよい。 Referring to FIG. 10, in step 1001, the sensing unit 450 detects the non-verbal expression of the speech received during the transmission of the reply speech in order to filter the non-verbal expression as speech that should be ignored. You can judge whether it is The sensing unit 450 may determine the non-verbal expression as a continuer that does not need to interrupt the sending of the reply during the sending of the reply, and furthermore, the non-verbal expression should not be used to grasp the user's intention. may be judged as good.

図１１は、本発明の一実施形態における、返答送出を中断しない例外状況を説明するための図である。 FIG. 11 is a diagram for explaining an exception situation that does not interrupt reply sending in one embodiment of the present invention.

例えば、図１１を参照すれば、［はい、ええ、はいはい、・・・］のような同意や首肯の表現、あるいは［うん、ああ、あ、・・・］のような各種感嘆詞などを非言語的表現として分類し、このような非言語的表現を学習してよい。この他にも、非言語的表現の一例として感情表現が含まれた音声を学習してよく、このとき、声の波形や言葉の終端の特徴を分析して疑問符と終止符を区分することによって感情の高まりまで把握してよい。感知部４５０は、非言語的表現の学習結果に基づき、返答音声の送出途中に受信された発話音声が非言語的表現に該当するかを判断してよい。 For example, referring to FIG. 11, expressions of agreement or agreement such as [Yes, yes, yes, yes, ...], or various exclamations such as [Yeah, ah, ah, ...] Classified as verbal expressions, such non-verbal expressions may be learned. In addition, as an example of non-verbal expressions, speech containing emotional expressions may be learned. You can grasp up to the height of . The sensing unit 450 may determine whether the uttered voice received during the transmission of the response voice corresponds to the non-verbal expression based on the learning result of the non-verbal expression.

再び図１０において、段階１００２で、音声合成器４４０は、受信された発話音声が非言語的表現に該当する場合には、無視しなければならない音声であると判断して返答音声の送出をそのまま維持し、非言語的表現に該当しない発話音声が受信される場合には、無視してはならない意味のある音声と判断して返答音声の送出を中断してよい。 Referring back to FIG. 10, in step 1002, if the received uttered voice corresponds to a non-verbal expression, the voice synthesizer 440 determines that the received voice should be ignored and continues to transmit the response voice. If a speech voice that does not correspond to a non-verbal expression is received, it may be determined as a meaningful voice that should not be ignored, and transmission of the response voice may be interrupted.

したがって、ＡＩ自動応答システム３００は、返答が送出されている途中にユーザ発話による音声信号が受信される場合、返答の送出を中断することにより、実際に人間と通話するような形態を実現することができる。 Therefore, if the AI automatic response system 300 receives a voice signal by the user's utterance while the reply is being sent, it interrupts the sending of the reply, thereby realizing a form of actually talking with a human being. can be done.

ダイアログマネージャ４２０は、自動応答サービスで人間と行うような自然な会話をサポートするために、失敗区間に対する適切な対応を提供してよい。一例として、ダイアログマネージャ４２０は、音声認識のための結果情報量が足りない場合、例えば、認識された意図（インテント）が足りないか、サービスと関連するスロットの情報が足りない場合、該当の情報に関する発話を誘導するための問い返しの質問を提供してよい。 Dialog manager 420 may provide appropriate responses to failure intervals to support natural conversations, such as those experienced with humans in automated attendant services. As an example, if the amount of result information for speech recognition is insufficient, e.g. Interrogative questions may be provided to induce utterances about the information.

他の例として、ダイアログマネージャ４２０は、音声認識のための結果情報量が多すぎる場合、例えば、２つ以上の意図が一度に認識される場合、意図を明確にするために、返答に先立ち、ユーザに該当の意図を確認するための質問を提供してよい。例えば、ユーザ発話「年配の方におすすめのメニューがあれば、今週の土曜日にレストランを予約したいです」から「おすすめメニュー」の意図と「予約」の意図とが同時に把握された場合、「先ずはおすすめメニューをご案内してから、ご予約を承ってもよいですか？」のように、ユーザコンファームのための質問を提供してよい。 As another example, if the resulting amount of information for speech recognition is too large, e.g., if more than one intent is recognized at once, the dialog manager 420 may use the A question may be provided to the user to confirm the intent in question. For example, if the user's utterance "If there is a recommended menu for elderly people, I would like to make a restaurant reservation for this Saturday." You may provide a question for user confirmation, such as "Can I make a reservation after showing you the recommended menu?"

また、質問が異なる意図を含む場合、例えば「子供用の椅子はありますか？」のようなユーザ発話から、椅子があるかに対して答えるだけでなく、子供連れの予約であるかについて追加で質問してよい。 In addition, if the question contains different intentions, for example, from the user utterance such as "Do you have a chair for children?" you can ask.

また他の例として、ダイアログマネージャ４２０は、音声認識のための結果情報量が多すぎる場合、例えば、２つ以上の意図が一度に認識される場合、返答を意図別に順に提供してよい。例えば、それぞれの意図に、意図認識順にしたがってナンバリング（１つ目、２つ目など）を適用して順に返答を提供してよい。ユーザ発話「年配の方におすすめのメニューがあれば、今週の土曜日にレストランを予約したいです」に対し、「１つ目、ご年配のお客さまのおすすめメニューとして韓定食コースをご用意しております。２つ目、今週の土曜日の何時にご予約をご希望ですか？」のように、返答を意図別に順に提供してよい。 As yet another example, dialog manager 420 may provide responses in order by intent when the amount of resulting information for speech recognition is too large, eg, when more than one intent is recognized at once. For example, a numbering (first, second, etc.) may be applied to each intent according to the order of intent recognition and responses may be provided in sequence. In response to the user's utterance, "If there is a menu recommended for elderly people, I would like to make a reservation at a restaurant this Saturday." Second, what time would you like to make a reservation for this Saturday?”

ユーザ発話「予約したいのですが、駐車場はありますか？」のようなユーザ発話の場合、駐車場があれば予約をするという意味として捉えることもできるし、駐車場がなければ車は利用しないが、予約はしたいという意味として捉えることもできる。本発明では、このような多様なユーザの意図を考慮しながら返答を提供することができる。 User utterances such as "I would like to make a reservation. Do you have a parking lot?" However, it can also be taken as meaning that you want to make a reservation. In the present invention, it is possible to provide responses while considering such various user intentions.

また他の例として、ダイアログマネージャ４２０は、単位時間内、あるいは連続的に定められた回数以上にまったく同じであるか類似する返答が繰り返される場合、失敗区間に対する対応として情報量によって処理してよい。さらに他の例として、ダイアログマネージャ４２０は、以前の会話文脈を反映したマルチターン会話手法に基づいて返答を提供してよい。この他にも、システム発話の定義問題、自然語理解（ＮＬＵ）エラーなどに対して適切な対応を提供することにより、いかなる状況でも実際に人間と通話するような自然な会話形態を実現することができる。 As another example, the dialog manager 420 may process the same or similar responses repeatedly within a unit time or more than a predetermined number of times as a response to the failed section according to the amount of information. . As yet another example, dialog manager 420 may provide responses based on a multi-turn conversation approach that reflects previous conversation context. In addition to this, by providing appropriate responses to system utterance definition problems, natural language understanding (NLU) errors, etc., realize a natural conversation form that actually talks to humans in any situation. can be done.

また、ダイアログマネージャ４２０は、音声認識機４１０によってユーザの発話速度を認識してよく、ユーザの発話速度に合わせて応答速度を異にして適用してよい。一例として、ダイアログマネージャ４２０は、ユーザの発話速度に比例して返答音声の発話速度を決定してよく、例えば、速く質問するユーザであれば質問速度に合わせて速く応答してよい。 In addition, the dialog manager 420 may recognize the speech rate of the user through the speech recognizer 410, and may apply different response speeds according to the speech rate of the user. As an example, the dialog manager 420 may determine the speech rate of the reply voice in proportion to the speech rate of the user, eg, a user who asks questions quickly may respond quickly to match the question rate.

したがって、ＡＩ自動応答システム３００は、ユーザ発話速度に合わせて応答速度を異にして提供することにより、ユーザ発話速度に合ったインタラクションによって適したサービスを提供することができる。 Therefore, the AI automatic response system 300 provides different response speeds according to the user's speech speed, thereby providing a suitable service through interaction suitable for the user's speech speed.

さらに、ダイアログマネージャ４２０は、リアルタイム翻訳を利用してユーザが発話している言語に翻訳して自動応答サービスを提供してよい。ダイアログマネージャ４２０は、少なくとも１つの言語モデルを含む翻訳機４６０を含むか、連動可能な形態で構成されてよく、ユーザの発話言語と対応する言語モデルに切り換えてリアルタイム翻訳による自動応答サービスを提供してよい。一例として、翻訳機４６０は、韓国語モデル、英語モデル、日本語モデルが維持されてよく、感知部４５０は、言語を設定するためのウェイクアップワード（ｗａｋｅｕｐｗｏｒｄ）としてユーザの発話音声のうちから先頭部分の言語を感知してよい。このとき、ダイアログマネージャ４２０は、ウェイクアップワードに基づいてユーザの発話言語を認識してよく、ユーザの発話言語に該当する言語モデルに分岐して自動応答サービスを提供してよい。例えば、ダイアログマネージャ４２０は、ユーザ発話の最初の一言として
（外１）

が認識されれば韓国語モデルとして、「ｈｅｌｌｏ（もしもしに該当する英語）」が認識されれば英語モデルとして、「もしもし」が認識されれば日本語モデルとして、翻訳機４６０の言語モデルを分岐して自動応答サービスを提供してよい。言い換えれば、ダイアログマネージャ４２０は、外国人と電話が繋がった場合でも、最初の発話の言語を感知し、該当の言語モデルを利用してレストランへの問い合わせや予約、配達注文などのための自動応答サービスを提供することができる。 Additionally, the dialog manager 420 may utilize real-time translation to translate into the language the user is speaking to provide an automated response service. The dialog manager 420 includes a translator 460 that includes at least one language model, or may be configured in a manner that can be linked, and switches to a language model corresponding to the user's spoken language to provide an automatic response service with real-time translation. you can For example, the translator 460 may maintain a Korean model, an English model, and a Japanese model, and the sensing unit 450 may select one of the user's uttered voices as a wakeup word for setting the language. The language of the beginning part may be sensed. At this time, the dialog manager 420 may recognize the user's spoken language based on the wake-up word, and branch to a language model corresponding to the user's spoken language to provide an automatic response service. For example, the dialog manager 420 may use (outside 1) as the first word of user utterance.

is recognized, the language model of the translator 460 is branched into a Korean model, "hello (English corresponding to hello)" is recognized as an English model, and if "moshi moshi" is recognized as a Japanese model. may provide an automated response service. In other words, the dialog manager 420 senses the language of the first utterance even when a call is made with a foreigner, and automatically responds to restaurant inquiries, reservations, delivery orders, etc. using the corresponding language model. can provide services.

したがって、ＡＩ自動応答システム３００は、リアルタイム翻訳を利用してユーザが発話している言語に翻訳して返答を提供することにより、言語に制限されずに、サービスへのアクセス性と利便性を高めることができる。 Therefore, the AI automatic response system 300 provides a response by translating into the language spoken by the user using real-time translation, thereby increasing the accessibility and convenience of services without being limited by language. be able to.

上述した装置は、ハードウェア構成要素、ソフトウェア構成要素、および／またはハードウェア構成要素とソフトウェア構成要素との組み合わせによって実現されてよい。例えば、実施形態で説明された装置および構成要素は、プロセッサ、コントローラ、ＡＬＵ（ａｒｉｔｈｍｅｔｉｃｌｏｇｉｃｕｎｉｔ）、デジタル信号プロセッサ、マイクロコンピュータ、ＦＰＧＡ（ｆｉｅｌｄｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙ）、ＰＬＵ（ｐｒｏｇｒａｍｍａｂｌｅｌｏｇｉｃｕｎｉｔ）、マイクロプロセッサ、または命令を実行して応答することができる様々な装置のように、１つ以上の汎用コンピュータまたは特殊目的コンピュータを利用して実現されてよい。処理装置は、オペレーティングシステム（ＯＳ）およびＯＳ上で実行される１つ以上のソフトウェアアプリケーションを実行してよい。また、処理装置は、ソフトウェアの実行に応答し、データにアクセスし、データを記録、操作、処理、および生成してもよい。理解の便宜のために、１つの処理装置が使用されるとして説明される場合もあるが、当業者は、処理装置が複数個の処理要素および／または複数種類の処理要素を含んでもよいことが理解できるであろう。例えば、処理装置は、複数個のプロセッサまたは１つのプロセッサおよび１つのコントローラを含んでよい。また、並列プロセッサのような、他の処理構成も可能である。 The apparatus described above may be realized by hardware components, software components, and/or a combination of hardware and software components. For example, the devices and components described in the embodiments include processors, controllers, arithmetic logic units (ALUs), digital signal processors, microcomputers, field programmable gate arrays (FPGAs), programmable logic units (PLUs), microprocessors, Or may be implemented using one or more general purpose or special purpose computers, such as various devices capable of executing and responding to instructions. The processing unit may run an operating system (OS) and one or more software applications that run on the OS. The processor may also access, record, manipulate, process, and generate data in response to executing software. For convenience of understanding, one processing device may be described as being used, but those skilled in the art will appreciate that the processing device may include multiple processing elements and/or multiple types of processing elements. You can understand. For example, a processing unit may include multiple processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

ソフトウェアは、コンピュータプログラム、コード、命令、またはこれらのうちの１つ以上の組み合わせを含んでもよく、思うままに動作するように処理装置を構成したり、独立的または集合的に処理装置に命令したりしてよい。ソフトウェアおよび／またはデータは、処理装置に基づいて解釈されたり、処理装置に命令またはデータを提供したりするために、いかなる種類の機械、コンポーネント、物理装置、コンピュータ記録媒体または装置に具現化されてよい。ソフトウェアは、ネットワークによって接続されたコンピュータシステム上に分散され、分散された状態で記録されても実行されてもよい。ソフトウェアおよびデータは、１つ以上のコンピュータ読み取り可能な記録媒体に記録されてよい。 Software may include computer programs, code, instructions, or a combination of one or more of these, to configure a processor to operate at its discretion or to independently or collectively instruct a processor. You can Software and/or data may be embodied in any kind of machine, component, physical device, computer storage medium, or device for interpretation by, or for providing instructions or data to, a processing device. good. The software may be stored and executed in a distributed fashion over computer systems linked by a network. Software and data may be recorded on one or more computer-readable recording media.

実施形態に係る方法は、多様なコンピュータ手段によって実行可能なプログラム命令の形態で実現されてコンピュータ読み取り可能な媒体に記録されてよい。ここで、媒体は、コンピュータ実行可能なプログラムを継続して記録するものであっても、実行またはダウンロードのために一時記録するものであってもよい。また、媒体は、単一または複数のハードウェアが結合した形態の多様な記録手段または格納手段であってよく、あるコンピュータシステムに直接接続する媒体に限定されることはなく、ネットワーク上に分散して存在するものであってもよい。媒体の例は、ハードディスク、フロッピー（登録商標）ディスク、および磁気テープのような磁気媒体、ＣＤ－ＲＯＭおよびＤＶＤのような光媒体、フロプティカルディスク（ｆｌｏｐｔｉｃａｌｄｉｓｋ）のような光磁気媒体、およびＲＯＭ、ＲＡＭ、フラッシュメモリなどを含み、プログラム命令が記録されるように構成されたものであってよい。また、媒体の他の例として、アプリケーションを配布するアプリケーションストアやその他の多様なソフトウェアを供給または配布するサイト、サーバなどで管理する記録媒体または格納媒体が挙げられる。 The method according to the embodiments may be embodied in the form of program instructions executable by various computer means and recorded on a computer-readable medium. Here, the medium may record the computer-executable program continuously or temporarily record it for execution or download. In addition, the medium may be various recording means or storage means in the form of a combination of single or multiple hardware, and is not limited to a medium that is directly connected to a computer system, but is distributed over a network. It may exist in Examples of media are magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, and It may include ROM, RAM, flash memory, etc., and may be configured to store program instructions. Other examples of media include recording media or storage media managed by application stores that distribute applications, sites that supply or distribute various software, and servers.

以上のように、実施形態を、限定された実施形態および図面に基づいて説明したが、当業者であれば、上述した記載から多様な修正および変形が可能であろう。例えば、説明された技術が、説明された方法とは異なる順序で実行されたり、かつ／あるいは、説明されたシステム、構造、装置、回路などの構成要素が、説明された方法とは異なる形態で結合されたりまたは組み合わされたり、他の構成要素または均等物によって対置されたり置換されたとしても、適切な結果を達成することができる。 As described above, the embodiments have been described based on the limited embodiments and drawings, but those skilled in the art will be able to make various modifications and variations based on the above description. For example, the techniques described may be performed in a different order than in the manner described and/or components such as systems, structures, devices, circuits, etc. described may be performed in a manner different from the manner described. Appropriate results may be achieved when combined or combined, opposed or substituted by other elements or equivalents.

したがって、異なる実施形態であっても、特許請求の範囲と均等なものであれば、添付される特許請求の範囲に属する。 Accordingly, different embodiments that are equivalent to the claims should still fall within the scope of the appended claims.

３００：ＡＩ自動応答システム
４０１：ゲートウェイ
４１０：音声認識器
４２０：ダイアログマネージャ
４３０：返答生成器
４４０：音声合成器
４５０：感知部
４６０：翻訳機 300: AI Automated Response System 401: Gateway 410: Speech Recognizer 420: Dialog Manager 430: Answer Generator 440: Speech Synthesizer 450: Sensor 460: Translator

Claims

An artificial intelligence-based automatic response method executed by a computer system, comprising:
The computer system includes at least one processor configured to execute computer readable instructions contained in memory;
The AI-based automatic response method includes:
receiving, by the at least one processor, a voice uttered by the user by establishing a call with the user;
pre- generating, by the at least one processor, a reply after analyzing intent using intermediate results of the spoken voice; and sending a pre-generated reply;
AI-based automatic response method.

The generating step includes:
Extracting sample sentences from a conversation log of an automatic response service; and Analyzing the intention from the intermediate results by conversation learning using sentences with the endings removed from the sample sentences as learning data.
The artificial intelligence-based automatic response method according to claim 1.

The generating step includes:
analyzing the intention on a syllable-by-syllable basis obtained as the intermediate result;
The artificial intelligence-based automatic response method according to claim 1.

The generating step includes:
determining when to generate the reply based on the confidence of the intention analysis result using the intermediate result;
The artificial intelligence-based automatic response method according to claim 1.

The generating step includes:
Determining a point in time at which the confidence in the intention analysis result using the intermediate result reaches a threshold as the point in time at which the response is generated; and
pre-generating the reply at the determined generation time before the spoken voice ends;
The threshold is determined by conversational learning using sentences excluding the endings, and after repeating the process of finding a numerical value whose confidence in the intermediate result is within an error range compared to the final result, the repeating process determined as a statistic of the values obtained by
The artificial intelligence-based automatic response method according to claim 1.

The delivering step includes:
characterized in that, if an endpoint is detected from the uttered voice, the reply is sent as a voice signal,
The artificial intelligence-based automatic response method according to claim 1.

An artificial intelligence-based automatic response method executed by a computer system, comprising:
The computer system includes at least one processor configured to execute computer readable instructions contained in memory;
The AI-based automatic response method includes:
receiving, by the at least one processor, the user's uttered voice by connecting the user's call;
pre-generating, by the at least one processor, a reply after analyzing intent using intermediate results of the spoken voice;
sending, by the at least one processor, a pre-generated reply to the spoken speech after the user finishes speaking; and while the reply is being sent, by the at least one processor, the user's spoken speech. is received, suspending sending said reply;
AI-based automatic response method.

The suspending step includes:
interrupting the sending of the reply immediately or after applying a fade-out when the user's uttered voice is received;
The artificial intelligence-based automatic response method according to claim 7.

The suspending step includes:
classifying and learning a non-verbal expression; and maintaining sending the reply if the speech received during the sending of the reply corresponds to the learned non-verbal expression; interrupting sending the reply if it does not correspond to the learned non-verbal expression;
The artificial intelligence-based automatic response method according to claim 7.

The AI-based automatic response method includes:
further comprising providing, by the at least one processor, an interrogative question for guiding the information when the recognition result of the spoken voice lacks information necessary for a reply;
The artificial intelligence-based automatic response method according to claim 1 or 7.

The AI-based automatic response method includes:
further comprising, when the at least one processor recognizes a plurality of intentions from the recognition result of the uttered voice, sequentially providing a reply to each intention according to an intention recognition order;
The artificial intelligence-based automatic response method according to claim 1 or 7.

The AI-based automatic response method includes:
recognizing, by the at least one processor, the user's speech rate; and determining, by the at least one processor, the speech rate of the reply according to the user's speech rate.
The artificial intelligence-based automatic response method according to claim 1 or 7.

The AI-based automatic response method includes:
recognizing, by the at least one processor, the user's spoken language; and switching, by the at least one processor, a language model for an automated response service to a language model corresponding to the user's spoken language. ,
The artificial intelligence-based automatic response method according to claim 1 or 7.

A non-temporary computer-readable recording medium in which a program for causing a computer to execute the artificial intelligence-based automatic response method according to any one of claims 1 to 9 is recorded.

a computer system,
at least one processor configured to execute computer readable instructions contained in memory;
The at least one processor
A process of receiving the user's uttered voice by connecting a call with the user;
pre- generating a reply after analyzing intent using intermediate results of said spoken voice; and
sending the pre-generated response to the spoken voice after the user finishes speaking;
computer system.

a computer system,
at least one processor configured to execute computer readable instructions contained in memory;
The at least one processor
A process of receiving the user's uttered voice by connecting a call with the user;
Pre-generating a reply after analyzing intent using the intermediate result of the spoken voice;
sending out the pre-generated reply to the uttered voice after the user finishes speaking; and interrupting the sending of the reply if the user's uttered voice is received while the reply is being sent. to process
computer system.