JP6454807B1

JP6454807B1 - Voice authentication payment system

Info

Publication number: JP6454807B1
Application number: JP2018081031A
Authority: JP
Inventors: 麻衣子高宮; 理奈大石; 傑井之上; 哲潮村
Original assignee: Nomura Research Institute Ltd
Current assignee: Nomura Research Institute Ltd
Priority date: 2018-04-20
Filing date: 2018-04-20
Publication date: 2019-01-16
Anticipated expiration: 2038-04-20
Also published as: JP2019191716A

Abstract

【課題】スマートスピーカー及びそれを用いたシステムであっても、オンラインショッピングを完遂することはできないため、その機能を提供する。【解決手段】マイクロフォンとスピーカーを少なくとも備えるスマートスピーカー１０からユーザが発話することで得られる音声データを受信し、受信した音声データを用いて声紋認証を行って対象ユーザを認証し、ユーザが購入を希望する商品を特定する情報を前記スマートスピーカー１０から受信し、ユーザからの決済処理依頼を前記スマートスピーカー１０から受信し、対象ユーザの認証が成功している場合に、前記決済処理依頼に沿って決済処理を実行する。【選択図】図１PROBLEM TO BE SOLVED: To provide a function of a smart speaker and a system using the same because online shopping cannot be completed. SOLUTION: Voice data obtained by a user speaking from a smart speaker 10 having at least a microphone and a speaker is received, voiceprint authentication is performed using the received voice data, the target user is authenticated, and the user purchases the voice data. When the information for specifying the desired product is received from the smart speaker 10 and the payment processing request from the user is received from the smart speaker 10 and the target user is successfully authenticated, Execute payment processing. [Selection] Figure 1

Description

本発明は、音声認証を用いて決済を行う音声認証決済システムに関する。 The present invention relates to a voice authentication settlement system that performs settlement using voice authentication.

一般的に、オンラインショッピングで決済を行う場合には、対象のＥＣサイトへログイン（ＩＤ及びパスワード）し、商品を選択した上で、クレジットカードの番号、氏名及びセキュリティコードを入力して決済を行う。ＥＣサイトの会員情報とクレジットカード情報が既に紐づいている場合には、再度のクレジットカード情報は不要で割愛することができる。 Generally, when making a payment through online shopping, log in to the target EC site (ID and password), select a product, and enter the credit card number, name, and security code to make the payment. . If the membership information and the credit card information on the EC site are already linked, the credit card information again is unnecessary and can be omitted.

このような一般的なオンラインショッピングに加え、購入者の個人情報を開示することなく購入処理を実行することができる決済支援装置も、特許文献１により開示されている。 In addition to such general online shopping, Patent Document 1 discloses a settlement support apparatus that can execute a purchase process without disclosing the purchaser's personal information.

特開２０１６−８１４６７号JP-A-2016-81467

前記のオンラインショッピングで、購入者はデスクトップＰＣ、ノートＰＣ、スマートフォンといったデバイスを用いて必要なキータッチを行ってログイン、商品選択、決済等を行っている。一方、昨今、検索エンジンを使った調査、オンラインニュースの読み上げ、音楽や動画の再生といった操作をエンドユーザの音声にて受けつけて実行するスマートスピーカーが販売されている。スマートスピーカーであれば、エンドユーザは手が離せない作業を行っている場合でも、発話をすることで各種操作を行うことができる。しかしながら、現在市販されているスマートスピーカー及びそれを用いたシステムであっても、オンラインショッピングを完遂することはできない。 In the online shopping, the purchaser performs necessary key touches using devices such as a desktop PC, a notebook PC, and a smartphone to perform login, product selection, settlement, and the like. On the other hand, smart speakers are now on the market that accept and perform operations such as surveys using search engines, reading online news, and playing music and videos with the voice of end users. With a smart speaker, the end user can perform various operations by speaking even when the end user is working. However, even online smart speakers and systems using the same cannot complete online shopping.

本発明はこうした課題に鑑みてなされたものであり、その目的は、スマートスピーカーを用いてオンラインショッピングを実行する機能を提供することにある。 The present invention has been made in view of these problems, and an object thereof is to provide a function of performing online shopping using a smart speaker.

本発明に係る音声認証決済システムは、マイクロフォンとスピーカーを少なくとも備えるスマートスピーカーからユーザが発話することで得られる音声データを受信し、受信した音声データを用いて声紋認証を行って対象ユーザを認証し、ユーザが購入を希望する商品を特定する情報を前記スマートスピーカーから受信し、ユーザからの決済処理依頼を前記スマートスピーカーから受信し、対象ユーザの認証が成功している場合に、前記決済処理依頼に沿って決済処理を実行するものである。 The voice authentication settlement system according to the present invention receives voice data obtained by a user speaking from a smart speaker having at least a microphone and a speaker, and performs voiceprint authentication using the received voice data to authenticate a target user. When the user receives information specifying a product that the user desires to purchase from the smart speaker, receives a payment processing request from the user from the smart speaker, and the authentication of the target user is successful, the payment processing request The settlement process is executed along with the above.

本発明によれば、認証した上で、スマートスピーカーを介してユーザからの操作指示を受け、購入する商品を特定して決済処理を行うことができる。 According to the present invention, after authentication, an operation instruction from a user can be received through a smart speaker, and a product to be purchased can be specified and payment processing can be performed.

本発明に係る第１の実施形態に係る音声認証決済システムの構成図である。1 is a configuration diagram of a voice authentication settlement system according to a first embodiment of the present invention. 本発明に係る第１の実施形態に係るシーケンス図である。It is a sequence diagram concerning a 1st embodiment concerning the present invention. 本発明に係るその他の実施形態に係るシーケンス図である。It is a sequence diagram concerning other embodiments concerning the present invention. 本発明に係るその他の実施形態に係るシーケンス図である。It is a sequence diagram concerning other embodiments concerning the present invention. 本発明に係るその他の実施形態に係る音声認証決済システムの構成図である。It is a block diagram of the voice authentication payment system which concerns on other embodiment which concerns on this invention. 本発明に係るその他の実施形態に係る表示装置における表示例である。It is a display example in the display apparatus which concerns on other embodiment which concerns on this invention.

（第１の実施形態）
以下、各図面に示される同一または同等の構成要素、部材、処理には、同一の符号を付するものとし、適宜重複した説明は省略する。また、各図面において説明上重要ではない部材の一部は省略して表示する。 (First embodiment)
Hereinafter, the same or equivalent components, members, and processes shown in the drawings are denoted by the same reference numerals, and repeated description is appropriately omitted. In addition, in the drawings, some of the members that are not important for explanation are omitted.

図１は本実施形態に係る音声認証決済システムの構成図である。この音声認証決済システムは、スマートスピーカー１０、音声認証サーバ３０、ECサーバ４０及び決済サーバ５０からなり、それぞれ有線又は無線にてネットワークに接続している。図１では、一例として、スマートスピーカー１０は無線接続にてアクセスポイント２０を介してネットワークに接続し、その他のシステム構成要素は有線にてネットワークに接続している。アクセスポイント２０は、無線端末を相互に接続し、有線ネットワーク等のネットワークに接続する無線機である。なお、本実施形態では、サーバ側の構成として、音声認証サーバ３０、ECサーバ４０及び決済サーバ５０をそれぞれ分離した構成としたが、これらは一のコンピュータ上で構成されてもよいし、それぞれのサーバを更に複数のコンピュータで構成することもできる。 FIG. 1 is a configuration diagram of a voice authentication settlement system according to the present embodiment. This voice authentication settlement system includes a smart speaker 10, a voice authentication server 30, an EC server 40, and a settlement server 50, and each is connected to a network by wire or wirelessly. In FIG. 1, as an example, the smart speaker 10 is connected to the network via the access point 20 by wireless connection, and the other system components are connected to the network by wire. The access point 20 is a wireless device that connects wireless terminals to each other and connects to a network such as a wired network. In the present embodiment, the voice authentication server 30, the EC server 40, and the payment server 50 are separated from each other as a server-side configuration. However, these may be configured on one computer. The server can also be composed of a plurality of computers.

スマートスピーカー１０は、エンドユーザからの音声入力を受け、音声データに変換して他装置に出力する。図１の構成では、音声データを音声認証サーバ３０又はECサーバ４０に送信する。スマートスピーカー１０は内部に記録する音声データ又は外部から受信した音声データを音声出力する機能も有する。スマートスピーカー１０のハードウェア構成の一例としては、外部の音声を検出して電気信号に変換するマイクロフォン、音声データを音声出力するスピーカー、外部装置の通信を行う通信モジュール、視覚的にスマートスピーカーのステータスを示すLED（発光素子）、各種操作指示を行うための操作ボタン及び各モジュール及び素子を制御するCPU（制御部）からなる。スマートスピーカーは現時点でも既に様々の種類のものが販売されており、複数マイクロフォン及び複数スピーカーを有するものもあり、例えば、上面の外周部に等間隔にマイクロフォンを配設し、側面の外周部に等間隔にスピーカーを配設することで、どの方向からもエンドユーザが音声入力を行い、どの方向からも音声を聞くことができる。また、マイクロフォン又はスピーカーの種類によっては、指向性を持たせたモジュールもあり、ソフトウェア制御によってその指向性を変更することができ、エンドユーザからの音声入力を検出するとエンドユーザが居る方向に指向性を高める制御を行い、また、エンドユーザが居る方向に指向性を高めて音声出力する制御を行うこともできる。 The smart speaker 10 receives voice input from an end user, converts it into voice data, and outputs the voice data to another device. In the configuration of FIG. 1, the voice data is transmitted to the voice authentication server 30 or the EC server 40. The smart speaker 10 also has a function of outputting voice data recorded inside or voice data received from outside. Examples of the hardware configuration of the smart speaker 10 include a microphone that detects external sound and converts it into an electrical signal, a speaker that outputs sound data as sound, a communication module that performs communication with an external device, and a status of the smart speaker visually. LED (light emitting element), an operation button for performing various operation instructions, and a CPU (control unit) for controlling each module and element. Various types of smart speakers are already on the market at present, and some have multiple microphones and multiple speakers. For example, microphones are arranged on the outer periphery of the upper surface at equal intervals, and the outer periphery of the side surface, etc. By arranging the speakers at intervals, the end user can input voice from any direction and listen to the voice from any direction. Depending on the type of microphone or speaker, some modules have directivity, and the directivity can be changed by software control. When voice input from an end user is detected, the directivity is in the direction the end user is. In addition, it is possible to perform control to increase the directivity in the direction in which the end user is present and to output sound.

前記音声認証サーバ３０は、受信した音声データを用いて認証を行う装置であり、受信した音声データと予め記憶している音声データとの比較を行って声紋認証するものである。図１では、スマートスピーカー１０が送信した音声データをアクセスポイント２０及びネットワークを介して音声認証サーバ３０が受信し、声紋認証を行う。ここで使用する声紋認証技術は公知慣用技術を用いることができる。例えば、音声データをスペクトラムに変換し、周波数上の分布状況から個人認証を行う技術があり、または、エンドユーザに特定のキーワードを発話させて取得した音声データと予め保持している過去に同じキーワードを発話させて取得した音声データを比較する技術もある。複数のエンドユーザがスマートスピーカーを使用していたとしても、個人認証することでどのエンドユーザが認証されたかを特定することができ、つまり、現在の利用者を特定することができる。 The voice authentication server 30 is an apparatus that performs authentication using received voice data, and performs voiceprint authentication by comparing the received voice data with previously stored voice data. In FIG. 1, the voice authentication server 30 receives voice data transmitted from the smart speaker 10 via the access point 20 and the network, and performs voiceprint authentication. As the voiceprint authentication technique used here, a known and common technique can be used. For example, there is a technology that converts voice data into a spectrum and performs personal authentication from the distribution status on the frequency, or the same keyword in the past as voice data acquired by letting the end user speak a specific keyword There is also a technique for comparing voice data acquired by speaking. Even if a plurality of end users use the smart speaker, it is possible to specify which end user is authenticated by performing personal authentication, that is, it is possible to specify the current user.

前記ECサーバ４０は、EC(electronic commerce)についてエンドユーザと対話を行い、購入商品を選別し、注文処理を行う。注文処理のうち、決済については決済サーバ５０にて行う。音声認識を行ってエンドユーザと特定の分野に関して対話を行う公知慣用技術は既に存在し、それらの技術を用いてECサーバ４０を実装する。意図解釈型ではエンドユーザの発話を理解し、次のアクションを実行するタスクを判定し、シナリオ対話型では予めシナリオを設定し、そのシナリオに沿って会話を行う。例えば、「お茶が欲しい」とエンドユーザが発話し、ECサーバ４０がその意図を解釈し、ECサーバ４０内で購入可能なお茶の中から最も売上高の高い商品を特定してその商品を「○○お茶５００ｍｌはいかがでしょうか。対象商品１個で送料込みの２００円となります。」と提案し、エンドユーザが「購入する」と言えば、注文確認をした上で決済処理を決済サーバ５０に依頼する動作を行う。ECサーバ４０による音声出力はECサーバ４０が対象の音声データをスマートスピーカー１０に送信して音声出力する。本実施形態では、ECサーバ４０が音声認識機能及び対話機能を有する構成としているが、音声認証サーバ３０、決済サーバ５０又は別装置がそれらの機能を有していても良い。注文確認では、商品名、数量及び購入金額を音声出力し、エンドユーザが了承の旨の発話をした場合には、次の決済処理依頼に移行する。なお、購入者の送付先及び決済情報（クレジット情報）は予めECサーバ４０で設定されているものとする。 The EC server 40 interacts with end users about EC (electronic commerce), selects purchased products, and performs order processing. Of the order processing, settlement is performed by the settlement server 50. There are already known conventional techniques for performing speech recognition and interacting with an end user in a specific field, and the EC server 40 is implemented using these techniques. In the intention interpretation type, the user understands the utterance of the end user, determines a task to execute the next action, and in the scenario interactive type, a scenario is set in advance and a conversation is performed in accordance with the scenario. For example, the end user speaks “I want tea”, the EC server 40 interprets the intention, identifies the product with the highest sales from the teas that can be purchased in the EC server 40, and selects the product “ ○○ How about 500ml of tea? It will be 200 yen including shipping fee for one target product. "If the end user says" Purchase ", the payment processing will be done after confirming the order and confirming the payment processing. Perform the operation requested. The sound output by the EC server 40 is performed by the EC server 40 transmitting the target sound data to the smart speaker 10 and outputting the sound. In the present embodiment, the EC server 40 has a voice recognition function and a dialogue function, but the voice authentication server 30, the settlement server 50, or another device may have these functions. In order confirmation, the product name, quantity, and purchase price are output by voice, and if the end user utters approval, the process proceeds to the next settlement processing request. It is assumed that the buyer's destination and payment information (credit information) are set in advance by the EC server 40.

前記決済サーバ５０は決済処理依頼を受け、決済認証を経て決済処理を実行する。決済認証は決済サーバ５０が音声認証サーバ３０に依頼し、音声認証サーバ３０で既に対象ユーザが認証済みである場合には認証成功とし、認証が未済の場合には音声認証を行う。決済処理の実行は、クレジットカード等のオンライン決済の場合には、クレジットカード会社等の金融システムと通信して実施する。 The payment server 50 receives the payment processing request and executes the payment processing through payment authentication. The payment server 50 requests the voice authentication server 30 for the payment authentication. When the target user has already been authenticated by the voice authentication server 30, the authentication is successful, and when the authentication has not been completed, the voice authentication is performed. The payment process is executed by communicating with a financial system such as a credit card company in the case of online payment such as a credit card.

前記ECサーバ４０は、決済サーバ５０からの決済完了通知を受け、受注発送処理を行い、注文完了をスマートスピーカー１０を介してエンドユーザに通知する。 The EC server 40 receives the payment completion notification from the payment server 50, performs order shipment processing, and notifies the end user via the smart speaker 10 of the order completion.

次に、本実施形態に係るシステムの動作について図２を用いて説明する。スマートスピーカー１０はエンドユーザからのトリガとなる音声発話を受け、待機状態から起動状態になる（ステップ５）。エンドユーザはトリガ発話に続けて何らかの発話を行い、スマートスピーカー１０はそれらの発話を受け音声データに変換し、音声認証サーバ３０に送信する（ステップ１０）。音声認証サーバ３０はスマートスピーカー１０からの音声データを受信し、声紋技術を用いた音声認証を行う（ステップ１５）。音声認証が成功した場合には音声認証サーバ３０は認証成功をECサーバ４０に通知し、ECサーバ４０は通知を受けてオンラインショッピングのヒアリング状態に移行する（ステップ２０）。ここで、音声認証サーバ３０は音声認証に使用した音声データもECサーバ４０に送信する。ECサーバ４０は受信した音声データ及びヒアリング状態以降にスマートスピーカー１０から受信した音声データを用いてヒアリングを行う（ステップ２０）。ヒアリングを経ることである商品の注文をエンドユーザからECサーバ４０が受ける。ECサーバ４０が注文確認の音声データをスマートスピーカー１０に送信し（ステップ３０）、スマートスピーカー１０が受信して音声出力する（ステップ３５）。エンドユーザから注文確認了承の旨の音声データをスマートスピーカー１０を介して受信した場合には、ECサーバ４０は注文内容に含まれる決済方法を決済サーバ５０に送信して決済処理依頼を行う（ステップ４０）。決済サーバ５０は決済処理依頼を受け（ステップ４５）、対象ユーザを特定して決済認証依頼を音声認証サーバ３０に行う（ステップ５０）。音声認証サーバ３０は決済サーバ５０からの決済認証依頼を受け、対象ユーザの決済認証を行い、ステップ１５で既に音声認証済みであれば認証成功とし、音声認証が実行されていない場合等ではステップ１５の音声認証を行う。ここで、音声認証に必要な音声データは、ステップ２０で取得したエンドユーザの音声データを用いてもよいし、新たにエンドユーザに発話して貰って取得しても良い。決済認証が終われば認証結果が音声認証サーバ３０から決済サーバ５０に通知され、認証成功であれば決済サーバ５０が決済処理を対象ユーザの決済情報を用いて外部の金融機関のシステムと共に実行する（ステップ６０）。決済処理が完了した場合にはその通知を決済サーバ５０からECサーバ４０に行う（ステップ６５）。ECサーバ４０は決済完了通知を受け、受注発送処理を実行する（ステップ７０）。ECサーバ４０は受注発送処理が完了した後にエンドユーザに対して注文完了通知をスマートスピーカー１０を介して行う（ステップ７５）。 Next, the operation of the system according to the present embodiment will be described with reference to FIG. The smart speaker 10 receives a voice utterance as a trigger from the end user, and enters a start state from a standby state (step 5). The end user makes some utterance following the trigger utterance, and the smart speaker 10 receives the utterance, converts it into voice data, and transmits it to the voice authentication server 30 (step 10). The voice authentication server 30 receives the voice data from the smart speaker 10 and performs voice authentication using voiceprint technology (step 15). If the voice authentication is successful, the voice authentication server 30 notifies the EC server 40 of the authentication success, and the EC server 40 receives the notification and shifts to an online shopping hearing state (step 20). Here, the voice authentication server 30 also transmits the voice data used for voice authentication to the EC server 40. The EC server 40 performs hearing using the received voice data and voice data received from the smart speaker 10 after the hearing state (step 20). The EC server 40 receives an order for a product to be heard through from the end user. The EC server 40 transmits voice data for order confirmation to the smart speaker 10 (step 30), and the smart speaker 10 receives and outputs the voice (step 35). When the voice data to the effect of confirming the order confirmation is received from the end user via the smart speaker 10, the EC server 40 transmits the settlement method included in the order contents to the settlement server 50 and makes a settlement process request (step) 40). The settlement server 50 receives the settlement processing request (step 45), specifies the target user, and makes a settlement authentication request to the voice authentication server 30 (step 50). The voice authentication server 30 receives the payment authentication request from the payment server 50 and performs the payment authentication of the target user. If the voice authentication has already been completed in step 15, the authentication is successful. If the voice authentication is not executed, step 15 is executed. Perform voice authentication. Here, as the voice data necessary for voice authentication, the voice data of the end user acquired in step 20 may be used, or may be acquired by newly speaking to the end user. When the payment authentication is completed, the authentication result is notified from the voice authentication server 30 to the payment server 50. If the authentication is successful, the payment server 50 executes the payment process together with the system of the external financial institution using the payment information of the target user ( Step 60). When the payment processing is completed, the notification is sent from the payment server 50 to the EC server 40 (step 65). The EC server 40 receives the payment completion notification and executes order shipment processing (step 70). The EC server 40 sends an order completion notice to the end user via the smart speaker 10 after the order dispatch process is completed (step 75).

（その他の実施形態）
前記第１の実施形態においては、ステップ１５で音声認証を行った後は基本的に音声認証を実施しなかったが、図３に示す通り、ステップ１５以降もステップ５５の決済認証が完了するまでには定期的又は発話毎にスマートスピーカー１０から音声認証サーバ３０が音声データを受信して適時に音声認証を行ってもよく、ステップ１５以降に認証対象のエンドユーザが居なくなった場合に居なくなった後の音声認証が失敗するために適切な認証状態を維持することができ、例えば、決済認証に失敗するために認証対象のエンドユーザ以外の者が注文依頼をすることができなくなる。 (Other embodiments)
In the first embodiment, the voice authentication is basically not performed after the voice authentication is performed in step 15. However, as shown in FIG. The voice authentication server 30 may receive voice data from the smart speaker 10 periodically or for each utterance, and may perform voice authentication in a timely manner. After the voice authentication fails, an appropriate authentication state can be maintained. For example, since the payment authentication fails, a person other than the end user to be authenticated cannot make an order request.

前記第１の実施形態においては、図４に示す通り、ステップ１１で受信した音声データの音声分離を行い、音声分離を行った上で時系列的に一番早く発話したエンドユーザの分離済み音声データを用いてステップ１５の音声認証を行い、音声認証が成功した場合にECサーバ４０をヒアリング待機状態に移行し（ステップ２０）、その後、音声認証が完了したエンドユーザとのヒアリングを行い、ステップ７５の注文完了まで終わると、最初のエンドユーザの次に発話したエンドユーザの分離した音声データを用いて音声認証を行って同様に注文処理まで行い、以降、同様にその次のエンドユーザの処理を実行する構成とすることもできる。これにより、複数のエンドユーザがスマートスピーカー１０の周りにいた場合でも、別々に注文を聞くことができる。ここで、音声認証してヒアリング中のエンドユーザが他のユーザからの発注もまとめて受けることを希望する旨の発話をした場合には、ECサーバ４０がその意図を理解し、各ユーザからの注文をまとめて受けつけ、音声認証済みのヒアリング中のエンドユーザの決済情報を用いて決済する構成であっても良い。あるエンドユーザが他のエンドユーザに奢ることもでき、又は、一旦あるエンドユーザがまとめて支払って他のエンドユーザから別途回収することもできる。 In the first embodiment, as shown in FIG. 4, the voice data received in step 11 is voice-separated, and after voice separation, the end-user separated voice uttered earliest in time series The voice authentication of step 15 is performed using the data, and when the voice authentication is successful, the EC server 40 is shifted to the hearing standby state (step 20), and thereafter, the hearing with the end user who has completed the voice authentication is performed, When 75 orders are completed, voice authentication is performed using the voice data separated from the end user uttered next to the first end user, and the order processing is performed in the same manner. It can also be set as the structure which performs. Thereby, even when a plurality of end users are around the smart speaker 10, the orders can be heard separately. Here, when the end user, who is listening through voice authentication and wishes to receive orders from other users at the same time, the EC server 40 understands the intention, A configuration may be adopted in which orders are received together and settlement is performed using the settlement information of the end user during the hearing-verified hearing. One end user can meet another end user, or once an end user can pay together and collect it separately from another end user.

前記第１の実施形態のシステム構成に加え、図５の示す通り、表示装置１１を加える構成であってもよく、その表示装置１１もアクセスポイント２０を介してネットワークと接続し、ECサーバ４０又はスマートスピーカー１０からの指示を受け、その指示で示されたURLを表示する構成でも良い。これにより、スマートスピーカー１０が商品提案を音声で行うと共に、表示装置１１に図６のように商品情報を表示することもできる。表示装置１１がタッチパネル型ディスプレイでエンドユーザがタッチ操作で表示制御を行ってもよいが、エンドユーザはスマートスピーカー１０を用いて制御することもできる。ここで、表示情報中のオブジェクト中でユーザが制御対象可能なもので、表示ラベルが付与されていないもの、例えば、右上の詳細ボタンを声のみで操作するのは難しい場合もあるので、オブジェクトに対して＜１＞ないし＜６＞などの制御用のラベルを付与することで音声による制御が可能となる。ECサーバに連動するWebサーバは、通常のWebページを保持してスマートスピーカー１０を介する表示制御の場合にのみ制御用ラベルを付与する制御を行う。表示装置１１にウェブブラウザ機能が搭載されており、スマートスピーカーから制御で指定されたURLにアクセスしてこれらの表示が可能になるが、Webサーバがスマートスピーカーからのアクセスであることを識別するためにURLのパラメータ領域にスマートスピーカーに関連するアクセスであることを示す変数を付与してもよい。表示装置１１はディスプレイを含むコンピュータからなる構成であり、そのコンピュータ上にオペレーティングシステムが導入されており、ブラウザの機能も有している。そのため、表示装置１１は指定されたURLのページをユーザに対して表示することができる。スマートスピーカー１０による表示装置１１の制御を可能とするために、表示装置１１上にスマートスピーカー制御用モジュールを導入する必要がある。このスマートスピーカー制御用モジュールは表示装置１１のオペレーティングシステムに導入されるソフトウェアであってもよいし、ブラウザ機能のアドインとして導入されるソフトウェアであってもよい。このスマートスピーカー制御用モジュールがオペレーティングシステムに照会し、ブラウザ機能で表示されているオブジェクトのうちユーザが制御可能なオブジェクトを特定し、特定したオブジェクトに対して制御用ラベルを付与する。このオペレーティングシステムの照会時に各オブジェクトに対してどのような制御が可能であるかも情報取得する。例えば、リンクオブジェクト、ボタンオブジェクトに対してクリック操作が可能であり、そのクリック操作によってリンクオブジェクトであれば対象のURLへジャンプし、ボタンオブジェクトはそのボタンオブジェクトのクリックイベントに対して紐づけられている動作が実行される。従って、図６の画面が表示されている際に、ユーザがスマートスピーカー１０に対して「＜１＞をクリック」と音声制御することで、ECサーバ40を介して表示装置１１のスマートスピーカー制御用モジュールに対してその制御が到達して＜１＞のクリックが実行され、商品ID００００１の詳細画面が表示されることになる。 In addition to the system configuration of the first embodiment, as shown in FIG. 5, the display device 11 may be added. The display device 11 is also connected to the network via the access point 20, and the EC server 40 or It may be configured to receive an instruction from the smart speaker 10 and display a URL indicated by the instruction. Thereby, the smart speaker 10 can make a product proposal by voice, and can also display product information on the display device 11 as shown in FIG. Although the display device 11 may be a touch panel display and the end user may perform display control by a touch operation, the end user can also control using the smart speaker 10. Here, the objects that can be controlled by the user among the objects in the display information that are not given a display label, for example, it may be difficult to operate the detail button in the upper right with only the voice. On the other hand, a control label such as <1> to <6> is assigned to enable control by voice. The web server linked to the EC server performs control to hold a normal web page and give a control label only in the case of display control via the smart speaker 10. The display device 11 has a web browser function, and the URL specified by the control from the smart speaker can be accessed to display them. In order to identify that the Web server is an access from the smart speaker. A variable indicating that access is related to the smart speaker may be added to the parameter area of the URL. The display device 11 is composed of a computer including a display. An operating system is installed on the computer 11 and also has a browser function. Therefore, the display device 11 can display the page of the designated URL to the user. In order to enable the smart speaker 10 to control the display device 11, it is necessary to introduce a smart speaker control module on the display device 11. The smart speaker control module may be software installed in the operating system of the display device 11 or software installed as an add-in for the browser function. The smart speaker control module inquires of the operating system, identifies an object that can be controlled by the user among objects displayed by the browser function, and assigns a control label to the identified object. Information on what kind of control is possible for each object at the time of inquiry of the operating system is also acquired. For example, a click operation can be performed on a link object or a button object. If the link object is a link object, the URL jumps to the target URL, and the button object is linked to the click event of the button object. The action is executed. Therefore, when the screen of FIG. 6 is displayed, the user performs voice control of “click <1>” on the smart speaker 10, thereby controlling the smart speaker of the display device 11 via the EC server 40. The control reaches the module, and <1> is clicked, and the detailed screen of the product ID 00001 is displayed.

本発明は、エンドユーザが発話した音声内容にて認証して決済する音声認証決済システムに好適に利用可能である。 INDUSTRIAL APPLICABILITY The present invention can be suitably used for a voice authentication settlement system that authenticates and makes a settlement using voice content spoken by an end user.

スマートスピーカー１０
表示装置１１
アクセスポイント２０
音声認証サーバ３０
ECサーバ４０
決済サーバ５０

Smart speaker 10
Display device 11
Access point 20
Voice authentication server 30
EC server 40
Payment server 50

Claims

A voice data receiving step of receiving voice data obtained by a user speaking from a smart speaker having at least a microphone and a speaker;
An authentication step of voice-separating the received voice data and authenticating the target user by performing voiceprint authentication using the voice data of the user who spoke first in time series among the voice-separated voice data ;
A product information receiving step for receiving from the smart speaker information identifying a product that the user wishes to purchase;
A payment processing request reception step for receiving a payment processing request from a user from the smart speaker;
A voice authentication settlement method in which one or a plurality of computers execute a settlement process step of executing a settlement process in accordance with the settlement process request when authentication of the target user is successful ,
The authentication step performs voice authentication using the voice-separated voice data of the user who spoke next to the user who spoke first after the settlement processing request from the user who spoke first, and then spoke A voice authentication settlement method for executing a product information reception step, a settlement processing request reception step, and a settlement processing step for a user .

When authentication is successful in the step of authenticating the target user, additional audio data after receiving the audio data used for the first authentication is received from the smart speaker, and additionally authenticated using the additional audio data The voice authentication settlement method according to claim 1, further comprising a step.

The voice authentication settlement method according to claim 2, wherein the additional authentication step is executed periodically.

The method further includes the step of receiving the voice data indicating that the target user authenticated in the authentication step desires to receive orders from other users collectively, understanding the intention, and receiving the orders from each user collectively. ,
The voice authentication settlement method according to claim 1, wherein the order received collectively for the target users authenticated in the settlement processing step is settled.

To the display device for displaying the display information received from the external device receives the display control from the smart speaker receives a request from the user, or linked object user is an object operable instructions of the display information to be the display The voice authentication settlement method according to claim 1, further comprising a step of controlling the button object so as to display the display information by providing a control label visible to the user.

Audio data receiving means for receiving audio data obtained by a user speaking from a smart speaker having at least a microphone and a speaker;
Authentication means for voice-separating the received voice data, authenticating the target user by performing voiceprint authentication using the voice data of the user who spoke first in time series among the voice-separated voice data ;
Product information receiving means for receiving information identifying the product that the user wishes to purchase from the smart speaker;
Payment processing request receiving means for receiving a payment processing request from a user from the smart speaker;
A payment processing means for executing payment processing in accordance with the payment processing request when the target user has been successfully authenticated , and the authentication means has processed the payment processing request from the user who spoke first Voice authentication is performed using the voice-separated voice data of the user who has spoken next after the user who has spoken first, and the product information receiving means, payment processing request receiving means, and payment processing means for the user who spoke next voice authentication settlement system to run the.