JP2020010144A

JP2020010144A - Smart speaker, secure element, program, information processing method, and distribution method

Info

Publication number: JP2020010144A
Application number: JP2018128565A
Authority: JP
Inventors: 正徳浅野; Masanori Asano
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2018-07-05
Filing date: 2018-07-05
Publication date: 2020-01-16
Anticipated expiration: 2038-07-05
Also published as: JP7119660B2

Abstract

To provide a smart speaker and the like that can appropriately operate while reducing a risk of security and privacy.SOLUTION: A smart speaker 1 includes: a speaker body including a microphone for receiving an input of voice, a speaker for outputting voice, and a communication unit for communicating with the outside; and a secure unit to which access from the speaker body is restricted. The secure unit includes: a storage unit for storing biological information on a user; an acquisition unit for acquiring the biological information on a user who input voice from the speaker body when the microphone receives the input of the voice; and an authentication unit for performing authentication based on the biological information.SELECTED DRAWING: Figure 1

Description

本発明は、スマートスピーカ、セキュアエレメント、プログラム、情報処理方法及び配信方法に関する。 The present invention relates to a smart speaker, a secure element, a program, an information processing method, and a distribution method.

通信機能と音声操作のアシスタント機能とを備えた、スマートスピーカと呼ばれる装置が普及しつつある。しかし、現在は利便性主導で普及が進んでおり、セキュリティ上の懸念が指摘されている。例えばスマートスピーカが正当なユーザではない第三者の発話を認識してＥＣサイトへの注文を確定し、商品が発送されてしまうという事例が報告されている。 A device called a smart speaker having a communication function and an assistant function for voice operation is becoming popular. However, it is currently being used and led by convenience, and security concerns have been pointed out. For example, a case has been reported in which a smart speaker recognizes an utterance of a third party who is not a legitimate user, determines an order to an EC site, and ships the product.

一方で、特許文献１では、ユーザが入力する音声に従ってアプリケーションを起動するスマートフォン等の電子機器であって、ユーザによるタップ等の所定動作を検知した場合にユーザの顔、虹彩等の生体情報を検出してユーザ認証を行い、認証に成功した場合に音声の入力を受け付けてアプリケーションを起動する電子機器等が開示されている。 On the other hand, Patent Literature 1 discloses an electronic device such as a smartphone that starts an application according to a voice input by a user, and detects biological information such as a user's face and iris when a predetermined operation such as a tap by the user is detected. An electronic device or the like is disclosed that performs user authentication, and when authentication is successful, accepts voice input and starts an application.

特開２０１８−７４３６６号公報JP 2018-74366 A

特許文献１に記載の技術をスマートスピーカに応用し、スマートスピーカにユーザの動作、生体情報等を検知させてユーザ認証を行わせることも考えられる。しかし、生体情報のような重要な情報をスマートスピーカ又はクラウド上に配置した場合、クラッキング等によるセキュリティ上のリスク、あるいはプライバシー漏洩のリスクがある。 It is also conceivable to apply the technology described in Patent Literature 1 to a smart speaker and cause the smart speaker to perform user authentication by detecting a user's operation, biological information, and the like. However, when important information such as biological information is arranged on a smart speaker or cloud, there is a risk of security due to cracking or the like, or a risk of privacy leakage.

一つの側面では、セキュリティ上及びプライバシー上のリスクを低減しつつ適切に動作することができるスマートスピーカ等を提供することを目的とする。 An object of one aspect is to provide a smart speaker or the like that can operate properly while reducing security and privacy risks.

一つの側面では、スマートスピーカは、音声の入力を受け付けるマイク、音声を出力するスピーカ、及び外部との通信を行う通信部を有するスピーカ本体と、該スピーカ本体からのアクセスが制限されたセキュア部とを備え、前記セキュア部は、ユーザの生体情報を記憶する記憶部と、前記マイクにおいて音声の入力を受け付けた場合に、該音声を入力したユーザの前記生体情報を前記スピーカ本体から取得する取得部と、前記生体情報に基づく認証を行う認証部とを備えることを特徴とする。 In one aspect, a smart speaker includes a microphone that accepts audio input, a speaker that outputs audio, and a speaker body that has a communication unit that communicates with the outside, and a secure unit that has limited access from the speaker body. Wherein the secure unit includes a storage unit that stores biometric information of a user, and an acquisition unit that, when an input of a voice is received by the microphone, obtains the biometric information of the user who has input the voice from the speaker body. And an authentication unit for performing authentication based on the biometric information.

一つの側面では、セキュリティ上及びプライバシー上のリスクを低減しつつ適切に動作することができる。 In one aspect, it can operate properly while reducing security and privacy risks.

スピーカシステムの構成例を示す模式図である。It is a schematic diagram which shows the example of a structure of a speaker system. スマートスピーカの構成例を示すブロック図である。It is a block diagram which shows the example of a structure of a smart speaker. 実施の形態１の概要を説明するための説明図である。FIG. 3 is an explanatory diagram for describing an outline of the first embodiment. スピーカシステムが実行する処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the processing procedure which a speaker system performs. 実施の形態２の概要を説明するための説明図である。FIG. 14 is an explanatory diagram for describing an outline of a second embodiment. 実施の形態２に係るスピーカシステムが実行する処理手順の一例を示すフローチャートである。13 is a flowchart illustrating an example of a processing procedure executed by the speaker system according to Embodiment 2. 実施の形態３の概要を説明するための説明図である。FIG. 14 is an explanatory diagram for describing an outline of a third embodiment. 実施の形態３に係るスピーカシステムが実行する処理手順の一例を示すフローチャートである。15 is a flowchart illustrating an example of a processing procedure executed by the speaker system according to Embodiment 3. 実施の形態４の概要を説明するための説明図である。FIG. 15 is an explanatory diagram for describing an outline of a fourth embodiment. 実施の形態４に係るスピーカシステムが実行する処理手順の一例を示すフローチャートである。15 is a flowchart illustrating an example of a processing procedure executed by the speaker system according to Embodiment 4. 実施の形態５の概要を説明するための説明図である。FIG. 15 is an explanatory diagram for describing an outline of a fifth embodiment. 実施の形態５に係るスピーカシステムが実行する処理手順の一例を示すフローチャートである。15 is a flowchart illustrating an example of a processing procedure executed by the speaker system according to Embodiment 5. 実施の形態６の概要を説明するための説明図である。FIG. 19 is an explanatory diagram for describing an outline of a sixth embodiment. 実施の形態６に係るスピーカシステムが実行する処理手順の一例を示すフローチャートである。20 is a flowchart illustrating an example of a processing procedure executed by the speaker system according to Embodiment 6. 実施の形態７に係るスマートスピーカの構成例を示すブロック図である。FIG. 21 is a block diagram showing a configuration example of a smart speaker according to a seventh embodiment. 上述した形態のスマートスピーカの動作を示す機能ブロック図である。It is a functional block diagram showing operation of a smart speaker of the above-mentioned form.

以下、本発明をその実施の形態を示す図面に基づいて詳述する。
（実施の形態１）
図１は、スピーカシステムの構成例を示す模式図である。スマートスピーカ１を用いて所定のサービスをユーザに提供するスピーカシステムであって、スマートスピーカ１において音声入力時にユーザの認証を行うスピーカシステムについて説明する。スピーカシステムは、スマートスピーカ１、管理サーバ３、サービスサーバ４（４ａ、４ｂ…）を含む。各装置は、インターネット等のネットワークＮを介して相互に通信接続されている。 Hereinafter, the present invention will be described in detail with reference to the drawings showing the embodiments.
(Embodiment 1)
FIG. 1 is a schematic diagram illustrating a configuration example of a speaker system. A speaker system that provides a predetermined service to a user by using the smart speaker 1 and that authenticates a user at the time of voice input in the smart speaker 1 will be described. The speaker system includes a smart speaker 1, a management server 3, and a service server 4 (4a, 4b ...). The devices are connected to each other via a network N such as the Internet.

スマートスピーカ１は、音声の入出力機能のほかに外部との通信機能を備えたスピーカ装置であり、例えばＧｏｏｇｌｅ（登録商標）社のＧｏｏｇｌｅＨｏｍｅ、Ａｍａｚｏｎ（登録商標）社のＡｍａｚｏｎＥｃｈｏ（登録商標）のようにユーザとの対話を行うスピーカ装置である。スマートスピーカ１はユーザから音声の入力を受け付けて外部に送信し、所定の外部装置において音声認識、自然言語処理等が行われ、入力音声に対して返信された出力音声を再生する。 The smart speaker 1 is a speaker device having a communication function with an external device in addition to a sound input / output function. Is a speaker device that interacts with the user as shown in FIG. The smart speaker 1 receives a voice input from the user and transmits the voice to the outside, and performs voice recognition, natural language processing, and the like in a predetermined external device, and reproduces an output voice returned in response to the input voice.

管理サーバ３は、本システムを管理する管理者の装置であり、後述する機密情報の暗号化をスマートスピーカ１が実行するための鍵情報を管理する管理装置である。鍵情報は、例えば公開鍵暗号方式の公開鍵、共通鍵暗号方式の共通鍵などである。詳しくは後述するように、スマートスピーカ１は管理サーバ３から配信された鍵情報を記憶し、当該鍵情報を用いて、外部に送信するデータを暗号化して出力する。 The management server 3 is a device of an administrator that manages the present system, and is a management device that manages key information for the smart speaker 1 to execute encryption of confidential information described below. The key information is, for example, a public key of a public key cryptosystem, a common key of a common key cryptosystem, or the like. As will be described in detail later, the smart speaker 1 stores the key information distributed from the management server 3, and encrypts and transmits data to be transmitted to the outside using the key information.

サービスサーバ４は、スマートスピーカ１における入力音声に基づき所定のサービスをユーザに提供するサービス事業者のサーバ装置である。以下の説明では複数のサービス事業者が存在するものとし、各サービス事業者のサービスサーバ４を符号４ａ、４ｂ、…で表す。サービス事業者が提供するサービスはＥＣサービス等が想定されるが、サービス内容は特に限定されない。サービスサーバ４はスマートスピーカ１から入力音声のデータを取得し、ユーザに提供するサービス内容に応じた情報処理（ＥＣサービスであれば商品の発注等に関する処理）を実行する。サービスサーバ４は、ユーザに提供するサービスに関する出力音声用のデータをスマートスピーカ１に返信し、音声を出力させる。 The service server 4 is a server device of a service provider that provides a predetermined service to a user based on the voice input from the smart speaker 1. In the following description, it is assumed that there are a plurality of service providers, and the service servers 4 of the respective service providers are represented by reference numerals 4a, 4b,. The service provided by the service provider is assumed to be an EC service or the like, but the service content is not particularly limited. The service server 4 acquires input voice data from the smart speaker 1 and executes information processing (in the case of an EC service, processing for ordering a product, etc.) according to the service content provided to the user. The service server 4 returns to the smart speaker 1 data for output audio relating to the service provided to the user, and outputs the audio.

なお、ＧｏｏｇｌｅＨｏｍｅ等の既存のプラットフォームを利用する場合、実際にはスマートスピーカ１とサービスサーバ４との間に音声認識、自然言語処理等を行う外部サーバが位置してスマートスピーカ１の入出力音声を処理するが、本実施の形態では説明の簡潔のため外部サーバに関する記載を省略する。または、サービスサーバ４が直接的に音声認識、自然言語処理等を行い、入出力音声の解析、生成を行ってもよい。 When an existing platform such as Google Home is used, an external server that performs voice recognition, natural language processing, and the like is actually located between the smart speaker 1 and the service server 4, and the input / output voice of the smart speaker 1 is used. However, in the present embodiment, description of the external server is omitted for simplicity of description. Alternatively, the service server 4 may directly perform speech recognition, natural language processing, and the like, and analyze and generate input / output speech.

本実施の形態においてスマートスピーカ１は、ユーザに関連する機密情報であって、サービス事業者が提供するサービスを利用するために必要な機密情報を記憶しており、入力音声と共に機密情報をサービスサーバ４に送信してサービスを利用する。機密情報は、例えばクレジットカード番号、個人番号（マイナンバー）などであるが、機密にすべき情報であればよく、その内容は特に限定されない。例えばユーザがスマートスピーカ１を介してＥＣサービスを利用する場合、スマートスピーカ１は、商品購入に必要な購買情報（例えばクレジットカード番号）を機密情報としてサービスサーバ４に送信する。 In the present embodiment, the smart speaker 1 stores confidential information related to the user, which is necessary for using the service provided by the service provider, and transmits the confidential information together with the input voice to the service server. 4 to use the service. The confidential information is, for example, a credit card number, a personal number (my number), etc., but may be any information that should be kept confidential, and the content is not particularly limited. For example, when the user uses the EC service via the smart speaker 1, the smart speaker 1 transmits purchase information (for example, a credit card number) necessary for product purchase to the service server 4 as confidential information.

本実施の形態でスマートスピーカ１は、悪意ある第三者による不正利用、特に機密情報の悪用、搾取等を防ぐため、ユーザによる音声入力時に生体情報の認証を行う。具体的には、スマートスピーカ１は入力音声からユーザの声紋に係る特徴量、すなわち声紋情報を抽出し、予め記憶してある声紋情報との照合を行ってユーザを認証する。 In the present embodiment, the smart speaker 1 authenticates biometric information at the time of voice input by a user in order to prevent unauthorized use by a malicious third party, in particular, misuse or exploitation of confidential information. More specifically, the smart speaker 1 extracts a feature amount related to the user's voiceprint, that is, voiceprint information from the input voice, and authenticates the user by comparing it with voiceprint information stored in advance.

図２は、スマートスピーカ１の構成例を示すブロック図である。スマートスピーカ１は、制御部１１、主記憶部１２、通信部１３、マイク１４、スピーカ１５、補助記憶部１９、セキュアエレメント２０を備える。
制御部１１は、一又は複数のＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro-Processing Unit）等の演算処理装置を有し、補助記憶部１９に記憶されたプログラムＰを読み出して実行することにより、スマートスピーカ１に係る種々の情報処理、制御処理等を行う。主記憶部１２は、ＲＡＭ（Random Access Memory）等の一時記憶領域であり、制御部１１が演算処理を実行するために必要なデータを一時的に記憶する。通信部１３は、通信に関する処理を行うための処理回路等を含み、外部と情報の送受信を行う。なお、通信部１３が行う通信処理は有線通信であってもよく、無線通信であってもよい。マイク１４は、ユーザの指示を音声で入力するためのマイクである。スピーカ１５は、マイク１４に入力された音声によってユーザから指示された操作の結果を音声で出力するためのスピーカである。補助記憶部１９はＲＯＭ（Read-Only Memory）等の不揮発性記憶領域であり、制御部１１が処理を実行するために必要なプログラムＰ、その他のデータを記憶している。 FIG. 2 is a block diagram illustrating a configuration example of the smart speaker 1. The smart speaker 1 includes a control unit 11, a main storage unit 12, a communication unit 13, a microphone 14, a speaker 15, an auxiliary storage unit 19, and a secure element 20.
The control unit 11 has one or more arithmetic processing devices such as a CPU (Central Processing Unit) and an MPU (Micro-Processing Unit), and reads and executes the program P stored in the auxiliary storage unit 19, Various information processing, control processing, and the like related to the smart speaker 1 are performed. The main storage unit 12 is a temporary storage area such as a random access memory (RAM), and temporarily stores data necessary for the control unit 11 to execute arithmetic processing. The communication unit 13 includes a processing circuit and the like for performing processing related to communication, and transmits and receives information to and from the outside. The communication process performed by the communication unit 13 may be wired communication or wireless communication. The microphone 14 is a microphone for inputting a user's instruction by voice. The speaker 15 is a speaker for outputting a result of an operation instructed by a user by voice input to the microphone 14 as voice. The auxiliary storage unit 19 is a nonvolatile storage area such as a ROM (Read-Only Memory), and stores a program P and other data necessary for the control unit 11 to execute a process.

セキュアエレメント２０は、耐タンパ性を有するハードウェアモジュールであり、例えばＳＩＭ（Subscriber Identity Module）、ＵＩＭ（User Identity Module）、ＴＰＭ（Trusted Platform Module）等のＩＣカードやＩＣチップである。セキュアエレメント２０は、制御部１１、通信部１３、マイク１４、スピーカ１５等を含むスピーカ本体からのアクセスが制限されており、上述の機密情報のように、スマートスピーカ１が行う処理において重要なデータを格納してある。 The secure element 20 is a hardware module having tamper resistance, for example, an IC card or an IC chip such as a SIM (Subscriber Identity Module), a UIM (User Identity Module), and a TPM (Trusted Platform Module). The secure element 20 has restricted access from the speaker body including the control unit 11, the communication unit 13, the microphone 14, the speaker 15, and the like. Is stored.

本実施の形態でセキュアエレメント２０は、スピーカ本体に接続されたセキュリティチップとして構成されており、スピーカ本体から取外し可能に構成されている。セキュアエレメント２０をスピーカ本体から取外し可能とすることで、ユーザのプロファイルの移動を簡便かつ直感的に実施できるようになる。 In the present embodiment, the secure element 20 is configured as a security chip connected to the speaker main body, and is configured to be detachable from the speaker main body. By making the secure element 20 detachable from the speaker body, the user's profile can be moved easily and intuitively.

なお、セキュアエレメント２０を取外し可能な構成とせず、例えばはんだ付け等により、スピーカ本体に取外し不可能に搭載されていてもよいことは勿論である。 It is needless to say that the secure element 20 may not be detachably mounted, but may be mounted on the speaker body in a non-removable manner by, for example, soldering.

セキュアエレメント２０は、認証部２１、暗号化部２２、記憶部２３を備える。認証部２１は、ユーザの生体情報に基づく認証処理を行う。暗号化部２２は、セキュアエレメント２０から出力するデータの暗号化を行う。記憶部２３は、セキュアエレメント２０に搭載されたメモリ領域であり、声紋情報２３１、機密情報２３２、及び鍵情報２３３を記憶している。声紋情報２３１は、入力音声から抽出可能なユーザの生体情報であり、ユーザの声紋を示す音声特徴量である。機密情報２３２は、ユーザに関連する機密情報であって、サービスサーバ４が提供するサービスの利用に必要な機密情報を記憶している。 The secure element 20 includes an authentication unit 21, an encryption unit 22, and a storage unit 23. The authentication unit 21 performs an authentication process based on the biometric information of the user. The encryption unit 22 encrypts data output from the secure element 20. The storage unit 23 is a memory area mounted on the secure element 20, and stores voiceprint information 231, confidential information 232, and key information 233. The voiceprint information 231 is biometric information of the user that can be extracted from the input voice, and is a voice feature indicating the voiceprint of the user. The confidential information 232 is confidential information related to the user, and stores confidential information necessary for using a service provided by the service server 4.

鍵情報２３３は、セキュアエレメント２０から出力するデータを暗号化するための鍵情報であって、例えば公開鍵暗号方式の公開鍵、共通鍵暗号方式の共通鍵などである。なお、例えば鍵情報２３３は電子署名用の暗号鍵であってもよく、暗号化のアルゴリズムは特に限定されない。暗号化部２２は、鍵情報２３３を用いて機密情報２３２を暗号化し、スピーカ本体に出力する。記憶部２３に記憶されている鍵情報２３３（例えば公開鍵）に対応する鍵情報（例えば秘密鍵）はサービスサーバ４が保持している。サービスサーバ４は自らが保持する鍵情報を用いて機密情報２３２を復号、検証し、ユーザにサービスを提供する。本実施の形態でスマートスピーカ１及びサービスサーバ４が使用する鍵情報は管理サーバ３が管理しており、各装置は管理サーバ３から取得した鍵情報を保持し、データの暗号化及び復号を行う。 The key information 233 is key information for encrypting data output from the secure element 20, and is, for example, a public key of a public key cryptosystem or a common key of a common key cryptosystem. Note that, for example, the key information 233 may be an encryption key for an electronic signature, and the encryption algorithm is not particularly limited. The encryption unit 22 encrypts the confidential information 232 using the key information 233 and outputs the encrypted confidential information 232 to the speaker body. The key information (for example, a secret key) corresponding to the key information 233 (for example, a public key) stored in the storage unit 23 is held by the service server 4. The service server 4 decrypts and verifies the confidential information 232 using the key information held by the service server 4, and provides a service to the user. In the present embodiment, the key information used by the smart speaker 1 and the service server 4 is managed by the management server 3, and each device holds the key information obtained from the management server 3 and performs data encryption and decryption. .

なお、図２では図示の便宜上、単一の声紋情報２３１及び機密情報２３２が記憶されているものとして図示してあるが、本実施の形態では一台のスマートスピーカ１を複数のユーザが使用するものとし、ユーザ毎に個別の声紋情報２３１及び機密情報２３２が記憶されているものとする。例えば記憶部２３は、ユーザ名と対応付けて各ユーザの声紋情報２３１及び機密情報２３２を記憶する。セキュアエレメント２０は声紋情報２３１に基づいてユーザを識別し、識別したユーザに対応する機密情報２３２を出力する。 In FIG. 2, for convenience of illustration, single voiceprint information 231 and confidential information 232 are illustrated as being stored, but in the present embodiment, one smart speaker 1 is used by a plurality of users. It is assumed that individual voiceprint information 231 and confidential information 232 are stored for each user. For example, the storage unit 23 stores voiceprint information 231 and confidential information 232 of each user in association with the user name. The secure element 20 identifies a user based on the voiceprint information 231 and outputs confidential information 232 corresponding to the identified user.

また、鍵情報２３３についても同様に、便宜的に単一の鍵情報２３３が記憶されているものとして図示してあるが、本実施の形態ではサービスサーバ４ａ、４ｂ、…毎に個別の鍵情報２３３が記憶されているものとする。セキュアエレメント２０は、ユーザが利用するサービスに応じて鍵情報２３３を選択し、選択した鍵情報２３３を用いて機密情報２３２を暗号化して出力する。 Similarly, the key information 233 is also illustrated as storing a single key information 233 for convenience, but in the present embodiment, individual key information is provided for each service server 4a, 4b,. 233 are stored. The secure element 20 selects the key information 233 according to the service used by the user, and encrypts and outputs the confidential information 232 using the selected key information 233.

図３は、実施の形態１の概要を説明するための説明図である。図３に基づき、本実施の形態の概要について説明する。
サービスサーバ４が提供するサービスを利用する場合、ユーザはスマートスピーカ１に対し、サービスの利用を要求するための所定の指示音声を入力する。例えばＥＣサービスを利用する場合、ユーザは購入を希望する商品の情報を発話する。スマートスピーカ１のマイク１４は、商品の購入を指示する指示音声の入力を受け付ける。 FIG. 3 is an explanatory diagram for explaining the outline of the first embodiment. An outline of the present embodiment will be described with reference to FIG.
When using the service provided by the service server 4, the user inputs a predetermined instruction sound to the smart speaker 1 to request the use of the service. For example, when using the EC service, the user speaks information of a product desired to be purchased. The microphone 14 of the smart speaker 1 receives an input of an instruction voice for instructing purchase of a product.

スマートスピーカ１は、マイク１４に入力された音声から、ユーザの声紋を示す特徴量、すなわち声紋情報を生体情報として抽出する。声紋情報は、例えばサウンドスペクトログラムに係る周波数パターンである。スマートスピーカ１は、抽出した声紋情報をセキュアエレメント２０に入力する。 The smart speaker 1 extracts a feature amount indicating a user's voiceprint, that is, voiceprint information from the voice input to the microphone 14 as biological information. The voiceprint information is, for example, a frequency pattern related to a sound spectrogram. The smart speaker 1 inputs the extracted voiceprint information to the secure element 20.

セキュアエレメント２０の認証部２１は、スピーカ本体から取得した声紋情報に基づきユーザの認証を行う。すなわち認証部２１は、記憶部２３に事前に記憶してある声紋情報２３１を読み出し、スピーカ本体から取得した声紋情報と一致するか否か照合する。具体的には、認証部２１は記憶部２３に記憶されている複数のユーザそれぞれの声紋情報２３１との比較を行い、音声を入力したユーザを特定する。声紋情報が一致した場合、認証部２１は認証に成功したものと判定する。声紋情報が一致しないと判定した場合、認証部２１は認証に失敗したものと判定し、認証結果をスピーカ本体に出力して一連の処理を終了させる。 The authentication unit 21 of the secure element 20 performs user authentication based on voiceprint information obtained from the speaker body. That is, the authentication unit 21 reads out the voiceprint information 231 stored in the storage unit 23 in advance, and checks whether or not the voiceprint information 231 matches the voiceprint information acquired from the speaker body. Specifically, the authentication unit 21 performs comparison with the voiceprint information 231 of each of the plurality of users stored in the storage unit 23, and specifies the user who has input the voice. If the voiceprint information matches, the authentication unit 21 determines that the authentication has been successful. If it is determined that the voiceprint information does not match, the authentication unit 21 determines that the authentication has failed, outputs the authentication result to the speaker body, and ends a series of processing.

認証に成功した場合、認証部２１は暗号化部２２にその旨を通知する。通知を受けた場合、暗号化部２２はユーザの機密情報２３２を記憶部２３から読み出す。上述の如く、機密情報２３２はユーザに関連する機密にすべき情報であって、ユーザがサービスを利用する上で必要な情報である。例えばサービスサーバ４が提供するサービスがＥＣサービスである場合、クレジットカード番号のように、商品購入の際に必要な購買情報が記憶部２３に記憶されている。暗号化部２２は、記憶部２３に記憶されている複数のユーザそれぞれの機密情報２３２のうち、認証部２１が声紋情報２３１に基づき特定したユーザに対応する機密情報２３２を読み出す。 If the authentication is successful, the authentication unit 21 notifies the encryption unit 22 of the fact. Upon receiving the notification, the encryption unit 22 reads the user confidential information 232 from the storage unit 23. As described above, the confidential information 232 is information related to the user that should be kept confidential, and is information necessary for the user to use the service. For example, when the service provided by the service server 4 is an EC service, the storage unit 23 stores purchase information necessary for purchasing a product, such as a credit card number. The encryption unit 22 reads the confidential information 232 corresponding to the user specified based on the voiceprint information 231 by the authentication unit 21 from among the confidential information 232 of each of the plurality of users stored in the storage unit 23.

また、暗号化部２２は鍵情報２３３を記憶部２３から読み出す。上述の如く、鍵情報２３３は公開鍵暗号方式の公開鍵、共通鍵暗号方式の共通鍵などであり、サービスサーバ４が保持する鍵情報に対応する鍵情報である。本実施の形態では各サービスサーバ４ａ、４ｂ、…が提供するサービス毎に個別の鍵情報２３３が用意されており、記憶部２３は、各サービスに対応する鍵情報２３３を記憶している。スマートスピーカ１（スピーカ本体）は入力音声からユーザが利用するサービスを特定し、暗号化部２２は、特定されたサービスに対応する鍵情報２３３を読み出す。 Further, the encryption unit 22 reads the key information 233 from the storage unit 23. As described above, the key information 233 is a public key of a public key cryptosystem, a common key of a common key cryptosystem, and the like, and is key information corresponding to the key information held by the service server 4. In the present embodiment, individual key information 233 is prepared for each service provided by each of the service servers 4a, 4b,..., And the storage unit 23 stores the key information 233 corresponding to each service. The smart speaker 1 (speaker body) specifies a service to be used by the user from the input voice, and the encryption unit 22 reads out key information 233 corresponding to the specified service.

なお、入力音声からサービスを特定する方法は特に限定されないが、例えば所謂ウェイクワードのような所定のコマンド音声をスマートスピーカ１が入力音声から認識してサービスを特定するようにしてもよい。あるいは、スマートスピーカ１から外部サーバへ音声のみを先に出力し、音声認識を行わせてサービスを特定させ、外部サーバから特定結果を取得するようにしてもよい。 The method of specifying the service from the input voice is not particularly limited. For example, the smart speaker 1 may recognize a predetermined command voice such as a so-called wake word from the input voice to specify the service. Alternatively, only the voice may be output first from the smart speaker 1 to the external server, the service may be specified by performing voice recognition, and the specified result may be obtained from the external server.

暗号化部２２は、上述の如く、音声を入力したユーザに対応する機密情報２３２と、ユーザが利用するサービスに対応する鍵情報２３３とを読み出す。暗号化部２２は、読み出した機密情報２３２を鍵情報２３３に基づいて暗号化し、スピーカ本体へ出力する。 As described above, the encryption unit 22 reads the confidential information 232 corresponding to the user who has input the voice and the key information 233 corresponding to the service used by the user. The encryption unit 22 encrypts the read confidential information 232 based on the key information 233 and outputs the encrypted confidential information 232 to the speaker body.

スピーカ本体の通信部１３は、セキュアエレメント２０から出力された機密情報２３２を、マイク１４に入力された音声のデータと共にサービスサーバ４へ送信する。例えばＥＣサービスを利用する場合、通信部１３は、商品の購入を指示する指示音声のデータと、商品購入に必要な購入情報とをサービスサーバ４に送信する。これにより、スマートスピーカ１はサービスサーバ４に対してサービスの提供を要求する。 The communication unit 13 of the speaker body transmits the confidential information 232 output from the secure element 20 to the service server 4 together with the audio data input to the microphone 14. For example, when using the EC service, the communication unit 13 transmits to the service server 4 the data of the instruction voice for instructing the purchase of the product and the purchase information necessary for purchasing the product. As a result, the smart speaker 1 requests the service server 4 to provide a service.

サービスサーバ４は、スマートスピーカ１から入力音声及び機密情報２３２に係るデータを受信する。サービスサーバ４は、受信した機密情報２３２を、自らが保持する鍵情報で復号し、検証を行う。上述の如く、スマートスピーカ１が暗号化に用いる鍵情報２３３は、各サービスサーバ４が保持する鍵情報毎に個別となっている。従って、スマートスピーカ１が誤って意図しないサービスサーバ４に機密情報２３２を送信した場合であっても、サービスサーバ４は機密情報２３２を復号することができず、不正使用のリスクを低減することができる。 The service server 4 receives the input voice and the data relating to the confidential information 232 from the smart speaker 1. The service server 4 decrypts the received confidential information 232 with the key information held by the service server 4, and performs verification. As described above, the key information 233 used for encryption by the smart speaker 1 is individual for each key information held by each service server 4. Therefore, even if the smart speaker 1 erroneously transmits the confidential information 232 to the unintended service server 4, the service server 4 cannot decrypt the confidential information 232, thereby reducing the risk of unauthorized use. it can.

機密情報２３２の復号に成功した場合、サービスサーバ４は、入力音声に従いユーザにサービスを提供する。例えばＥＣサービスを提供する場合、サービスサーバ４は商品の発注、発送等に関連する処理を行う。サービスサーバ４は、提供されたサービスに関する出力音声用のデータをスマートスピーカ１に送信する。通信部１３は当該データを受信し、スピーカ１５へと受け渡す。スピーカ１５は、サービスサーバ４から送信された出力音声を再生（出力）する。 If the confidential information 232 is successfully decrypted, the service server 4 provides a service to the user according to the input voice. For example, when providing an EC service, the service server 4 performs processing related to ordering and shipping of a product. The service server 4 transmits to the smart speaker 1 data for output voice related to the provided service. The communication unit 13 receives the data and transfers the data to the speaker 15. The speaker 15 reproduces (outputs) the output sound transmitted from the service server 4.

以上より、スマートスピーカ１は耐タンパ性を有するセキュアエレメント２０を備え、ユーザの生体情報として声紋情報２３１をセキュアエレメント２０に記憶している。スマートスピーカ１は、音声入力によりユーザがサービスを利用する場合、セキュアエレメント２０に声紋情報を入力し、ユーザの認証を行わせる。スマートスピーカ１は、セキュアエレメント２０による認証結果に応じて機密情報２３２を出力し、ユーザにサービスを提供する。これにより、スマートスピーカ１の不正利用、特に機密情報２３２の不正利用を防ぐことができ、スピーカシステムの安全性を高めることができる。 As described above, the smart speaker 1 includes the tamper-resistant secure element 20 and stores the voice print information 231 in the secure element 20 as the biometric information of the user. When the user uses a service by voice input, the smart speaker 1 inputs voiceprint information to the secure element 20 to authenticate the user. The smart speaker 1 outputs the confidential information 232 according to the authentication result by the secure element 20, and provides a service to the user. As a result, unauthorized use of the smart speaker 1, particularly unauthorized use of the confidential information 232, can be prevented, and the security of the speaker system can be enhanced.

図４は、スピーカシステムが実行する処理手順の一例を示すフローチャートである。図４に基づき、スピーカシステムが実行する処理内容について説明する。
スマートスピーカ１の制御部１１は、マイク１４を介して指示音声の入力を受け付ける（ステップＳ１１）。制御部１１は、入力された音声からユーザの声紋情報を抽出する（ステップＳ１２）。 FIG. 4 is a flowchart illustrating an example of a processing procedure executed by the speaker system. With reference to FIG. 4, the contents of processing performed by the speaker system will be described.
The control unit 11 of the smart speaker 1 receives the input of the instruction voice via the microphone 14 (Step S11). The control unit 11 extracts the voiceprint information of the user from the input voice (Step S12).

制御部１１は抽出した声紋情報をセキュアエレメント２０に入力し、セキュアエレメント２０は、記憶部２３に記憶してあるユーザの声紋情報２３１（生体情報）と照合する認証処理を行う（ステップＳ１３）。記憶部２３に複数のユーザの声紋情報２３１が記憶（登録）されている場合、セキュアエレメント２０は、入力音声から抽出された声紋情報を、記憶部２３に記憶されている各ユーザの声紋情報２３１と照合し、音声を入力したユーザを特定する。 The control unit 11 inputs the extracted voiceprint information to the secure element 20, and the secure element 20 performs an authentication process for collating with the voiceprint information 231 (biological information) of the user stored in the storage unit 23 (step S13). When the voiceprint information 231 of a plurality of users is stored (registered) in the storage unit 23, the secure element 20 converts the voiceprint information extracted from the input voice into the voiceprint information 231 of each user stored in the storage unit 23. And identifies the user who input the voice.

ステップＳ１３の認証処理の結果、セキュアエレメント２０は、声紋が一致したか否かを判定する（ステップＳ１４）。声紋が一致しないと判定した場合（Ｓ１４：ＮＯ）、セキュアエレメント２０は一連の処理を終了する。 As a result of the authentication processing in step S13, the secure element 20 determines whether the voiceprints match (step S14). If it is determined that the voiceprints do not match (S14: NO), the secure element 20 ends a series of processing.

声紋が一致したと判定した場合（Ｓ１４：ＹＥＳ）、セキュアエレメント２０は記憶部２３から機密情報２３２を読み出す（ステップＳ１５）。記憶部２３に複数のユーザの機密情報２３２が記憶されている場合、セキュアエレメント２０は、ステップＳ１３で認証（特定）したユーザに対応する機密情報２３２を読み出す。 If it is determined that the voiceprints match (S14: YES), the secure element 20 reads the confidential information 232 from the storage unit 23 (Step S15). When the confidential information 232 of a plurality of users is stored in the storage unit 23, the secure element 20 reads the confidential information 232 corresponding to the user authenticated (identified) in step S13.

セキュアエレメント２０は鍵情報２３３を記憶部２３から読み出す（ステップＳ１６）。例えば記憶部２３にはユーザが音声入力により利用するサービス毎に個別の鍵情報２３３が記憶されており、スピーカ本体の制御部１１が入力音声からサービスを特定した上で、セキュアエレメント２０は特定されたサービスに対応する鍵情報２３３を記憶部２３から読み出す。セキュアエレメント２０は、読み出した鍵情報２３３に基づき機密情報２３２を暗号化する（ステップＳ１７）。セキュアエレメント２０は暗号化した機密情報２３２をスピーカ本体に出力し、制御部１１は、出力された機密情報２３２と共に、ステップＳ１１で入力された音声のデータをサービスサーバ４に送信する（ステップＳ１８）。 The secure element 20 reads the key information 233 from the storage unit 23 (Step S16). For example, the storage unit 23 stores individual key information 233 for each service used by the user by voice input. After the control unit 11 of the speaker body specifies the service from the input voice, the secure element 20 is specified. The key information 233 corresponding to the provided service is read from the storage unit 23. The secure element 20 encrypts the confidential information 232 based on the read key information 233 (Step S17). The secure element 20 outputs the encrypted confidential information 232 to the speaker body, and the control unit 11 transmits the audio data input in step S11 to the service server 4 together with the output confidential information 232 (step S18). .

サービスサーバ４は、機密情報２３２を含む入力音声のデータをスマートスピーカ１から受信する（ステップＳ１９）。サービスサーバ４は、自らが保持する鍵情報であって、スマートスピーカ１のセキュアエレメント２０に記憶されている鍵情報２３３に対応する鍵情報を用いて、スマートスピーカ１から受信した機密情報２３２を復号する（ステップＳ２０）。サービスサーバ４は、復号した機密情報２３２を用いて、ユーザの入力音声に従い提供するサービスに関する情報処理を実行する（ステップＳ２１）。例えばＥＣサービスを提供する場合、サービスサーバ４は商品購入に必要な購買情報（機密情報２３２）を用いて、商品代金の引き落とし、商品の発送等の関わる情報処理を実行する。サービスサーバ４は、提供されたサービスに関する出力音声のデータをスマートスピーカ１に返信する（ステップＳ２２）。 The service server 4 receives the input voice data including the confidential information 232 from the smart speaker 1 (Step S19). The service server 4 decrypts the confidential information 232 received from the smart speaker 1 using the key information held by the service server 4 and corresponding to the key information 233 stored in the secure element 20 of the smart speaker 1. (Step S20). The service server 4 uses the decrypted confidential information 232 to execute information processing on a service to be provided according to the voice input by the user (step S21). For example, in the case of providing an EC service, the service server 4 executes information processing related to withdrawal of a product price and shipping of a product using purchase information (confidential information 232) necessary for product purchase. The service server 4 returns data of the output voice related to the provided service to the smart speaker 1 (Step S22).

スマートスピーカ１の制御部１１は、サービスサーバ４から出力音声のデータを受信する（ステップＳ２３）。制御部１１は、受信した音声をスピーカ１５により出力し（ステップＳ２４）、一連の処理を終了する。 The control unit 11 of the smart speaker 1 receives the output audio data from the service server 4 (Step S23). The control unit 11 outputs the received voice through the speaker 15 (step S24), and ends a series of processes.

以上より、本実施の形態１によれば、スピーカ本体からのアクセスが制限されたセキュアエレメント２０に生体情報（声紋情報２３１）を格納し、音声入力時に音声を入力したユーザの生体認証を行う。これにより、セキュリティ上及びプライバシー上のリスクを低減しつつ、スマートスピーカ１を適切に動作させることができる。 As described above, according to the first embodiment, biometric information (voiceprint information 231) is stored in the secure element 20 to which access from the speaker body is restricted, and biometric authentication of a user who has input a voice at the time of voice input is performed. This allows the smart speaker 1 to operate properly while reducing security and privacy risks.

また、本実施の形態１によれば、生体情報の格納する領域を、耐タンパ性を有するセキュアエレメント２０とすることで、外部からの物理的攻撃にも頑健な構成とすることができ、セキュリティ上のリスクをより低減することができる。 Further, according to the first embodiment, the storage area of the biometric information is the secure element 20 having tamper resistance, so that the configuration can be made robust against external physical attacks. The above risk can be further reduced.

また、本実施の形態１によれば、セキュアエレメント２０をスピーカ本体から取外し可能な構成とすることで、ユーザのプロファイルの移動を簡便かつ直感的にできるようになる。 Further, according to the first embodiment, the configuration in which the secure element 20 is detachable from the speaker body makes it easy and intuitive to move the profile of the user.

また、本実施の形態１によれば、セキュアエレメント２０に機密情報２３２を格納しておき、生体情報の認証結果に応じて機密情報２３２をスピーカ本体に出力する。これにより、機密情報２３２を安全に取り扱いながらも、ユーザはスマートスピーカ１を活用したサービスを享受することができる。 Further, according to the first embodiment, the confidential information 232 is stored in the secure element 20, and the confidential information 232 is output to the speaker body according to the authentication result of the biometric information. Thus, the user can enjoy the service utilizing the smart speaker 1 while safely handling the confidential information 232.

また、本実施の形態１によれば、複数のユーザそれぞれの生体情報及び機密情報２３２をセキュアエレメント２０に格納しておき、音声を入力したユーザを生体情報から特定して、特定したユーザの機密情報２３２をスピーカ本体に出力する。これにより、複数人がスマートスピーカ１を共有する場合であっても本システムを適切に運用することができる。 Further, according to the first embodiment, the biometric information and the confidential information 232 of each of the plurality of users are stored in the secure element 20, and the user who has input the voice is specified from the biometric information, and the confidential information of the specified user is specified. The information 232 is output to the speaker body. Thereby, even when a plurality of people share the smart speaker 1, the present system can be operated properly.

また、本実施の形態１によれば、セキュアエレメント２０は鍵情報２３３により機密情報２３２を暗号化した上で出力する。これにより、スマートスピーカ１及びサービスサーバ４の間で機密情報２３２を安全に送受信することができる。 According to the first embodiment, the secure element 20 encrypts the confidential information 232 with the key information 233 and outputs the encrypted confidential information 232. Thereby, the confidential information 232 can be securely transmitted and received between the smart speaker 1 and the service server 4.

また、本実施の形態１によれば、鍵情報２３３はサービスサーバ４ａ、４ｂ、…が提供するサービス毎に個別化されており、ユーザが利用するサービスに応じた鍵情報２３３を用いて機密情報２３２を暗号化する。従って、意図しないサービスサーバ４に機密情報２３２が送信された場合であっても不正使用のリスクを低減することができる。 According to the first embodiment, the key information 233 is individualized for each service provided by the service servers 4a, 4b,..., And the confidential information is obtained by using the key information 233 corresponding to the service used by the user. 232 is encrypted. Therefore, even when the confidential information 232 is transmitted to the unintended service server 4, the risk of unauthorized use can be reduced.

（実施の形態２）
本実施の形態では声紋に加え、ユーザの指紋を生体情報として用いて認証を行う形態について述べる。なお、実施の形態１と重複する内容については同一の符号を付して説明を省略する。
図５は、実施の形態２の概要を説明するための説明図である。本実施の形態に係るスマートスピーカ１は、指紋センサ１６を備える。指紋センサ１６は、ユーザの指紋を読み取るためのタッチセンサである。なお、指紋センサ１６はスピーカ本体に設けられていてもよく、外付けのデバイスであってもよい。指紋センサ１６は、ユーザの指紋を読み取った画像データを制御部１１に与える。 (Embodiment 2)
In the present embodiment, a mode in which authentication is performed using a user's fingerprint as biometric information in addition to a voiceprint will be described. Note that the same components as those in the first embodiment are denoted by the same reference numerals, and description thereof is omitted.
FIG. 5 is an explanatory diagram for explaining the outline of the second embodiment. The smart speaker 1 according to the present embodiment includes a fingerprint sensor 16. The fingerprint sensor 16 is a touch sensor for reading a user's fingerprint. Note that the fingerprint sensor 16 may be provided on the speaker body, or may be an external device. The fingerprint sensor 16 gives the control unit 11 image data obtained by reading the fingerprint of the user.

また、本実施の形態においてセキュアエレメント２０の記憶部２３は、指紋情報２３４を記憶している。指紋情報２３４は、ユーザの指紋の特徴量を示すデータであり、指紋センサ１６を用いてユーザが事前に登録したものである。なお、実施の形態１と同じく、記憶部２３はユーザ毎に指紋情報２３４を記憶している。 In the present embodiment, the storage unit 23 of the secure element 20 stores fingerprint information 234. The fingerprint information 234 is data indicating a feature amount of the user's fingerprint, and is registered by the user in advance using the fingerprint sensor 16. Note that, as in the first embodiment, the storage unit 23 stores fingerprint information 234 for each user.

実施の形態１と同じく、スマートスピーカ１はマイク１４によりユーザから所定の指示音声の入力を受け付ける。この場合に、スマートスピーカ１は指紋センサ１６により、音声だけでなく指紋の入力を受け付ける。 As in the first embodiment, the smart speaker 1 receives an input of a predetermined instruction voice from the user through the microphone 14. In this case, the smart speaker 1 accepts not only voice input but also fingerprint input by the fingerprint sensor 16.

スマートスピーカ１は、指紋センサ１６により読み取った指紋の画像データからユーザの指紋の特徴量、すなわち指紋情報を抽出する。そしてスマートスピーカ１は、抽出した指紋情報と、指示音声から抽出した声紋情報とをセキュアエレメント２０に入力する。 The smart speaker 1 extracts a feature amount of a user's fingerprint from fingerprint image data read by the fingerprint sensor 16, that is, fingerprint information. Then, the smart speaker 1 inputs the extracted fingerprint information and the voiceprint information extracted from the instruction voice to the secure element 20.

セキュアエレメント２０の認証部２１は、実施の形態１と同じく声紋認証を行うと共に、本実施の形態では指紋認証を行う。すなわち認証部２１は、スピーカ本体から入力された指紋情報が、記憶部２３に記憶されている指紋情報２３４と一致するか否か照合する。例えば認証部２１は、先に声紋認証を行い、入力音声から抽出した声紋情報を記憶部２３に記憶されている各ユーザの声紋情報２３１と比較し、音声を入力したユーザを特定する。さらに認証部２１は、特定したユーザの指紋情報２３４を記憶部２３から読み出し、指紋センサ１６を介して取得した指紋情報と一致するか否か照合する。なお、先に指紋認証を行い、後に声紋認証を行ってもよいことは勿論である。 The authentication unit 21 of the secure element 20 performs voiceprint authentication as in the first embodiment, and performs fingerprint authentication in the present embodiment. That is, the authentication unit 21 checks whether the fingerprint information input from the speaker body matches the fingerprint information 234 stored in the storage unit 23. For example, the authentication unit 21 performs voiceprint authentication first, compares the voiceprint information extracted from the input voice with the voiceprint information 231 of each user stored in the storage unit 23, and specifies the user who has input the voice. Further, the authentication unit 21 reads out the fingerprint information 234 of the specified user from the storage unit 23 and checks whether the fingerprint information 234 matches the fingerprint information acquired via the fingerprint sensor 16. Needless to say, fingerprint authentication may be performed first, and voiceprint authentication may be performed later.

声紋又は指紋のいずれかの認証に失敗した場合、認証部２１は認証に失敗した旨の認証結果をスピーカ本体に出力し、処理を終了する。声紋及び指紋の双方の認証に成功した場合、認証部２１はその旨を暗号化部２２に通知する。その後の処理は実施の形態１と同様であり、セキュアエレメント２０は機密情報２３２の暗号化を行ってスピーカ本体に出力する。 When the authentication of either the voice print or the fingerprint fails, the authentication unit 21 outputs an authentication result indicating that the authentication has failed to the speaker body, and ends the processing. When the authentication of both the voice print and the fingerprint is successful, the authentication unit 21 notifies the encryption unit 22 of that. Subsequent processing is the same as in the first embodiment, and the secure element 20 encrypts the confidential information 232 and outputs it to the speaker body.

図６は、実施の形態２に係るスピーカシステムが実行する処理手順の一例を示すフローチャートである。
スマートスピーカ１の制御部１１は、マイク１４を介して音声の入力を受け付けると共に、指紋センサ１６を介して指紋の入力を受け付ける（ステップＳ２０１）。制御部１１は、入力された音声から声紋情報を抽出すると共に、入力された指紋の特徴量を示す指紋情報を抽出する（ステップＳ２０２）。制御部１１は、抽出した声紋情報及び指紋情報をセキュアエレメント２０に入力する。 FIG. 6 is a flowchart illustrating an example of a processing procedure executed by the speaker system according to the second embodiment.
The control unit 11 of the smart speaker 1 receives a voice input via the microphone 14 and a fingerprint input via the fingerprint sensor 16 (step S201). The control unit 11 extracts voiceprint information from the input voice and also extracts fingerprint information indicating the feature amount of the input fingerprint (step S202). The control unit 11 inputs the extracted voiceprint information and fingerprint information to the secure element 20.

セキュアエレメント２０は、スピーカ本体から入力された声紋情報及び指紋情報をそれぞれ、記憶部２３に記憶されているユーザの声紋情報２３１及び指紋情報２３４と照合する認証処理を行う（ステップＳ２０３）。ステップＳ２０３の認証処理の結果、セキュアエレメント２０は、声紋及び指紋が共に一致したか否かを判定する。声紋及び指紋が一致しないと判定した場合（Ｓ２０４：ＮＯ）、セキュアエレメント２０は一連の処理を終了する。声紋及び指紋が一致したと判定した場合（Ｓ２０４：ＹＥＳ）、セキュアエレメント２０は処理をステップＳ１５に移行する。 The secure element 20 performs an authentication process in which the voiceprint information and the fingerprint information input from the speaker body are compared with the voiceprint information 231 and the fingerprint information 234 of the user stored in the storage unit 23 (step S203). As a result of the authentication processing in step S203, the secure element 20 determines whether the voiceprint and the fingerprint match each other. If it is determined that the voiceprint and the fingerprint do not match (S204: NO), the secure element 20 ends a series of processing. If it is determined that the voiceprint and the fingerprint match (S204: YES), the secure element 20 shifts the processing to step S15.

なお、上記では声紋以外の生体情報として指紋を一例に挙げたが、本実施の形態はこれに限定されるものではなく、例えば顔、虹彩等の情報を生体情報としてもよい。 In the above description, a fingerprint is taken as an example of biometric information other than a voiceprint. However, the present embodiment is not limited to this, and information such as a face and an iris may be used as biometric information.

以上より、本実施の形態２によれば、声紋情報２３１及び指紋情報２３４という複数の生体情報を用いてユーザの認証を行う。これにより、セキュリティをより高めることができる。 As described above, according to the second embodiment, user authentication is performed using a plurality of pieces of biometric information such as voiceprint information 231 and fingerprint information 234. As a result, security can be further improved.

（実施の形態３）
本実施の形態では、生体情報と異なるその他の認証情報をユーザに入力させ、認証を行う形態について説明する。
図７は、実施の形態３の概要を説明するための説明図である。本実施の形態に係るスマートスピーカ１は、入力部１７を備える。入力部１７は、ユーザがＰＩＮコードを入力するための入力インターフェイスであり、例えばメカニカルキー、タッチパネル等の入力パッドである。なお、入力部１７はスピーカ本体に設けられていてもよく、外付けのデバイスであってもよい。 (Embodiment 3)
In the present embodiment, an embodiment will be described in which a user inputs other authentication information different from biometric information to perform authentication.
FIG. 7 is an explanatory diagram for explaining the outline of the third embodiment. The smart speaker 1 according to the present embodiment includes an input unit 17. The input unit 17 is an input interface for a user to input a PIN code, and is, for example, an input pad such as a mechanical key and a touch panel. Note that the input unit 17 may be provided in the speaker body or may be an external device.

また、本実施の形態においてセキュアエレメント２０の記憶部２３は、生体情報以外の手段でユーザの認証を行うための認証情報として、ＰＩＮコード２３５を記憶している。ＰＩＮコード２３５は、ユーザが事前に登録した所定桁数の暗証番号であり、入力部１７等を用いてユーザが事前に登録したものである。なお、実施の形態１と同じく、記憶部２３はユーザ毎にＰＩＮコード２３５を記憶している。 Further, in the present embodiment, the storage unit 23 of the secure element 20 stores a PIN code 235 as authentication information for performing user authentication by means other than biometric information. The PIN code 235 is a password having a predetermined number of digits registered in advance by the user, and is registered in advance by the user using the input unit 17 or the like. As in the first embodiment, the storage unit 23 stores a PIN code 235 for each user.

実施の形態１と同じく、スマートスピーカ１はマイク１４によりユーザから所定の指示音声の入力を受け付ける。この場合に、スマートスピーカ１は入力部１７により、音声だけでなくＰＩＮコードの入力を受け付ける。スマートスピーカ１は、指示音声から声紋情報を抽出してセキュアエレメント２０に入力すると共に、入力部１７を介して入力を受け付けたＰＩＮコードをセキュアエレメント２０に入力する。 As in the first embodiment, the smart speaker 1 receives an input of a predetermined instruction voice from the user through the microphone 14. In this case, the smart speaker 1 receives the input of the PIN code as well as the voice through the input unit 17. The smart speaker 1 extracts voiceprint information from the instruction voice and inputs the voiceprint information to the secure element 20, and also inputs the PIN code accepted via the input unit 17 to the secure element 20.

セキュアエレメント２０の認証部２１は、実施の形態１と同じく声紋認証を行うと共に、本実施の形態ではＰＩＮコードの認証を行う。すなわち認証部２１は、スピーカ本体から入力されたＰＩＮコードが、記憶部２３に記憶されているＰＩＮコード２３５と一致するか否か照合する。例えば認証部２１は、先に声紋認証を行い、入力音声から抽出した声紋情報を記憶部２３に記憶されている各ユーザの声紋情報２３１と比較して、音声を入力したユーザを特定する。さらに認証部２１は、特定したユーザのＰＩＮコード２３５を記憶部２３から読み出し、入力部１７を介して取得したＰＩＮコードと一致するか否か照合する。 The authentication unit 21 of the secure element 20 performs voiceprint authentication as in the first embodiment, and also performs PIN code authentication in the present embodiment. That is, the authentication unit 21 checks whether the PIN code input from the speaker body matches the PIN code 235 stored in the storage unit 23. For example, the authentication unit 21 performs voiceprint authentication first, compares the voiceprint information extracted from the input voice with the voiceprint information 231 of each user stored in the storage unit 23, and specifies the user who has input the voice. Further, the authentication unit 21 reads out the PIN code 235 of the specified user from the storage unit 23 and checks whether the PIN code 235 matches the PIN code acquired via the input unit 17.

声紋又はＰＩＮコードのいずれかの認証に失敗した場合、認証部２１は認証に失敗した旨の認証結果をスピーカ本体に出力し、処理を終了する。声紋及びＰＩＮコードの双方の認証に成功した場合、認証部２１はその旨を暗号化部２２に通知する。その後の処理は実施の形態１と同様であり、セキュアエレメント２０は機密情報２３２の暗号化を行ってスピーカ本体に出力する。 When the authentication of either the voice print or the PIN code fails, the authentication unit 21 outputs an authentication result indicating that the authentication has failed to the speaker body, and ends the process. When the authentication of both the voiceprint and the PIN code is successful, the authentication unit 21 notifies the encryption unit 22 of that. Subsequent processing is the same as in the first embodiment, and the secure element 20 encrypts the confidential information 232 and outputs it to the speaker body.

なお、上記では生体情報と異なる認証情報としてＰＩＮコードを用いたが、認証情報はＰＩＮコードに限定されるものではなく、例えば文字及び数字から成るパスワードなどであってもよい。認証情報は生体情報と組み合わせて利用可能なものであればよく、その内容は特に限定されない。 In the above description, the PIN code is used as the authentication information different from the biometric information. However, the authentication information is not limited to the PIN code, and may be, for example, a password including characters and numerals. The authentication information only needs to be usable in combination with the biometric information, and the content is not particularly limited.

図８は、実施の形態３に係るスピーカシステムが実行する処理手順の一例を示すフローチャートである。
マイク１４に入力された音声から声紋情報を抽出した後（ステップＳ１２）、スマートスピーカ１の制御部１１は以下の処理を実行する。制御部１１は、入力部１７を介して、生体情報と異なるＰＩＮコード（認証情報）の入力を受け付ける（ステップＳ３０１）。なお、ステップＳ３０１で入力される認証情報は、生体情報と異なる認証用の情報であればよく、ＰＩＮコードに限定されない。制御部１１は、ステップＳ１２に抽出した声紋情報と、ステップＳ３０１で受け付けたＰＩＮコードとをセキュアエレメント２０に入力する。セキュアエレメント２０は、スピーカ本体から取得した声紋情報を記憶部２３に記憶されている声紋情報２３１と照合する認証処理を行う（ステップＳ３０２）。 FIG. 8 is a flowchart illustrating an example of a processing procedure executed by the speaker system according to Embodiment 3.
After extracting voiceprint information from the voice input to the microphone 14 (step S12), the control unit 11 of the smart speaker 1 executes the following processing. The control unit 11 receives an input of a PIN code (authentication information) different from the biological information via the input unit 17 (Step S301). The authentication information input in step S301 may be any information for authentication that is different from biometric information, and is not limited to a PIN code. The control unit 11 inputs the voiceprint information extracted in step S12 and the PIN code received in step S301 to the secure element 20. The secure element 20 performs an authentication process of collating the voiceprint information acquired from the speaker body with the voiceprint information 231 stored in the storage unit 23 (Step S302).

ステップＳ３０２の認証処理の結果、セキュアエレメント２０は声紋が一致したか否かを判定する（ステップＳ３０３）。声紋が一致したと判定した場合（Ｓ３０３：ＹＥＳ）、制御部１１はさらにＰＩＮコードの認証を行い、スピーカ本体から取得したＰＩＮコードが、記憶部２３に記憶されているＰＩＮコード２３５と一致するか否かを判定する（ステップＳ３０４）。一致すると判定した場合（Ｓ３０４：ＹＥＳ）、セキュアエレメント２０は処理をステップＳ１５に移行する。 As a result of the authentication processing in step S302, the secure element 20 determines whether the voiceprints match (step S303). If it is determined that the voiceprints match (S303: YES), the control unit 11 further performs authentication of the PIN code, and determines whether the PIN code acquired from the speaker body matches the PIN code 235 stored in the storage unit 23. It is determined whether or not it is (step S304). If it is determined that they match (S304: YES), the secure element 20 shifts the processing to step S15.

声紋が一致しないと判定した場合（Ｓ３０３：ＮＯ）、又はＰＩＮコードが一致しないと判定した場合（Ｓ３０４：ＮＯ）、セキュアエレメント２０は一連の処理を終了する。 If it is determined that the voiceprints do not match (S303: NO) or if the PIN codes do not match (S304: NO), the secure element 20 ends a series of processing.

以上より、本実施の形態３によれば、ＰＩＮコード等の認証情報をユーザ本人に入力させることで、セキュリティをより高めることができる。 As described above, according to the third embodiment, security can be further improved by allowing the user to input authentication information such as a PIN code.

（実施の形態４）
本実施の形態では、生体情報に加えて、スマートスピーカ１の位置情報を認証に用いる形態について説明する。
図９は、実施の形態４の概要を説明するための説明図である。本実施の形態に係るスマートスピーカ１は、位置取得部１８を備える。位置取得部１８は、例えばＧＰＳ（Global Positioning System）情報を取得するＧＰＳアンテナであり、スマートスピーカ１の位置情報を取得する。 (Embodiment 4)
In the present embodiment, an embodiment will be described in which the position information of the smart speaker 1 is used for authentication in addition to the biological information.
FIG. 9 is an explanatory diagram for describing an outline of the fourth embodiment. The smart speaker 1 according to the present embodiment includes a position acquisition unit 18. The position acquisition unit 18 is, for example, a GPS antenna that acquires GPS (Global Positioning System) information, and acquires position information of the smart speaker 1.

また、セキュアエレメント２０の記憶部２３は、正規位置情報２３６を記憶している。正規位置情報２３６は、スマートスピーカ１の正規配置座標を示す位置情報であり、スマートスピーカ１を通常使用する適正位置を示す情報である。セキュアエレメント２０には、正規位置情報２３６が予め登録（記憶）されている。 Further, the storage unit 23 of the secure element 20 stores regular position information 236. The regular position information 236 is position information indicating the regular arrangement coordinates of the smart speaker 1, and is information indicating an appropriate position where the smart speaker 1 is normally used. Regular position information 236 is registered (stored) in the secure element 20 in advance.

スマートスピーカ１の位置取得部１８は、マイク１４において音声入力を受け付けた場合、スマートスピーカ１の現在位置を示す位置情報を取得する。スマートスピーカ１は、マイク１４に入力された音声から抽出した声紋情報と、位置取得部１８が取得した位置情報とをセキュアエレメント２０に入力する。 When the microphone 14 receives a voice input, the position acquisition unit 18 of the smart speaker 1 acquires position information indicating the current position of the smart speaker 1. The smart speaker 1 inputs the voiceprint information extracted from the voice input to the microphone 14 and the position information acquired by the position acquisition unit 18 to the secure element 20.

セキュアエレメント２０の認証部２１は、声紋認証に先立ち、位置情報に基づく認証を行う。認証部２１は記憶部２３から正規位置情報２３６を読み出し、スピーカ本体から取得した位置情報と照合して、スマートスピーカ１の現在位置が通常使用する適正位置にあるか否かを判定する。例えば認証部２１は、スマートスピーカ１の現在の位置座標が、正規配置座標から所定の許容範囲内にあるか否かを判定する。許容範囲内にあると判定した場合、認証部２１は位置情報について認証に成功したものと判定し、声紋認証に処理を移行する。その後の処理は実施の形態１と同様であり、暗号化部２２が機密情報２３２の暗号化を行ってスピーカ本体に出力する。 The authentication unit 21 of the secure element 20 performs authentication based on position information prior to voiceprint authentication. The authentication unit 21 reads out the regular position information 236 from the storage unit 23 and checks with the position information acquired from the speaker body to determine whether the current position of the smart speaker 1 is a proper position for normal use. For example, the authentication unit 21 determines whether the current position coordinates of the smart speaker 1 are within a predetermined allowable range from the regular arrangement coordinates. If it is determined that the position information is within the allowable range, the authentication unit 21 determines that the position information has been successfully authenticated, and shifts the processing to voiceprint authentication. Subsequent processing is the same as in the first embodiment. The encryption unit 22 encrypts the confidential information 232 and outputs it to the speaker body.

図１０は、実施の形態４に係るスピーカシステムが実行する処理手順の一例を示すフローチャートである。
マイク１４に入力された音声から声紋情報を抽出した後（ステップＳ１２）、スマートスピーカ１の制御部１１は以下の処理を実行する。制御部１１は、スマートスピーカ１の位置情報を取得する（ステップＳ４０１）。制御部１１は、ステップＳ１２で抽出した声紋情報と、ステップＳ４０１で抽出した位置情報とをセキュアエレメント２０に入力する。 FIG. 10 is a flowchart illustrating an example of a processing procedure executed by the speaker system according to Embodiment 4.
After extracting voiceprint information from the voice input to the microphone 14 (step S12), the control unit 11 of the smart speaker 1 executes the following processing. The control unit 11 acquires the position information of the smart speaker 1 (Step S401). The control unit 11 inputs the voiceprint information extracted in step S12 and the position information extracted in step S401 to the secure element 20.

セキュアエレメント２０はスマートスピーカ１の位置情報を記憶部２３に記憶している正規位置情報２３６と比較して、スマートスピーカ１が適正位置にあるか否かを判定する（ステップＳ４０２）。例えばセキュアエレメント２０は、スマートスピーカ１の現在の位置座標が、正規配置座標から所定の許容範囲内にあるか否かを判定する。適正位置にあると判定した場合（Ｓ４０２：ＹＥＳ）、セキュアエレメント２０は認証に成功したものと判定する。この場合、セキュアエレメント２０は声紋情報に基づく認証を行い（ステップＳ４０３）、ユーザの声紋と一致するか否かを判定する（ステップＳ４０４）。声紋が一致したと判定した場合（Ｓ４０４：ＹＥＳ）、制御部１１は処理をステップＳ１５に移行する。ステップＳ４０２又はＳ４０４でＮＯの場合、セキュアエレメント２０は認証に失敗したものと判定し、一連の処理を終了する。 The secure element 20 compares the position information of the smart speaker 1 with the regular position information 236 stored in the storage unit 23 to determine whether the smart speaker 1 is at an appropriate position (step S402). For example, the secure element 20 determines whether the current position coordinates of the smart speaker 1 are within a predetermined allowable range from the regular arrangement coordinates. When determining that the secure element 20 is located at the appropriate position (S402: YES), the secure element 20 determines that the authentication is successful. In this case, the secure element 20 performs authentication based on the voiceprint information (step S403), and determines whether or not the authentication matches the voiceprint of the user (step S404). If it is determined that the voiceprints match (S404: YES), the control unit 11 shifts the processing to step S15. In the case of NO in step S402 or S404, the secure element 20 determines that the authentication has failed, and ends a series of processing.

以上より、本実施の形態４によれば、スマートスピーカ１が物理的に持ち去られた場合などに対応することができ、セキュリティを強固にすることができる。 As described above, according to the fourth embodiment, it is possible to cope with a case where the smart speaker 1 is physically taken away, and to strengthen security.

（実施の形態５）
本実施の形態では、機密情報２３２を暗号化するための鍵情報２３３を更新する形態について説明する。
本実施の形態では、スマートスピーカ１及びサービスサーバ４が機密情報２３２を送受信する際に用いる鍵情報２３３を、クラウド経由で更新する。具体的には、鍵情報２３３を管理する管理サーバ３が各装置に鍵情報２３３を配信し、更新させる。 (Embodiment 5)
In the present embodiment, a mode in which key information 233 for encrypting confidential information 232 is updated will be described.
In the present embodiment, the key information 233 used when the smart speaker 1 and the service server 4 transmit and receive the confidential information 232 is updated via the cloud. More specifically, the management server 3 that manages the key information 233 distributes the key information 233 to each device and updates it.

図１１は、実施の形態５の概要を説明するための説明図である。本実施の形態に係るスマートスピーカ１のセキュアエレメント２０は、秘匿通信処理部２４を備える。秘匿通信処理部２４は、管理サーバ３との間の秘匿通信に関する処理を行う。また、セキュアエレメント２０の記憶部２３は、秘匿通信を行うための秘匿通信用鍵２３７（第２の鍵情報）を記憶している。秘匿通信用鍵２３７は管理サーバ３との間の通信内容の暗号化及び復号を行うための暗号鍵であり、例えば共通鍵方式の共通鍵である。本実施の形態では、スマートスピーカ１はＴＬＳ（Transport Layer Security）に基づく暗号化通信を行って管理サーバ３との間の通信内容を秘匿化し、更新用の鍵情報２３３を安全に受信する。 FIG. 11 is an explanatory diagram for describing an outline of the fifth embodiment. The secure element 20 of the smart speaker 1 according to the present embodiment includes a secret communication processing unit 24. The confidential communication processing unit 24 performs processing related to confidential communication with the management server 3. The storage unit 23 of the secure element 20 stores a secret communication key 237 (second key information) for performing secret communication. The secret communication key 237 is an encryption key for encrypting and decrypting communication contents with the management server 3, and is, for example, a common key of a common key system. In the present embodiment, the smart speaker 1 performs encrypted communication based on TLS (Transport Layer Security) to conceal communication contents with the management server 3, and securely receives the update key information 233.

以下、本実施の形態の概要について説明する。まず管理サーバ３が、スマートスピーカ１が保持する鍵情報２３３に対応する鍵情報であって、サービスサーバ４が機密情報２３２を復号する際に用いる鍵情報をサービスサーバ４に送信し、更新させる。上述の如く、鍵情報は各サービスサーバ４ａ、４ｂ、…毎に異なり、ユーザが利用するサービス毎に個別化されている。管理サーバ３は、各サービスサーバ４ａ、４ｂ、…に対応する鍵情報を、該当するサービスサーバ４に送信する。 Hereinafter, an outline of the present embodiment will be described. First, the management server 3 transmits key information corresponding to the key information 233 held by the smart speaker 1 and used when the service server 4 decrypts the confidential information 232 to the service server 4, and updates the key information. As described above, the key information is different for each service server 4a, 4b,... And is individualized for each service used by the user. The management server 3 transmits key information corresponding to each service server 4a, 4b,... To the corresponding service server 4.

次に管理サーバ３は、スマートスピーカ１に鍵情報２３３を送信し、セキュアエレメント２０に格納してある鍵情報２３３を更新させる。この場合に管理サーバ３は、自身が保持する秘匿通信用鍵２３７を用いて秘匿通信路をスマートスピーカ１との間で確立し、鍵情報２３３を暗号化した上で送信する。 Next, the management server 3 transmits the key information 233 to the smart speaker 1 and updates the key information 233 stored in the secure element 20. In this case, the management server 3 establishes a confidential communication path with the smart speaker 1 using the confidential communication key 237 held by the management server 3, and transmits the key information 233 after encrypting it.

例えば管理サーバ３は、ＴＬＳに基づく秘匿通信を行う。管理サーバ３は、予め公開鍵暗号方式の公開鍵を含む電子証明書をスマートスピーカ１（セキュアエレメント２０）に発行し、当該公開鍵に基づく暗号化通信を行って共通鍵暗号方式の共通鍵を秘匿通信用鍵２３７としてスマートスピーカ１との間で共有する。ここでは説明の簡潔のため、すでに暗号化通信が行われて秘匿通信用鍵２３７が共有されているものとする。管理サーバ３は秘匿通信用鍵２３７を用いてスマートスピーカ１との間の通信を暗号化し、更新用の鍵情報２３３を安全に送信する。 For example, the management server 3 performs confidential communication based on TLS. The management server 3 issues an electronic certificate including the public key of the public key cryptosystem to the smart speaker 1 (secure element 20) in advance, performs encrypted communication based on the public key, and obtains the common key of the common key cryptosystem. The secret communication key 237 is shared with the smart speaker 1. Here, for simplicity of description, it is assumed that the encrypted communication has already been performed and the secret communication key 237 has been shared. The management server 3 encrypts the communication with the smart speaker 1 using the secret communication key 237, and securely transmits the update key information 233.

なお、管理サーバ３はスマートスピーカ１との間で共有した秘匿通信用の鍵（第２の鍵情報）を用いて通信内容を暗号化することができればよく、そのアルゴリズムはＴＬＳに限定されない。 The management server 3 only needs to be able to encrypt communication contents using a secret communication key (second key information) shared with the smart speaker 1, and the algorithm is not limited to TLS.

スマートスピーカ１は、通信部１３を介して、ユーザが利用するサービスに対応する鍵情報２３３を受信する。なお、セキュアエレメント２０がスピーカ本体から独立して通信手段を有する場合、スピーカ本体（通信部１３）ではなくセキュアエレメント２０が鍵情報２３３を直接受信するようにしてもよい。スマートスピーカ１は、受信した鍵情報２３３をセキュアエレメント２０に入力する。 The smart speaker 1 receives the key information 233 corresponding to the service used by the user via the communication unit 13. When the secure element 20 has a communication unit independent of the speaker body, the secure element 20 may directly receive the key information 233 instead of the speaker body (communication unit 13). The smart speaker 1 inputs the received key information 233 to the secure element 20.

セキュアエレメント２０の秘匿通信処理部２４は、スピーカ本体から入力された鍵情報２３３を、秘匿通信用鍵２３７を用いて復号する。秘匿通信処理部２４は、復号した鍵情報２３３を記憶部２３に格納し、鍵情報２３３を更新する。以上より、鍵情報２３３の遠隔管理が可能となると共に、鍵情報２３３の陳腐化によるセキュリティの低下を防止することができる。 The secure communication processing unit 24 of the secure element 20 decrypts the key information 233 input from the speaker body using the secure communication key 237. The secret communication processing unit 24 stores the decrypted key information 233 in the storage unit 23 and updates the key information 233. As described above, the key information 233 can be remotely managed, and security degradation due to the obsolescence of the key information 233 can be prevented.

図１２は、実施の形態５に係るスピーカシステムが実行する処理手順の一例を示すフローチャートである。
管理サーバ３は、サービスサーバ４ａ、４ｂ、…にそれぞれ、各サービスサーバ４が提供するサービス毎に異なる鍵情報を送信する（ステップＳ５０１）。さらに管理サーバ３は、スマートスピーカ１に送信する鍵情報２３３を、予めスマートスピーカ１と共有してある秘匿通信用鍵２３７で暗号化する（ステップＳ５０２）。秘匿通信用鍵２３７は、例えば共通鍵暗号方式の共通鍵であり、スマートスピーカ１毎に固有の暗号鍵である。管理サーバ３は、暗号化した鍵情報２３３をスマートスピーカ１に送信する（ステップＳ５０３）。 FIG. 12 is a flowchart illustrating an example of a processing procedure executed by the speaker system according to Embodiment 5.
The management server 3 transmits, to the service servers 4a, 4b,..., Different key information for each service provided by each service server 4 (step S501). Further, the management server 3 encrypts the key information 233 to be transmitted to the smart speaker 1 with the secret communication key 237 shared with the smart speaker 1 in advance (step S502). The secret communication key 237 is, for example, a common key of a common key cryptosystem, and is an encryption key unique to each smart speaker 1. The management server 3 transmits the encrypted key information 233 to the smart speaker 1 (Step S503).

スマートスピーカ１の制御部１１は、通信部１３を介して鍵情報２３３を受信し、セキュアエレメント２０に入力する（ステップＳ５０４）。セキュアエレメント２０は、記憶部２３に記憶している秘匿通信用鍵２３７に基づき、鍵情報２３３を復号する（ステップＳ５０５）。セキュアエレメント２０は、復号した鍵情報２３３を記憶部２３に格納する（ステップＳ５０６）。セキュアエレメント２０は、一連の処理を終了する。 The control unit 11 of the smart speaker 1 receives the key information 233 via the communication unit 13 and inputs the key information 233 to the secure element 20 (Step S504). The secure element 20 decrypts the key information 233 based on the secret communication key 237 stored in the storage unit 23 (Step S505). The secure element 20 stores the decrypted key information 233 in the storage unit 23 (Step S506). The secure element 20 ends the series of processing.

以上より、本実施の形態５によれば、鍵情報２３３の遠隔管理が可能となると共に、鍵情報２３３の陳腐化によるセキュリティの低下を防止することができる。 As described above, according to the fifth embodiment, it is possible to remotely manage the key information 233 and to prevent a decrease in security due to the obsolescence of the key information 233.

（実施の形態６）
本実施の形態では、セキュアエレメント２０に格納されている各種情報をスマートスピーカ１のみのローカルで更新する形態について説明する。
ユーザが本システムを利用するに当たり、スマートスピーカ１に新たなユーザを登録する、クレジットカード番号を更新する等、セキュアエレメント２０内の生体情報、機密情報２３２等を更新するケースが発生し得る。一方で、これらのプライベートな情報をクラウド上（例えば管理サーバ３）で管理することはセキュリティ上、あるいはプライバシー上の懸念が存在する。そこで本実施の形態では、スマートスピーカ１単体で各種情報の更新を行う。 (Embodiment 6)
In the present embodiment, a form in which various information stored in the secure element 20 is locally updated only by the smart speaker 1 will be described.
When the user uses the present system, a case may occur in which biometric information, confidential information 232, and the like in the secure element 20 are updated, such as registering a new user in the smart speaker 1 and updating a credit card number. On the other hand, managing such private information on a cloud (for example, the management server 3) has security or privacy concerns. Therefore, in the present embodiment, the smart speaker 1 alone updates various information.

図１３は、実施の形態６の概要を説明するための説明図である。本実施の形態に係るセキュアエレメント２０は、更新部２５を備える。更新部２５は、スピーカ本体から入力される声紋情報２３１、機密情報２３２、指紋情報２３４、ＰＩＮコード２３５、正規位置情報２３６といった各種情報を記憶部２３に格納し、更新する処理を行う。 FIG. 13 is an explanatory diagram for describing an outline of the sixth embodiment. The secure element 20 according to the present embodiment includes an updating unit 25. The update unit 25 stores various information such as voiceprint information 231, confidential information 232, fingerprint information 234, PIN code 235, and regular position information 236 input from the speaker body in the storage unit 23, and performs a process of updating.

まずスマートスピーカ１のマイク１４が、ユーザから声紋情報２３１等の更新を指示する所定の指示音声の入力を受け付ける。また、併せてスマートスピーカ１の指紋センサ１６がユーザの指紋を読み取る。スマートスピーカ１は、入力された音声から声紋情報を、指紋を読み取った画像から指紋情報（指紋の特徴量）をそれぞれ抽出する。 First, the microphone 14 of the smart speaker 1 receives an input of a predetermined instruction voice for instructing the user to update the voiceprint information 231 or the like from the user. In addition, the fingerprint sensor 16 of the smart speaker 1 reads the fingerprint of the user. The smart speaker 1 extracts voiceprint information from input voice and fingerprint information (feature amount of fingerprint) from an image obtained by reading a fingerprint.

さらにスマートスピーカ１は、マイク１４、指紋センサ１６等により、ユーザが更新を所望する情報を更新用の情報として取得する。更新用の情報は、声紋情報、機密情報、指紋情報、ＰＩＮコード、及び位置情報の少なくともいずれかであるが、どの情報を更新するかは特に限定されない。また、更新する情報は既にセキュアエレメント２０に声紋情報２３１等が登録（記憶）されている既存ユーザのものであってもよく、新規ユーザのものであってもよい。例えば新規ユーザの登録を行う場合、既存ユーザが上述の如く指示音声を入力した後、続いて新規ユーザが音声を入力する。スマートスピーカ１は、新規ユーザの音声から声紋情報を更新用の情報として抽出する。 Further, the smart speaker 1 acquires information that the user desires to update as information for update by using the microphone 14, the fingerprint sensor 16, and the like. The information for updating is at least one of voiceprint information, confidential information, fingerprint information, PIN code, and position information, but the information to be updated is not particularly limited. Further, the information to be updated may be that of an existing user whose voice print information 231 or the like is already registered (stored) in the secure element 20, or that of a new user. For example, when registering a new user, the existing user inputs the instruction voice as described above, and then the new user inputs the voice. The smart speaker 1 extracts voiceprint information from the voice of the new user as information for updating.

スマートスピーカ１は、上記の声紋情報及び指紋情報と、更新用の情報とをセキュアエレメント２０に入力する。 The smart speaker 1 inputs the voiceprint information and the fingerprint information and the information for updating to the secure element 20.

セキュアエレメント２０の更新部２５は、スピーカ本体から入力された声紋情報及び指紋情報を記憶部２３に記憶されているユーザの声紋情報２３１及び指紋情報２３４と照合し、正当なユーザが更新を要求しているか否か、認証を行う。声紋及び指紋が一致した場合、更新部２５は認証に成功したものと判定する。 The update unit 25 of the secure element 20 collates the voiceprint information and the fingerprint information input from the speaker body with the user's voiceprint information 231 and the fingerprint information 234 stored in the storage unit 23, and a valid user requests an update. Authentication is performed as to whether or not authentication has been performed. If the voiceprint and the fingerprint match, the updating unit 25 determines that the authentication has been successful.

上記のように、更新部２５は声紋情報及び指紋情報、すなわち生体情報に基づく認証を行う。生体情報の認証に成功した場合、次に更新部２５はＰＩＮコードの検証を行う。更新部２５は、ＰＩＮコードの入力をスピーカ本体に要求する。セキュアエレメント２０から要求を受けた場合、例えばスマートスピーカ１は所定の案内音声をスピーカ１５により出力する等して、ＰＩＮコードを入力するようユーザに促す。 As described above, the updating unit 25 performs authentication based on voiceprint information and fingerprint information, that is, biometric information. When the authentication of the biometric information is successful, the updating unit 25 verifies the PIN code. The updating unit 25 requests the speaker body to input a PIN code. When a request is received from the secure element 20, the smart speaker 1 prompts the user to input a PIN code by, for example, outputting a predetermined guidance voice through the speaker 15.

入力部１７は、ユーザからＰＩＮコードの入力を受け付け、セキュアエレメント２０に入力する。セキュアエレメント２０の更新部２５は、入力されたＰＩＮコードを記憶部２３に記憶しているＰＩＮコード２３５と照合し、両者が一致するか否か、認証を行う。ＰＩＮコードが一致したと判定した場合、更新部２５は認証に成功したと判定する。 The input unit 17 receives an input of a PIN code from a user and inputs the PIN code to the secure element 20. The update unit 25 of the secure element 20 collates the input PIN code with the PIN code 235 stored in the storage unit 23, and performs authentication to determine whether the two match. When determining that the PIN codes match, the updating unit 25 determines that the authentication has been successful.

生体情報及びＰＩＮコードの認証に成功した場合、更新部２５はスピーカ本体から取得した更新用の情報を記憶部２３に記憶（格納）し、各種情報を更新する。 When the authentication of the biometric information and the PIN code is successful, the update unit 25 stores (stores) the update information acquired from the speaker body in the storage unit 23 and updates various information.

図１４は、実施の形態６に係るスピーカシステムが実行する処理手順の一例を示すフローチャートである。
セキュアエレメント２０の制御部１１は、指紋センサ１６を介して指紋の入力を受け付けると共に、マイク１４を介して、セキュアエレメント２０に格納してある情報の更新を要求する指示音声の入力を受け付ける（ステップＳ６０１）。制御部１１は、入力された音声からユーザの声紋情報を抽出する（ステップＳ６０２）。また、制御部１１は、ステップＳ６０１で入力された指紋（画像）から指紋情報（指紋に関する特徴量）を抽出する（ステップＳ６０３）。 FIG. 14 is a flowchart illustrating an example of a processing procedure performed by the speaker system according to Embodiment 6.
The control unit 11 of the secure element 20 accepts an input of a fingerprint via the fingerprint sensor 16 and an input of an instruction voice requesting an update of information stored in the secure element 20 via the microphone 14 (step). S601). The control unit 11 extracts the voiceprint information of the user from the input voice (step S602). In addition, the control unit 11 extracts fingerprint information (feature amount related to the fingerprint) from the fingerprint (image) input in step S601 (step S603).

また、制御部１１は、更新用の情報を取得する（ステップＳ６０４）。更新用の情報は、声紋情報、機密情報、指紋情報、ＰＩＮコード、及び位置情報の少なくともいずれかであるが、どの情報であるかは特に限定されない。また、更新する情報は既にセキュアエレメント２０に声紋情報２３１等が登録（記憶）されている既存ユーザのものであってもよく、新規ユーザのものであってもよい。制御部１１は、ステップＳ６０１、Ｓ６０２で抽出した声紋情報及び指紋情報をセキュアエレメント２０に入力すると共に、更新用の情報を併せて入力する。 Further, the control unit 11 acquires information for updating (step S604). The information for updating is at least one of voiceprint information, confidential information, fingerprint information, PIN code, and position information, but the information is not particularly limited. Further, the information to be updated may be that of an existing user whose voice print information 231 or the like is already registered (stored) in the secure element 20, or that of a new user. The control unit 11 inputs the voiceprint information and the fingerprint information extracted in steps S601 and S602 to the secure element 20, and also inputs information for updating.

セキュアエレメント２０は、スピーカ本体から入力された声紋情報及び指紋情報を、記憶部２３に記憶している声紋情報２３１及び指紋情報２３４とそれぞれ照合する認証処理を行う（ステップＳ６０５）。ステップＳ６０５の認証処理の結果、セキュアエレメント２０は、生体情報（声紋情報及び指紋情報）が一致したか否かを判定する（ステップＳ６０６）。生体情報が一致しないと判定した場合（Ｓ６０６：ＮＯ）、セキュアエレメント２０は認証に失敗したものとして情報の更新を行わず、一連の処理を終了する。 The secure element 20 performs an authentication process in which the voiceprint information and the fingerprint information input from the speaker body are compared with the voiceprint information 231 and the fingerprint information 234 stored in the storage unit 23 (step S605). As a result of the authentication processing in step S605, the secure element 20 determines whether or not the biometric information (voiceprint information and fingerprint information) matches (step S606). If it is determined that the biometric information does not match (S606: NO), the secure element 20 determines that the authentication has failed, does not update the information, and ends a series of processing.

生体情報が一致したと判定した場合（Ｓ６０６：ＹＥＳ）、セキュアエレメント２０はＰＩＮコードの入力をスピーカ本体に要求し、スマートスピーカ１の制御部１１は、入力部１７を介してＰＩＮコードの入力を受け付ける（ステップＳ６０７）。制御部１１は、入力されたＰＩＮコードをセキュアエレメント２０に入力する。 When it is determined that the biometric information matches (S606: YES), the secure element 20 requests the input of the PIN code to the speaker body, and the control unit 11 of the smart speaker 1 inputs the PIN code via the input unit 17. Accept (Step S607). The control unit 11 inputs the input PIN code to the secure element 20.

セキュアエレメント２０は、スピーカ本体から入力されたＰＩＮコードが、記憶部２３に記憶しているＰＩＮコード２３５と一致したか否かを判定する（ステップＳ６０８）。ＰＩＮコードが一致しないと判定した場合（Ｓ６０８：ＮＯ）、セキュアエレメント２０は認証に失敗したものとして情報の更新を行わず、一連の処理を終了する。 The secure element 20 determines whether the PIN code input from the speaker body matches the PIN code 235 stored in the storage unit 23 (Step S608). If it is determined that the PIN codes do not match (S608: NO), the secure element 20 determines that authentication has failed, does not update the information, and ends a series of processing.

ＰＩＮコードが一致したと判定した場合（Ｓ６０８：ＹＥＳ）、セキュアエレメント２０は、ステップＳ６０４で取得した更新用の情報を記憶部２３に記憶し、情報を更新する（ステップＳ６０９）。セキュアエレメント２０は、一連の処理を終了する。 If it is determined that the PIN codes match (S608: YES), the secure element 20 stores the update information acquired in step S604 in the storage unit 23 and updates the information (step S609). The secure element 20 ends the series of processing.

以上より、本実施の形態６によれば、セキュアエレメント２０に格納されている各種情報をスマートスピーカ１単体で更新することができ、セキュリティ及びプライバシーの観点から適切に更新を行うことができる。 As described above, according to the sixth embodiment, various types of information stored in the secure element 20 can be updated by the smart speaker 1 alone, and can be appropriately updated from the viewpoint of security and privacy.

（実施の形態７）
実施の形態１では、耐タンパ性を有するセキュアエレメント２０に各種情報を格納して認証を行う形態について説明した。本実施の形態では、スマートスピーカ１内のソフトウェア上の実行環境を、スマートスピーカ１の一般的な処理を行う通常実行環境２０１（第１の実行環境）と、通常実行環境２０１よりもセキュアなトラステッド実行環境２０２（第２の実行環境）とに仮想的に分離し、トラステッド実行環境２０２において認証処理を行う形態について説明する。 (Embodiment 7)
In the first embodiment, an example has been described in which various types of information are stored in the tamper-resistant secure element 20 to perform authentication. In the present embodiment, the execution environment on the software in the smart speaker 1 is defined as a normal execution environment 201 (first execution environment) for performing general processing of the smart speaker 1 and a trusted environment more secure than the normal execution environment 201. A form in which the authentication processing is performed in the trusted execution environment 202 while being virtually separated from the execution environment 202 (second execution environment) will be described.

図１５は、実施の形態７に係るスマートスピーカ１の構成例を示すブロック図である。本実施の形態に係るスマートスピーカ１は、例えばＴｒｕｓｔＺｏｎｅ（登録商標）と称される技術を用いることによって、ソフトウェア（ＯＳ、アプケーションなど）の実行環境を、通常実行環境（ＲＥＥ；Rich Execution Environment）２０１と、トラステッド実行環境２０２とに分離している。 FIG. 15 is a block diagram showing a configuration example of the smart speaker 1 according to the seventh embodiment. The smart speaker 1 according to the present embodiment uses, for example, a technology called TrustZone (registered trademark) to change the execution environment of software (OS, application, and the like) to a normal execution environment (REE; Rich Execution Environment). 201 and a trusted execution environment 202.

通常実行環境２０１は、スマートスピーカ１の基本ＯＳとして稼働する汎用ＯＳ２１１の実行環境であり、トラステッド実行環境２０２へのアクセスが制限される以外に、特段の機能制約がない実行環境である。汎用ＯＳ２１１は、通常実行環境２０１においてＯＳの機能を果たすソフトウェアであり、アプリケーション２１２からの要求に応じて、スマートスピーカ１が有するハードウェアの制御等を含む各種ＯＳ機能を提供する。制御部１１は、汎用ＯＳ２１１上でアプリケーション２１２を実行することで、音声の入出力、外部との通信を含む、スマートスピーカ１の基本的、汎用的な処理を実行する。 The normal execution environment 201 is an execution environment of the general-purpose OS 211 that operates as the basic OS of the smart speaker 1, and is an execution environment that has no particular functional restrictions other than restricting access to the trusted execution environment 202. The general-purpose OS 211 is software that performs an OS function in the normal execution environment 201, and provides various OS functions including control of hardware included in the smart speaker 1 in response to a request from the application 212. By executing the application 212 on the general-purpose OS 211, the control unit 11 executes basic and general-purpose processing of the smart speaker 1, including input / output of sound and communication with the outside.

トラステッド実行環境２０２は、セキュリティ機能を隔離する目的で、同一のＳｏＣ（System on Chip）上で通常実行環境２０１とは別に提供される独立した実行環境である。トラステッド実行環境２０２は、通常実行環境２０１からのアクセスが制限されており、実行可能な機能も限定されている。なお、トラステッド実行環境は、ＴＥＥのような称呼に限定されるものではなく、通常実行環境２０１と分離され、セキュリティ上より安全な実行環境であれば、どのような称呼の実行環境であってもよい。スマートスピーカ１は、セキュリティ上保護すべきソフトウェア及びデータをトラステッド実行環境２０２に配置すると共に、通常実行環境２０１及びスマートスピーカ１の外部からのアクセスを制限することで、安全性を確保する。 The trusted execution environment 202 is an independent execution environment provided separately from the normal execution environment 201 on the same SoC (System on Chip) for the purpose of isolating security functions. In the trusted execution environment 202, access from the normal execution environment 201 is restricted, and executable functions are also limited. Note that the trusted execution environment is not limited to a name such as TEE, but may be separated from the normal execution environment 201 and may be any execution environment that is more secure in terms of security. Good. The smart speaker 1 secures software and data to be protected in security in the trusted execution environment 202 and restricts access from outside the normal execution environment 201 and the smart speaker 1 to ensure security.

上述のように、通常実行環境２０１からトラステッド実行環境２０２にはアクセスできないように制限されており、通常実行環境２０１からはトラステッド実行環境２０２の存在を認識できない。通常実行環境２０１からトラステッド実行環境２０２で実行する処理を呼び出すためには、ソフトウェア上実現されるセキュアモニタ２０３を経由しなければならない。 As described above, the normal execution environment 201 is restricted so as not to access the trusted execution environment 202, and the normal execution environment 201 cannot recognize the existence of the trusted execution environment 202. In order to call a process to be executed in the trusted execution environment 202 from the normal execution environment 201, it is necessary to go through a secure monitor 203 implemented by software.

トラステッドＯＳ２２１は、トラステッド実行環境２０２においてＯＳの機能を果たすソフトウェアであり、アプリケーション２２２からの要求に応じて、セキュリティ機能を中心としたＯＳ機能を提供する。制御部１１は、トラステッドＯＳ２２１上でアプリケーション２２２を実行することで、本実施の形態に係るユーザの認証処理を含む、セキュリティ上重要な処理を実行する。 The trusted OS 221 is software that performs an OS function in the trusted execution environment 202, and provides an OS function centering on a security function in response to a request from the application 222. By executing the application 222 on the trusted OS 221, the control unit 11 executes a process important for security including a user authentication process according to the present embodiment.

なお、本実施の形態では、スマートスピーカ１の各種機能がＯＳ、アプリケーションのいずれで実装されるかは本質的事項ではなく、実装者が適宜選択すべき設計事項であることから、ＯＳ、アプリケーションの機能分担については説明を省略する。 In the present embodiment, whether the various functions of the smart speaker 1 are implemented by the OS or the application is not an essential matter, but is a design matter to be appropriately selected by the implementer. The description of the function sharing is omitted.

図１１に示すように、本実施の形態においてスマートスピーカ１の制御部１１は、トラステッド実行環境２０２に声紋情報２３１、機密情報２３２、鍵情報２３３等を配置する。制御部１１は、通常実行環境２０１からトラステッド実行環境２０２に声紋情報等のデータを入力し、トラステッド実行環境２０２においてユーザの認証及び機密情報２３２の暗号化を行う。制御部１１は、トラステッド実行環境２０２から通常実行環境２０１に機密情報２３２を出力し、通信部１３を介してサービスサーバ４へ機密情報２３２を送信する。 As shown in FIG. 11, in the present embodiment, the control unit 11 of the smart speaker 1 places voiceprint information 231, confidential information 232, key information 233, and the like in the trusted execution environment 202. The control unit 11 inputs data such as voiceprint information from the normal execution environment 201 to the trusted execution environment 202, and performs user authentication and encryption of the confidential information 232 in the trusted execution environment 202. The control unit 11 outputs the confidential information 232 from the trusted execution environment 202 to the normal execution environment 201, and transmits the confidential information 232 to the service server 4 via the communication unit 13.

以上より、本実施の形態３によれば、スマートスピーカ１は、通常実行環境２０１と、通常実行環境２０１よりもセキュアなトラステッド実行環境２０２とを構築し、トラステッド実行環境２０２において認証処理を実行する。このように、セキュアエレメント２０を搭載せずとも、ソフトウェア上の構成によって安全性を確保することができる。 As described above, according to the third embodiment, the smart speaker 1 constructs the normal execution environment 201 and the trusted execution environment 202 which is more secure than the normal execution environment 201, and executes the authentication processing in the trusted execution environment 202. . As described above, the security can be ensured by the software configuration without mounting the secure element 20.

上述の如く、スマートスピーカ１は、音声の入出力及び外部との通信を実行するコンポーネント（スピーカ本体）よりもセキュアなコンポーネント（セキュア部）を有し、当該セキュアなコンポーネントにおいて生体情報を含む各種情報を保持し、認証処理を実行可能であればよい。当該セキュアなコンポーネントは、ハードウェア上分離されたセキュアエレメント２０であってもよく、ソフトウェア上分離されたトラステッド実行環境２０２であってもよい。 As described above, the smart speaker 1 has a component (secure unit) that is more secure than a component (speaker body) that performs input / output of sound and communication with the outside, and various types of information including biometric information in the secure component. It is sufficient that the authentication processing can be executed. The secure component may be a secure element 20 that is separated on hardware, or a trusted execution environment 202 that is separated on software.

セキュアエレメント２０に代えてトラステッド実行環境２０２を実装する以外は実施の形態１と共通するため、本実施の形態では詳細な図示及び説明を省略する。 Since this embodiment is the same as the first embodiment except that a trusted execution environment 202 is implemented instead of the secure element 20, detailed illustration and description are omitted in this embodiment.

（実施の形態８）
図１６は、上述した形態のスマートスピーカ１の動作を示す機能ブロック図である。制御部１１がプログラムＰを実行することにより、スマートスピーカ１は以下のように動作する。
スピーカ本体１６１は、音声の入力を受け付けるマイク１６１１、音声を出力するスピーカ１６１２、及び外部との通信を行う通信部１６１３を有する。セキュア部１６２は、該スピーカ本体１６１からのアクセスが制限され、ユーザの生体情報を記憶する記憶部１６２１と、前記マイクにおいて音声の入力を受け付けた場合に、該音声を入力したユーザの前記生体情報を前記スピーカ本体１６１から取得する取得部１６２２と、前記生体情報に基づく認証を行う認証部１６２３とを備える。 (Embodiment 8)
FIG. 16 is a functional block diagram showing the operation of the smart speaker 1 of the above-described embodiment. When the control unit 11 executes the program P, the smart speaker 1 operates as follows.
The speaker body 161 includes a microphone 1611 for receiving audio input, a speaker 1612 for outputting audio, and a communication unit 1613 for communicating with the outside. The secure unit 162 includes a storage unit 1621 for limiting access from the speaker body 161 and storing biometric information of the user, and a biometric information of the user who inputs the voice when the microphone receives an input of the voice. And an authentication unit 1623 for performing authentication based on the biometric information.

本実施の形態８は以上の如きであり、その他は実施の形態１から７と同様であるので、対応する部分には同一の符号を付してその詳細な説明を省略する。 The eighth embodiment is as described above, and the other parts are the same as those in the first to seventh embodiments. Corresponding parts are denoted by the same reference numerals, and detailed description thereof will be omitted.

今回開示された実施の形態はすべての点で例示であって、制限的なものではないと考えられるべきである。本発明の範囲は、上記した意味ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time is an example in all respects and should be considered as not being restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

１スマートスピーカ
１１制御部
１２主記憶部
１３通信部
１４マイク
１５スピーカ
１６指紋センサ
１７入力部
１８位置取得部
１９補助記憶部
２０セキュアエレメント
２１認証部
２２暗号化部
２３記憶部
２３１声紋情報
２３２機密情報
２３３鍵情報
２３４指紋情報
２３５ＰＩＮコード
２３６正規位置情報
２３７秘匿通信用鍵
２４秘匿通信処理部
２５更新部
３管理サーバ
４サービスサーバ Reference Signs List 1 smart speaker 11 control unit 12 main storage unit 13 communication unit 14 microphone 15 speaker 16 fingerprint sensor 17 input unit 18 position acquisition unit 19 auxiliary storage unit 20 secure element 21 authentication unit 22 encryption unit 23 storage unit 231 voiceprint information 232 confidential information 233 Key information 234 Fingerprint information 235 PIN code 236 Regular position information 237 Secret communication key 24 Secret communication processing unit 25 Update unit 3 Management server 4 Service server

Claims

A microphone that receives audio input, a speaker that outputs audio, and a speaker body having a communication unit that communicates with the outside;
And a secure unit whose access from the speaker body is restricted,
The secure unit,
A storage unit for storing biometric information of the user,
An acquisition unit configured to acquire, from the speaker body, the biological information of the user who has input the audio when the input of the audio is received at the microphone;
An authentication unit for performing authentication based on the biological information.

The smart speaker according to claim 1, wherein the secure unit is a tamper-resistant secure element.

The smart speaker according to claim 2, wherein the secure element is configured to be detachable from the speaker main body.

The speaker main body is a first execution environment for executing a sound input / output process and a communication process with the outside,
The smart speaker according to claim 1, wherein the secure unit is a second execution environment virtually separated from the first execution environment.

The storage unit stores a plurality of the biological information in association with the user,
The acquisition unit acquires the plurality of biological information,
The smart speaker according to claim 1, wherein the authentication unit performs authentication based on the plurality of pieces of biometric information.

The speaker body includes an input unit that receives an input of authentication information different from the biological information,
The storage unit stores the authentication information,
The acquisition unit acquires the biometric information and the authentication information,
The smart speaker according to any one of claims 1 to 5, wherein the secure unit includes a second authentication unit that performs authentication based on the authentication information.

The storage unit stores confidential information about the user,
The smart speaker according to any one of claims 1 to 6, wherein the secure unit includes an output unit that outputs the confidential information to the speaker body in accordance with an authentication result by the authentication unit.

The storage unit stores the biological information and confidential information corresponding to each of the plurality of users,
The authentication unit specifies the user based on the biological information acquired from the speaker body,
The smart speaker according to claim 7, wherein the output unit outputs the confidential information corresponding to the specified user.

The storage unit stores key information for encrypting the confidential information,
An encryption unit that encrypts the confidential information based on the key information,
The smart speaker according to claim 7, wherein the output unit outputs the encrypted confidential information.

The storage unit stores the individual key information for each of a plurality of services used by the user by inputting a voice to the microphone,
The speaker body includes a specification unit that specifies the service used by the user based on a voice input to the microphone,
The smart speaker according to claim 9, wherein the encryption unit performs encryption based on the key information corresponding to the specified service.

The storage unit stores second key information shared with a distribution source of the key information,
The secure unit,
A second acquisition unit that acquires the key information encrypted based on the second key information from the distribution source via a communication network;
A decryption unit that decrypts the key information based on the second key information;
The storage device according to claim 9, further comprising: a storage unit configured to store the decrypted key information in the storage unit.

The speaker body includes a position acquisition unit that acquires position information of the smart speaker,
The storage unit stores regular position information of the smart speaker,
The said secure part is provided with the position authentication part which performs the authentication based on the said position information which the said position acquisition part acquired and the said regular position information. The Claim 1 characterized by the above-mentioned. Smart speaker.

The acquisition unit acquires update information from the speaker body,
The smart device according to any one of claims 1 to 11, wherein the secure unit includes an update unit that updates information stored in the storage unit according to an authentication result by the authentication unit. Speaker.

A microphone that receives input of voice, a speaker that outputs voice, and a secure element in which access from a speaker body of a smart speaker having a communication unit that performs communication with the outside is restricted,
A storage unit for storing biometric information of the user,
An acquisition unit configured to acquire, from the speaker body, the biological information of the user who has input the audio when the input of the audio is received at the microphone;
An authentication unit for performing authentication based on the biometric information.

A microphone that receives voice input, a speaker that outputs voice, and a smart speaker with a communication unit that communicates with the outside,
Executing a voice input / output process and an external communication process in a first execution environment;
In a second execution environment virtually separated from the first execution environment,
Holding the user's biological information,
When receiving an input of a voice in the microphone, the biological information of the user who has input the voice is acquired from the first execution environment,
A program for executing processing for performing authentication based on the biological information.

From the smart speaker storing the biometric information of the user, the purchase information required when the user purchases the product, and the key information in the secure section where the access from the speaker body is restricted, according to the authentication result of the biometric information The purchase information output by the secure unit, to obtain the purchase information encrypted based on the key information,
The acquired purchase information is decrypted based on key information corresponding to the key information,
An information processing method, comprising: causing a computer to execute a process of performing information processing on purchase of the product based on the decrypted purchase information.

To a service provider providing a service to a user via a smart speaker, transmitting individual key information for each of the services,
Key information corresponding to the key information transmitted to the service provider is encrypted based on second key information shared with the smart speaker,
A distribution method comprising: causing a computer to execute a process of transmitting the encrypted key information to the smart speaker.