JP2020154140A

JP2020154140A - Speech processing apparatus, agent system, program, and speech processing method

Info

Publication number: JP2020154140A
Application number: JP2019052608A
Authority: JP
Inventors: 東坪田; Azuma Tsubota
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2019-03-20
Filing date: 2019-03-20
Publication date: 2020-09-24
Anticipated expiration: 2039-03-20
Also published as: JP7259446B2

Abstract

To return answers linked to many unspecified users, who use agent services, to the respective users.SOLUTION: An agent device 100 comprises: a speech data conversion unit 103 which performs mutual conversion between an input speech and speech data; a mobile terminal authentication unit 104 which receives authentication data for accessing an agent server 300 from a mobile terminal 200; and an agent service access unit 105 which transmits speech data converted by the speech data conversion unit 103 and the authentication data that the mobile terminal authentication unit 104 receives, to the agent server 300 and acquires an answer to indication details of the speech data from the agent server 300.SELECTED DRAWING: Figure 1

Description

本発明は、音声処理装置、エージェントシステム、音声処理プログラム、および、音声処理方法に関し、例えば、音声によってサーバ上のエージェントに対して何らかの指示を行い、結果を音声で返すシステムに関して記載したものである。 The present invention describes a voice processing device, an agent system, a voice processing program, and a voice processing method, for example, a system that gives some instruction to an agent on a server by voice and returns a result by voice. ..

近年、利用者が行う仕事（例えば、検索）を、利用者に代わって行うエージェントサービスに関する技術が盛んに行われている。エージェントサービスを実現するエージェントシステムについては、従来では、利用者からエージェント装置へリクエストを行うと、予めエージェント装置内に登録されているサーバへのアクセスアカウントを用いてサーバへアクセスし、同じエージェント装置では利用者の区別を行わない第１の方式や、エージェントシステムのエージェント装置ごとに予め複数の利用者を登録し、エージェント装置に設定されたアカウントでサーバにアクセスした後に、サーバでさらに生体情報等を利用して登録利用者を検出するという第２の方式があった。 In recent years, technologies related to agent services that perform work (for example, search) performed by a user on behalf of the user have been actively performed. Conventionally, when a user makes a request to an agent device, an agent system that realizes an agent service accesses the server using an access account to the server registered in the agent device in advance, and the same agent device uses the same agent system. The first method that does not distinguish between users, or after registering multiple users in advance for each agent device of the agent system and accessing the server with the account set in the agent device, further biometric information etc. is stored on the server. There was a second method of using it to detect registered users.

特開２０１８−８５０５３号公報JP-A-2018-85053

第１の方式では、エージェント装置ごとにサーバへのアクセスアカウントがつけられる。しかしながら、第１の方式では、利用者からの指示に対するサーバからの回答の内容は、所定の箇所（例：エージェント装置の周辺）に設置された機器に括りつけられることになる。換言すれば、サーバからの回答の内容は、利用者が利用する機器に、および、当該利用者に固有の回答となる。このため、利用者が使用可能なエージェント装置に対して、利用者本人のアカウントとは別のアカウントが登録されている場合、利用者が普段使用している同等のエージェント装置から指示した場合とは異なる回答が返ってくる等の動作が発生してしまい不便である。 In the first method, an access account to the server is assigned to each agent device. However, in the first method, the content of the response from the server to the instruction from the user is bound to the device installed at a predetermined location (eg, around the agent device). In other words, the content of the reply from the server is a reply unique to the device used by the user and to the user. For this reason, when an account other than the user's own account is registered for the agent device that can be used by the user, and when instructed by the equivalent agent device that the user normally uses, It is inconvenient because operations such as returning different answers occur.

また、第２の方式では、エージェント装置ごとの登録処理自体が大きな手間である。このため、第２の方式は、自宅等での使用であればともかく、不特定多数が使用可能な共用のエージェント装置に適用するには現実的ではない。 Further, in the second method, the registration process itself for each agent device is a great trouble. Therefore, the second method is not realistic to be applied to a shared agent device that can be used by an unspecified number of people, regardless of whether it is used at home or the like.

上記事情に鑑みて、本発明は、エージェントサービスを利用する不特定多数の利用者の各々に、当該利用者に紐づいた回答を返すことを課題とする。 In view of the above circumstances, it is an object of the present invention to return an answer associated with each of the unspecified number of users who use the agent service.

前記課題を解決するため、本発明の音声処理装置は、入力された音声と音声データを相互に変換する音声データ変換部と、エージェントサーバにアクセスするための認証データを無線通信装置から受信する無線通信装置認証部と、前記音声データ変換部で変換された音声データ、および、前記無線通信装置認証部が受信した認証データを前記エージェントサーバに送信し、前記音声データの指示内容に対する回答を前記エージェントサーバから取得するエージェントサービスアクセス部と、を備える、ことを特徴とする。 In order to solve the above problems, the voice processing device of the present invention has a voice data conversion unit that mutually converts input voice and voice data, and a radio that receives authentication data for accessing an agent server from a wireless communication device. The communication device authentication unit, the voice data converted by the voice data conversion unit, and the authentication data received by the wireless communication device authentication unit are transmitted to the agent server, and the agent responds to the instruction content of the voice data. It is characterized by having an agent service access unit acquired from a server.

また、本発明は、音声処理装置とエージェントサーバとを備えるエージェントシステムであって、前記音声処理装置は、入力された音声と音声データを相互に変換する音声データ変換部と、前記エージェントサーバにアクセスするための認証データを無線通信装置から受信する無線通信装置認証部と、前記音声データ変換部で変換された音声データ、および、前記無線通信装置認証部が受信した認証データを前記エージェントサーバに送信し、前記音声データの指示内容に対する回答を前記エージェントサーバから取得するエージェントサービスアクセス部と、を備える、ことを特徴とする。 Further, the present invention is an agent system including a voice processing device and an agent server, and the voice processing device accesses a voice data conversion unit that mutually converts input voice and voice data and the agent server. The wireless communication device authentication unit that receives the authentication data for receiving the authentication data from the wireless communication device, the voice data converted by the voice data conversion unit, and the authentication data received by the wireless communication device authentication unit are transmitted to the agent server. It is characterized by including an agent service access unit that acquires an answer to the instruction content of the voice data from the agent server.

また、本発明は、音声処理装置のコンピュータを、入力された音声と音声データを相互に変換する音声データ変換部、エージェントサーバにアクセスするための認証データを無線通信装置から受信する無線通信装置認証部、前記音声データ変換部で変換された音声データ、および、前記無線通信装置認証部が受信した認証データを前記エージェントサーバに送信し、前記音声データの指示内容に対する回答を前記エージェントサーバから取得するエージェントサービスアクセス部、として機能させるためのプログラムである。 Further, according to the present invention, the computer of the voice processing device receives authentication data for accessing the voice data conversion unit and the agent server that mutually convert the input voice and the voice data from the wireless communication device. The unit, the voice data converted by the voice data conversion unit, and the authentication data received by the wireless communication device authentication unit are transmitted to the agent server, and a response to the instruction content of the voice data is acquired from the agent server. It is a program to function as an agent service access unit.

また、本発明は、音声処理装置における音声処理方法であって、前記音声処理装置は、入力された音声と音声データを相互に変換する音声データ変換ステップと、エージェントサーバにアクセスするための認証データを無線通信装置から受信する無線通信装置認証ステップと、前記音声データ変換ステップで変換された音声データ、および、前記無線通信装置認証ステップで受信した認証データを前記エージェントサーバに送信し、前記音声データの指示内容に対する回答を前記エージェントサーバから取得するエージェントサービスアクセスステップと、を実行する、ことを特徴とする。 Further, the present invention is a voice processing method in a voice processing device, wherein the voice processing device has a voice data conversion step for mutually converting input voice and voice data, and authentication data for accessing an agent server. Is transmitted from the wireless communication device to the agent server, the wireless communication device authentication step, the voice data converted in the voice data conversion step, and the authentication data received in the wireless communication device authentication step are transmitted to the voice data. It is characterized in that the agent service access step of acquiring the answer to the instruction content of the above from the agent server is executed.

本発明によれば、エージェントサービスを利用する不特定多数の利用者の各々に、当該利用者に紐づいた回答を返すことができる。 According to the present invention, it is possible to return an answer associated with each of the unspecified number of users who use the agent service.

エージェントシステムの機能構成図である。It is a functional block diagram of an agent system. エージェントシステムで実行される処理のシーケンス図である。It is a sequence diagram of the process executed in the agent system.

以下、本発明の実施形態を、適宜図面を参照しながら詳細に説明する。
各図は、本発明を十分に理解できる程度に、概略的に示してあるに過ぎない。よって、本発明は、図示例のみに限定されるものではない。また、本実施形態では、本発明と直接的に関連しない構成や周知な構成については、説明を省略する場合がある。なお、各図において、共通する構成要素や同様な構成要素については、同一の符号を付し、それらの重複する説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings as appropriate.
Each figure is only schematically shown to the extent that the present invention can be fully understood. Therefore, the present invention is not limited to the illustrated examples. Further, in the present embodiment, description may be omitted for configurations that are not directly related to the present invention and well-known configurations. In each figure, common components and similar components are designated by the same reference numerals, and duplicate description thereof will be omitted.

≪構成≫
図１に示すように、本実施形態のエージェントシステムは、エージェント装置１００と、エージェントサーバ３００と、アクセスポイント４００とを備える。図１に示すモバイル端末２００，２００−１〜２００−３の各々は、エージェントシステムが提供するエージェントサービスを利用する利用者が所持する端末である。アクセスポイント４００は、例えば、無線ＬＡＮ（Local Area Network）のルータとすることができるが、これに限定されない。アクセスポイント４００は、モバイル端末２００，２００−１〜２００−３が、エージェント装置１００と通信可能に接続するための、および通信ネットワークを介してエージェントサーバ３００と通信可能に接続するための接続点であるが、本発明に必須の構成ではない。図１に示す通信ネットワークは、例えば、インターネットであるが、これに限定されない。 ≪Composition≫
As shown in FIG. 1, the agent system of the present embodiment includes an agent device 100, an agent server 300, and an access point 400. Each of the mobile terminals 200, 200-1 to 200-3 shown in FIG. 1 is a terminal owned by a user who uses the agent service provided by the agent system. The access point 400 can be, for example, a router of a wireless LAN (Local Area Network), but is not limited thereto. The access point 400 is a connection point for the mobile terminals 200, 200-1 to 200-3 to connect to the agent device 100 in a communicable manner and to connect to the agent server 300 in a communicable manner via a communication network. However, it is not an essential configuration for the present invention. The communication network shown in FIG. 1 is, for example, the Internet, but is not limited thereto.

（エージェント装置）
エージェント装置１００は、利用者から入力された音声を処理する音声処理装置である。また、エージェント装置１００は、利用者からの音声の指示に対する回答を音声で出力する音声入出力端末である。エージェント装置１００は、例えば、スマートスピーカや感情認識ヒューマノイドロボットとすることができるが、これらに限定されない。エージェント装置１００は、通信ネットワークを介してエージェントサーバ３００と通信可能に接続されている。エージェント装置１００は、マイク１０１と、スピーカ１０２と、音声データ変換部１０３と、モバイル端末認証部１０４（無線通信装置認証部）と、エージェントサービスアクセス部１０５とを備える。 (Agent device)
The agent device 100 is a voice processing device that processes the voice input from the user. Further, the agent device 100 is a voice input / output terminal that outputs a voice response to a voice instruction from the user. The agent device 100 can be, for example, a smart speaker or an emotion recognition humanoid robot, but is not limited thereto. The agent device 100 is communicably connected to the agent server 300 via a communication network. The agent device 100 includes a microphone 101, a speaker 102, a voice data conversion unit 103, a mobile terminal authentication unit 104 (wireless communication device authentication unit), and an agent service access unit 105.

マイク１０１は、エージェント装置１００の周囲の音声を収集する。
スピーカ１０２は、音声を出力する。 The microphone 101 collects sounds around the agent device 100.
The speaker 102 outputs sound.

音声データ変換部１０３は、マイク１０１から取得した音声を音声データに変換する。また、音声データ変換部１０３は、音声データを音声に変換し、スピーカ１０２から出力させる。なお、音声から音声データへの変換、および、音声データから音声への変換の技術は、周知技術とすることができ、説明を省略する。 The voice data conversion unit 103 converts the voice acquired from the microphone 101 into voice data. Further, the voice data conversion unit 103 converts the voice data into voice and outputs it from the speaker 102. The techniques for converting voice to voice data and converting voice data to voice can be well-known techniques, and the description thereof will be omitted.

モバイル端末認証部１０４は、エージェント装置１００の周辺に存在するモバイル端末２００，２００−１〜２００−３や、通信の相手先となる通信機器（図示せず）とデータの送受信をする。 The mobile terminal authentication unit 104 transmits / receives data to / from mobile terminals 200, 200-1 to 200-3 existing around the agent device 100 and a communication device (not shown) which is a communication partner.

エージェントサービスアクセス部１０５は、エージェント装置１００がエージェントサーバ３００にアクセスするためのインタフェースである。エージェントサービスアクセス部１０５は、音声入力された利用者の指示をエージェントサーバ３００に送ったり、指示に対する回答をエージェントサーバ３００から取得したりすることができる。 The agent service access unit 105 is an interface for the agent device 100 to access the agent server 300. The agent service access unit 105 can send a voice-input user's instruction to the agent server 300, and can acquire a response to the instruction from the agent server 300.

（モバイル端末）
モバイル端末２００は、無線通信装置の例である。モバイル端末２００は、例えば、スマートフォンやタブレット端末とすることができるが、これらに限定されない。モバイル端末２００は、音声認証部２０１と、認証データ管理部２０２とを備える。モバイル端末２００−１〜２００−３は、本発明の特徴部分に関しては、モバイル端末２００と同等の機能を備えるため、モバイル端末２００についてのみ説明する。 (mobile computer)
The mobile terminal 200 is an example of a wireless communication device. The mobile terminal 200 can be, for example, a smartphone or a tablet terminal, but is not limited thereto. The mobile terminal 200 includes a voice authentication unit 201 and an authentication data management unit 202. Since the mobile terminals 200-1 to 200-3 have the same functions as the mobile terminal 200 with respect to the feature portion of the present invention, only the mobile terminal 200 will be described.

音声認証部２０１は、音声データ変換部１０３が変換した音声データからモバイル端末２００の所有者（利用者）か否かを認識する。
認証データ管理部２０２は、モバイル端末２００の利用者個人がエージェントサービスを利用するための認証データを記憶して管理する。 The voice authentication unit 201 recognizes whether or not the mobile terminal 200 is the owner (user) from the voice data converted by the voice data conversion unit 103.
The authentication data management unit 202 stores and manages authentication data for an individual user of the mobile terminal 200 to use the agent service.

（エージェントサーバ）
エージェントサーバ３００は、エージェントサービスを提供するサーバであり、例えば、クラウドサーバとすることができる。エージェントサーバ３００は、エージェントサービスとして、音声データによる指示と利用者の認証データとから、利用者に合った適切な回答を返すことができる。 (Agent server)
The agent server 300 is a server that provides an agent service, and can be, for example, a cloud server. As an agent service, the agent server 300 can return an appropriate answer suitable for the user from the instruction by voice data and the authentication data of the user.

≪処理≫
図２を参照して、エージェントシステムで実行される処理について説明する。事前に、利用者は、自身のモバイル端末２００の音声認証部２０１に自身の声を音声データとして登録している。登録する声の内容は特に限定されない。また、利用者は、モバイル端末２００を持ち歩き、エージェント装置１００の付近に移動する。これにより、モバイル端末２００は、エージェントシステム使用時には、エージェント装置１００と同一のネットワークが占めるエリア（図１に示す通信ネットワークとは異なる）に入る。以降、この同一のネットワークが占めるエリアは、例えば、エージェント装置１００から所定距離内となるエージェント装置１００の周辺とすることができる。この同一のネットワークを、「特定通信エリア」と呼ぶ場合がある。モバイル端末２００が特定通信エリア内に入ると、音声認証部２０１は、エージェント装置１００から取得予定の音声データを受信可能な状態に切り替わる。 ≪Processing≫
The processing executed in the agent system will be described with reference to FIG. In advance, the user has registered his / her voice as voice data in the voice authentication unit 201 of his / her mobile terminal 200. The content of the voice to be registered is not particularly limited. In addition, the user carries the mobile terminal 200 and moves to the vicinity of the agent device 100. As a result, when the agent system is used, the mobile terminal 200 enters an area occupied by the same network as the agent device 100 (different from the communication network shown in FIG. 1). Hereinafter, the area occupied by the same network can be, for example, the periphery of the agent device 100 within a predetermined distance from the agent device 100. This same network may be referred to as a "specific communication area". When the mobile terminal 200 enters the specific communication area, the voice authentication unit 201 switches to a state in which the voice data to be acquired from the agent device 100 can be received.

エージェント装置１００のマイク１０１は、エージェント装置１００の周囲の音声を絶えず受信している。エージェント装置１００のマイク１０１は、一定以上の音量の音声を受信すると、受信した音声を音声データ変換部１０３に出力する（ステップＳ１）。受信した音声は、モバイル端末２００の利用者が喋った指示の音声である。 The microphone 101 of the agent device 100 is constantly receiving audio around the agent device 100. When the microphone 101 of the agent device 100 receives a voice having a volume equal to or higher than a certain level, the microphone 101 outputs the received voice to the voice data conversion unit 103 (step S1). The received voice is the voice of the instruction spoken by the user of the mobile terminal 200.

次に、音声データ変換部１０３は、マイク１０１から出力された音声を音声データに変換し、変換した音声データをモバイル端末認証部１０４に出力する（ステップＳ２）。次に、モバイル端末認証部１０４は、音声データ変換部１０３から入力された音声データを、特定通信エリア内にあるモバイル端末２００の音声認証部２０１に送信する（ステップＳ３）。同時に、モバイル端末認証部１０４は、音声データ変換部１０３から入力された音声データを、特定通信エリア内にある他のモバイル端末２００，２００−１〜２００−３の音声認証部２０１に送信する（図２では図示略）。 Next, the voice data conversion unit 103 converts the voice output from the microphone 101 into voice data, and outputs the converted voice data to the mobile terminal authentication unit 104 (step S2). Next, the mobile terminal authentication unit 104 transmits the voice data input from the voice data conversion unit 103 to the voice authentication unit 201 of the mobile terminal 200 in the specific communication area (step S3). At the same time, the mobile terminal authentication unit 104 transmits the voice data input from the voice data conversion unit 103 to the voice authentication unit 201 of other mobile terminals 200, 200-1 to 200-3 in the specific communication area ( Not shown in FIG. 2).

モバイル端末２００の音声認証部２０１は、モバイル端末認証部１０４から受信した音声データと、事前に登録している、モバイル端末２００の利用者自身の声の音声データとを比較する（ステップＳ４）。音声認証部２０１は、比較の結果、双方の音声データが同一人物のものか否か判定する。つまり、ここでは、ステップＳ１での音声の声紋などがどの利用者のものであるか（どのモバイル端末の利用者であるか）を判別する。本処理では、モバイル端末２００については、同一人物の認証データと判定する。なお、モバイル端末２００−１〜２００−３については、同一人物という判定にならず、図２の処理を終了する。 The voice authentication unit 201 of the mobile terminal 200 compares the voice data received from the mobile terminal authentication unit 104 with the voice data of the user's own voice of the mobile terminal 200 registered in advance (step S4). As a result of comparison, the voice authentication unit 201 determines whether or not both voice data belong to the same person. That is, here, it is determined which user the voiceprint of the voice in step S1 belongs to (which mobile terminal user). In this process, the mobile terminal 200 is determined to be the authentication data of the same person. It should be noted that the mobile terminals 200-1 to 200-3 are not determined to be the same person, and the process of FIG. 2 is terminated.

次に、モバイル端末２００の音声認証部２０１は、エージェントサービスを利用するための認証データを認証データ管理部２０２に要求する（ステップＳ５）。認証データは、例えば、所有者（モバイル端末２００の利用者）のID等、所有者の権限でエージェントサーバ３００へアクセスするためのデータである。また、認証データは、例えば、銀行のモバイルバンキングアプリケーション等で使用される１タイムパスワード発生アルゴリズムによるもの（一時的に有効なデータの例）とすることができる。また、認証データは、例えば、認証データ管理部２０２が要求（ステップＳ５の要求）とともにエージェントサーバ３００へアクセスし、１タイムでトークンの発行を行い、発行されたトークンを渡す等のものとし、１回または短期間の時間のみ有効な方法のもの（一時的に有効なデータの例）とすることができる。 Next, the voice authentication unit 201 of the mobile terminal 200 requests the authentication data management unit 202 for authentication data for using the agent service (step S5). The authentication data is data for accessing the agent server 300 with the authority of the owner, such as the ID of the owner (user of the mobile terminal 200). Further, the authentication data can be, for example, based on a one-time password generation algorithm used in a bank's mobile banking application or the like (an example of temporarily valid data). Further, for the authentication data, for example, the authentication data management unit 202 accesses the agent server 300 together with the request (request in step S5), issues a token in one time, and passes the issued token. It can be a method that is valid only once or for a short period of time (an example of temporarily valid data).

次に、モバイル端末２００の認証データ管理部２０２は、音声認証部２０１に認証データを返す（ステップＳ６）。また、モバイル端末２００の音声認証部２０１は、エージェント装置１００のモバイル端末認証部１０４に認証データを送信する（ステップＳ６）。 Next, the authentication data management unit 202 of the mobile terminal 200 returns the authentication data to the voice authentication unit 201 (step S6). Further, the voice authentication unit 201 of the mobile terminal 200 transmits the authentication data to the mobile terminal authentication unit 104 of the agent device 100 (step S6).

次に、モバイル端末認証部１０４は、ステップＳ６で音声認証部２０１から受信した認証データ、および、ステップＳ２で音声データ変換部１０３から入力された音声データを、エージェントサービスアクセス部１０５に出力する（ステップＳ７）。次に、エージェントサービスアクセス部１０５は、モバイル端末認証部１０４から入力された認証データおよび音声データを、通信ネットワークを介してエージェントサーバ３００に送信する（ステップＳ８）。 Next, the mobile terminal authentication unit 104 outputs the authentication data received from the voice authentication unit 201 in step S6 and the voice data input from the voice data conversion unit 103 in step S2 to the agent service access unit 105 ( Step S7). Next, the agent service access unit 105 transmits the authentication data and voice data input from the mobile terminal authentication unit 104 to the agent server 300 via the communication network (step S8).

次に、エージェントサーバ３００は、エージェント装置１００のエージェントサービスアクセス部１０５から受信した認証データによって、利用者を判別する（ステップＳ９）。なお、認証データによる利用者の判別の技術は、さまざまな技術を用いることができ、認証データの種類に応じた技術を用いることができる。 Next, the agent server 300 determines the user based on the authentication data received from the agent service access unit 105 of the agent device 100 (step S9). As the user identification technology based on the authentication data, various technologies can be used, and the technology according to the type of authentication data can be used.

次に、エージェントサーバ３００は、利用者が正当な利用者であると判別した場合には、エージェント装置１００のエージェントサービスアクセス部１０５から受信した音声データを解析する（ステップＳ１０）。なお、音声データの解析の技術は、さまざまな技術を用いることができ、例えば、ケプストラム分析やＬＰＣ（linear predictive coding）分析の技術を用いることができる。 Next, when the agent server 300 determines that the user is a legitimate user, the agent server 300 analyzes the voice data received from the agent service access unit 105 of the agent device 100 (step S10). Various techniques can be used for the speech data analysis technique, and for example, a cepstrum analysis technique or an LPC (linear predictive coding) analysis technique can be used.

次に、エージェントサーバ３００は、認証データで認識された利用者ごとに、音声データによる指示内容（問い合わせ）の回答を回答データとして作成する（ステップＳ１１）。回答データは、例えば、さまざまなルールやＡＩで処理を行うことで作成される。エージェントサーバ３００は、例えば、認証データに対応づけて利用者に関する情報（例：性別、年齢）や過去の応答を保有しており、利用者ごとに適した回答を行うことができる。 Next, the agent server 300 creates a response of the instruction content (inquiry) by voice data as response data for each user recognized by the authentication data (step S11). Answer data is created, for example, by processing with various rules and AI. For example, the agent server 300 holds information about the user (eg, gender, age) and past responses in association with the authentication data, and can give an appropriate answer for each user.

次に、エージェントサーバ３００は、作成された回答データを、エージェント装置１００のエージェントサービスアクセス部１０５を介して音声データ変換部１０３に送信する（ステップＳ１２）。よって、エージェントサービスアクセス部１０５は、回答データが示す回答をエージェントサーバ３００から取得する。 Next, the agent server 300 transmits the created response data to the voice data conversion unit 103 via the agent service access unit 105 of the agent device 100 (step S12). Therefore, the agent service access unit 105 acquires the answer indicated by the answer data from the agent server 300.

次に、音声データ変換部１０３は、エージェントサーバ３００からの回答データを音声の信号に変換して、スピーカ１０２に出力する（ステップＳ１３）。次に、スピーカ１０２は、音声の信号に変換した、エージェントサーバ３００の回答を音声として出力して利用者に伝える（ステップＳ１４）。
以上で、図２の処理が終了する。 Next, the voice data conversion unit 103 converts the response data from the agent server 300 into a voice signal and outputs it to the speaker 102 (step S13). Next, the speaker 102 outputs the response of the agent server 300 converted into a voice signal as voice and conveys it to the user (step S14).
This completes the process of FIG.

≪まとめ≫
本実施形態によれば、エージェント装置１００に音声を入力した利用者を認証して、エージェントサービスを実行するように制御することができる。これにより、従来で問題視していた、不特定多数の利用者本人のアカウントをエージェント装置１００に登録する処理そのものを省くことができる。
したがって、エージェントサービスを利用する不特定多数の利用者の各々に、当該利用者に紐づいた回答を返すことができる。 ≪Summary≫
According to the present embodiment, it is possible to authenticate the user who input the voice to the agent device 100 and control the agent service to be executed. As a result, it is possible to omit the process itself of registering the accounts of an unspecified number of users in the agent device 100, which has been regarded as a problem in the past.
Therefore, it is possible to return the answer associated with the user to each of the unspecified number of users who use the agent service.

≪その他≫
（ａ）：ＡＩ等による自己学習機能がエージェントサーバ３００に搭載されている場合、他人にエージェント装置１００から指示を出されることで、本人とは異なる志向の学習がされてしまいノイズになる。しかし、本発明によれば、エージェント装置１００が複数あって、各地に点在していた場合に、どのエージェント装置１００からエージェントサーバ３００へのアクセスがあったとしても、話しかけた本人のアカウントでエージェントサーバ３００にアクセスすることができる。このため、自己学習機能にノイズが載らず、本人の指示のみのデータで学習させることができる。
（ｂ）：本実施形態のエージェント装置１００は、ＣＰＵ（Central Processing unit）と、メモリと、ハードディスクなどの記憶手段（記億部）と、ネットワークインタフェースとを有するコンピュータとして構成される。このコンピュータは、ＣＰＵが、メモリ上に読み込んだプログラムを実行することにより、各種機能が実現される。
（ｃ）：例えば、不特定多数のお客様が訪れるショッピングモールの無人のサービスカウンタなどに、本実施形態のエージェント装置１００を配置し、お客様からの音声の問合せを受け付け、相応の回答を音声で出力するというエージェントサービスを実現することができる。 ≪Others≫
(A): When the agent server 300 is equipped with a self-learning function by AI or the like, when an instruction is given from the agent device 100 to another person, learning with a different orientation from the person is performed, resulting in noise. However, according to the present invention, when there are a plurality of agent devices 100 and they are scattered in various places, no matter which agent device 100 accesses the agent server 300, the agent with the account of the person who talked to the agent. You can access the server 300. Therefore, noise does not appear in the self-learning function, and learning can be performed using only the data of the person's instruction.
(B): The agent device 100 of the present embodiment is configured as a computer having a CPU (Central Processing unit), a memory, a storage means (100 million copies) such as a hard disk, and a network interface. Various functions of this computer are realized by the CPU executing a program read in the memory.
(C): For example, the agent device 100 of the present embodiment is arranged at an unmanned service counter of a shopping mall visited by an unspecified number of customers, receives voice inquiries from customers, and outputs a corresponding answer by voice. It is possible to realize the agent service of doing.

１００エージェント装置
１０１マイク
１０２スピーカ
１０３音声データ変換部
１０４モバイル端末認証部（無線通信装置認証部）
１０５エージェントサービスアクセス部
２００モバイル端末
２０１音声認証部
２０２認証データ管理部
３００エージェントサーバ 100 Agent device 101 Microphone 102 Speaker 103 Voice data conversion unit 104 Mobile terminal authentication unit (wireless communication device authentication unit)
105 Agent Service Access Department 200 Mobile Terminal 201 Voice Authentication Department 202 Authentication Data Management Department 300 Agent Server

Claims

A voice data converter that converts the input voice and voice data to each other,
A wireless communication device authentication unit that receives authentication data for accessing the agent server from the wireless communication device,
An agent service that transmits voice data converted by the voice data conversion unit and authentication data received by the wireless communication device authentication unit to the agent server, and obtains a response to an instruction content of the voice data from the agent server. Equipped with an access unit
A voice processing device characterized by the fact that.

The wireless communication device authentication unit transmits the voice data to the wireless communication device, and causes the wireless communication device to authenticate the user who has input the voice.
The voice processing device according to claim 1.

The wireless communication device authentication unit transmits the voice data to a wireless communication device in the vicinity of the voice processing device.
The voice processing device according to claim 2.

The authentication data is temporarily valid data,
The voice processing device according to any one of claims 1 to 3, wherein the voice processing device is characterized.

An agent system equipped with a voice processing device and an agent server.
The voice processing device is
A voice data converter that converts the input voice and voice data to each other,
A wireless communication device authentication unit that receives authentication data for accessing the agent server from the wireless communication device, and
An agent service that transmits voice data converted by the voice data conversion unit and authentication data received by the wireless communication device authentication unit to the agent server, and obtains a response to an instruction content of the voice data from the agent server. Equipped with an access unit
An agent system characterized by that.

The computer of the voice processing device,
Voice data conversion unit that converts input voice and voice data to each other,
Wireless communication device authentication unit that receives authentication data for accessing the agent server from the wireless communication device,
An agent service that transmits the voice data converted by the voice data conversion unit and the authentication data received by the wireless communication device authentication unit to the agent server, and acquires a response to the instruction content of the voice data from the agent server. Access section,
A program to function as.

It is a voice processing method in a voice processing device.
The voice processing device is
A voice data conversion step that converts the input voice and voice data to each other,
A wireless communication device authentication step that receives authentication data for accessing the agent server from the wireless communication device, and
An agent service that transmits the voice data converted in the voice data conversion step and the authentication data received in the wireless communication device authentication step to the agent server, and acquires a response to the instruction content of the voice data from the agent server. Perform access steps and
A voice processing method characterized by that.