JP2021033315A

JP2021033315A - Information processing apparatus and information processing program

Info

Publication number: JP2021033315A
Application number: JP2019148427A
Authority: JP
Inventors: 晃三角; Akira Misumi; 佐藤　英樹; Hideki Sato; 英樹佐藤
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2021-03-01
Anticipated expiration: 2039-08-13
Also published as: JP7326983B2

Abstract

To provide an information processing apparatus which can improve safety in authenticating a user by means of voice of the user, compared with a case of authenticating a user by means of voice of the user who uttered a predetermined password.SOLUTION: Display control means of an information processing apparatus controls one or a plurality of characters in a character string to be displayed, for the character string including a plurality of characters. Receiving means receives voice of a user who uttered the characters displayed by the control means. First authentication means authenticates each voice for the one or a plurality of characters. Second authentication means authenticates the user who uttered the voice by applying a rule specified in advance for a plurality of authentication results of the first authentication means.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置及び情報処理プログラムに関する。 The present invention relates to an information processing device and an information processing program.

特許文献１には、送信元アドレスやパスワードの盗用による不正アクセスを検出して排除できるセキュリティの高い認証システムを提供することを課題とし、接続装置および端末のそれぞれに複数のパスワード（または複数のパスワード生成アルゴリズム）および設定情報を格納し、接続装置と端末との時刻を同期させ、設定情報により時間の経過とともにパスワード（またはパスワード生成アルゴリズム）を変更してパスワードを変化させ、端末は送信するフレームにパスワードを付加して送出し、接続装置は受け取ったフレームのパスワードと接続装置のパスワードとを比較し、パスワードが一致した場合、接続装置は端末の通信を許可し、一致しない場合には通信を不許可としてフレームを廃棄することが開示されている。 Patent Document 1 has an object of providing a highly secure authentication system capable of detecting and eliminating unauthorized access due to theft of a source address or password, and a plurality of passwords (or a plurality of passwords) for each of a connecting device and a terminal. The generation algorithm) and setting information are stored, the time between the connecting device and the terminal is synchronized, and the password (or password generation algorithm) is changed with the passage of time according to the setting information to change the password, and the terminal sends a frame. The password is added and sent, and the connecting device compares the password of the received frame with the password of the connecting device. If the passwords match, the connecting device allows the terminal to communicate, and if they do not match, the communication is not possible. Discarding the frame as a permit is disclosed.

特許文献２には、声紋認証において音声パスワードの変更を容易に行うことを課題とし、声紋認証システムは、パスワードを構成する数字及び／又は文字の要素、並びにこれら要素を繋ぐ要素の声紋データを登録者毎に記録する音声記録手段と、各要素を用いてランダムなパスワードを形成するパスワード形成手段と、パスワード形成手段により形成されたパスワードの音声データを、声紋データを用いて形成する音声データ形成手段と、判定要求者がパスワードを音声入力して得られる音声データと、音声データ形成手段で形成した音声データとの照合に基づいて、判定要求者が登録者であることを判定するパスワード判定手段とを備え、パスワードを発声して得られる音声パスワード自体を登録することに代えて、パスワードを構成する各要素についてその要素の声紋データを登録することが開示されている。 Patent Document 2 has an object of easily changing a voice password in voiceprint authentication, and the voiceprint authentication system registers the elements of numbers and / or characters constituting the password and the voiceprint data of the elements connecting these elements. A voice recording means for recording each person, a password forming means for forming a random password using each element, and a voice data forming means for forming voice data of a password formed by the password forming means using voiceprint data. The password determination means for determining that the determination requester is a registrant based on the collation between the voice data obtained by the determination requester by inputting the password by voice and the voice data formed by the voice data forming means. It is disclosed that, instead of registering the voice password itself obtained by uttering the password, the voice print data of each element constituting the password is registered.

特開２００１−２０９６１４号公報Japanese Unexamined Patent Publication No. 2001-209614 特開２００５−１２８３０７号公報Japanese Unexamined Patent Publication No. 2005-128307

ユーザーの音声を用いて認証を行う場合に、ユーザーが予め定められたパスワードを発声した音声を用いて認証を行う構成では、その音声が録音されてしまった場合には、別ユーザーにより悪用されるという安全上の恐れがある。そこで本発明は、ユーザーの音声を用いて認証を行う場合に、予め定められたパスワードを発声したユーザーの音声を用いて認証を行う場合に比べ、発声したユーザーを認証することに伴う安全性を高めることができる情報処理装置及び情報処理プログラムを提供することを目的としている。 When authenticating using the user's voice, in the configuration where the user authenticates using the voice that utters a predetermined password, if the voice is recorded, it will be abused by another user. There is a safety risk. Therefore, in the present invention, when the authentication is performed using the voice of the user, the security associated with authenticating the user who has spoken is improved as compared with the case where the authentication is performed using the voice of the user who has spoken a predetermined password. It is an object of the present invention to provide an information processing device and an information processing program that can be enhanced.

かかる目的を達成するための本発明の要旨とするところは、次の各項の発明に存する。
請求項１の発明は、複数の文字を含む文字列について、該文字列内の一文字又は複数文字を表示するように制御する表示制御手段と、前記表示制御手段によって表示された文字を発声したユーザーの音声を受け付ける受付手段と、一文字又は複数文字の音声毎に認証する第１認証手段と、前記第１認証手段による複数の認証結果に対して予め定められた規則を適用することによって、前記音声を発したユーザーを認証する第２認証手段を有する情報処理装置である。 The gist of the present invention for achieving such an object lies in the inventions of the following items.
The invention of claim 1 is a display control means for controlling a character string including a plurality of characters to display one or more characters in the character string, and a user who utters the characters displayed by the display control means. By applying a predetermined rule to a reception means for receiving the voice of the above, a first authentication means for authenticating each voice of one character or a plurality of characters, and a plurality of authentication results by the first authentication means. It is an information processing device having a second authentication means for authenticating the user who issued the above.

請求項２の発明は、前記表示制御手段は、前記文字列内の一文字又は複数文字を複数回に分けて表示するように制御し、前記第１認証手段は、前記表示制御手段によって表示された文字毎に認証を行う、請求項１に記載の情報処理装置である。 According to the second aspect of the present invention, the display control means controls to display one character or a plurality of characters in the character string in a plurality of times, and the first authentication means is displayed by the display control means. The information processing device according to claim 1, which authenticates each character.

請求項３の発明は、前記第２認証手段は、予め定められた回数の前記第１認証手段による認証失敗があった場合は、認証失敗とする、請求項２に記載の情報処理装置である。 The invention according to claim 3 is the information processing apparatus according to claim 2, wherein the second authentication means fails to authenticate when the first authentication means fails to authenticate a predetermined number of times. ..

請求項４の発明は、前記表示制御手段は、表示する文字の読み仮名をも表示するように制御する、請求項１に記載の情報処理装置である。 The invention of claim 4 is the information processing apparatus according to claim 1, wherein the display control means controls so as to display the reading kana of the characters to be displayed.

請求項５の発明は、予め定められた時間内に、前記受付手段が音声を受け付けなかった場合、又は、前記第１認証手段による認証が行われなかった場合、前記表示制御手段は、次の文字の表示を行うように制御する、請求項２に記載の情報処理装置である。 According to the invention of claim 5, when the receiving means does not receive the voice within a predetermined time, or when the authentication by the first authentication means is not performed, the display controlling means is described as follows. The information processing apparatus according to claim 2, wherein the information processing device is controlled so as to display characters.

請求項６の発明は、前記表示制御手段が表示対象とする前記文字列を、セキュリティレベルに応じて生成する生成手段をさらに有し、前記表示制御手段は、前記生成手段によって生成された前記文字列内の一文字又は複数文字を表示するように制御する、請求項１に記載の情報処理装置である。 The invention of claim 6 further includes a generation means for generating the character string to be displayed by the display control means according to the security level, and the display control means is the character generated by the generation means. The information processing apparatus according to claim 1, wherein one character or a plurality of characters in a column are controlled to be displayed.

請求項７の発明は、予め定められた一文字又は複数文字について、複数人の予め定められたユーザーによって発声された音声を受け付ける受付手段と、前記一文字又は複数文字毎に予め定められた情報を前記音声に付加したデータを、認証するための学習データとして生成する生成手段と、前記生成手段によって生成された学習データを用いて、音声を用いた認証用の学習を行って１つの学習モデルを生成する学習モデル生成手段を有する情報処理装置である。 The invention of claim 7 is the reception means for receiving voices uttered by a plurality of predetermined users for a predetermined one character or a plurality of characters, and the predetermined information for each one character or the plurality of characters. Using the generation means that generates the data added to the voice as learning data for authentication and the learning data generated by the generation means, learning for authentication using voice is performed to generate one learning model. It is an information processing device having a learning model generating means.

請求項８の発明は、前記第１認証手段は、請求項７に記載の情報処理装置によって生成された学習モデルを用いて、認証を行う、請求項１に記載の情報処理装置である。 The invention according to claim 8 is the information processing apparatus according to claim 1, wherein the first authentication means authenticates using the learning model generated by the information processing apparatus according to claim 7.

請求項９の発明は、予め定められた一文字又は複数文字について、複数人の予め定められたユーザーによって発声された音声を受け付ける受付手段と、予め定められた一文字又は複数文字毎の前記音声を、認証するための学習データとして生成する生成手段と、前記生成手段によって生成された学習データを用いて、音声を用いた認証用の学習を行って一文字又は複数文字毎の学習モデルを生成する学習モデル生成手段を有する情報処理装置である。 The invention of claim 9 comprises a receiving means for receiving voices uttered by a plurality of predetermined users for a predetermined one character or a plurality of characters, and the predetermined voice for each one character or a plurality of characters. A learning model that uses a generation means generated as training data for authentication and learning data generated by the generation means to perform learning for authentication using voice and generate a learning model for each character or a plurality of characters. It is an information processing device having a generation means.

請求項１０の発明は、前記第１認証手段は、請求項９に記載の情報処理装置によって生成された学習モデルであって、前記表示制御手段によって表示された一文字又は複数文字に対応する学習モデルを用いて、認証を行う、請求項１に記載の情報処理装置である。 According to the invention of claim 10, the first authentication means is a learning model generated by the information processing apparatus according to claim 9, and is a learning model corresponding to one character or a plurality of characters displayed by the display control means. The information processing apparatus according to claim 1, wherein authentication is performed using the information processing apparatus according to claim 1.

請求項１１の発明は、コンピュータを、複数の文字を含む文字列について、該文字列内の一文字又は複数文字を表示するように制御する表示制御手段と、前記表示制御手段によって表示された文字を発声したユーザーの音声を受け付ける受付手段と、一文字又は複数文字の音声毎に認証する第１認証手段と、前記第１認証手段による複数の認証結果に対して予め定められた規則を適用することによって、前記音声を発したユーザーを認証する第２認証手段として機能させるための情報処理プログラムである。 The invention of claim 11 is a display control means for controlling a computer to display one character or a plurality of characters in the character string for a character string including a plurality of characters, and a character displayed by the display control means. By applying a predetermined rule to a reception means for receiving the voice of the user who utters a voice, a first authentication means for authenticating each one-character or multiple-character voice, and a plurality of authentication results by the first authentication means. , Is an information processing program for functioning as a second authentication means for authenticating the user who has emitted the voice.

請求項１２の発明は、コンピュータを、予め定められた一文字又は複数文字について、複数人の予め定められたユーザーによって発声された音声を受け付ける受付手段と、前記一文字又は複数文字毎に予め定められた情報を前記音声に付加したデータを、認証するための学習データとして生成する生成手段と、前記生成手段によって生成された学習データを用いて、音声を用いた認証用の学習を行って１つの学習モデルを生成する学習モデル生成手段として機能させるための情報処理プログラムである。 The invention of claim 12 is defined in advance as a receiving means for receiving a voice uttered by a plurality of predetermined users for a predetermined one character or a plurality of characters, and for each of the one character or the plurality of characters. One learning is performed by learning for authentication using voice using a generation means for generating data in which information is added to the voice as learning data for authentication and learning data generated by the generation means. It is an information processing program for functioning as a learning model generation means for generating a model.

請求項１３の発明は、コンピュータを、予め定められた一文字又は複数文字について、複数人の予め定められたユーザーによって発声された音声を受け付ける受付手段と、予め定められた一文字又は複数文字毎の前記音声を、認証するための学習データとして生成する生成手段と、前記生成手段によって生成された学習データを用いて、音声を用いた認証用の学習を行って一文字又は複数文字毎の学習モデルを生成する学習モデル生成手段として機能させるための情報処理プログラムである。 The invention of claim 13 is the above-mentioned reception means for receiving a voice uttered by a plurality of predetermined users for a predetermined one character or a plurality of characters, and the predetermined one character or a plurality of characters for each of the predetermined characters. Using the generation means for generating voice as learning data for authentication and the learning data generated by the generation means, learning for authentication using voice is performed to generate a learning model for each character or a plurality of characters. It is an information processing program for functioning as a learning model generation means.

請求項１の情報処理装置によれば、ユーザーの音声を用いて認証を行う場合に、予め定められたパスワードを発声したユーザーの音声を用いて認証を行う場合に比べ、発声したユーザーを認証することに伴う安全性を高めることができる。 According to the information processing device of claim 1, when the authentication is performed using the user's voice, the uttered user is authenticated as compared with the case where the authentication is performed using the voice of the user who utters a predetermined password. The safety that accompanies it can be improved.

請求項２の情報処理装置によれば、第１認証において、表示された文字毎に認証を行うことができる。 According to the information processing device of claim 2, in the first authentication, authentication can be performed for each displayed character.

請求項３の情報処理装置によれば、第１認証において、認証失敗が予め定められた回数のあった場合は、認証失敗とすることができる。 According to the information processing apparatus of claim 3, if the authentication fails a predetermined number of times in the first authentication, the authentication can be regarded as a failure.

請求項４の情報処理装置によれば、複数の読みがある文字であっても、認証用の読みを統一させることができる。 According to the information processing apparatus of claim 4, even if the characters have a plurality of readings, the readings for authentication can be unified.

請求項５の情報処理装置によれば、予め定められた時間内に音声の受け付けがなかった場合、又は、第１認証による認証が行われなかった場合、次の文字の表示を行うことができる。 According to the information processing device of claim 5, if the voice is not received within a predetermined time, or if the authentication by the first authentication is not performed, the following characters can be displayed. ..

請求項６の情報処理装置によれば、セキュリティレベルに応じた文字列を認証用に生成することができる。 According to the information processing device of claim 6, a character string according to the security level can be generated for authentication.

請求項７の情報処理装置によれば、認証用の文字の読みが類似している場合であっても、１つの学習モデルで認証できるように、その学習モデルを生成することができる。 According to the information processing apparatus of claim 7, even when the readings of the characters for authentication are similar, the learning model can be generated so that one learning model can be used for authentication.

請求項８の情報処理装置によれば、認証用の文字の読みが類似している場合であっても、１つの学習モデルでユーザーを認証できる。 According to the information processing device of claim 8, the user can be authenticated by one learning model even when the readings of the characters for authentication are similar.

請求項９の情報処理装置によれば、一文字又は複数文字毎に対応する学習モデルを生成することができる。 According to the information processing apparatus of claim 9, a learning model corresponding to each one character or a plurality of characters can be generated.

請求項１０の情報処理装置によれば、表示された一文字又は複数文字に対応する学習モデルを用いて、ユーザーの認証を行うことができる。 According to the information processing device of claim 10, the user can be authenticated by using the learning model corresponding to the displayed one character or a plurality of characters.

請求項１１の情報処理プログラムによれば、ユーザーの音声を用いて認証を行う場合に、予め定められたパスワードを発声したユーザーの音声を用いて認証を行う場合に比べ、発声したユーザーを認証することに伴う安全性を高めることができる。 According to the information processing program of claim 11, when authenticating using the voice of the user, the user who utters the password is authenticated as compared with the case where the user who utters the predetermined password is used for authentication. The safety that accompanies it can be improved.

請求項１２の情報処理プログラムによれば、認証用の文字の読みが類似している場合であっても、１つの学習モデルで認証できるように、その学習モデルを生成することができる。 According to the information processing program of claim 12, even when the readings of the characters for authentication are similar, the learning model can be generated so that the authentication can be performed by one learning model.

請求項１３の情報処理プログラムによれば、一文字又は複数文字毎に対応する学習モデルを生成することができる。 According to the information processing program of claim 13, a learning model corresponding to each one character or a plurality of characters can be generated.

第１の実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module configuration diagram about the configuration example of the first embodiment. 本実施の形態を利用したシステム構成例を示す説明図である。It is explanatory drawing which shows the system configuration example using this embodiment. 第１の実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the processing example by 1st Embodiment. 第１の実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the processing example by 1st Embodiment. 第１の実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the processing example by 1st Embodiment. 第１の実施の形態による処理例を示すフローチャートである。It is a flowchart which shows the processing example by 1st Embodiment. 第２の実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module configuration diagram about the configuration example of the second embodiment. 第２の実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the processing example by 2nd Embodiment. 第２の実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the processing example by 2nd Embodiment. 第２の実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the processing example by 2nd Embodiment. 第３の実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module configuration diagram about the configuration example of the third embodiment. 第３の実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the processing example by 3rd Embodiment. 第３の実施の形態による処理例を示す説明図である。It is explanatory drawing which shows the processing example by 3rd Embodiment. 本実施の形態を実現するコンピュータのハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware configuration example of the computer which realizes this embodiment.

以下、図面に基づき本発明を実現するにあたっての好適な各種の実施の形態の例を説明する。
＜第１の実施の形態＞
図１は、第１の実施の形態の構成例についての概念的なモジュール構成図を示している。
なお、モジュールとは、一般的に論理的に分離可能なソフトウェア（「ソフトウェア」の解釈として、コンピュータ・プログラムを含む）、ハードウェア等の部品を指す。したがって、本実施の形態におけるモジュールはコンピュータ・プログラムにおけるモジュールのことだけでなく、ハードウェア構成におけるモジュールも指す。それゆえ、本実施の形態は、それらのモジュールとして機能させるためのコンピュータ・プログラム（例えば、コンピュータにそれぞれの手順を実行させるためのプログラム、コンピュータをそれぞれの手段として機能させるためのプログラム、コンピュータにそれぞれの機能を実現させるためのプログラム）、システム及び方法の説明をも兼ねている。ただし、説明の都合上、「記憶する」、「記憶させる」、これらと同等の文言を用いるが、これらの文言は、実施の形態がコンピュータ・プログラムの場合は、記憶装置に記憶させる、又は記憶装置に記憶させるように制御するという意味である。また、モジュールは機能に一対一に対応していてもよいが、実装においては、１モジュールを１プログラムで構成してもよいし、複数モジュールを１プログラムで構成してもよく、逆に１モジュールを複数プログラムで構成してもよい。また、複数モジュールは１コンピュータによって実行されてもよいし、分散又は並列環境におけるコンピュータによって１モジュールが複数コンピュータで実行されてもよい。なお、１つのモジュールに他のモジュールが含まれていてもよい。また、以下、「接続」とは物理的な接続の他、論理的な接続（例えば、データの授受、指示、データ間の参照関係、ログイン等）の場合にも用いる。「予め定められた」とは、対象としている処理の前に定まっていることをいい、本実施の形態による処理が始まる前はもちろんのこと、本実施の形態による処理が始まった後であっても、対象としている処理の前であれば、そのときの状況・状態にしたがって、又はそれまでの状況・状態にしたがって定まることの意を含めて用いる。「予め定められた値」が複数ある場合は、それぞれ異なった値であってもよいし、２以上の値（「２以上の値」には、もちろんのことながら、全ての値も含む）が同じであってもよい。また、「Ａである場合、Ｂをする」という記載は、「Ａであるか否かを判断し、Ａであると判断した場合はＢをする」の意味で用いる。ただし、Ａであるか否かの判断が不要である場合を除く。また、「Ａ、Ｂ、Ｃ」等のように事物を列挙した場合は、断りがない限り例示列挙であり、その１つのみを選んでいる場合（例えば、Ａのみ）を含む。
また、システム又は装置とは、複数のコンピュータ、ハードウェア、装置等がネットワーク（「ネットワーク」には、一対一対応の通信接続を含む）等の通信手段で接続されて構成されるほか、１つのコンピュータ、ハードウェア、装置等によって実現される場合も含まれる。「装置」と「システム」とは、互いに同義の用語として用いる。もちろんのことながら、「システム」には、人為的な取り決めである社会的な「仕組み」（つまり、社会システム）にすぎないものは含まない。
また、各モジュールによる処理毎に又はモジュール内で複数の処理を行う場合はその処理毎に、対象となる情報を記憶装置から読み込み、その処理を行った後に、処理結果を記憶装置に書き出すものである。したがって、処理前の記憶装置からの読み込み、処理後の記憶装置への書き出しについては、説明を省略する場合がある。なお、ここでの記憶装置としては、ハードディスクドライブ、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙの略）、外部記憶媒体、通信回線を介した記憶装置、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔの略）内のレジスタ等を含んでいてもよい。 Hereinafter, examples of various suitable embodiments for realizing the present invention will be described with reference to the drawings.
<First Embodiment>
FIG. 1 shows a conceptual module configuration diagram for a configuration example of the first embodiment.
Note that a module generally refers to parts such as software (including a computer program as an interpretation of "software") and hardware that are logically separable. Therefore, the module in this embodiment refers not only to the module in the computer program but also to the module in the hardware configuration. Therefore, in this embodiment, a computer program for functioning as those modules (for example, a program for causing the computer to perform each procedure, a program for causing the computer to function as each means, and a computer for each of them. It also serves as an explanation of the program), system, and method for realizing the functions of. However, for convenience of explanation, words equivalent to "remember" and "remember" are used, but these words are stored in a storage device or stored when the embodiment is a computer program. It means that it is controlled so that it is stored in the device. Further, the modules may have a one-to-one correspondence with the functions, but in the implementation, one module may be configured by one program, a plurality of modules may be configured by one program, and conversely, one module may be configured. May be composed of a plurality of programs. Further, the plurality of modules may be executed by one computer, or one module may be executed by a plurality of computers by a computer in a distributed or parallel environment. In addition, one module may include another module. In addition, hereinafter, "connection" is used not only for physical connection but also for logical connection (for example, data transfer, instruction, reference relationship between data, login, etc.). "Predetermined" means that it is determined before the target process, not only before the process according to the present embodiment starts, but also after the process according to the present embodiment starts. However, if it is before the target process, it is used with the intention that it is determined according to the situation / state at that time or according to the situation / state up to that point. When there are a plurality of "predetermined values", they may be different values, and two or more values ("two or more values" include, of course, all values). It may be the same. Further, the description "if A, do B" is used to mean "determine whether or not it is A, and if it is determined to be A, do B". However, this excludes cases where it is not necessary to determine whether or not it is A. Further, when a thing is listed such as "A, B, C", it is an example list unless otherwise specified, and includes a case where only one of them is selected (for example, only A).
In addition, a system or device is configured by connecting a plurality of computers, hardware, devices, etc. by communication means such as a network (the "network" includes a one-to-one correspondence communication connection), and one. It also includes cases where it is realized by computers, hardware, devices, and the like. "Device" and "system" are used as synonymous terms. Of course, the "system" does not include anything that is nothing more than a social "mechanism" (that is, a social system) that is an artificial arrangement.
In addition, for each process by each module or when multiple processes are performed in the module, the target information is read from the storage device, and after the processes are performed, the process results are written to the storage device. is there. Therefore, the description of reading from the storage device before processing and writing to the storage device after processing may be omitted. The storage device here includes a hard disk drive, a RAM (abbreviation of Random Access Memory), an external storage medium, a storage device via a communication line, a register in a CPU (abbreviation of Central Processing Unit), and the like. May be good.

第１の実施の形態である情報処理装置１００は、ユーザーの音声を用いた認証機能を有しており、図１の例に示すように、文字列生成モジュール１０５、表示制御モジュール１１０、表示装置１１５、音声受付モジュール１２０、ユーザー認証モジュール１２５を有している。 The information processing device 100 according to the first embodiment has an authentication function using a user's voice, and as shown in the example of FIG. 1, the character string generation module 105, the display control module 110, and the display device. It has 115, a voice reception module 120, and a user authentication module 125.

文字列生成モジュール１０５は、表示制御モジュール１１０と接続されている。文字列生成モジュール１０５は、表示制御モジュール１１０が表示対象とする文字列を、セキュリティレベルに応じて生成する。例えば、セキュリティレベルに応じて、文字列の長さを決定してもよい。具体的には、セキュリティレベルが高い場合は、長い文字列として、セキュリティレベルが低い場合は、短い文字列としてもよい。
なお、そのセキュリティレベルは、ユーザーが用いる機能に応じて決定してもよい。ここで「ユーザーが用いる機能」は、認証前にユーザーが用いる機能を選択させればよい。例えば、複写の機能を用いる場合は、３文字の文字列を生成し、ファックス受信の機能を用いる場合は、６文字の文字列を生成するようにしてもよい。また、そのセキュリティレベルは、ユーザーが指定する処理量に応じて決定してもよい。ここで「ユーザーが指定する処理量」は、認証前に、ユーザーに処理量を入力させればよい。例えば、１枚の複写である場合は、３文字の文字列を生成し、１０枚の複写である場合は、６文字の文字列を生成するようにしてもよい。
文字列生成モジュール１０５が生成する文字列は、数字列であってもよいし、英字列であってもよいし、カタカナ等の文字列であってもよい。例えば、数字列である場合は、０〜９の文字のうち、文字列生成モジュール１０５は、セキュリティレベルに応じた文字数の数字をランダムに選択してもよい。ランダムに選択しているので、同じ文字列が表示される可能性は低く、以前に認証された音声の録音を使用しても認証させることは困難である。また、文字列生成モジュール１０５は過去に使用した文字列を記憶しておき、その過去に使用した文字列以外の文字列を生成するようにしてもよい。 The character string generation module 105 is connected to the display control module 110. The character string generation module 105 generates a character string to be displayed by the display control module 110 according to the security level. For example, the length of the character string may be determined according to the security level. Specifically, if the security level is high, it may be a long character string, and if the security level is low, it may be a short character string.
The security level may be determined according to the function used by the user. Here, as the "function used by the user", the function used by the user may be selected before authentication. For example, when the copy function is used, a three-character character string may be generated, and when the fax reception function is used, a six-character character string may be generated. Further, the security level may be determined according to the processing amount specified by the user. Here, as for the "processing amount specified by the user", the processing amount may be input by the user before authentication. For example, in the case of one copy, a character string of 3 characters may be generated, and in the case of 10 copies, a character string of 6 characters may be generated.
The character string generated by the character string generation module 105 may be a number string, an alphabetic character string, or a character string such as katakana. For example, in the case of a number string, the character string generation module 105 may randomly select a number of characters according to the security level from the characters 0 to 9. Since the selection is random, it is unlikely that the same string will be displayed, and it is difficult to authenticate using a previously authenticated voice recording. Further, the character string generation module 105 may store a character string used in the past and generate a character string other than the character string used in the past.

表示制御モジュール１１０は、文字列生成モジュール１０５、表示装置１１５、ユーザー認証モジュール１２５と接続されている。表示制御モジュール１１０は、複数の文字を含む文字列について、その文字列内の一文字又は複数文字を表示するように制御する。
また、表示制御モジュール１１０は、文字列内の一文字又は複数文字を複数回に分けて、表示装置１１５に表示させるよう制御するようにしてもよい。ここで、「複数回に分けて」とは、次の一文字又は複数文字を表示する場合には、前回に表示していた一文字又は複数文字を非表示にすることを含む。つまり、表示装置１１５に表示する「一文字又は複数文字」は、１つの「一文字又は複数文字」である。
また、表示制御モジュール１１０は、表示する文字の読み仮名をも、表示装置１１５に表示させるよう制御するようにしてもよい。表示する文字に複数の読みがある場合であっても、ユーザーによる読みを統一させるようにするためである。例えば、「１」と表示した場合、「イチ」、「ヒトツ」等のように複数の読みがあるが、「イチ」と表示させたい場合は、「１」とともに「イチ」と表示させる。
また、予め定められた時間内に、音声受付モジュール１２０が音声を受け付けなかった場合、又は、認証（Ａ）モジュール１３０による認証が行われなかった場合、表示制御モジュール１１０は、次の文字を、表示装置１１５に表示させるよう制御するようにしてもよい。
また、表示制御モジュール１１０は、文字列生成モジュール１０５によって生成された文字列内の一文字又は複数文字を、表示装置１１５に表示するよう制御するようにしてもよい。以下、例示する場合は、文字列生成モジュール１０５によって生成された文字列内の一文字を順に、表示装置１１５に表示するように制御する場合について説明する。 The display control module 110 is connected to the character string generation module 105, the display device 115, and the user authentication module 125. The display control module 110 controls a character string including a plurality of characters to display one character or a plurality of characters in the character string.
Further, the display control module 110 may control the display device 115 to display one character or a plurality of characters in the character string in a plurality of times. Here, "divided into a plurality of times" includes hiding the previously displayed one character or a plurality of characters when displaying the next one character or a plurality of characters. That is, the "one character or a plurality of characters" displayed on the display device 115 is one "one character or a plurality of characters".
Further, the display control module 110 may also control the reading kana of the characters to be displayed so as to be displayed on the display device 115. This is to ensure that the readings by the user are unified even if the characters to be displayed have multiple readings. For example, when "1" is displayed, there are a plurality of readings such as "Ichi" and "Hitotsu", but when it is desired to display "Ichi", it is displayed as "Ichi" together with "1".
If the voice reception module 120 does not receive voice within a predetermined time, or if the authentication (A) module 130 does not perform authentication, the display control module 110 displays the following characters. It may be controlled so that it may be displayed on the display device 115.
Further, the display control module 110 may control the display device 115 to display one or more characters in the character string generated by the character string generation module 105. Hereinafter, in the example, a case where one character in the character string generated by the character string generation module 105 is controlled to be displayed on the display device 115 in order will be described.

表示装置１１５は、表示制御モジュール１１０と接続されている。表示装置１１５として、例えば、液晶ディスプレイ、有機ＥＬディスプレイ等の表示装置であり、さらに、ユーザーの操作を受け付けるタッチスクリーン等であってもよい。表示装置１１５は、表示制御モジュール１１０による制御にしたがって、一文字又は複数文字を表示する。この表示された一文字又は複数文字は、ユーザーによって読み上げられて、その音声がユーザーの認証に用いられる。 The display device 115 is connected to the display control module 110. The display device 115 may be, for example, a display device such as a liquid crystal display or an organic EL display, and may be a touch screen or the like that accepts a user's operation. The display device 115 displays one character or a plurality of characters according to the control by the display control module 110. This displayed single character or plural characters is read aloud by the user, and the voice is used for user authentication.

音声受付モジュール１２０は、ユーザー認証モジュール１２５と接続されている。音声受付モジュール１２０は、例えば、マイク等によってユーザーの音声を受け付ける。そして、音声受付モジュール１２０は、表示制御モジュール１１０の制御によって、表示装置１１５に表示された文字を発声したユーザーの音声を受け付ける。音声受付モジュール１２０が受け付けた音声は、ユーザーを認証するための声紋データとなる。 The voice reception module 120 is connected to the user authentication module 125. The voice reception module 120 receives the user's voice by, for example, a microphone or the like. Then, the voice reception module 120 receives the voice of the user who utters the characters displayed on the display device 115 under the control of the display control module 110. The voice received by the voice reception module 120 becomes voiceprint data for authenticating the user.

ユーザー認証モジュール１２５は、認証（Ａ）モジュール１３０、認証（Ｂ）モジュール１３５を有しており、表示制御モジュール１１０、音声受付モジュール１２０と接続されている。ユーザー認証モジュール１２５は、ユーザーを認証する。例えば、ある装置を用いることができるユーザーを認証すること等がある。具体的には、図２の例に示す画像処理装置２００のように、情報処理装置１００を内蔵しており、画像処理装置２００を利用することができるユーザーを認証する。いわゆるログインのための認証が該当する。
ここでの認証は、音声を用いたユーザーの認証であり、いわゆる声紋認証である。そして、音声認識も行っている。 The user authentication module 125 has an authentication (A) module 130 and an authentication (B) module 135, and is connected to the display control module 110 and the voice reception module 120. The user authentication module 125 authenticates the user. For example, it may authenticate a user who can use a certain device. Specifically, like the image processing device 200 shown in the example of FIG. 2, a user who has a built-in information processing device 100 and can use the image processing device 200 is authenticated. This applies to so-called login authentication.
The authentication here is the authentication of the user using voice, and is the so-called voiceprint authentication. And it also performs voice recognition.

認証（Ａ）モジュール１３０は、音声受付モジュール１２０によって受け付けられた一文字又は複数文字の音声毎に認証する。
また、認証（Ａ）モジュール１３０は、表示制御モジュール１１０によって表示された文字毎に認証を行うようにしてもよい。具体的には、認証（Ａ）モジュール１３０は、一文字又は複数文字の音声毎に、その音声の認識を行う。その音声の認識結果が、表示制御モジュール１１０によって表示するように制御された「一文字又は複数文字」、つまり、表示装置１１５によって表示されている「一文字又は複数文字」と同じであるか否かを判断する。同じであった場合に、声紋認証を行う。認証（Ａ）モジュール１３０による認証失敗として、表示装置１１５によって表示されている「一文字又は複数文字」が異なる場合、声紋認証ができなかった場合、声紋認証はできたが、前回の声紋認証とは異なるユーザーとなった場合が該当する。なお、ここでの「声紋認証できた」とは、予め定められたユーザーの音声であることである。また、「前回の声紋認証」とは、１回のユーザー認証にあたって、表示装置１１５によって表示されている「一文字又は複数文字」が複数回ある場合における前回である。もちろんのことながら、１回のユーザー認証にあたって、最初の声紋認証においては、前回の声紋認証はない。もちろんのことながら、１回のユーザー認証にあたって、表示装置１１５によって表示されている「一文字又は複数文字」が複数回ある場合、全回の声紋認証の結果が同じユーザーの認証とならないと、ユーザー認証モジュール１２５による認証は成功しない。 Authentication (A) The module 130 authenticates each one-character or plural-character voice received by the voice reception module 120.
Further, the authentication (A) module 130 may perform authentication for each character displayed by the display control module 110. Specifically, the authentication (A) module 130 recognizes the voice for each voice of one character or a plurality of characters. Whether or not the voice recognition result is the same as the "single character or multiple characters" controlled to be displayed by the display control module 110, that is, the "single character or multiple characters" displayed by the display device 115. to decide. If they are the same, voiceprint authentication is performed. Authentication (A) As an authentication failure by the module 130, if the "single character or multiple characters" displayed by the display device 115 is different, or if voiceprint authentication cannot be performed, voiceprint authentication was possible, but what is the previous voiceprint authentication? This applies when you become a different user. In addition, "voiceprint authentication was possible" here means that the voice of a predetermined user is used. Further, the "previous voiceprint authentication" is the previous time when the "one character or a plurality of characters" displayed by the display device 115 is present a plurality of times in one user authentication. Of course, in one user authentication, there is no previous voiceprint authentication in the first voiceprint authentication. Of course, in one user authentication, if there are multiple "single character or multiple characters" displayed by the display device 115, user authentication will be performed if the result of all voiceprint authentications is not the same user authentication. Authentication by module 125 is unsuccessful.

認証（Ａ）モジュール１３０は、図７の例を用いて後述する機械学習装置（Ａ）７００によって生成された学習モデルを用いて、認証を行うようにしてもよい。
また、認証（Ａ）モジュール１３０は、図１１の例を用いて後述する機械学習装置（Ｂ）１１００によって生成された学習モデルであって、表示制御モジュール１１１０によって表示された一文字又は複数文字に対応する学習モデルを用いて、認証を行うようにしてもよい。 The authentication (A) module 130 may perform authentication using the learning model generated by the machine learning device (A) 700, which will be described later using the example of FIG. 7.
Further, the authentication (A) module 130 is a learning model generated by the machine learning device (B) 1100 described later using the example of FIG. 11, and corresponds to one character or a plurality of characters displayed by the display control module 1110. Authentication may be performed using the learning model to be used.

認証（Ｂ）モジュール１３５は、認証（Ａ）モジュール１３０による複数の認証結果に対して予め定められた規則を適用することによって、音声を発したユーザーを認証する。
また、認証（Ｂ）モジュール１３５は、予め定められた回数の認証（Ａ）モジュール１３０による認証失敗があった場合は、認証失敗とするようにしてもよい。「予め定められた規則」の一例として、「予め定められた回数の認証（Ａ）モジュール１３０による認証失敗があった場合は、認証失敗とする」ことが定められている。この他に、「予め定められた規則」として、「認証（Ａ）モジュール１３０による認証失敗が続けてＸ回以上ある場合は、認証失敗とする」等としてもよい。 The authentication (B) module 135 authenticates the user who utters the voice by applying a predetermined rule to a plurality of authentication results by the authentication (A) module 130.
Further, the authentication (B) module 135 may be set to the authentication failure when the authentication failure by the authentication (A) module 130 has been performed a predetermined number of times. As an example of the "predetermined rule", it is stipulated that "if there is an authentication failure by the authentication (A) module 130 a predetermined number of times, the authentication fails". In addition to this, as a "predetermined rule", "if the authentication failure by the authentication (A) module 130 continues X times or more, the authentication fails" or the like.

図２は、本実施の形態を利用したシステム構成例を示す説明図である。
図２（ａ）の例に示す画像処理装置２００は、情報処理装置１００を有している。ユーザーは、画像処理装置２００の複写、プリント等の機能を利用するために、情報処理装置１００によって認証される必要がある。ユーザーは、情報処理装置１００によって表示された文字を読み上げ、その音声をマイクで受け付けて認証を行う。つまり、ユーザーは、予め定められたパスワードを発声するのではなく、その場で表示された文字を読み上げることによって認証される。 FIG. 2 is an explanatory diagram showing an example of a system configuration using the present embodiment.
The image processing device 200 shown in the example of FIG. 2A has an information processing device 100. The user needs to be authenticated by the information processing device 100 in order to use the functions such as copying and printing of the image processing device 200. The user reads out the characters displayed by the information processing device 100, receives the voice with the microphone, and authenticates. That is, the user is authenticated by reading out the characters displayed on the spot instead of uttering a predetermined password.

図２（ｂ）の例では、画像処理装置２００内の情報処理装置１００、機械学習装置（Ａ）７００、機械学習装置（Ｂ）１１００は、通信回線２９０を介してそれぞれ接続されている。通信回線２９０は、無線、有線、これらの組み合わせであってもよく、例えば、通信インフラとしてのインターネット、イントラネット等であってもよい。
機械学習装置（Ａ）７００、機械学習装置（Ｂ）１１００は、ユーザーの音声の機械学習を行って、情報処理装置１００の認証（Ａ）モジュール１３０の機能を発揮する学習モデルを生成する。機械学習は、ニューラルネットワークをつくる「学習フェーズ」と、できあがったニューラルネットワークを使って正解を出す「予測フェーズ」の２つに分かれるが、学習フェーズは機械学習装置（Ａ）７００又は機械学習装置（Ｂ）１１００で行われ、予測フェーズは情報処理装置１００の認証（Ａ）モジュール１３０で行われる。つまり、機械学習装置（Ａ）７００又は機械学習装置（Ｂ）１１００による学習によって生成された学習モデルを、画像処理装置２００の情報処理装置１００に送信し、情報処理装置１００は、その学習モデルを認証（Ａ）モジュール１３０として用いる。 In the example of FIG. 2B, the information processing device 100, the machine learning device (A) 700, and the machine learning device (B) 1100 in the image processing device 200 are connected via a communication line 290, respectively. The communication line 290 may be wireless, wired, or a combination thereof, and may be, for example, the Internet as a communication infrastructure, an intranet, or the like.
The machine learning device (A) 700 and the machine learning device (B) 1100 perform machine learning of the user's voice to generate a learning model that exerts the function of the authentication (A) module 130 of the information processing device 100. Machine learning is divided into two phases: a "learning phase" in which a neural network is created and a "prediction phase" in which a correct answer is obtained using the completed neural network. The learning phase is a machine learning device (A) 700 or a machine learning device ( B) It is performed in 1100, and the prediction phase is performed in the authentication (A) module 130 of the information processing apparatus 100. That is, the learning model generated by the learning by the machine learning device (A) 700 or the machine learning device (B) 1100 is transmitted to the information processing device 100 of the image processing device 200, and the information processing device 100 transmits the learning model. Authentication (A) Used as module 130.

図３は、第１の実施の形態による処理例を示すフローチャートである。
以下に示す例では、「複数の文字を含む文字列」をパスワードと称する。パスワードの一例として、複数の数字によって構成されている場合を示す。また、表示装置１１５に表示する「文字列内の一文字又は複数文字」として、そのパスワードの数字を１桁ずつ順に表示する例を示す。 FIG. 3 is a flowchart showing a processing example according to the first embodiment.
In the example shown below, a "character string containing a plurality of characters" is referred to as a password. As an example of a password, a case where it is composed of a plurality of numbers is shown. Further, an example is shown in which the numbers of the password are displayed in order one digit at a time as "one character or a plurality of characters in the character string" to be displayed on the display device 115.

ステップＳ３０２では、パスワードを生成する。例えば、予め定められた桁数の乱数を用いて、パスワードを生成する。桁数は、任意に設定することができる。例えば、４桁等としてもよい。なお、乱数には疑似乱数を含めてもよい（以下、同様）。
ステップＳ３０４では、ユーザー認証画面にパスワードを１桁表示する。 In step S302, a password is generated. For example, a password is generated using a random number having a predetermined number of digits. The number of digits can be set arbitrarily. For example, it may be 4 digits or the like. Pseudo-random numbers may be included in the random numbers (the same applies hereinafter).
In step S304, a single digit password is displayed on the user authentication screen.

ステップＳ３０６では、音声を受け付ける。ユーザーは、ユーザー認証画面に表示されている１桁の数字を読み上げる。
ステップＳ３０８では、タイムアウト時間が経過したか否かを判断し、経過した場合はステップＳ３０４へ戻り、それ以外の場合はステップＳ３１０へ進む。パスワードを構成する１桁の数字を表示した時からの時間を計時し、予め定められた時間を過ぎた場合を、タイムアウト時間が経過したと判断する。タイムアウト時間は、任意に設定することができる。例えば、２秒等としてもよい。 In step S306, the voice is received. The user reads out the one-digit number displayed on the user authentication screen.
In step S308, it is determined whether or not the timeout time has elapsed, and if it has elapsed, the process returns to step S304, and if not, the process proceeds to step S310. The time from the time when the one-digit number constituting the password is displayed is counted, and when the predetermined time has passed, it is determined that the timeout time has passed. The timeout time can be set arbitrarily. For example, it may be 2 seconds or the like.

ステップＳ３１０では、入力音声を判定し、ＯＫの場合はステップＳ３１２へ進み、ＮＧの場合はステップＳ３１６へ進む。前述したように、音声認識結果がユーザー認証画面に表示している数字と合致し、今回の認証結果のユーザーは前回の認証結果のユーザーと合致している場合がステップＳ３１２へ進む。
ステップＳ３１２では、認証条件に合致するか否かを判断し、合致する場合はステップＳ３１４へ進み、それ以外の場合はステップＳ３０４へ戻る。例えば、認証条件として、ステップＳ３０２で生成されたパスワードのうち予め定められた文字数以上で、ステップＳ３１０の判断でＯＫとなっていることを認証条件としてもよい。その一例として、ステップＳ３０２で生成されたパスワードの全部の数字で、ステップＳ３１０の判断でＯＫとなっていることを認証条件としてもよい。 In step S310, the input voice is determined, and if it is OK, the process proceeds to step S312, and if it is NG, the process proceeds to step S316. As described above, if the voice recognition result matches the number displayed on the user authentication screen and the user of the current authentication result matches the user of the previous authentication result, the process proceeds to step S312.
In step S312, it is determined whether or not the authentication conditions are met, and if they match, the process proceeds to step S314, and if not, the process returns to step S304. For example, as an authentication condition, it may be an authentication condition that the password generated in step S302 has a predetermined number of characters or more and is OK in the judgment of step S310. As an example, the authentication condition may be that all the numbers of the password generated in step S302 are OK in the judgment of step S310.

ステップＳ３１４では、認証成功とする。つまり、ログインが成功し、このユーザーは画像処理装置２００を使用することができるようになる。
ステップＳ３１６では、失敗可能回数に達したか否かを判断し、達した場合はステップＳ３１８へ進み、それ以外の場合はステップＳ３０４へ戻る。前述したように、失敗可能回数は、このフローチャートにおける処理における失敗の回数であってもよいし、失敗が連続した回数であってもよい。失敗可能回数は、任意に設定することができる。例えば、３回等としてもよい。
ステップＳ３１８では、認証失敗とする。つまり、ログインは不成功であり、このユーザーは画像処理装置２００を使用することができない。 In step S314, the authentication is successful. That is, the login is successful, and this user can use the image processing device 200.
In step S316, it is determined whether or not the number of possible failures has been reached, and if it is reached, the process proceeds to step S318, and if not, the process returns to step S304. As described above, the number of possible failures may be the number of failures in the processing in this flowchart, or the number of consecutive failures. The number of possible failures can be set arbitrarily. For example, it may be 3 times or the like.
In step S318, the authentication fails. That is, the login is unsuccessful and the user cannot use the image processing device 200.

図４は、第１の実施の形態による処理例を示す説明図である。
ステップＳ３０４での表示例を示すものである。画像処理装置２００に備え付けられている液晶ディスプレイ、有機ＥＬディスプレイ等の表示装置に表示する。
ユーザー認証画面４００には、パスワード表示領域４１０、残時間表示領域４２０を表示する。
パスワード表示領域４１０には、パスワード文字表示領域４１２、読み表示領域４１４を表示する。パスワード文字表示領域４１２には、パスワードの１桁の数字を表示する。読み表示領域４１４には、その数字の読みを表示する。数字「４」について、ユーザーによる発声を「ヨン」に統一させるようにしている。
残時間表示領域４２０は、パスワード文字表示領域４１２に数字を表示させた時からの経過時間を示すものである。この例では、時間が経過するとバーが左から右に伸びていき、右端に達した場合がステップＳ３０８で、タイムアウト時間が経過したことを示しており、ユーザーは経過時間がわかる。 FIG. 4 is an explanatory diagram showing a processing example according to the first embodiment.
A display example in step S304 is shown. It is displayed on a display device such as a liquid crystal display or an organic EL display provided in the image processing device 200.
The password display area 410 and the remaining time display area 420 are displayed on the user authentication screen 400.
The password character display area 412 and the reading display area 414 are displayed in the password display area 410. A one-digit number of the password is displayed in the password character display area 412. The reading of the number is displayed in the reading display area 414. Regarding the number "4", the utterance by the user is unified to "Yon".
The remaining time display area 420 indicates the elapsed time from the time when the number is displayed in the password character display area 412. In this example, the bar extends from left to right as time elapses, and when it reaches the right end, step S308 indicates that the timeout time has elapsed, and the user knows the elapsed time.

図５は、第１の実施の形態による処理例を示す説明図である。
ユーザー認証画面４００における表示遷移の例を示している。この例では、４桁以上で声紋認証ができ、ステップＳ３１６における失敗可能回数を３回としている。なお、各画面の切り替えは、規定時間毎に切り替えてもよいし、その画面に表示している１桁の数字における認証の成功、又は、失敗が判明した時点で切り替えるようにしてもよい。 FIG. 5 is an explanatory diagram showing a processing example according to the first embodiment.
An example of the display transition on the user authentication screen 400 is shown. In this example, voiceprint authentication can be performed with four or more digits, and the number of possible failures in step S316 is set to three. The screens may be switched at specified time intervals, or may be switched when the success or failure of authentication in the one-digit number displayed on the screen is found.

図５（ａ）の例では、パスワード「４１５６」でユーザー認証ができた例を示している。
ユーザー認証画面４００−ａ１のパスワード文字表示領域４１２に「４」、読み表示領域４１４に「ヨン」と表示する。声紋認証ができたので、ユーザー認証画面４００−ａ２を表示する。
ユーザー認証画面４００−ａ２のパスワード文字表示領域４１２に「１」、読み表示領域４１４に「イチ」と表示する。声紋認証ができ、ユーザー認証画面４００−ａ１での声紋認証のユーザーと同じであるので、ユーザー認証画面４００−ａ３を表示する。
ユーザー認証画面４００−ａ３のパスワード文字表示領域４１２に「５」、読み表示領域４１４に「ゴ」と表示する。声紋認証ができ、ユーザー認証画面４００−ａ２での声紋認証のユーザーと同じであるので、ユーザー認証画面４００−ａ４を表示する。
ユーザー認証画面４００−ａ４のパスワード文字表示領域４１２に「６」、読み表示領域４１４に「ロク」と表示する。声紋認証ができ、ユーザー認証画面４００−ａ３での声紋認証のユーザーと同じであり、４桁の声紋認証ができたので、認証成功とする。 The example of FIG. 5A shows an example in which user authentication can be performed with the password “4156”.
"4" is displayed in the password character display area 412 of the user authentication screen 400-a1, and "Yon" is displayed in the reading display area 414. Since the voiceprint authentication has been completed, the user authentication screen 400-a2 is displayed.
"1" is displayed in the password character display area 412 of the user authentication screen 400-a2, and "Ichi" is displayed in the reading display area 414. Since voiceprint authentication can be performed and the user is the same as the voiceprint authentication user on the user authentication screen 400-a1, the user authentication screen 400-a3 is displayed.
"5" is displayed in the password character display area 412 of the user authentication screen 400-a3, and "go" is displayed in the reading display area 414. Since voiceprint authentication can be performed and the user is the same as the voiceprint authentication user on the user authentication screen 400-a2, the user authentication screen 400-a4 is displayed.
"6" is displayed in the password character display area 412 of the user authentication screen 400-a4, and "Roku" is displayed in the reading display area 414. Voiceprint authentication is possible, and it is the same as the user of voiceprint authentication on the user authentication screen 400-a3, and 4-digit voiceprint authentication is possible, so the authentication is successful.

図５（ｂ）の例では、パスワード「４１５６９」でユーザー認証ができた例を示している。ただし、１回のタイムアウト時間の経過が発生した例を示している。
ユーザー認証画面４００−ｂ１のパスワード文字表示領域４１２に「４」、読み表示領域４１４に「ヨン」と表示する。声紋認証ができたので、ユーザー認証画面４００−ｂ２を表示する。
ユーザー認証画面４００−ｂ２のパスワード文字表示領域４１２に「１」、読み表示領域４１４に「イチ」と表示する。タイムアウト時間が経過したので、ユーザー認証画面４００−ｂ３を表示する。
ユーザー認証画面４００−ｂ３のパスワード文字表示領域４１２に「５」、読み表示領域４１４に「ゴ」と表示する。声紋認証ができ、ユーザー認証画面４００−ｂ１での声紋認証のユーザーと同じであるので、ユーザー認証画面４００−ｂ４を表示する。
ユーザー認証画面４００−ｂ４のパスワード文字表示領域４１２に「６」、読み表示領域４１４に「ロク」と表示する。声紋認証ができ、ユーザー認証画面４００−ｂ３での声紋認証のユーザーと同じであるので、ユーザー認証画面４００−ｂ５を表示する。
ユーザー認証画面４００−ｂ５のパスワード文字表示領域４１２に「９」、読み表示領域４１４に「キュウ」と表示する。声紋認証ができ、ユーザー認証画面４００−ｂ４での声紋認証のユーザーと同じであり、４桁の声紋認証ができたので、認証成功とする。 The example of FIG. 5B shows an example in which user authentication can be performed with the password "41569". However, an example is shown in which one time-out time has elapsed.
"4" is displayed in the password character display area 412 of the user authentication screen 400-b1, and "Yon" is displayed in the reading display area 414. Since the voiceprint authentication has been completed, the user authentication screen 400-b2 is displayed.
"1" is displayed in the password character display area 412 of the user authentication screen 400-b2, and "Ichi" is displayed in the reading display area 414. Since the timeout time has elapsed, the user authentication screen 400-b3 is displayed.
"5" is displayed in the password character display area 412 of the user authentication screen 400-b3, and "go" is displayed in the reading display area 414. Since voiceprint authentication can be performed and the user is the same as the voiceprint authentication user on the user authentication screen 400-b1, the user authentication screen 400-b4 is displayed.
"6" is displayed in the password character display area 412 of the user authentication screen 400-b4, and "Roku" is displayed in the reading display area 414. Since voiceprint authentication can be performed and the user is the same as the voiceprint authentication user on the user authentication screen 400-b3, the user authentication screen 400-b5 is displayed.
"9" is displayed in the password character display area 412 of the user authentication screen 400-b5, and "Kyu" is displayed in the reading display area 414. Voiceprint authentication is possible, and it is the same as the user of voiceprint authentication on the user authentication screen 400-b4, and 4-digit voiceprint authentication is possible, so the authentication is successful.

図５（ｃ）の例では、パスワード「４１５６９」でユーザー認証ができた例を示している。ただし、１回の認証失敗が発生した例を示している。
ユーザー認証画面４００−ｃ１のパスワード文字表示領域４１２に「４」、読み表示領域４１４に「ヨン」と表示する。声紋認証ができたので、ユーザー認証画面４００−ｃ２を表示する。
ユーザー認証画面４００−ｃ２のパスワード文字表示領域４１２に「１」、読み表示領域４１４に「イチ」と表示する。声紋認証ができ、ユーザー認証画面４００−ｃ１での声紋認証のユーザーと同じであるので、ユーザー認証画面４００−ｃ３を表示する。
ユーザー認証画面４００−ｃ３のパスワード文字表示領域４１２に「５」、読み表示領域４１４に「ゴ」と表示する。声紋認証できなかったので、ユーザー認証画面４００−ｃ４を表示する。
ユーザー認証画面４００−ｃ４のパスワード文字表示領域４１２に「６」、読み表示領域４１４に「ロク」と表示する。声紋認証ができ、ユーザー認証画面４００−ｃ２での声紋認証のユーザーと同じであるので、ユーザー認証画面４００−ｃ５を表示する。
ユーザー認証画面４００−ｃ５のパスワード文字表示領域４１２に「９」、読み表示領域４１４に「キュウ」と表示する。声紋認証ができ、ユーザー認証画面４００−ｃ４での声紋認証のユーザーと同じであり、４桁の声紋認証ができたので、認証成功とする。 The example of FIG. 5C shows an example in which user authentication can be performed with the password "41569". However, an example in which one authentication failure occurs is shown.
"4" is displayed in the password character display area 412 of the user authentication screen 400-c1, and "Yon" is displayed in the reading display area 414. Since the voiceprint authentication has been completed, the user authentication screen 400-c2 is displayed.
"1" is displayed in the password character display area 412 of the user authentication screen 400-c2, and "Ichi" is displayed in the reading display area 414. Since voiceprint authentication can be performed and the user is the same as the voiceprint authentication user on the user authentication screen 400-c1, the user authentication screen 400-c3 is displayed.
"5" is displayed in the password character display area 412 of the user authentication screen 400-c3, and "go" is displayed in the reading display area 414. Since voiceprint authentication could not be performed, the user authentication screen 400-c4 is displayed.
"6" is displayed in the password character display area 412 of the user authentication screen 400-c4, and "Roku" is displayed in the reading display area 414. Since voiceprint authentication can be performed and the user is the same as the voiceprint authentication user on the user authentication screen 400-c2, the user authentication screen 400-c5 is displayed.
"9" is displayed in the password character display area 412 of the user authentication screen 400-c5, and "Kyu" is displayed in the reading display area 414. Voiceprint authentication is possible, and it is the same as the user of voiceprint authentication on the user authentication screen 400-c4, and 4-digit voiceprint authentication is possible, so the authentication is successful.

図５（ｄ）の例では、パスワード「４１５６９７」でユーザー認証ができた例を示している。なお、この例は、図５（ｂ）、図５（ｃ）の例とは異なり、４文字連続で認証ができた場合に認証成功としており、１回のタイムアウト時間の経過が発生した例を示している。
ユーザー認証画面４００−ｄ１のパスワード文字表示領域４１２に「４」、読み表示領域４１４に「ヨン」と表示する。声紋認証ができたので、ユーザー認証画面４００−ｄ２を表示する。
ユーザー認証画面４００−ｄ２のパスワード文字表示領域４１２に「１」、読み表示領域４１４に「イチ」と表示する。タイムアウト時間が経過したので、ユーザー認証画面４００−ｄ３を表示する。
ユーザー認証画面４００−ｄ３のパスワード文字表示領域４１２に「５」、読み表示領域４１４に「ゴ」と表示する。声紋認証ができたので、ユーザー認証画面４００−ｄ４を表示する。
ユーザー認証画面４００−ｄ４のパスワード文字表示領域４１２に「６」、読み表示領域４１４に「ロク」と表示する。声紋認証ができ、ユーザー認証画面４００−ｄ３での声紋認証のユーザーと同じであるので、ユーザー認証画面４００−ｄ５を表示する。
ユーザー認証画面４００−ｄ５のパスワード文字表示領域４１２に「９」、読み表示領域４１４に「キュウ」と表示する。声紋認証ができ、ユーザー認証画面４００−ｄ４での声紋認証のユーザーと同じであるので、ユーザー認証画面４００−ｄ６を表示する。
ユーザー認証画面４００−ｄ６のパスワード文字表示領域４１２に「７」、読み表示領域４１４に「ナナ」と表示する。声紋認証ができ、ユーザー認証画面４００−ｄ５での声紋認証のユーザーと同じであり、連続して４桁の声紋認証ができたので、認証成功とする。 The example of FIG. 5D shows an example in which user authentication can be performed with the password "415697". In this example, unlike the examples of FIGS. 5 (b) and 5 (c), the authentication is successful when four characters can be authenticated in succession, and the time-out time elapses once. Shown.
"4" is displayed in the password character display area 412 of the user authentication screen 400-d1, and "Yon" is displayed in the reading display area 414. Since the voiceprint authentication has been completed, the user authentication screen 400-d2 is displayed.
"1" is displayed in the password character display area 412 of the user authentication screen 400-d2, and "Ichi" is displayed in the reading display area 414. Since the timeout time has elapsed, the user authentication screen 400-d3 is displayed.
"5" is displayed in the password character display area 412 of the user authentication screen 400-d3, and "go" is displayed in the reading display area 414. Since the voiceprint authentication has been completed, the user authentication screen 400-d4 is displayed.
"6" is displayed in the password character display area 412 of the user authentication screen 400-d4, and "Roku" is displayed in the reading display area 414. Since voiceprint authentication can be performed and the user is the same as the voiceprint authentication user on the user authentication screen 400-d3, the user authentication screen 400-d5 is displayed.
"9" is displayed in the password character display area 412 of the user authentication screen 400-d5, and "Kyu" is displayed in the reading display area 414. Since voiceprint authentication can be performed and the user is the same as the voiceprint authentication user on the user authentication screen 400-d4, the user authentication screen 400-d6 is displayed.
"7" is displayed in the password character display area 412 of the user authentication screen 400-d6, and "Nana" is displayed in the reading display area 414. Voiceprint authentication is possible, and it is the same as the user of voiceprint authentication on the user authentication screen 400-d5. Since 4-digit voiceprint authentication can be performed continuously, the authentication is successful.

図５（ｅ）の例では、パスワード「４１５６９」ではユーザー認証できなかった例を示している。
ユーザー認証画面４００−ｅ１のパスワード文字表示領域４１２に「４」、読み表示領域４１４に「ヨン」と表示する。声紋認証できなかったので、ユーザー認証画面４００−ｅ２を表示する。
ユーザー認証画面４００−ｅ２のパスワード文字表示領域４１２に「１」、読み表示領域４１４に「イチ」と表示する。声紋認証ができたので、ユーザー認証画面４００−ｅ３を表示する。
ユーザー認証画面４００−ｅ３のパスワード文字表示領域４１２に「５」、読み表示領域４１４に「ゴ」と表示する。声紋認証できなかったので、ユーザー認証画面４００−ｅ４を表示する。
ユーザー認証画面４００−ｅ４のパスワード文字表示領域４１２に「６」、読み表示領域４１４に「ロク」と表示する。声紋認証ができ、ユーザー認証画面４００−ｅ２での声紋認証のユーザーと同じであるので、ユーザー認証画面４００−ｅ５を表示する。
ユーザー認証画面４００−ｅ５のパスワード文字表示領域４１２に「９」、読み表示領域４１４に「キュウ」と表示する。声紋認証できず、その回数が３回に達したので、認証失敗とする。 In the example of FIG. 5 (e), the user cannot be authenticated with the password "41569".
"4" is displayed in the password character display area 412 of the user authentication screen 400-e1, and "Yon" is displayed in the reading display area 414. Since voiceprint authentication could not be performed, the user authentication screen 400-e2 is displayed.
"1" is displayed in the password character display area 412 of the user authentication screen 400-e2, and "Ichi" is displayed in the reading display area 414. Since the voiceprint authentication has been completed, the user authentication screen 400-e3 is displayed.
"5" is displayed in the password character display area 412 of the user authentication screen 400-e3, and "go" is displayed in the reading display area 414. Since voiceprint authentication could not be performed, the user authentication screen 400-e4 is displayed.
"6" is displayed in the password character display area 412 of the user authentication screen 400-e4, and "Roku" is displayed in the reading display area 414. Since voiceprint authentication can be performed and the user is the same as the voiceprint authentication user on the user authentication screen 400-e2, the user authentication screen 400-e5 is displayed.
"9" is displayed in the password character display area 412 of the user authentication screen 400-e5, and "Kyu" is displayed in the reading display area 414. Voiceprint authentication could not be performed, and the number of times reached 3 times, so authentication failed.

図６は、第１の実施の形態による処理例を示すフローチャートである。図３に示す例では、パスワードを静的に決定していたが、図６に示す例では、パスワードを動的に生成している。図６の例に示すフローチャートは、図３の例に示すフローチャートにステップＳ６０２とステップＳ６２２を付加したものである。 FIG. 6 is a flowchart showing a processing example according to the first embodiment. In the example shown in FIG. 3, the password was statically determined, but in the example shown in FIG. 6, the password is dynamically generated. The flowchart shown in the example of FIG. 6 is obtained by adding steps S602 and S622 to the flowchart shown in the example of FIG.

ステップＳ６０２では、パスワード生成ルールを作成する。パスワード生成ルールとして、例えば、予め定められた桁数の乱数を用いて、パスワードを生成するとしてもよいし、予め定められた関数を用いて、パスワードを生成するとしてもよい。桁数は、任意に設定することができる。例えば、４桁等としてもよい。桁数もパスワード生成ルールにしたがって可変としてもよい。
ステップＳ６０４では、パスワード生成ルールにしたがって、パスワードを生成する。 In step S602, a password generation rule is created. As a password generation rule, for example, a password may be generated by using a random number having a predetermined number of digits, or a password may be generated by using a predetermined function. The number of digits can be set arbitrarily. For example, it may be 4 digits or the like. The number of digits may also be variable according to the password generation rule.
In step S604, a password is generated according to the password generation rule.

ステップＳ６０６では、ユーザー認証画面にパスワードを１桁表示する。
ステップＳ６０８では、音声を受け付ける。 In step S606, a single digit password is displayed on the user authentication screen.
In step S608, the voice is received.

ステップＳ６１０では、タイムアウト時間が経過したか否かを判断し、経過した場合はステップＳ６２２へ進み、それ以外の場合はステップＳ６１２へ進む。 In step S610, it is determined whether or not the timeout time has elapsed, and if it has elapsed, the process proceeds to step S622, and if not, the process proceeds to step S612.

ステップＳ６１２では、入力音声を判定し、ＯＫの場合はステップＳ６１４へ進み、ＮＧの場合はステップＳ６１８へ進む。
ステップＳ６１４では、認証条件に合致するか否かを判断し、合致する場合はステップＳ６１６へ進み、それ以外の場合はステップＳ６０６へ戻る。 In step S612, the input voice is determined, and if it is OK, the process proceeds to step S614, and if it is NG, the process proceeds to step S618.
In step S614, it is determined whether or not the authentication conditions are met, and if they match, the process proceeds to step S616, and if not, the process returns to step S606.

ステップＳ６１６では、認証成功とする。
ステップＳ６１８では、失敗可能回数に達したか否かを判断し、達した場合はステップＳ６２０へ進み、それ以外の場合はステップＳ６２２へ進む。 In step S616, the authentication is successful.
In step S618, it is determined whether or not the number of possible failures has been reached, and if it is reached, the process proceeds to step S620, and if not, the process proceeds to step S622.

ステップＳ６２０では、認証失敗とする。
ステップＳ６２２では、パスワード生成ルールにしたがって、パスワードを変更し、ステップＳ６０６へ戻る。 In step S620, the authentication fails.
In step S622, the password is changed according to the password generation rule, and the process returns to step S606.

認証（Ａ）モジュール１３０の学習モデルの生成方法として、以下の２通りがある。
（１）全てのデータを一括で学習・推論する方法（第２の実施の形態に該当する）
・声紋データの認証のための学習をする場合に、文字毎に付加データを加える。
・推論時に取得した声紋データに、表示されている文字に対応する付加データを加えて推論を実施する。
（２）１文字ごとに学習・推論する方法（第３の実施の形態に該当する） There are the following two methods for generating the learning model of the authentication (A) module 130.
(1) Method of learning and inferring all data at once (corresponding to the second embodiment)
-Add additional data for each character when learning for voiceprint data authentication.
-Inference is performed by adding additional data corresponding to the displayed characters to the voiceprint data acquired at the time of inference.
(2) Method of learning / inferring character by character (corresponding to the third embodiment)

＜第２の実施の形態＞
図７は、第２の実施の形態の構成例についての概念的なモジュール構成図である。
機械学習装置（Ａ）７００は、文字列生成モジュール７０５、表示制御モジュール７１０、表示装置７１５、音声受付モジュール７２０、機械学習モジュール７２５、送信モジュール７４０を有している。 <Second embodiment>
FIG. 7 is a conceptual module configuration diagram for a configuration example of the second embodiment.
The machine learning device (A) 700 includes a character string generation module 705, a display control module 710, a display device 715, a voice reception module 720, a machine learning module 725, and a transmission module 740.

文字列生成モジュール７０５は、表示制御モジュール７１０と接続されている。文字列生成モジュール７０５は、情報処理装置１００の文字列生成モジュール１０５が生成する文字列を構成する個々の文字が含まれていればよい。例えば、前述したように、情報処理装置１００の文字列生成モジュール１０５が生成する数字列である場合は、文字列生成モジュール７０５が生成する文字列は、０〜９を含む文字列である。具体的には、０〜９を順に並べた文字列であってもよいし、情報処理装置１００における表示と同様にするために、ランダムに並べ替えた文字列であってもよい。なお、情報処理装置１００の文字列生成モジュール１０５が生成する文字列を構成する文字数は、複数である。 The character string generation module 705 is connected to the display control module 710. The character string generation module 705 may include individual characters constituting the character string generated by the character string generation module 105 of the information processing apparatus 100. For example, as described above, in the case of the number string generated by the character string generation module 105 of the information processing apparatus 100, the character string generated by the character string generation module 705 is a character string including 0 to 9. Specifically, it may be a character string in which 0 to 9 are arranged in order, or a character string randomly rearranged in order to make the display similar to that in the information processing apparatus 100. The number of characters constituting the character string generated by the character string generation module 105 of the information processing apparatus 100 is a plurality.

表示制御モジュール７１０は、文字列生成モジュール７０５、表示装置７１５、機械学習モジュール７２５の学習データ生成モジュール７３０と接続されている。表示制御モジュール７１０は、図１の例に示した情報処理装置１００の表示制御モジュール１１０と同等の機能を有している。つまり、情報処理装置１００における認証処理における表示制御と同等の表示制御を行うことによって、ユーザーに対して、情報処理装置１００における認証処理における環境と学習モデルを生成するための環境を同じにしている。なお、表示制御モジュール７１０は、一人のユーザーに対して、一文字又は複数文字の表示を複数回行う。
表示装置７１５は、表示制御モジュール７１０と接続されている。表示装置７１５は、図１の例に示した情報処理装置１００の表示装置１１５と同等の機能を有している。 The display control module 710 is connected to the character string generation module 705, the display device 715, and the learning data generation module 730 of the machine learning module 725. The display control module 710 has the same function as the display control module 110 of the information processing apparatus 100 shown in the example of FIG. That is, by performing display control equivalent to the display control in the authentication process in the information processing device 100, the environment for generating the learning model is the same as the environment in the authentication process in the information processing device 100 for the user. .. The display control module 710 displays one character or a plurality of characters to one user a plurality of times.
The display device 715 is connected to the display control module 710. The display device 715 has the same function as the display device 115 of the information processing device 100 shown in the example of FIG.

音声受付モジュール７２０は、機械学習モジュール７２５の学習データ生成モジュール７３０と接続されている。音声受付モジュール７２０は、例えば、マイク等によってユーザーの音声を受け付ける。そして、音声受付モジュール７２０は、予め定められた一文字又は複数文字について、複数人の予め定められたユーザーによって発声された音声を受け付ける。
ここで、「予め定められたユーザー」は、認証対象のユーザーである。具体的には、対象とする機器又はサービスの利用が許可されるユーザーである。 The voice reception module 720 is connected to the learning data generation module 730 of the machine learning module 725. The voice reception module 720 receives the user's voice by, for example, a microphone or the like. Then, the voice reception module 720 receives voices uttered by a plurality of predetermined users for one or a plurality of predetermined characters.
Here, the "predetermined user" is a user to be authenticated. Specifically, it is a user who is permitted to use the target device or service.

機械学習モジュール７２５は、学習データ生成モジュール７３０、学習モジュール７３５を有しており、送信モジュール７４０と接続されている。機械学習モジュール７２５は、表示制御モジュール７１０によって表示された一文字又は複数文字と、その一文字又は複数文字に対応して、音声受付モジュール７２０によって受け付けられた音声とを用いて、１つの学習モデルを生成する。 The machine learning module 725 has a learning data generation module 730 and a learning module 735, and is connected to the transmission module 740. The machine learning module 725 generates one learning model by using one character or a plurality of characters displayed by the display control module 710 and the voice received by the voice reception module 720 corresponding to the one character or a plurality of characters. To do.

学習データ生成モジュール７３０は、表示制御モジュール７１０、音声受付モジュール７２０、学習モジュール７３５と接続されている。学習データ生成モジュール７３０は、一文字又は複数文字毎に予め定められた情報を音声に付加したデータを、認証するための学習データとして生成する。ここで「予め定められた情報」は、表示制御モジュール７１０によって表示される「一文字又は複数文字」毎に定められている情報である。つまり、その「一文字又は複数文字」を一意に特定できるような情報であればよい。これによって、読みが類似している文字であっても相違を明確にした学習データを生成することになる。例えば、「１」の読み「イチ」と「８」の読み「ハチ」は類似しており、これを区別することは困難であるが、図９の例を用いて後述するように、「予め定められた情報」として「イチ」の音声データに「０ｘ１０」を付加し、「ハチ」の音声データに「０ｘ８０」を付加することによって、両者の差異を明確にする。 The learning data generation module 730 is connected to the display control module 710, the voice reception module 720, and the learning module 735. The learning data generation module 730 generates data in which predetermined information is added to the voice for each one character or a plurality of characters as learning data for authentication. Here, the "predetermined information" is information defined for each "single character or a plurality of characters" displayed by the display control module 710. That is, the information may be any information that can uniquely identify the "one character or a plurality of characters". As a result, learning data in which the differences are clarified even if the characters have similar readings are generated. For example, the reading "Ichi" of "1" and the reading "Hachi" of "8" are similar, and it is difficult to distinguish them. By adding "0x10" to the voice data of "Ichi" and adding "0x80" to the voice data of "Hachi" as "defined information", the difference between the two is clarified.

学習モジュール７３５は、学習データ生成モジュール７３０と接続されている。学習モジュール７３５は、学習データ生成モジュール７３０によって生成された学習データを用いて、音声を用いた認証用の学習を行って１つの学習モデルを生成する。なお、ここでの学習としては、既存の技術を用いればよい。例えば、決定木、ＮａｉｖｅＢａｙｅｓモデル、決定リスト、サポートベクターマシン、最大エントロピー法、条件付き確率場等を用いればよい。 The learning module 735 is connected to the learning data generation module 730. The learning module 735 uses the learning data generated by the learning data generation module 730 to perform learning for authentication using voice to generate one learning model. As the learning here, the existing technology may be used. For example, a decision tree, a Naive Bayes model, a decision list, a support vector machine, a maximum entropy method, a conditional random field, and the like may be used.

送信モジュール７４０は、機械学習モジュール７２５と接続されている。送信モジュール７４０は、、学習モジュール７３５が生成した学習モデルを情報処理装置１００に送信する。受信した情報処理装置１００は、その学習モデルを、認証（Ａ）モジュール１３０として設定する。 The transmission module 740 is connected to the machine learning module 725. The transmission module 740 transmits the learning model generated by the learning module 735 to the information processing device 100. The received information processing device 100 sets the learning model as the authentication (A) module 130.

図８、図９は、第２の実施の形態による処理例を示す説明図である。
声紋データの学習方法として、Ａさん、Ｂさん、Ｃさん、Ｄさんの４人分の数字「０」から「９」を読み上げた声紋データである学習単位データ８００を用いて、機械学習装置（Ａ）７００は学習を行う。つまり、図８の例に示す学習単位データ８００として、４０文字の学習を一括で行うようにしている。ただし、その際に、各数字の声紋データに付加データを加えている。具体的には、図９の例に示すように、「０」の声紋データ９１０ａに付加データ９１０ｂである「０ｘ００」を加え、「１」の声紋データ９１１ａに付加データ９１１ｂである「０ｘ１０」を加え、「２」の声紋データ９１２ａに付加データ９１２ｂである「０ｘ２０」を加え、「３」の声紋データ９１３ａに付加データ９１３ｂである「０ｘ３０」を加え、「４」の声紋データ９１４ａに付加データ９１４ｂである「０ｘ４０」を加え、「５」の声紋データ９１５ａに付加データ９１５ｂである「０ｘ５０」を加え、「６」の声紋データ９１６ａに付加データ９１６ｂである「０ｘ６０」を加え、「７」の声紋データ９１７ａに付加データ９１７ｂである「０ｘ７０」を加え、「８」の声紋データ９１８ａに付加データ９１８ｂである「０ｘ８０」を加え、「９」の声紋データ９１９ａに付加データ９１９ｂである「０ｘ９０」を加えて、学習データとしている。なお「０ｘ」は、その後に続く数字は１６進数であることを示している。
学習単位データ８００を用いて学習を行うので、生成される学習モデルも１つである。 8 and 9 are explanatory views showing a processing example according to the second embodiment.
As a learning method of voiceprint data, a machine learning device (using learning unit data 800, which is voiceprint data obtained by reading out the numbers "0" to "9" for four people, Mr. A, Mr. B, Mr. C, and Mr. D, ( A) 700 performs learning. That is, as the learning unit data 800 shown in the example of FIG. 8, 40 characters are learned in a batch. However, at that time, additional data is added to the voiceprint data of each number. Specifically, as shown in the example of FIG. 9, "0x00" which is the additional data 910b is added to the voiceprint data 910a of "0", and "0x10" which is the additional data 911b is added to the voiceprint data 911a of "1". In addition, the additional data 912b "0x20" is added to the voiceprint data 912a of "2", the additional data "0x30" is added to the voiceprint data 913a of "3", and the additional data 914a is added to the voiceprint data 914a of "4". Add "0x40" which is 914b, add "0x50" which is additional data 915b to voiceprint data 915a of "5", add "0x60" which is additional data 916b to voiceprint data 916a of "6", and add "7". The additional data 917b "0x70" is added to the voiceprint data 917a of "8", the additional data "0x80" is added to the voiceprint data 918a of "8", and the additional data 919b "0x90" is added to the voiceprint data 919a of "9". ] Is added to make the training data. Note that "0x" indicates that the number following it is a hexadecimal number.
Since learning is performed using the learning unit data 800, only one learning model is generated.

機械学習装置（Ａ）７００によって生成された学習モデルを用いてユーザーを認証する場合に、情報処理装置１００の音声受付モジュール１２０が受け付けた声紋データに、その時に表示装置１１５に表示されている数字に対応する付加データを加える。例えば、表示装置１１５に「０」が表示されている場合は、その表示されている期間に音声受付モジュール１２０が受け付けた声紋データに「０ｘ００」を加える。 When the user is authenticated using the learning model generated by the machine learning device (A) 700, the voiceprint data received by the voice reception module 120 of the information processing device 100 is the number displayed on the display device 115 at that time. Add the corresponding additional data to. For example, when "0" is displayed on the display device 115, "0x00" is added to the voiceprint data received by the voice reception module 120 during the displayed period.

図１０は、第２の実施の形態による処理例を示す説明図である。
まず、ユーザー認証画面４００に「パスワードを発音してください」と表示する。
次に、ユーザー認証画面４００に「０（ゼロ）」と表示する。それを見たユーザー１０１０は「０」１０１２ａと読み上げる。音声受付モジュール１２０は、声紋データ１０２０ａを受け付ける。そして、声紋データ１０２０ａに付加データ１０３０ａを加えて、合成データ１０４０ａを生成する。機械学習装置（Ａ）７００によって生成された学習モデル１０５０を用いて、合成データ１０４０ａはユーザー１０１０による認識結果「０」１０６０ａと認証する。なお、付加データ１０３０ａは、声紋データ１０２０ａを受け付けた際にユーザー認証画面４００に表示されていた「０」に対応する付加データ９１０ｂである。
次に、ユーザー認証画面４００に「８（ハチ）」と表示する。それを見たユーザー１０１０は「８」１０１２ｂと読み上げる。音声受付モジュール１２０は、声紋データ１０２０ｂを受け付ける。そして、声紋データ１０２０ｂに付加データ１０３０ｂを加えて、合成データ１０４０ｂを生成する。機械学習装置（Ａ）７００によって生成された学習モデル１０５０を用いて、合成データ１０４０ｂはユーザー１０１０による認識結果「８」１０６０ｂと認証する。なお、付加データ１０３０ｂは、声紋データ１０２０ｂを受け付けた際にユーザー認証画面４００に表示されていた「８」に対応する付加データ９１８ｂである。
次に、ユーザー認証画面４００に「７（ナナ）」と表示する。それを見たユーザー１０１０は「７」１０１２ｃと読み上げる。音声受付モジュール１２０は、声紋データ１０２０ｃを受け付ける。そして、声紋データ１０２０ｃに付加データ１０３０ｃを加えて、合成データ１０４０ｃを生成する。機械学習装置（Ａ）７００によって生成された学習モデル１０５０を用いて、合成データ１０４０ｃはユーザー１０１０による認識結果「７」１０６０ｃと認証する。なお、付加データ１０３０ｃは、声紋データ１０２０ｃを受け付けた際にユーザー認証画面４００に表示されていた「７」に対応する付加データ９１７ｂである。
次に、ユーザー認証画面４００に「３（サン）」と表示する。それを見たユーザー１０１０は「３」１０１２ｄと読み上げる。音声受付モジュール１２０は、声紋データ１０２０ｄを受け付ける。そして、声紋データ１０２０ｄに付加データ１０３０ｄを加えて、合成データ１０４０ｄを生成する。機械学習装置（Ａ）７００によって生成された学習モデル１０５０を用いて、合成データ１０４０ｄはユーザー１０１０による認識結果「３」１０６０ｄと認証する。なお、付加データ１０３０ｄは、声紋データ１０２０ｄを受け付けた際にユーザー認証画面４００に表示されていた「３」に対応する付加データ９１３ｂである。 FIG. 10 is an explanatory diagram showing a processing example according to the second embodiment.
First, "Please pronounce the password" is displayed on the user authentication screen 400.
Next, "0 (zero)" is displayed on the user authentication screen 400. The user 1010 who sees it reads out "0" 1012a. The voice reception module 120 receives voiceprint data 1020a. Then, the additional data 1030a is added to the voiceprint data 1020a to generate the composite data 1040a. Using the learning model 1050 generated by the machine learning device (A) 700, the synthetic data 1040a is authenticated as the recognition result "0" 1060a by the user 1010. The additional data 1030a is the additional data 910b corresponding to "0" displayed on the user authentication screen 400 when the voiceprint data 1020a is received.
Next, "8 (bee)" is displayed on the user authentication screen 400. The user 1010 who sees it reads out "8" 1012b. The voice reception module 120 receives voiceprint data 1020b. Then, the additional data 1030b is added to the voiceprint data 1020b to generate the composite data 1040b. Using the learning model 1050 generated by the machine learning device (A) 700, the synthetic data 1040b is authenticated as the recognition result "8" 1060b by the user 1010. The additional data 1030b is the additional data 918b corresponding to "8" displayed on the user authentication screen 400 when the voiceprint data 1020b is received.
Next, "7 (Nana)" is displayed on the user authentication screen 400. The user 1010 who sees it reads out "7" 1012c. The voice reception module 120 receives voiceprint data 1020c. Then, the additional data 1030c is added to the voiceprint data 1020c to generate the composite data 1040c. Using the learning model 1050 generated by the machine learning device (A) 700, the synthetic data 1040c is authenticated as the recognition result "7" 1060c by the user 1010. The additional data 1030c is the additional data 917b corresponding to "7" displayed on the user authentication screen 400 when the voiceprint data 1020c is received.
Next, "3 (Sun)" is displayed on the user authentication screen 400. The user 1010 who sees it reads "3" 1012d. The voice reception module 120 receives the voiceprint data 1020d. Then, the additional data 1030d is added to the voiceprint data 1020d to generate the composite data 1040d. Using the learning model 1050 generated by the machine learning device (A) 700, the synthetic data 1040d is authenticated as the recognition result "3" 1060d by the user 1010. The additional data 1030d is the additional data 913b corresponding to "3" displayed on the user authentication screen 400 when the voiceprint data 1020d is received.

＜第３の実施の形態＞
図１１は、第３の実施の形態の構成例についての概念的なモジュール構成図である。
機械学習装置（Ｂ）１１００は、文字列生成モジュール１１０５、表示制御モジュール１１１０、表示装置１１１５、音声受付モジュール１１２０、機械学習モジュール１１２５、送信モジュール１１４０を有している。
文字列生成モジュール１１０５は、表示制御モジュール１１１０と接続されている。文字列生成モジュール１１０５は、図７の例に示した機械学習装置（Ａ）７００の文字列生成モジュール７０５と同等の機能を有している。
表示制御モジュール１１１０は、文字列生成モジュール１１０５、表示装置１１１５、機械学習モジュール１１２５の学習データ生成モジュール１１３０と接続されている。表示制御モジュール１１１０は、図７の例に示した機械学習装置（Ａ）７００の表示制御モジュール７１０と同等の機能を有している。
表示装置１１１５は、表示制御モジュール１１１０と接続されている。表示装置１１１５は、図７の例に示した機械学習装置（Ａ）７００の表示装置７１５と同等の機能を有している。 <Third embodiment>
FIG. 11 is a conceptual module configuration diagram for a configuration example of the third embodiment.
The machine learning device (B) 1100 includes a character string generation module 1105, a display control module 1110, a display device 1115, a voice reception module 1120, a machine learning module 1125, and a transmission module 1140.
The character string generation module 1105 is connected to the display control module 1110. The character string generation module 1105 has the same function as the character string generation module 705 of the machine learning device (A) 700 shown in the example of FIG.
The display control module 1110 is connected to the character string generation module 1105, the display device 1115, and the learning data generation module 1130 of the machine learning module 1125. The display control module 1110 has the same function as the display control module 710 of the machine learning device (A) 700 shown in the example of FIG.
The display device 1115 is connected to the display control module 1110. The display device 1115 has the same function as the display device 715 of the machine learning device (A) 700 shown in the example of FIG.

音声受付モジュール１１２０は、機械学習モジュール１１２５の学習データ生成モジュール１１３０と接続されている。音声受付モジュール１１２０は、例えば、マイク等によってユーザーの音声を受け付ける。そして、音声受付モジュール１１２０は、予め定められた一文字又は複数文字について、複数人の予め定められたユーザーによって発声された音声を受け付ける。 The voice reception module 1120 is connected to the learning data generation module 1130 of the machine learning module 1125. The voice reception module 1120 receives the user's voice by, for example, a microphone or the like. Then, the voice reception module 1120 receives voices uttered by a plurality of predetermined users for one or a plurality of predetermined characters.

機械学習モジュール１１２５は、学習データ生成モジュール１１３０、学習モジュール１１３５ａ、学習モジュール１１３５ｂ、学習モジュール１１３５ｃを有しており、送信モジュール１１４０と接続されている。機械学習モジュール１１２５は、表示制御モジュール１１１０によって表示された一文字又は複数文字と、その一文字又は複数文字に対応して、音声受付モジュール１１２０によって受け付けられた音声とを用いて、表示制御モジュール１１１０によって表示された一文字又は複数文字を構成する文字毎の学習モデルを生成する。つまり、複数の学習モデルを生成することになる。 The machine learning module 1125 has a learning data generation module 1130, a learning module 1135a, a learning module 1135b, and a learning module 1135c, and is connected to the transmission module 1140. The machine learning module 1125 is displayed by the display control module 1110 using one character or a plurality of characters displayed by the display control module 1110 and the voice received by the voice reception module 1120 corresponding to the one character or a plurality of characters. Generate a learning model for each character that constitutes one or more characters. That is, a plurality of learning models will be generated.

学習データ生成モジュール１１３０は、表示制御モジュール１１１０、音声受付モジュール１１２０、学習モジュール１１３５ａ、学習モジュール１１３５ｂ、学習モジュール１１３５ｃと接続されている。学習データ生成モジュール１１３０は、予め定められた一文字又は複数文字毎の音声を、認証するための学習データとして生成する。 The learning data generation module 1130 is connected to a display control module 1110, a voice reception module 1120, a learning module 1135a, a learning module 1135b, and a learning module 1135c. The learning data generation module 1130 generates a predetermined voice for each one character or a plurality of characters as learning data for authentication.

学習モジュール１１３５ａ等は、学習データ生成モジュール１１３０と接続されている。学習モジュール１１３５ａ等は、学習データ生成モジュール１１３０によって生成された学習データを用いて、音声を用いた認証用の学習を行って一文字又は複数文字毎の学習モデルを生成する。例えば、文字列生成モジュール１１０５が数字列を生成した場合は、０〜９の１文字毎に、学習モジュール１１３５ａ等を割り当てる。具体的には、数字「１」用の学習モデルを生成する学習モジュール１１３５ａ、数字「２」用の学習モデルを生成する学習モジュール１１３５ｂ等のように、学習モジュール１１３５ａ等は、文字毎の学習モデルを生成する。なお、ここでの学習としては、既存の技術を用いればよい。例えば、決定木、ＮａｉｖｅＢａｙｅｓモデル、決定リスト、サポートベクターマシン、最大エントロピー法、条件付き確率場等を用いればよい。 The learning module 1135a and the like are connected to the learning data generation module 1130. The learning module 1135a and the like use the learning data generated by the learning data generation module 1130 to perform learning for authentication using voice and generate a learning model for each character or a plurality of characters. For example, when the character string generation module 1105 generates a number string, the learning module 1135a or the like is assigned to each character from 0 to 9. Specifically, the learning module 1135a and the like are character-by-character learning models, such as the learning module 1135a that generates a learning model for the number "1" and the learning module 1135b that generates a learning model for the number "2". To generate. As the learning here, the existing technology may be used. For example, a decision tree, a Naive Bayes model, a decision list, a support vector machine, a maximum entropy method, a conditional random field, and the like may be used.

送信モジュール１１４０は、機械学習モジュール１１２５と接続されている。送信モジュール１１４０は、学習モジュール１１３５ａ等が生成した学習モデルを情報処理装置１００に送信する。受信した情報処理装置１００は、その複数の学習モデルを、認証（Ａ）モジュール１３０として設定する。 The transmission module 1140 is connected to the machine learning module 1125. The transmission module 1140 transmits the learning model generated by the learning module 1135a or the like to the information processing apparatus 100. The received information processing device 100 sets the plurality of learning models as the authentication (A) module 130.

図１２は、第３の実施の形態による処理例を示す説明図である。
声紋データの学習方法として、Ａさん、Ｂさん、Ｃさん、Ｄさんの４人分の数字「０」から「９」のそれぞれを読み上げた声紋データである学習単位データ１２００、学習単位データ１２０１、学習単位データ１２０２、学習単位データ１２０３、学習単位データ１２０４、学習単位データ１２０５、学習単位データ１２０６、学習単位データ１２０７、学習単位データ１２０８、学習単位データ１２０９を用いて、機械学習装置（Ｂ）１１００は学習を行う。つまり、学習単位データ１２００を用いて学習モデルを生成し、学習単位データ１２０１を用いて学習モデルを生成し、学習単位データ１２０２を用いて学習モデルを生成し、学習単位データ１２０３を用いて学習モデルを生成し、学習単位データ１２０４を用いて学習モデルを生成し、学習単位データ１２０５を用いて学習モデルを生成し、学習単位データ１２０６を用いて学習モデルを生成し、学習単位データ１２０７を用いて学習モデルを生成し、学習単位データ１２０８を用いて学習モデルを生成し、学習単位データ１２０９を用いて学習モデルを生成して、合計１０個の学習モデルを生成する。 FIG. 12 is an explanatory diagram showing a processing example according to the third embodiment.
As a learning method of voice pattern data, learning unit data 1200, learning unit data 1201, which are voice pattern data read out from each of the numbers "0" to "9" for four people, Mr. A, Mr. B, Mr. C, and Mr. D, Machine learning device (B) 1100 using learning unit data 1202, learning unit data 1203, learning unit data 1204, learning unit data 1205, learning unit data 1206, learning unit data 1207, learning unit data 1208, learning unit data 1209. Do learning. That is, a learning model is generated using the learning unit data 1200, a learning model is generated using the learning unit data 1201, a learning model is generated using the learning unit data 1202, and a learning model is generated using the learning unit data 1203. Is generated, a learning model is generated using the learning unit data 1204, a learning model is generated using the learning unit data 1205, a learning model is generated using the learning unit data 1206, and the learning unit data 1207 is used. A learning model is generated, a learning model is generated using the learning unit data 1208, a learning model is generated using the learning unit data 1209, and a total of 10 learning models are generated.

図１３は、第３の実施の形態による処理例を示す説明図である。
まず、ユーザー認証画面４００に「パスワードを発音してください」と表示する。
次に、ユーザー認証画面４００に「０（ゼロ）」と表示する。それを見たユーザー１３１０は音声（０）１３１２ａと読み上げる。音声受付モジュール１２０は、声紋データ１３２０ａを受け付ける。そして、音声モデル（０）１３５０を用いて、声紋データ１３２０ａはユーザー１３１０による認識結果（０）１３６０ａと認証する。なお、音声モデル（０）１３５０は、声紋データ１３２０ａを受け付けた際にユーザー認証画面４００に表示されていた「０」に対応する学習モデルである。
次に、ユーザー認証画面４００に「８（ハチ）」と表示する。それを見たユーザー１３１０は音声（８）１３１２ｂと読み上げる。音声受付モジュール１２０は、声紋データ１３２０ｂを受け付ける。そして、音声モデル（８）１３５８を用いて、声紋データ１３２０ｂはユーザー１３１０による認識結果（８）１３６０ｂと認証する。なお、音声モデル（８）１３５８は、声紋データ１３２０ｂを受け付けた際にユーザー認証画面４００に表示されていた「８」に対応する学習モデルである。
次に、ユーザー認証画面４００に「７（ナナ）」と表示する。それを見たユーザー１３１０は音声（７）１３１２ｃと読み上げる。音声受付モジュール１２０は、声紋データ１３２０ｃを受け付ける。そして、音声モデル（７）１３５７を用いて、声紋データ１３２０ｃはユーザー１３１０による認識結果（７）１３６０ｃと認証する。なお、音声モデル（７）１３５７は、声紋データ１３２０ｃを受け付けた際にユーザー認証画面４００に表示されていた「７」に対応する学習モデルである。
次に、ユーザー認証画面４００に「３（サン）」と表示する。それを見たユーザー１３１０は音声（３）１３１２ｄと読み上げる。音声受付モジュール１２０は、声紋データ１３２０ｄを受け付ける。そして、音声モデル（３）１３５３を用いて、声紋データ１３２０ｄはユーザー１３１０による認識結果（３）１３６０ｄと認証する。なお、音声モデル（３）１３５３は、声紋データ１３２０ｄを受け付けた際にユーザー認証画面４００に表示されていた「３」に対応する学習モデルである。 FIG. 13 is an explanatory diagram showing a processing example according to the third embodiment.
First, "Please pronounce the password" is displayed on the user authentication screen 400.
Next, "0 (zero)" is displayed on the user authentication screen 400. The user 1310 who sees it reads aloud as voice (0) 1312a. The voice reception module 120 receives voiceprint data 1320a. Then, using the voice model (0) 1350, the voiceprint data 1320a is authenticated as the recognition result (0) 1360a by the user 1310. The voice model (0) 1350 is a learning model corresponding to "0" displayed on the user authentication screen 400 when the voiceprint data 1320a is received.
Next, "8 (bee)" is displayed on the user authentication screen 400. The user 1310 who sees it reads out the voice (8) 1312b. The voice reception module 120 receives voiceprint data 1320b. Then, using the voice model (8) 1358, the voiceprint data 1320b is authenticated as the recognition result (8) 1360b by the user 1310. The voice model (8) 1358 is a learning model corresponding to "8" displayed on the user authentication screen 400 when the voiceprint data 1320b is received.
Next, "7 (Nana)" is displayed on the user authentication screen 400. The user 1310 who sees it reads out the voice (7) 1312c. The voice reception module 120 receives voiceprint data 1320c. Then, using the voice model (7) 1357, the voiceprint data 1320c is authenticated as the recognition result (7) 1360c by the user 1310. The voice model (7) 1357 is a learning model corresponding to "7" displayed on the user authentication screen 400 when the voiceprint data 1320c is received.
Next, "3 (Sun)" is displayed on the user authentication screen 400. The user 1310 who sees it reads out the voice (3) 1312d. The voice reception module 120 receives the voiceprint data 1320d. Then, using the voice model (3) 1353, the voiceprint data 1320d is authenticated as the recognition result (3) 1360d by the user 1310. The voice model (3) 1353 is a learning model corresponding to "3" displayed on the user authentication screen 400 when the voiceprint data 1320d is received.

図１４を参照して、本実施の形態の情報処理装置１００、機械学習装置（Ａ）７００、機械学習装置（Ｂ）１１００のハードウェア構成例について説明する。図１４に示す構成は、例えばパーソナルコンピュータ等によって構成されるものであり、スキャナ等のデータ読み取り部１４１７と、プリンタ等のデータ出力部１４１８を備えたハードウェア構成例を示している。 A hardware configuration example of the information processing device 100, the machine learning device (A) 700, and the machine learning device (B) 1100 of the present embodiment will be described with reference to FIG. The configuration shown in FIG. 14 is configured by, for example, a personal computer or the like, and shows an example of a hardware configuration including a data reading unit 1417 such as a scanner and a data output unit 1418 such as a printer.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔの略）１４０１は、前述の実施の形態において説明した各種のモジュール、すなわち、文字列生成モジュール１０５、表示制御モジュール１１０、ユーザー認証モジュール１２５、認証（Ａ）モジュール１３０、認証（Ｂ）モジュール１３５、文字列生成モジュール７０５、表示制御モジュール７１０、機械学習モジュール７２５、学習データ生成モジュール７３０、学習モジュール７３５、送信モジュール７４０、文字列生成モジュール１１０５、表示制御モジュール１１１０、表示装置１１１５、機械学習モジュール１１２５、学習データ生成モジュール１１３０、学習モジュール１１３５、送信モジュール１１４０等の各モジュールの実行シーケンスを記述したコンピュータ・プログラムにしたがった処理を実行する制御部である。 The CPU (abbreviation of Central Processing Unit) 1401 includes various modules described in the above-described embodiment, that is, a character string generation module 105, a display control module 110, a user authentication module 125, an authentication (A) module 130, and authentication ( B) Module 135, character string generation module 705, display control module 710, machine learning module 725, learning data generation module 730, learning module 735, transmission module 740, character string generation module 1105, display control module 1110, display device 1115, It is a control unit that executes a process according to a computer program that describes an execution sequence of each module such as a machine learning module 1125, a learning data generation module 1130, a learning module 1135, and a transmission module 1140.

ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙの略）１４０２は、ＣＰＵ１４０１が使用するプログラムや演算パラメータ等を格納する。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙの略）１４０３は、ＣＰＵ１４０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を格納する。これらはＣＰＵバス等から構成されるホストバス１４０４により相互に接続されている。 The ROM (abbreviation of Read Only Memory) 1402 stores programs, calculation parameters, and the like used by the CPU 1401. The RAM (abbreviation of Random Access Memory) 1403 stores a program used in the execution of the CPU 1401, parameters that are appropriately changed in the execution, and the like. These are connected to each other by a host bus 1404 composed of a CPU bus or the like.

ホストバス１４０４は、ブリッジ１４０５を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅの略）バス等の外部バス１４０６に接続されている。 The host bus 1404 is connected to an external bus 1406 such as a PCI (Peripheral Component Interconnect / Interface) bus via a bridge 1405.

キーボード１４０８、マウス等のポインティングデバイス１４０９は、操作者により操作されるデバイスである。表示装置１１５、表示装置７１５、表示装置１１１５の一例であるディスプレイ１４１０は、液晶表示装置又はＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅの略）等があり、各種情報をテキストやイメージ情報として表示する。また、ポインティングデバイス１４０９とディスプレイ１４１０の両方の機能を備えているタッチスクリーン等であってもよい。その場合、キーボードの機能の実現について、キーボード１４０８のように物理的に接続しなくても、画面（例えば、タッチスクリーン）上にソフトウェアでキーボード（いわゆるソフトウェアキーボード、スクリーンキーボード等ともいわれる）を描画して、キーボードの機能を実現するようにしてもよい。 A pointing device 1409 such as a keyboard 1408 and a mouse is a device operated by an operator. The display 1410, which is an example of the display device 115, the display device 715, and the display device 1115, includes a liquid crystal display device or a CRT (abbreviation of Cathode Ray Tube), and displays various information as text or image information. Further, a touch screen or the like having the functions of both the pointing device 1409 and the display 1410 may be used. In that case, regarding the realization of the keyboard function, a keyboard (also called a so-called software keyboard, screen keyboard, etc.) is drawn by software on the screen (for example, touch screen) without physically connecting like the keyboard 1408. It may be possible to realize the function of the keyboard.

ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅの略）１４１１は、ハードディスク（ハードディスク以外に、フラッシュ・メモリ等であってもよい）を内蔵し、ハードディスクを駆動し、ＣＰＵ１４０１によって実行するプログラムや情報を記録又は再生させる。ＨＤＤ１４１１は、音声受付モジュール１２０が受け付けた音声データ、認識を行う学習モデル、規則、ユーザー認証モジュール１２５による処理の結果データ等を記憶する。さらに、その他の各種データ、各種コンピュータ・プログラム等が格納される。 The HDD (abbreviation of Hard Disk Drive) 1411 has a built-in hard disk (a flash memory or the like may be used in addition to the hard disk), drives the hard disk, and records or reproduces programs and information executed by the CPU 1401. The HDD 1411 stores the voice data received by the voice reception module 120, the learning model for recognition, the rules, the result data of the processing by the user authentication module 125, and the like. In addition, various other data, various computer programs, etc. are stored.

ドライブ１４１２は、装着されている磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリ等のリムーバブル記録媒体１４１３に記録されているデータ又はプログラムを読み出して、そのデータ又はプログラムを、インタフェース１４０７、外部バス１４０６、ブリッジ１４０５、及びホストバス１４０４を介して接続されているＲＡＭ１４０３に供給する。なお、リムーバブル記録媒体１４１３も、データ記録領域として利用可能である。 The drive 1412 reads out the data or program recorded on the removable recording medium 1413 such as the mounted magnetic disk, optical disk, magneto-optical disk, or semiconductor memory, and transfers the data or program to the interface 1407 and the external bus 1406. , Bridge 1405, and RAM 1403 connected via host bus 1404. The removable recording medium 1413 can also be used as a data recording area.

接続ポート１４１４は、外部接続機器１４１５を接続するポートであり、ＵＳＢ、ＩＥＥＥ１３９４等の接続部を持つ。接続ポート１４１４には、外部接続機器１４１５として、音声受付モジュール１２０、音声受付モジュール７２０、音声受付モジュール１１２０の一例であるマイク等が接続される。接続ポート１４１４は、インタフェース１４０７、及び外部バス１４０６、ブリッジ１４０５、ホストバス１４０４等を介してＣＰＵ１４０１等に接続されている。通信部１４１６は、通信回線に接続され、外部とのデータ通信処理を実行する。データ読み取り部１４１７は、例えばスキャナであり、ドキュメントの読み取り処理を実行する。データ出力部１４１８は、例えばプリンタであり、ドキュメントデータの出力処理を実行する。 The connection port 1414 is a port for connecting the external connection device 1415, and has a connection portion such as USB or IEEE 1394. A voice reception module 120, a voice reception module 720, a microphone which is an example of the voice reception module 1120, and the like are connected to the connection port 1414 as an external connection device 1415. The connection port 1414 is connected to the CPU 1401 and the like via the interface 1407, the external bus 1406, the bridge 1405, the host bus 1404, and the like. The communication unit 1416 is connected to the communication line and executes data communication processing with the outside. The data reading unit 1417 is, for example, a scanner, and executes a document reading process. The data output unit 1418 is, for example, a printer, and executes a document data output process.

前述の実施の形態のうち、コンピュータ・プログラムによるものについては、本ハードウェア構成のシステムにソフトウェアであるコンピュータ・プログラムを読み込ませ、ソフトウェアとハードウェア資源とが協働して、前述の実施の形態が実現される。
なお、図１４に示す情報処理装置１００等のハードウェア構成は、１つの構成例を示すものであり、本実施の形態は、図１４に示す構成に限らず、本実施の形態において説明したモジュールを実行可能な構成であればよい。例えば、プロセッサーとして、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔの略、ＧＰＧＰＵ（Ｇｅｎｅｒａｌ−ＰｕｒｐｏｓｅｃｏｍｐｕｔｉｎｇｏｎＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔｓの略）を含む）を用いてもよいし、一部のモジュールを専用のハードウェア（例えば特定用途向け集積回路（具体例として、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔの略）等がある）や再構成可能な集積回路（具体例として、ＦＰＧＡ（Ｆｉｅｌｄ−ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙの略）等がある）で構成してもよく、一部のモジュールは外部のシステム内にあり通信回線で接続している形態でもよく、さらに図１４に示すシステムが複数互いに通信回線によって接続されていて互いに協調動作するようにしてもよい。また、特に、パーソナルコンピュータの他、携帯情報通信機器（携帯情報通信機器として、携帯電話、スマートフォン、モバイル機器、ウェアラブルコンピュータ等を含む）、情報家電、ロボット、複写機、ファックス、スキャナ、プリンタ、複合機（複合機とは、スキャナ、プリンタ、複写機、ファックス等のいずれか２つ以上の機能を有している画像処理装置）等に組み込まれていてもよい。 Among the above-described embodiments, in the case of a computer program, the system of the present hardware configuration is made to read the computer program which is software, and the software and the hardware resources cooperate with each other to carry out the above-described embodiment. Is realized.
The hardware configuration of the information processing device 100 and the like shown in FIG. 14 shows one configuration example, and the present embodiment is not limited to the configuration shown in FIG. 14, and the module described in the present embodiment is not limited to the configuration shown in FIG. Any configuration can be used. For example, as a processor, a GPU (abbreviation of Graphics Processing Unit, including GPGPU (abbreviation of General-Purpose computing on Graphics Processing Units)) may be used, or some modules may be used with dedicated hardware (for example, for a specific application). It is composed of an integrated circuit (specific example, there is an ASIC (abbreviation of Application Specific Integrated System), etc.) and a reconfigurable integrated circuit (specific example, there is an FPGA (abbreviation of Field-Programmable Gate Array), etc.). In some cases, some modules may be in an external system and connected by a communication line, or a plurality of systems shown in FIG. 14 may be connected to each other by a communication line so as to cooperate with each other. In addition to personal computers, mobile information communication devices (including mobile phones, smartphones, mobile devices, wearable computers, etc. as mobile information communication devices), information appliances, robots, copying machines, fax machines, scanners, printers, etc. It may be incorporated in a compound machine (the compound machine is an image processing device having any two or more functions such as a scanner, a printer, a copying machine, and a fax machine).

また、前述の実施の形態の説明内での比較処理において、「以上」、「以下」、「より大きい」、「より小さい（未満）」としたものは、例示であって、その組み合わせに矛盾が生じない限り、それぞれ「より大きい」、「より小さい（未満）」、「以上」、「以下」としてもよい。
なお、前述の各種の実施の形態を組み合わせてもよく、また、各モジュールの処理内容として背景技術で説明した技術を採用してもよい。例えば、第２の実施の形態による学習モデルと第３の実施の形態による学習モデルの２種類を生成しておき、認証（Ａ）モジュール１３０は、その２つの学習モデルを切り替えて使用するようにしてもよい。具体的には、認証（Ａ）モジュール１３０は、第２の実施の形態による学習モデルを用いた認証を行い、認証される率が予め定められた値よりも低い場合は、第３の実施の形態による学習モデルを用いた認証を行うようにしてもよい。また、この逆であってもよい。また、認証（Ａ）モジュール１３０は、音声受付モジュール１２０が受け付けた音声に対して、第２の実施の形態による学習モデルと第３の実施の形態による学習モデルを用いた認証を行い、両者が合致する場合に、その文字におけるユーザーの認証が成功したとしてもよい。不一致の場合は、その文字におけるユーザーの認証は失敗としてもよい。 Further, in the comparison process in the description of the above-described embodiment, "greater than or equal to", "less than or equal to", "greater than", and "less than (less than)" are examples, and the combination is inconsistent. May be "greater than", "less than (less than)", "greater than or equal to", and "less than or equal to", respectively, as long as
In addition, the above-mentioned various embodiments may be combined, and the technique described in the background technique may be adopted as the processing content of each module. For example, two types of a learning model according to the second embodiment and a learning model according to the third embodiment are generated, and the authentication (A) module 130 switches between the two learning models for use. You may. Specifically, the authentication (A) module 130 performs authentication using the learning model according to the second embodiment, and when the authentication rate is lower than a predetermined value, the third embodiment is performed. Authentication may be performed using a learning model based on the form. The reverse may also be true. Further, the authentication (A) module 130 authenticates the voice received by the voice reception module 120 using the learning model according to the second embodiment and the learning model according to the third embodiment, and both perform authentication. If they match, the user may have successfully authenticated on that character. In the case of a mismatch, the user's authentication on that character may fail.

なお、説明したプログラムについては、記録媒体に格納して提供してもよく、また、そのプログラムを通信手段によって提供してもよい。その場合、例えば、前記説明したプログラムについて、「プログラムを記録したコンピュータ読み取り可能な記録媒体」の発明として捉えてもよい。
「プログラムを記録したコンピュータ読み取り可能な記録媒体」とは、プログラムのインストール、実行、プログラムの流通等のために用いられる、プログラムが記録されたコンピュータで読み取り可能な記録媒体をいう。
なお、記録媒体としては、例えば、デジタル・バーサタイル・ディスク（ＤＶＤ）であって、ＤＶＤフォーラムで策定された規格である「ＤＶＤ−Ｒ、ＤＶＤ−ＲＷ、ＤＶＤ−ＲＡＭ等」、ＤＶＤ＋ＲＷで策定された規格である「ＤＶＤ＋Ｒ、ＤＶＤ＋ＲＷ等」、コンパクトディスク（ＣＤ）であって、読出し専用メモリ（ＣＤ−ＲＯＭ）、ＣＤレコーダブル（ＣＤ−Ｒ）、ＣＤリライタブル（ＣＤ−ＲＷ）等、ブルーレイ・ディスク（Ｂｌｕ−ｒａｙ（登録商標）Ｄｉｓｃ）、光磁気ディスク（ＭＯ）、フレキシブルディスク（ＦＤ）、磁気テープ、ハードディスク、読出し専用メモリ（ＲＯＭ）、電気的消去及び書換可能な読出し専用メモリ（ＥＥＰＲＯＭ（登録商標））、フラッシュ・メモリ、ランダム・アクセス・メモリ（ＲＡＭ）、ＳＤ（ＳｅｃｕｒｅＤｉｇｉｔａｌの略）メモリーカード等が含まれる。
そして、前記のプログラムの全体又はその一部は、前記記録媒体に記録して保存や流通等させてもよい。また、通信によって、例えば、ローカル・エリア・ネットワーク（ＬＡＮ）、メトロポリタン・エリア・ネットワーク（ＭＡＮ）、ワイド・エリア・ネットワーク（ＷＡＮ）、インターネット、イントラネット、エクストラネット等に用いられる有線ネットワーク、又は無線通信ネットワーク、さらにこれらの組み合わせ等の伝送媒体を用いて伝送させてもよく、また、搬送波に乗せて搬送させてもよい。
さらに、前記のプログラムは、他のプログラムの一部分若しくは全部であってもよく、又は別個のプログラムと共に記録媒体に記録されていてもよい。また、複数の記録媒体に分割して記録されていてもよい。また、圧縮や暗号化等、復元可能であればどのような態様で記録されていてもよい。 The described program may be stored in a recording medium and provided, or the program may be provided by a communication means. In that case, for example, the program described above may be regarded as an invention of "a computer-readable recording medium on which the program is recorded".
The "computer-readable recording medium on which a program is recorded" means a computer-readable recording medium on which a program is recorded, which is used for program installation, execution, program distribution, and the like.
The recording medium is, for example, a digital versatile disc (DVD), which is a standard established by the DVD Forum "DVD-R, DVD-RW, DVD-RAM, etc." and DVD + RW. Standards such as "DVD + R, DVD + RW, etc.", compact discs (CD), read-only memory (CD-ROM), CD recordable (CD-R), CD rewritable (CD-RW), etc., Blu-ray discs (CD-RW) Blu-ray (registered trademark) Disc), optical magnetic disk (MO), flexible disk (FD), magnetic tape, hard disk, read-only memory (ROM), electrically erasable and rewritable read-only memory (EEPROM (registered trademark)) )), Flash memory, random access memory (RAM), SD (abbreviation of Secure Digital) memory card and the like.
Then, the whole or a part of the program may be recorded on the recording medium and stored, distributed, or the like. Further, by communication, for example, a wired network used for a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), the Internet, an intranet, an extranet, or wireless communication. It may be transmitted using a transmission medium such as a network or a combination thereof, or may be carried on a carrier.
Further, the program may be a part or all of other programs, or may be recorded on a recording medium together with a separate program. Further, the recording may be divided into a plurality of recording media. Further, it may be recorded in any mode as long as it can be restored, such as compression and encryption.

１００…情報処理装置
１０５…文字列生成モジュール
１１０…表示制御モジュール
１１５…表示装置
１２０…音声受付モジュール
１２５…ユーザー認証モジュール
１３０…認証（Ａ）モジュール
１３５…認証（Ｂ）モジュール
２００…画像処理装置
２９０…通信回線
７００…機械学習装置（Ａ）
７０５…文字列生成モジュール
７１０…表示制御モジュール
７１５…表示装置
７２０…音声受付モジュール
７２５…機械学習モジュール
７３０…学習データ生成モジュール
７３５…学習モジュール
７４０…送信モジュール
１１００…機械学習装置（Ｂ）
１１０５…文字列生成モジュール
１１１０…表示制御モジュール
１１１５…表示装置
１１２０…音声受付モジュール
１１２５…機械学習モジュール
１１３０…学習データ生成モジュール
１１３５…学習モジュール
１１３５ａ…学習モジュール
１１３５ｂ…学習モジュール
１１３５ｃ…学習モジュール
１１４０…送信モジュール 100 ... Information processing device 105 ... Character string generation module 110 ... Display control module 115 ... Display device 120 ... Voice reception module 125 ... User authentication module 130 ... Authentication (A) module 135 ... Authentication (B) module 200 ... Image processing device 290 … Communication line 700… Machine learning device (A)
705 ... Character string generation module 710 ... Display control module 715 ... Display device 720 ... Voice reception module 725 ... Machine learning module 730 ... Learning data generation module 735 ... Learning module 740 ... Transmission module 1100 ... Machine learning device (B)
1105 ... Character string generation module 1110 ... Display control module 1115 ... Display device 1120 ... Voice reception module 1125 ... Machine learning module 1130 ... Learning data generation module 1135 ... Learning module 1135a ... Learning module 1135b ... Learning module 1135c ... Learning module 1140 ... Transmission module

Claims

A display control means for controlling a character string including a plurality of characters to display one character or a plurality of characters in the character string, and
A reception means that receives the voice of the user who utters the characters displayed by the display control means, and
The first authentication means that authenticates each one-character or multiple-character voice,
An information processing device having a second authentication means that authenticates a user who has emitted the voice by applying a predetermined rule to a plurality of authentication results by the first authentication means.

The display control means controls so that one character or a plurality of characters in the character string is displayed in a plurality of times.
The first authentication means authenticates each character displayed by the display control means.
The information processing device according to claim 1.

If the second authentication means fails to authenticate by the first authentication means a predetermined number of times, the second authentication means fails to authenticate.
The information processing device according to claim 2.

The display control means controls so as to display the reading kana of the characters to be displayed.
The information processing device according to claim 1.

If the receiving means does not receive the voice within a predetermined time, or if the authentication by the first authentication means is not performed, the display control means displays the following characters. Control,
The information processing device according to claim 2.

It further has a generation means for generating the character string to be displayed by the display control means according to the security level.
The display control means controls to display one or more characters in the character string generated by the generation means.
The information processing device according to claim 1.

A reception means for receiving voices uttered by a plurality of predetermined users for a predetermined character or a plurality of characters.
A generation means for generating data in which predetermined information is added to the voice for each one character or a plurality of characters as learning data for authentication.
An information processing device having a learning model generating means for generating one learning model by performing learning for authentication using voice using the learning data generated by the generating means.

The first authentication means authenticates using the learning model generated by the information processing apparatus according to claim 7.
The information processing device according to claim 1.

A reception means for receiving voices uttered by a plurality of predetermined users for a predetermined character or a plurality of characters.
A generation means for generating a predetermined voice for each one character or a plurality of characters as learning data for authentication, and
An information processing device having a learning model generating means for generating a learning model for each character or a plurality of characters by performing learning for authentication using voice using the learning data generated by the generating means.

The first authentication means is a learning model generated by the information processing apparatus according to claim 9, and authentication is performed using a learning model corresponding to one character or a plurality of characters displayed by the display control means. ,
The information processing device according to claim 1.

Computer,
A display control means for controlling a character string including a plurality of characters to display one character or a plurality of characters in the character string, and
A reception means that receives the voice of the user who utters the characters displayed by the display control means, and
The first authentication means that authenticates each one-character or multiple-character voice,
An information processing program for functioning as a second authentication means for authenticating a user who has emitted a voice by applying a predetermined rule to a plurality of authentication results by the first authentication means.

Computer,
A reception means for receiving voices uttered by a plurality of predetermined users for a predetermined character or a plurality of characters.
A generation means for generating data in which predetermined information is added to the voice for each one character or a plurality of characters as learning data for authentication.
An information processing program for functioning as a learning model generation means for generating one learning model by performing learning for authentication using voice using the learning data generated by the generation means.

Computer,
A reception means for receiving voices uttered by a plurality of predetermined users for a predetermined character or a plurality of characters.
A generation means for generating a predetermined voice for each one character or a plurality of characters as learning data for authentication, and
An information processing program for functioning as a learning model generation means for generating a learning model for each character or a plurality of characters by performing learning for authentication using voice using the learning data generated by the generation means.