JP7423310B2

JP7423310B2 - Data processing device, data processing method and trained model

Info

Publication number: JP7423310B2
Application number: JP2019238682A
Authority: JP
Inventors: 隆長峯; 直幸高田
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2024-01-29
Anticipated expiration: 2039-12-27
Also published as: JP2021107979A

Description

本発明は、データ処理装置及び学習済みモデルに係り、特に、入力データに対して所定のデータ処理タスクを行うデータ処理装置、データ処理方法及び学習済みモデルに関する。 The present invention relates to a data processing device and a learned model, and more particularly to a data processing device, a data processing method, and a learned model that perform a predetermined data processing task on input data.

特許文献１には、カメラ画像をそのまま送信するのではなく、画像を取得したローカル装置において照合に影響しないノイズを顔画像に付加することで目視にて識別困難なノイズ付加画像を生成し、当該ノイズ付加画像をサーバに送信して、サーバ側で当該ノイズ付加画像を用いて人物照合することにより、プライバシーに配慮したネットワーク型の認証システムが開示されている。 Patent Document 1 discloses that instead of transmitting the camera image as it is, the local device that acquired the image adds noise that does not affect matching to the face image to generate a noise-added image that is difficult to visually identify. A network-type authentication system has been disclosed that takes privacy into consideration by transmitting a noise-added image to a server and performing person verification using the noise-added image on the server side.

特に、サーバにおいて多層ニューラルネットワークによって人物照合などのデータ処理タスクを行う場合、ローカル装置は、当該多層ニューラルネットワークに含まれる畳み込み層のフィルタの成分とノイズの成分との積が０とみなせる範囲でノイズの成分を設定する条件によりノイズ付加画像を生成する例が開示されている。 In particular, when a data processing task such as person matching is performed using a multilayer neural network on a server, the local device uses noise to the extent that the product of the filter component of the convolutional layer included in the multilayer neural network and the noise component can be considered to be 0. An example is disclosed in which a noise-added image is generated based on conditions for setting the components of .

特開２０１８－１２９７５０号公報Japanese Patent Application Publication No. 2018-129750

しかしながら、上記条件となるノイズ付加画像を生成するために、例えばフィルタの成分を学習により求め、その後に上記条件となるようノイズの成分を求める方法を用いた場合、上記条件を満たすノイズが求められることは保証されない。すなわち、多層ニューラルネットワークのような非線形識別器においては、上記条件を満たすノイズを解析的に求めることは困難な場合があった。 However, in order to generate a noise-added image that meets the above conditions, for example, if a method is used to find the filter components through learning and then find the noise components that meet the above conditions, noise that satisfies the above conditions will be found. That is not guaranteed. That is, in a nonlinear classifier such as a multilayer neural network, it is sometimes difficult to analytically find noise that satisfies the above conditions.

そこで、本発明は、ノイズを付加しても、多層ニューラルネットワークを用いたデータ処理タスクを精度よく行うことができるデータ処理装置、データ処理方法及び学習済みモデルを提供することを目的とする。 SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a data processing device, a data processing method, and a trained model that can perform data processing tasks using a multilayer neural network with high accuracy even when noise is added.

上記の目的を達成するために本発明に係るデータ処理装置は、入力データに対して所定のデータ処理タスクを行う多層ニューラルネットワークにおける各畳み込み層で用いられるフィルタと、前記入力データに加算されるノイズデータとを学習する学習部と、前記入力データに対して前記学習部により学習されたノイズデータを加算して得られたノイズ付加データを、前記多層ニューラルネットワークに入力して、前記多層ニューラルネットワークの出力に基づいて、前記データ処理タスクの結果を求める入力用データ処理部と、を含むデータ処理装置であって、前記学習部は、前記データ処理タスクの結果が予め付与された学習用データを前記多層ニューラルネットワークに入力して、前記ノイズデータと前記多層ニューラルネットワークに含まれる前段の畳み込み層で用いられるフィルタとが直交性を有し、かつ、前記求められた前記データ処理タスクの結果と、前記学習用データに予め付与された前記データ処理タスクの結果とが一致するよう学習することを特徴とする。 In order to achieve the above object, a data processing device according to the present invention includes a filter used in each convolution layer in a multilayer neural network that performs a predetermined data processing task on input data, and a noise added to the input data. a learning unit that learns the data, and inputs noise-added data obtained by adding noise data learned by the learning unit to the input data to the multilayer neural network, and inputs the noise-added data obtained by adding the noise data learned by the learning unit to the input data to An input data processing unit that obtains a result of the data processing task based on an output. input to a multilayer neural network, the noise data and the filter used in the previous convolution layer included in the multilayer neural network have orthogonality, and the obtained result of the data processing task and the It is characterized in that learning is performed so that the results of the data processing task given in advance to the learning data match.

本発明に係るデータ処理装置によれば、前記学習部は、前記データ処理タスクの結果が予め付与された学習用データを前記多層ニューラルネットワークに入力して、前記ノイズデータと前記多層ニューラルネットワークに含まれる前段の畳み込み層で用いられるフィルタとが直交性を有し、かつ、前記求められた前記データ処理タスクの結果と、前記学習用データに予め付与された前記データ処理タスクの結果とが一致するよう学習する。そして、入力用データ処理部は、前記入力データに対して前記学習部により学習されたノイズデータを加算して得られたノイズ付加データを、前記多層ニューラルネットワークに入力して、前記多層ニューラルネットワークの出力に基づいて、前記データ処理タスクの結果を求める。 According to the data processing device according to the present invention, the learning unit inputs learning data to which the results of the data processing task are assigned in advance to the multilayer neural network, and includes the noise data and the multilayer neural network. filters used in the convolutional layer in the previous stage that are used have orthogonality, and the obtained result of the data processing task matches the result of the data processing task given in advance to the learning data. Learn how. Then, the input data processing section inputs noise-added data obtained by adding the noise data learned by the learning section to the input data to the multilayer neural network. Based on the output, a result of the data processing task is determined.

このように、ノイズデータと多層ニューラルネットワークに含まれる前段の畳み込み層で用いられるフィルタとが直交性を有し、かつ、求められた所定のデータ処理タスクの結果と、学習用データに予め付与された所定のデータ処理タスクの結果とが一致するよう学習することにより、ノイズを付加しても、多層ニューラルネットワークを用いたデータ処理タスクを精度よく行うことができる。 In this way, the noise data and the filter used in the previous convolutional layer included in the multilayer neural network have orthogonality, and the results of the predetermined data processing task and the results of the predetermined data processing task are By learning to match the results of a predetermined data processing task, it is possible to perform a data processing task using a multilayer neural network with high accuracy even when noise is added.

また、前記学習部は、前記多層ニューラルネットワークに含まれる前段の畳み込み層で用いられるフィルタ群と、当該前段の畳み込み層で用いられるフィルタ群のフィルタを所定のストライドで畳み込む場合における、該畳み込む各領域に対応する前記ノイズデータの領域とが直交性を有するように前記フィルタと前記ノイズデータとを学習することができる。 Further, the learning unit is configured to convolve a filter group used in a previous convolution layer included in the multilayer neural network and a filter of the filter group used in the previous convolution layer with a predetermined stride, in each region to be convolved. The filter and the noise data can be learned such that the noise data region corresponding to the filter has orthogonality.

また、前記入力データは、画像であって、前記学習部は、所定の基準ノイズブロックと当該基準ノイズブロックを所定のシフトパターンに応じてシフトさせて得られる派生ノイズブロックとからなるノイズブロック群のうちの何れか一つを並べて配置することにより前記ノイズデータが生成されるよう学習することができる。 The input data is an image, and the learning unit is configured to generate a noise block group consisting of a predetermined reference noise block and a derived noise block obtained by shifting the reference noise block according to a predetermined shift pattern. It is possible to learn to generate the noise data by arranging any one of them side by side.

また、前記学習部は、前記ノイズブロック群の全てのノイズブロックと前記フィルタの夫々とが直交性を有するように学習することができる。 Further, the learning unit can learn such that all noise blocks of the noise block group and each of the filters have orthogonality.

また、前記学習部は、前記所定のストライドに応じた前記シフトパターンにより前記ノイズブロックをシフトさせることにより前記派生ノイズブロックを得ることができる。 Further, the learning section can obtain the derived noise block by shifting the noise block using the shift pattern according to the predetermined stride.

また、前記入力データは、画像であって、前記所定のストライドは前記フィルタのサイズの整数倍であって、前記学習部は、所定のノイズブロックと前記フィルタとが直交性を有するように学習し、前記ノイズデータは、学習により求めた一以上のノイズブロックを並べて配置したものであることができる。 Further, the input data is an image, the predetermined stride is an integral multiple of the size of the filter, and the learning unit learns so that the predetermined noise block and the filter have orthogonality. , the noise data may be one or more noise blocks obtained through learning arranged side by side.

また、前記入力データは、画像であって、前記ノイズ付加データは、前記入力データの画素のうちランダムに決定される画素を欠落させてから、前記ノイズデータを加算して得られたものであり、前記学習部は、前記学習用データの画素のうちランダムに決定される画素を欠落させてから前記多層ニューラルネットワークに入力して学習することができる。 Further, the input data is an image, and the noise-added data is obtained by adding the noise data after dropping randomly determined pixels among the pixels of the input data. , the learning unit can perform learning by omitting randomly determined pixels from among the pixels of the learning data and inputting the data to the multilayer neural network.

また、前記入力データは、画像であって、前記ノイズ付加データは、前記入力データにランダムノイズを付加してから、前記ノイズデータを加算して得られたものであり、
前記学習部は、前記学習用データにランダムノイズを付加してから前記多層ニューラルネットワークに入力して学習することができる。 Further, the input data is an image, and the noise-added data is obtained by adding random noise to the input data and then adding the noise data,
The learning section can perform learning by adding random noise to the learning data and inputting the data to the multilayer neural network.

本発明に係るデータ処理方法は、学習部が、入力データに対して所定のデータ処理タスクを行う多層ニューラルネットワークにおける各畳み込み層で用いられるフィルタと、前記入力データに加算されるノイズデータとを学習し、入力用データ処理部が、前記入力データに対して前記学習部により学習されたノイズデータを加算して得られたノイズ付加データを、前記多層ニューラルネットワークに入力して、前記多層ニューラルネットワークの出力に基づいて、前記データ処理タスクの結果を求めるデータ処理方法であって、前記学習部は、前記データ処理タスクの結果が予め付与された学習用データを前記多層ニューラルネットワークに入力して、前記ノイズデータと前記多層ニューラルネットワークに含まれる前段の畳み込み層で用いられるフィルタとが直交性を有し、かつ、前記求められた前記データ処理タスクの結果と、前記学習用データに予め付与された前記データ処理タスクの結果とが一致するよう学習することを特徴とする。 In the data processing method according to the present invention, the learning unit learns filters used in each convolution layer in a multilayer neural network that performs a predetermined data processing task on input data, and noise data to be added to the input data. Then, the input data processing unit inputs noise-added data obtained by adding noise data learned by the learning unit to the input data to the multilayer neural network, and inputs the noise-added data obtained by adding the noise data learned by the learning unit to the input data to A data processing method for obtaining a result of the data processing task based on an output, wherein the learning unit inputs learning data to which the result of the data processing task has been assigned in advance to the multilayer neural network; The noise data and the filter used in the previous convolutional layer included in the multilayer neural network have orthogonality, and the obtained result of the data processing task and the It is characterized by learning to match the results of data processing tasks.

本発明に係る学習済みモデルは、入力データに対して所定のデータ処理タスクを行うための多層ニューラルネットワークであって、前記入力データに対してノイズデータを加算して得られたノイズ付加データを、前記多層ニューラルネットワークに入力したときの出力に基づいて前記データ処理タスクの結果を求めるための多層ニューラルネットワークである学習済みモデルであって、前記データ処理タスクの結果が予め付与された学習用データを前記多層ニューラルネットワークに入力して、前記ノイズデータと前記多層ニューラルネットワークに含まれる前段の畳み込み層で用いられるフィルタとが直交性を有し、かつ、前記求められた前記データ処理タスクの結果と、前記学習用データに予め付与された前記データ処理タスクの結果とが一致するよう予め学習されたことを特徴とする。 The trained model according to the present invention is a multilayer neural network for performing a predetermined data processing task on input data, and the trained model is a multilayer neural network that performs a predetermined data processing task on input data, and uses noise-added data obtained by adding noise data to the input data. A trained model that is a multilayer neural network for determining the result of the data processing task based on the output when input to the multilayer neural network, the training data being provided with the result of the data processing task in advance. input to the multilayer neural network, the noise data and a filter used in a previous convolutional layer included in the multilayer neural network have orthogonality, and the obtained result of the data processing task; The learning data is characterized in that the learning data is trained in advance so that the result of the data processing task given in advance matches the result of the data processing task.

本発明に係る学習済みモデルによれば、所定のデータ処理タスクの結果が予め付与された学習用データを前記多層ニューラルネットワークに入力して、ノイズデータと多層ニューラルネットワークに含まれる前段の畳み込み層で用いられるフィルタとが直交性を有し、かつ、求められた所定のデータ処理タスクの結果と、学習用データに予め付与された所定のデータ処理タスクの結果とが一致するよう予め学習される。そして、入力データに対してノイズデータを加算して得られたノイズ付加データを、多層ニューラルネットワークに入力したときの出力に基づいて所定のデータ処理タスクの結果を求める。 According to the trained model according to the present invention, training data to which the results of a predetermined data processing task have been assigned in advance is input to the multilayer neural network, and the noise data and the previous convolution layer included in the multilayer neural network are The filters used are orthogonal and are trained in advance so that the obtained result of the predetermined data processing task matches the result of the predetermined data processing task given in advance to the learning data. Then, the result of a predetermined data processing task is determined based on the output when the noise-added data obtained by adding noise data to the input data is input to the multilayer neural network.

このように、多層ニューラルネットワークに含まれる前段の畳み込み層で用いられるフィルタ群と、ノイズデータとが直交性を有するように学習されることにより、ノイズを付加しても、多層ニューラルネットワークを用いたデータ処理タスクを精度よく行うことができる。 In this way, the filter group used in the previous convolutional layer included in the multilayer neural network is trained to have orthogonality with the noise data, so even if noise is added, the filter group used in the previous convolutional layer Data processing tasks can be performed with precision.

以上説明したように、本発明のデータ処理装置、データ処理方法及び学習済みモデルによれば、ノイズを付加しても、多層ニューラルネットワークを用いたデータ処理タスクを精度よく行うことができる、という効果が得られる。 As explained above, according to the data processing device, data processing method, and trained model of the present invention, even if noise is added, data processing tasks using a multilayer neural network can be performed with high accuracy. is obtained.

本発明の実施の形態に係る顔認証システムの構成を示す概略図である。1 is a schematic diagram showing the configuration of a face authentication system according to an embodiment of the present invention. 入力顔データの構成を示す図である。FIG. 3 is a diagram showing the structure of input face data. ノイズ付加入力顔データの構成を示す図である。FIG. 3 is a diagram showing the structure of noise-added input face data. ノイズブロックと多層ニューラルネットワークのフィルタを示す模式図である。FIG. 2 is a schematic diagram showing a noise block and a filter of a multilayer neural network. ノイズ付加前の顔画像とノイズ付加後の顔画像を示す模式図である。FIG. 3 is a schematic diagram showing a face image before noise is added and a face image after noise is added. フィルタとノイズ付加顔データとの畳み込みの特性を示す模式図である。FIG. 3 is a schematic diagram showing characteristics of convolution between a filter and noise-added face data. ノイズ画像の例を示す図である。FIG. 3 is a diagram showing an example of a noise image. 派生ノイズブロックの例を示す図である。FIG. 3 is a diagram showing an example of a derived noise block. 登録顔データの構成を示す図である。FIG. 3 is a diagram showing the configuration of registered face data. 認証済み登録顔データの構成を示す図である。FIG. 3 is a diagram showing the configuration of authenticated registered face data. 画像処理タスクの学習と直交制約学習を同時に進める方法を説明するための図である。FIG. 6 is a diagram for explaining a method for simultaneously proceeding with image processing task learning and orthogonal constraint learning. 認証履歴データの構成を示す図である。FIG. 3 is a diagram showing the configuration of authentication history data. 報知用画像の例を示す図である。FIG. 3 is a diagram showing an example of a notification image. 本発明の実施の形態に係る顔認証による学習処理の動作を示すフローチャートである。3 is a flowchart showing the operation of learning processing using face authentication according to an embodiment of the present invention. 本発明の実施の形態に係る画像処理装置による検知処理の動作を示すフローチャートである。7 is a flowchart showing the operation of detection processing by the image processing device according to the embodiment of the present invention. 本発明の実施の形態に係る顔認証装置による照合処理の動作を示すフローチャートである。2 is a flowchart showing the operation of matching processing by the face authentication device according to the embodiment of the present invention. 本発明の実施の形態に係る報知装置による報知処理の動作を示すフローチャートである。2 is a flowchart showing the operation of notification processing by the notification device according to the embodiment of the present invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。なお、本実施の形態では、ネットワーク型の顔認証システムに本発明を適用した場合を例に説明する。 Embodiments of the present invention will be described in detail below with reference to the drawings. Note that in this embodiment, a case where the present invention is applied to a network type face authentication system will be described as an example.

＜システム構成＞
以下、本発明を適用した顔認証システム１０００の概略構成を示した図１を参照し、本発明の実施の形態の構成を説明する。 <System configuration>
Hereinafter, the configuration of an embodiment of the present invention will be described with reference to FIG. 1 showing a schematic configuration of a face authentication system 1000 to which the present invention is applied.

（顔認証システム１０００）
顔認証システム１０００は、撮像装置１１００、ネットワーク１２００、画像処理装置１３００、顔認証装置１４００、及び報知装置１５００を有する。なお、顔認証装置１４００が、データ処理装置の一例である。 (Face recognition system 1000)
The face authentication system 1000 includes an imaging device 1100, a network 1200, an image processing device 1300, a face authentication device 1400, and a notification device 1500. Note that the face authentication device 1400 is an example of a data processing device.

（撮像装置１１００）
撮像装置１１００は、所定の領域を監視する目的で設置される監視カメラであり、監視対象領域内に滞在する人物の顔が撮影できる位置に取り付けられる。撮像装置１１００で撮影した監視画像は、画像処理装置１３００に送信される。 (Imaging device 1100)
The imaging device 1100 is a surveillance camera installed for the purpose of monitoring a predetermined area, and is installed at a position where it can photograph the face of a person staying within the area to be monitored. A monitoring image captured by the imaging device 1100 is transmitted to the image processing device 1300.

（ネットワーク１２００）
ネットワーク１２００は、画像処理装置１３００、顔認証装置１４００、及び報知装置１５００の間でデータの送受信を行なうために利用される回線である。ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）や、インターネット等の公衆回線が本発明のネットワーク１２００として利用できる。ネットワーク１２００上の電文については、公知のＶＰＮ技術等を用いて、電文を暗号化する等の安全措置が講じられることが望ましい。 (Network 1200)
The network 1200 is a line used for transmitting and receiving data between the image processing device 1300, the face authentication device 1400, and the notification device 1500. A LAN (Local Area Network) or a public line such as the Internet can be used as the network 1200 of the present invention. Regarding the messages on the network 1200, it is desirable to take security measures such as encrypting the messages using a known VPN technology or the like.

（画像処理装置１３００）
画像処理装置１３００は、ＣＰＵ、ＭＰＵ、周辺回路、端子、各種メモリなどから構成され、撮像装置１１００が撮影した画像に対して画像処理を施した結果を、ネットワーク１２００を介して顔認証装置１４００や報知装置１５００に送信する。以下、画像処理装置１３００を構成する画像処理部１３１０、記憶部１３２０、及び送受信部１３３０の各部について、詳細に説明する。 (Image processing device 1300)
The image processing device 1300 is composed of a CPU, an MPU, peripheral circuits, terminals, various types of memories, etc., and performs image processing on images taken by the imaging device 1100, and transmits the results to the face authentication device 1400 and the like via the network 1200. It is transmitted to the notification device 1500. Each of the image processing section 1310, storage section 1320, and transmission/reception section 1330 that constitute the image processing apparatus 1300 will be described in detail below.

（画像処理部１３１０）
画像処理部１３１０は、顔画像取得手段１３１１及びノイズ付加手段１３１２から構成される。 (Image processing unit 1310)
The image processing section 1310 includes a face image acquisition means 1311 and a noise addition means 1312.

（顔画像取得手段１３１１）
顔画像取得手段１３１１は、撮像装置１１００が撮影した監視画像から人物の顔画像を抽出し、入力顔画像とする。さらに入力顔画像に固有の顔画像識別子と撮影時刻を付与して、図２に示す構成の入力顔データ２００としてノイズ付加手段１３１２に送信するとともに、記憶部１３２０に格納する。 (Face image acquisition means 1311)
The face image acquisition unit 1311 extracts a face image of a person from the monitoring image taken by the imaging device 1100, and uses it as an input face image. Furthermore, a unique face image identifier and photographing time are added to the input face image, and the input face image is transmitted to the noise adding means 1312 as input face data 200 having the configuration shown in FIG. 2, and is stored in the storage unit 1320.

顔画像識別子２１０は入力顔画像２２０を一意に特定する為の識別子で、例えば１２８ビット整数を顔画像識別子２１０として用いて、初期値を０として、入力顔画像２２０に顔画像識別子２１０を付与するごとに顔画像識別子２１０の値をインクリメントする、等の方法がある。顔画像識別子２１０を不正に推定されないよう、顔画像識別子２１０にチェックサムなどを付与しても良い。監視画像中に複数の人物が存在する場合は、夫々の人物の顔画像を抽出して互いに異なる顔画像識別子２１０を付与し、ノイズ付加手段１３１２および記憶部１３２０に送信する。 The face image identifier 210 is an identifier for uniquely identifying the input face image 220. For example, a 128-bit integer is used as the face image identifier 210, the initial value is set to 0, and the face image identifier 210 is assigned to the input face image 220. There is a method such as incrementing the value of the face image identifier 210 for each time. A checksum or the like may be added to the facial image identifier 210 to prevent the facial image identifier 210 from being fraudulently estimated. If a plurality of people exist in the monitoring image, the face images of each person are extracted, assigned different face image identifiers 210, and transmitted to the noise addition means 1312 and the storage unit 1320.

顔画像の抽出方法については、従来から多数提案されており、適宜公知の方法を採用すれば良い。例えば、顔画像を学習した識別器と呼ばれるフィルタにて抽出する方法や、入力画像の二値化エッジ画像を生成し、当該エッジ画像において顔の形状である楕円形状を検出する方法などを採用すれば良い。 Many methods for extracting facial images have been proposed in the past, and any known method may be adopted as appropriate. For example, a method may be adopted in which a face image is extracted using a filter called a learned classifier, or a method in which a binarized edge image of the input image is generated and an elliptical shape, which is the shape of the face, is detected in the edge image. Good.

（ノイズ付加手段１３１２）
ノイズ付加手段１３１２は、被写体を目視で識別することが困難となるように、顔画像取得手段１３１１で抽出した人物の顔画像にノイズを付加したノイズ付加顔データ３２０を生成するとともに、顔画像取得手段１３１１が出力した顔画像と同一の顔画像識別子３１０を当該ノイズ付加顔データ３２０に付与して、図３に示す構成のノイズ付加入力顔データ３００として送受信部１３３０に送信する。なお、ノイズ付加顔データ３２０は、ノイズ付加データの一例である。 (Noise addition means 1312)
The noise addition means 1312 generates noise-added face data 320 by adding noise to the face image of the person extracted by the face image acquisition means 1311 so as to make it difficult to visually identify the subject, and also performs face image acquisition. The same face image identifier 310 as the face image output by the means 1311 is given to the noise-added face data 320, and the noise-added face data 320 is transmitted to the transmitting/receiving unit 1330 as the noise-added input face data 300 having the configuration shown in FIG. Note that the noise-added face data 320 is an example of noise-added data.

ノイズ付加顔データ３２０の作成法については、ノイズ付加顔データ３２０を入力とする、畳み込み層を含む多層ニューラルネットワーク１４４０、具体的には畳み込みニューラルネットワーク（ＣＮＮ；ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を用いる顔照合方式を適用する場合を例に説明する。当該多層ニューラルネットワーク１４４０は、後述するように、入力用データ処理部１４３０にて目視識別困難なノイズ付加顔データ３２０を入力され、記憶部１４２０に登録されている顔画像データと照合するための特徴量を出力する。なお、学習を終えた多層ニューラルネットワーク１４４０が、学習済みモデルの一例である。 Regarding the method for creating the noise-added face data 320, a face matching method using a multilayer neural network 1440 including a convolutional layer, specifically a convolutional neural network (CNN), which receives the noise-added face data 320 as input, is used. An application case will be explained as an example. As will be described later, the multilayer neural network 1440 has a feature for receiving noise-added face data 320 that is difficult to visually identify from the input data processing unit 1430 and comparing it with face image data registered in the storage unit 1420. Output the amount. Note that the multilayer neural network 1440 that has completed learning is an example of a trained model.

この場合、図４に示す通り、多層ニューラルネットワーク１４４０の前段の畳み込み層で用いられるフィルタ３０１０と同一の構造（次元、要素数）を持つノイズブロック３０２０を、フィルタ３０１０と直交性を有するように設定し、顔画像取得手段１３１１で抽出した顔画像に並べて加算することにより、ノイズ付加画像３０３０を生成する。すなわち、ノイズブロック３０２０を並べて配置したノイズ画像を入力顔画像２２０に加算することにより、ノイズ付加画像３０３０を生成する。当該ノイズ付加画像３０３０を、被写体を目視で識別することが困難なノイズ付加顔データ３２０とすることができる。ここで、「フィルタとノイズブロックが直交性を有する」とは、両者の要素をベクトルとして表したとき、フィルタの要素ベクトルとノイズブロックの要素ベクトルがベクトル空間で直交することを意味する。当該フィルタ要素ベクトルと当該ノイズブロック要素ベクトルが直交するとき、両者の内積はゼロとなる。なお、ノイズ画像は、ノイズデータの一例である。 In this case, as shown in FIG. 4, a noise block 3020 having the same structure (dimensions, number of elements) as the filter 3010 used in the convolutional layer in the previous stage of the multilayer neural network 1440 is set to have orthogonality with the filter 3010. Then, a noise-added image 3030 is generated by arranging and adding it to the face image extracted by the face image acquisition means 1311. That is, a noise-added image 3030 is generated by adding a noise image in which noise blocks 3020 are arranged side by side to the input face image 220. The noise-added image 3030 can be noise-added face data 320 that makes it difficult to visually identify the subject. Here, "the filter and the noise block have orthogonality" means that when the elements of both are expressed as vectors, the element vector of the filter and the element vector of the noise block are orthogonal in the vector space. When the filter element vector and the noise block element vector are orthogonal, their inner product becomes zero. Note that the noise image is an example of noise data.

図５にノイズ付加前の顔画像４００とノイズ付加画像４１０の模式図を示す。ノイズ付加画像４１０は、ノイズ付加前の顔画像４００と比べて、目視識別困難な画像である。 FIG. 5 shows a schematic diagram of a face image 400 before noise is added and a noise-added image 410. The noise-added image 410 is an image that is difficult to visually identify compared to the face image 400 before noise is added.

ここで、このように生成した目視識別困難なノイズ付加顔データ３２０が、多層ニューラルネットワーク１４４０を用いた顔照合処理の精度を低下させない原理について説明する。 Here, a principle will be described in which the noise-added face data 320 that is difficult to visually identify generated in this way does not reduce the accuracy of face matching processing using the multilayer neural network 1440.

多層ニューラルネットワーク１４４０における畳み込み処理は、畳み込み層におけるフィルタ３０１０と、多層ニューラルネットワーク１４４０に入力される入力画像（ここではノイズ付加画像３０３０、すなわち目視識別困難なノイズ付加顔データ３２０）の画像ブロックの要素との内積マップを出力する。画像ブロックは、入力画像において特定の畳み込み位置における内積計算の対象要素を含む領域である。畳み込み処理では、フィルタの位置を所定のストライド（ずらし幅）で入力画像上を順次走査して各位置（畳み込み位置）における画像ブロックとの内積を計算し、最終的には入力画像全体に対する内積マップを出力する。 The convolution process in the multilayer neural network 1440 is performed using the filter 3010 in the convolution layer and the image block elements of the input image (here, the noise-added image 3030, that is, the noise-added face data 320 that is difficult to visually identify) that is input to the multilayer neural network 1440. Outputs the dot product map. An image block is an area in an input image that includes a target element for inner product calculation at a specific convolution position. In convolution processing, the filter position is sequentially scanned over the input image with a predetermined stride (shift width), and the inner product with the image block at each position (convolution position) is calculated, and finally an inner product map for the entire input image is calculated. Output.

図６に示す通り、内積には「線形性」の特性があるため、フィルタ３０１０と目視識別困難なノイズ付加顔データ３１２０との畳み込みは、「フィルタ３０１０とノイズ付加前の顔画像３１３０との畳み込み」と「フィルタ３０１０とノイズブロックを並べたノイズ画像３１４０との畳み込み」との和となる。ストライドがフィルタ３０１０のサイズ（幅、高さ）の整数倍の場合、「フィルタ３０１０とノイズブロックを並べたノイズ画像３１４０」が直交性を有していれば、「フィルタ３０１０とノイズブロックを並べたノイズ画像３１４０との畳み込み」は任意の畳み込み位置で内積がゼロとなるため、「フィルタ３０１０とノイズ付加前の顔画像３１３０との畳み込み」の出力のみが残る。これは、ノイズブロックを付加していない顔画像との畳み込みであるから、多層ニューラルネットワーク１４４０からはノイズの有無に依らず同じ畳み込み結果（照合用の特徴）が出力される。すなわち、ここで述べたようなノイズを付加して目視識別困難なノイズ付加顔データ３２０を用いて照合処理を行っても照合精度の低下はない。 As shown in FIG. 6, since the inner product has the characteristic of "linearity," the convolution of the filter 3010 and the noise-added face data 3120 that is difficult to visually identify is the "convolution of the filter 3010 and the face image 3130 before adding noise." ” and “convolution of the filter 3010 and the noise image 3140 in which noise blocks are arranged.” When the stride is an integer multiple of the size (width, height) of the filter 3010, if the "noise image 3140 in which the filter 3010 and the noise block are arranged" has orthogonality, "the filter 3010 and the noise image 3140 in which the noise block is arranged Since the inner product of "convolution with noise image 3140" becomes zero at any convolution position, only the output of "convolution with filter 3010 and face image 3130 before noise addition" remains. Since this is a convolution with a face image to which no noise block is added, the multilayer neural network 1440 outputs the same convolution result (feature for matching) regardless of the presence or absence of noise. That is, even if the matching process is performed using the noise-added face data 320 that is difficult to visually identify due to the addition of noise as described herein, there is no reduction in matching accuracy.

このように、ストライドがフィルタ３０１０のサイズの整数倍となる場合、ノイズ画像のうち、フィルタ３０１０が畳み込まれる領域のノイズパターンは一定となるため、ノイズブロック単体とフィルタ３０１０との直交性を有していればよい。しかし、フィルタ３０１０のストライドがフィルタ３０１０のサイズの整数倍でない場合、ノイズ画像３１４０において畳み込み位置がノイズブロック３０２０の境界をまたぐ場合が生じるため、このような場合、ノイズブロック３０２０単体でフィルタ３０１０との直交性を確保するだけでは不十分となる。 In this way, when the stride is an integer multiple of the size of the filter 3010, the noise pattern in the region of the noise image where the filter 3010 is convolved is constant, so there is orthogonality between the noise block itself and the filter 3010. All you have to do is do it. However, if the stride of the filter 3010 is not an integral multiple of the size of the filter 3010, the convolution position in the noise image 3140 may cross the boundary of the noise block 3020. Merely ensuring orthogonality is not sufficient.

この様子を図７で説明する。図７（Ａ）ではノイズブロック３０２０の配置例として４種類のノイズブロックＮａ、Ｎｂ、Ｎｃ、Ｎｄを並べて配置したノイズ画像３２００が示されている。当該ノイズブロックＮａ、Ｎｂ、Ｎｃ、Ｎｄはフィルタ３０１０と直交する関係にある。このノイズ画像３２００に対してフィルタ３０１０を用いてストライド１で畳み込みを行うと、例えば、図７（Ｂ）に示すように、ノイズブロックをシフトさせて得られる、当該ノイズブロックＮａ、Ｎｂ、Ｎｃ、Ｎｄの要素の一部から構成される派生ノイズブロックＮｅに対しても畳み込みが行われる。このとき、当該ノイズブロックＮａ、Ｎｂ、Ｎｃ、Ｎｄがそれぞれノイズブロック単体でフィルタ３０１０と直交性を確保しているとはいえ、それだけでは当該派生ノイズブロックＮｅとフィルタ３０１０との直交性を有していることにはならない。 This situation will be explained with reference to FIG. FIG. 7A shows a noise image 3200 in which four types of noise blocks Na, Nb, Nc, and Nd are arranged side by side as an example of the arrangement of the noise blocks 3020. The noise blocks Na, Nb, Nc, and Nd are orthogonal to the filter 3010. When convolution is performed on this noise image 3200 using a filter 3010 with a stride of 1, for example, as shown in FIG. 7(B), the noise blocks Na, Nb, Nc, obtained by shifting the noise blocks, Convolution is also performed on the derived noise block Ne consisting of some of the elements of Nd. At this time, although the noise blocks Na, Nb, Nc, and Nd each ensure orthogonality with the filter 3010 as a single noise block, this alone does not guarantee orthogonality between the derived noise block Ne and the filter 3010. It doesn't mean that it is.

このような場合、フィルタのサイズとストライドとに基づいて、ノイズブロック３０２０を並べたノイズ画像３２００におけるフィルタを畳み込む各領域を求め、当該畳み込む各領域とフィルタ３０１０とが直交性を有するように設定する必要がある。ここで、図８（Ａ）のように、ノイズブロックＮａのみを並べて配置したノイズ画像３３１０の場合については、ノイズブロックＮａをシフトさせることにより、ノイズ画像における畳み込む各領域に相当するブロックを求めることができる。例えばフィルタのストライドを１とすると、フィルタの走査により、ノイズブロックＮａを並べたノイズ画像３３１０の、フィルタを畳み込む各領域は、ノイズブロックＮａの要素を行方向及び列方向にシフト（循環シフト）したものとなる。したがって、ノイズブロックＮａの要素を行方向及び列方向にシフトさせた複数の派生ノイズブロックを構成しておけば、ノイズブロックＮａを並べたノイズ画像３３１０では、任意の畳み込み位置でフィルタを畳み込む領域が、派生ノイズブロックのどれかと一致する。したがって、このようにノイズブロックＮａの要素を行方向及び列方向にシフトした派生ノイズブロックがフィルタ３０１０と直交するように設定すればよい。なお、派生ノイズブロックを作成する基準となるノイズブロックＮａは、基準ノイズブロックの一例である。また、ノイズブロックＮａとノイズブロックＮａをシフトさせた複数の派生ノイズブロックからなる一群は、ノイズブロック群の一例である。以下では、ノイズブロック群を構成する基準ノイズブロック及び派生ノイズブロックを、ノイズブロックと称することがある。 In such a case, each region in which the filter is convolved in the noise image 3200 in which the noise blocks 3020 are arranged is determined based on the size and stride of the filter, and each region to be convolved and the filter 3010 are set to have orthogonality. There is a need. Here, in the case of a noise image 3310 in which only noise blocks Na are arranged side by side as shown in FIG. 8(A), blocks corresponding to each area to be convolved in the noise image can be obtained by shifting the noise blocks Na. Can be done. For example, if the stride of the filter is 1, by scanning the filter, each area in which the filter is convolved in the noise image 3310 in which noise blocks Na are arranged is obtained by shifting (cyclically shifting) the elements of the noise block Na in the row and column directions. Become something. Therefore, by configuring a plurality of derived noise blocks in which the elements of noise block Na are shifted in the row and column directions, in the noise image 3310 in which noise blocks Na are arranged, there is a region where the filter is convolved at any convolution position. , matches any of the derived noise blocks. Therefore, the derived noise block obtained by shifting the elements of the noise block Na in the row and column directions may be set to be orthogonal to the filter 3010. Note that the noise block Na that serves as a reference for creating derived noise blocks is an example of a reference noise block. Further, a group consisting of a noise block Na and a plurality of derived noise blocks obtained by shifting the noise block Na is an example of a noise block group. Hereinafter, the reference noise block and the derived noise block that constitute the noise block group may be referred to as noise blocks.

図８（Ａ）では、ノイズブロックＮａのサイズを３×３としている。図８（Ｂ）では、ノイズブロックＮａの各要素をＮａ１～Ｎａ９で表し、フィルタ３０１０との畳み込みをストライド１で行う様子を示している。このとき、ノイズ画像３３１０のうちのフィルタを畳み込む各領域に対応する派生ノイズブロックのバリエーション３３３０は、図８（Ｃ）に示す通り、ノイズブロックＮａの要素を行方向及び列方向にシフトさせた９つのシフトのパターンとなる。したがって、単一のノイズブロック３０２０の一つをベースとして、その要素を行方向及び列方向にシフトさせた派生ノイズブロックを求め、派生ノイズブロックの各々がフィルタ３０１０と直交するように設定すればよい。 In FIG. 8A, the size of the noise block Na is 3×3. In FIG. 8B, each element of the noise block Na is represented by Na1 to Na9, and convolution with the filter 3010 is performed with a stride of 1. At this time, a variation 3330 of the derived noise block corresponding to each region in which the filter is convolved in the noise image 3310 is obtained by shifting the elements of the noise block Na in the row and column directions, as shown in FIG. 8(C). This results in a pattern of two shifts. Therefore, based on one of the single noise blocks 3020, derived noise blocks are obtained by shifting its elements in the row and column directions, and each derived noise block is set to be orthogonal to the filter 3010. .

なお、ストライドが２以上であってもよく、その場合には、ストライドに応じてノイズブロック３０２０の要素を行方向及び列方向にシフトした派生ノイズブロックを用いればよい。 Note that the stride may be 2 or more, and in that case, a derived noise block in which the elements of the noise block 3020 are shifted in the row direction and column direction according to the stride may be used.

また、フィルタ３０１０とノイズブロック３０２０とが直交するように設定するためには、多層ニューラルネットワーク１４４０の学習時に、多層ニューラルネットワーク１４４０のフィルタ３０１０と、多層ニューラルネットワーク１４４０の学習用画像に付加されるノイズブロック３０２０とが直交するようにも制約をつけて学習を行い、その学習結果を適用する方法がある。 In addition, in order to set the filter 3010 and the noise block 3020 to be orthogonal, it is necessary to set the filter 3010 of the multilayer neural network 1440 and the noise added to the training image of the multilayer neural network 1440 during learning of the multilayer neural network 1440. There is a method in which learning is performed with constraints such that block 3020 is orthogonal, and the learning results are applied.

例えば、フィルタ３０１０の係数をｗｆとし、ノイズブロック３０２０の値をｗｎとすると、以下の（１）式で表すような非直交度を表す直交制約用損失値ｌｏｓｓＣを定義することができる。ここで、ｉはフィルタのバリエーションを表し、ｊはストライド位置に応じたノイズブロック３０２０のバリエーションを表す。「・」は内積を表す。多層ニューラルネットワーク１４４０の学習の過程で当該非直交度もあわせて最小化することで、フィルタ３０１０と直交するノイズブロック３０２０が得られる。 For example, if the coefficient of the filter 3010 is wf and the value of the noise block 3020 is wn, then an orthogonality constraint loss value lossC representing the degree of non-orthogonality as expressed by the following equation (1) can be defined. Here, i represents a variation of the filter, and j represents a variation of the noise block 3020 depending on the stride position. "・" represents the inner product. By minimizing the degree of non-orthogonality during the learning process of the multilayer neural network 1440, a noise block 3020 that is orthogonal to the filter 3010 can be obtained.

（１）
(1)

なお、派生ノイズブロックを用いる方法においては、ノイズブロック３０２０のサイズは必ずしもフィルタ３０１０のサイズと一致させる必要はなく、フィルタ３０１０よりも小さいサイズ、あるいは大きいサイズでノイズブロック３０２０を想定し、ノイズブロック３０２０の要素をシフトさせて派生ノイズブロックを作成することも可能である。ただし、畳み込み処理の単位はあくまでフィルタ３０１０のサイズに対応する領域単位となるため、ノイズブロック３０２０を並べて作成したノイズ画像から切り出される、フィルタ３０１０と同じサイズの任意の領域が、フィルタ３０１０と直交するように学習を行う必要がある。 Note that in the method using derived noise blocks, the size of the noise block 3020 does not necessarily have to match the size of the filter 3010, but the noise block 3020 is assumed to be smaller or larger than the filter 3010, and the noise block 3020 is It is also possible to create a derived noise block by shifting the elements of . However, since the unit of convolution processing is a region unit corresponding to the size of the filter 3010, an arbitrary region of the same size as the filter 3010 cut out from the noise image created by arranging the noise blocks 3020 is orthogonal to the filter 3010. It is necessary to learn as follows.

例えば、図８（Ａ）（Ｂ）に示すフィルタのサイズを４×４とした場合、ノイズ画像３３１０においてフィルタで畳み込む領域に対応する領域の一つとして、同図（Ｂ）の左上隅から４×４の要素が切り出されるが、当該領域がフィルタ群の各フィルタと直交性を有するように学習する。この際、当該領域において同じ識別子を有する要素は同じ値になるよう制約を課して学習する。 For example, if the size of the filter shown in FIGS. 8(A) and 8(B) is 4×4, one of the areas corresponding to the area to be convolved with the filter in the noise image 3310 is 4×4 from the upper left corner of FIG. 8(B). ×4 elements are cut out, and the area is learned to have orthogonality with each filter in the filter group. At this time, learning is performed by imposing a constraint so that elements having the same identifier in the region have the same value.

（送受信部１３３０）
送受信部１３３０は、ノイズ付加手段１３１２が作成したノイズ付加入力顔データ３００を、ネットワーク１２００を介して顔認証装置１４００の送受信部１４１０に送信する。 (Transmission/reception section 1330)
The transmitter/receiver 1330 transmits the noise-added input face data 300 created by the noise adder 1312 to the transmitter/receiver 1410 of the face authentication device 1400 via the network 1200 .

また、後述するように顔認証装置１４００の送受信部１４１０から送信された、認証済み顔画像識別子の情報を受信し、認証済み顔画像識別子に対応する入力顔データ２００を記憶部１３２０から読み出して、ネットワーク１２００を介して報知装置１５００の受信部１５１０に送信する。 Further, as described later, information on an authenticated face image identifier transmitted from the transmitting/receiving unit 1410 of the face authentication device 1400 is received, and input face data 200 corresponding to the authenticated face image identifier is read from the storage unit 1320. It is transmitted to the receiving unit 1510 of the notification device 1500 via the network 1200.

（顔認証装置１４００）
顔認証装置１４００は、ＣＰＵ、ＭＰＵ、周辺回路、端子、各種メモリなどから構成され、画像処理装置１３００が送信したノイズ付加入力顔データ３００を受信し、当該ノイズ付加入力顔データ３００が、顔登録済み人物の顔データであるか否かを、記憶部１４２０に予め格納された登録済み顔データを参照して、入力用データ処理部１４３０で判定する。なお、顔登録済み人物の顔データであるか否かを判定する顔認証タスクが、データ処理タスクの一例である。 (Face recognition device 1400)
The face authentication device 1400 is composed of a CPU, an MPU, peripheral circuits, terminals, various types of memories, etc., and receives the noise-added input face data 300 transmitted by the image processing device 1300, and uses the noise-added input face data 300 to perform face registration. The input data processing unit 1430 determines whether the face data is of a registered person by referring to the registered face data stored in the storage unit 1420 in advance. Note that a face authentication task that determines whether the face data is of a person whose face has been registered is an example of a data processing task.

入力用データ処理部１４３０には、事前に学習された多層ニューラルネットワーク１４４０が実装されており、ノイズ付加入力顔データ３００のノイズ付加顔データ３２０を多層ニューラルネットワーク１４４０に入力して出力される特徴量と、記憶部１４２０に予め格納された登録顔データを当該多層ニューラルネットワーク１４４０に入力して出力される特徴量との類似度を比較することにより、ノイズ付加入力顔データ３００が当該登録顔データと一致するか否かを判定する。 The input data processing unit 1430 is equipped with a multilayer neural network 1440 that has been trained in advance, and inputs the noise-added face data 320 of the noise-added input face data 300 to the multi-layer neural network 1440 and outputs the feature amount. The noise-added input face data 300 is determined to be the registered face data by inputting the registered face data pre-stored in the storage unit 1420 into the multilayer neural network 1440 and comparing the similarity with the feature amount output. Determine if they match.

入力用データ処理部１４３０は、顔認証タスクにて一致と判定したノイズ付加入力顔データ３００に紐付けられた顔画像識別子を、認証済み顔画像識別子として、送受信部１４１０およびネットワーク１２００を介して、報知装置１５００の受信部１５１０に送信する。 The input data processing unit 1430 uses the face image identifier linked to the noise-added input face data 300 determined to be a match in the face authentication task as an authenticated face image identifier via the transmitting/receiving unit 1410 and the network 1200. It is transmitted to the receiving section 1510 of the notification device 1500.

以下、顔認証装置１４００を構成する送受信部１４１０、入力用データ処理部１４３０、及び記憶部１４２０の各部について、詳細に説明する。 Hereinafter, each of the transmitting/receiving section 1410, input data processing section 1430, and storage section 1420 that constitute the face authentication device 1400 will be described in detail.

（送受信部１４１０）
送受信部１４１０は、画像処理装置１３００が送信したノイズ付加入力顔データ３００を、ネットワーク１２００を介して受信し、入力用データ処理部１４３０に出力する。 (Transmission/reception section 1410)
The transmitting/receiving unit 1410 receives the noise-added input face data 300 transmitted by the image processing device 1300 via the network 1200 and outputs it to the input data processing unit 1430.

また、入力用データ処理部１４３０が出力した認証済み顔画像識別子を、ネットワーク１２００を介して画像処理装置１３００の送受信部１３３０に送信する。 Further, the authenticated face image identifier output by the input data processing unit 1430 is transmitted to the transmitting/receiving unit 1330 of the image processing device 1300 via the network 1200.

（記憶部１４２０）
記憶部１４２０には、予め顔登録された人物の顔データが、登録顔画像識別子が付与された登録顔データとして格納される。図９に示すように、登録顔データ６００は、登録顔画像識別子６１０、登録顔画像６２０、及び登録属性情報６３０から構成される。登録顔データ６００は、１人の登録人物に対して少なくとも１データが記憶される。複数の登録人物が存在する場合、記憶部１４２０には、異なる登録顔画像識別子６１０が付与された登録顔データ６００が複数記憶される。 (Storage unit 1420)
The storage unit 1420 stores face data of a person whose face has been registered in advance as registered face data to which a registered face image identifier has been added. As shown in FIG. 9, registered face data 600 includes a registered face image identifier 610, a registered face image 620, and registered attribute information 630. At least one piece of registered face data 600 is stored for one registered person. When a plurality of registered persons exist, the storage unit 1420 stores a plurality of pieces of registered face data 600 to which different registered face image identifiers 610 are assigned.

登録顔画像識別子６１０は、登録顔画像６２０を一意に特定する為の識別子で、例えば１２８ビット整数を用いる。例えば、初期値を０として新規に登録顔データ６００を作成する度に登録顔画像識別子６１０の値をインクリメントする、等の方法がある。登録顔画像識別子６１０を不正に推定されないよう、チェックサムなどを登録顔画像識別子６１０に付与しても良い。 The registered face image identifier 610 is an identifier for uniquely identifying the registered face image 620, and uses, for example, a 128-bit integer. For example, there is a method in which the initial value is set to 0 and the value of the registered face image identifier 610 is incremented every time new registered face data 600 is created. A checksum or the like may be added to the registered facial image identifier 610 to prevent the registered facial image identifier 610 from being fraudulently estimated.

登録顔画像６２０は、登録人物の顔画像であり、顔認証タスクにおける多層ニューラルネットワーク１４４０への入力データ、および報知装置１５００における報知用画像として利用される。 The registered face image 620 is a face image of a registered person, and is used as input data to the multilayer neural network 1440 in the face authentication task and as a notification image in the notification device 1500.

登録属性情報６３０は、氏名や性別、年齢、所属組織などの登録人物に付随する属性情報を表す。 The registered attribute information 630 represents attribute information associated with a registered person, such as name, gender, age, and affiliated organization.

（入力用データ処理部１４３０）
入力用データ処理部１４３０は、画像処理装置１３００が出力したノイズ付加入力顔データ３００と、記憶部１４２０に予め格納された登録顔データ６００とを照合し、当該ノイズ付加入力顔データ３００が登録顔データ６００のいずれかと一致するか否かを判定する。そして、当該ノイズ付加入力顔データ３００と一致する登録顔データ６００が存在する場合は、一致した登録顔データ６００の登録顔画像６２０および登録属性情報６３０に、認証済み顔画像識別子７１０を付与して、図１０に示す認証済み登録顔データ７００を作成し、送受信部１４１０およびネットワーク１２００を介して、報知装置１５００の受信部１５１０に送信する。 (Input data processing unit 1430)
The input data processing unit 1430 compares the noise-added input face data 300 outputted by the image processing device 1300 with the registered face data 600 stored in advance in the storage unit 1420, and determines whether the noise-added input face data 300 is a registered face. It is determined whether or not it matches any of the data 600. If registered face data 600 that matches the noise-added input face data 300 exists, an authenticated face image identifier 710 is added to the registered face image 620 and registered attribute information 630 of the matched registered face data 600. , creates authenticated registered face data 700 shown in FIG.

入力用データ処理部１４３０におけるノイズ付加入力顔データ３００と登録顔データ６００との一致判定は、ノイズ付加入力顔データ３００のノイズ付加顔データ３２０と登録顔データ６００の登録顔画像６２０とを照合し、多層ニューラルネットワーク１４４０から出力される特徴量の類似度の閾値判定などによって行う。 The input data processing unit 1430 determines whether the noise-added input face data 300 matches the registered face data 600 by comparing the noise-added face data 320 of the noise-added input face data 300 with the registered face image 620 of the registered face data 600. , by determining a threshold value of the similarity of the feature amounts output from the multilayer neural network 1440.

入力用データ処理部１４３０は、画像処理装置１３００が出力したノイズ付加入力顔データ３００のノイズ付加顔データ３２０を学習済みの多層ニューラルネットワーク１４４０に入力して、照合用に入力画像特徴量を抽出し、さらに、記憶部１４２０に格納されている登録顔データ６００の登録顔画像６２０を当該多層ニューラルネットワーク１４４０に入力して照合用に登録画像特徴量を抽出し、当該入力画像特徴量と当該登録画像特徴量を照合して、両者の一致判定を行う。 The input data processing unit 1430 inputs the noise-added face data 320 of the noise-added input face data 300 outputted by the image processing device 1300 to the trained multilayer neural network 1440, and extracts input image features for matching. , Furthermore, the registered face image 620 of the registered face data 600 stored in the storage unit 1420 is input to the multilayer neural network 1440 to extract the registered image feature amount for comparison, and the input image feature amount and the registered image are extracted. The feature values are compared to determine whether they match.

多層ニューラルネットワーク１４４０に予め用意した大量の学習用顔画像データを入力して特徴量を算出し、当該特徴量のペアが同一人物のものである場合は類似度が大きくなり、異なる人物のものであれば類似度が小さくなるように、多層ニューラルネットワーク１４４０を学習しておく（データ処理タスクの学習）。 A large amount of training face image data prepared in advance is input to the multilayer neural network 1440 to calculate feature quantities, and if the pair of feature quantities belong to the same person, the degree of similarity increases, and if the pair of feature quantities belong to different people. If so, the multilayer neural network 1440 is trained so that the degree of similarity decreases (learning of data processing tasks).

具体的には、顔画像がどの人物のものであるかを識別するための人物識別子が予め付与された大量の学習用顔画像データを入力部１６１０により受け付け、学習部１６３０が、多層ニューラルネットワーク１４４０を学習し、学習結果を、記憶部１４２０に格納する。なお、人物識別子が、データ処理タスクの学習結果の一例である。 Specifically, the input unit 1610 receives a large amount of learning face image data to which a person identifier for identifying which person the face image belongs to is received in advance, and the learning unit 1630 uses the multilayer neural network 1440. The learning result is stored in the storage unit 1420. Note that the person identifier is an example of the learning result of the data processing task.

学習部１６３０は、データ処理タスクの学習に加え、さらに、ノイズ付加手段１３１２の説明で述べたように、当該多層ニューラルネットワーク１４４０の前段の畳み込み層のフィルタ３０１０と、ノイズブロック３０２０とが直交するようにも学習を行う（直交制約学習）。 In addition to learning the data processing task, the learning unit 1630 also performs training so that the filter 3010 of the convolution layer in the previous stage of the multilayer neural network 1440 and the noise block 3020 are orthogonal, as described in the explanation of the noise adding means 1312. Also performs learning (orthogonal constraint learning).

データ処理タスクの学習では、例えばＶＧＧ１６等の一般的な深層学習用モデルに、学習用顔画像データに含まれる人物識別子と同数の出力をもつ全結合層を接続し、人物識別子に対応した出力を１としたｏｎｅ－ｈｏｔベクトルを教師データとしてデータ処理タスク用損失値ｌｏｓｓＭを算出し、ｌｏｓｓＭが小さくなるように学習を進めればよい。 In learning data processing tasks, for example, a fully connected layer with the same number of outputs as person identifiers included in the training face image data is connected to a general deep learning model such as VGG16, and outputs corresponding to the person identifiers are connected to a general deep learning model such as VGG16. It is sufficient to calculate the loss value lossM for the data processing task using the one-hot vector set to 1 as training data, and proceed with learning so that lossM becomes smaller.

具体的には、ソフトマックス関数を用いて全結合層の出力を確率出力に変換し、教師データとのクロスエントロピーを、ｌｏｓｓＭと定義する。そして、フィルタ３０１０毎に、ｌｏｓｓＭを、当該フィルタ３０１０の要素の値で微分した微分値に基づいて、ｌｏｓｓＭが小さくなるように、フィルタ３０１０の要素の値を学習すればよい。 Specifically, the output of the fully connected layer is converted into a probability output using a softmax function, and the cross entropy with the training data is defined as lossM. Then, for each filter 3010, the values of the elements of the filter 3010 may be learned so that the lossM becomes smaller based on the differential value obtained by differentiating the lossM with the value of the element of the filter 3010.

直交制約学習では、多層ニューラルネットワーク１４４０の前段の畳み込み層におけるフィルタ３０１０の重みベクトルと、ノイズブロック３０２０の重みベクトルとの非直交度を表す直交制約用損失値ｌｏｓｓＣを、上記（１）式に従って算出し、ｌｏｓｓＣが小さくなるように学習を進めればよい。 In orthogonal constraint learning, the orthogonal constraint loss value lossC representing the degree of non-orthogonality between the weight vector of the filter 3010 in the convolutional layer at the previous stage of the multilayer neural network 1440 and the weight vector of the noise block 3020 is calculated according to the above formula (1). Then, learning should be performed so that lossC becomes smaller.

あるいは、多層ニューラルネットワーク１４４０の前段の畳み込み層におけるフィルタ３０１０の重みベクトルと、ノイズブロック３０２０の重みベクトルとの成す角度θの余弦値（ｃｏｓθ）を非直交度の指標とし、当該余弦値の絶対値と、直角の場合の正解の余弦値（ゼロ）との差分絶対値を直交制約用損失値ｌｏｓｓＣとして算出し、ｌｏｓｓＣが小さくなるように学習を進めてもよい。 Alternatively, the cosine value (cos θ) of the angle θ formed by the weight vector of the filter 3010 in the convolution layer in the previous stage of the multilayer neural network 1440 and the weight vector of the noise block 3020 is used as an index of non-orthogonality, and the absolute value of the cosine value The absolute value of the difference between the correct cosine value (zero) in the case of a right angle may be calculated as the orthogonality constraint loss value lossC, and learning may be performed so that lossC becomes smaller.

具体的には、ノイズブロック３０２０毎に、全てのフィルタ３０１０に関するｌｏｓｓＣの和を、当該ノイズブロック３０２０の要素の値で微分した微分値に基づいて、ｌｏｓｓＣの和が小さくなるように、当該ノイズブロック３０２０の要素の値を学習すればよい。 Specifically, for each noise block 3020, based on the differential value obtained by differentiating the sum of lossC for all filters 3010 by the value of the element of the noise block 3020, the noise block is adjusted such that the sum of lossC becomes smaller. It is sufficient to learn the value of the element 3020.

なお、データ処理タスク用損失値ｌｏｓｓＭと直交制約用損失値ｌｏｓｓＣを、バランス係数αを用いて統合した全体損失値ｌｏｓｓＴを求め、当該全体損失値ｌｏｓｓＴが小さくなるように学習を行うことにより、データ処理タスクの学習と直交制約学習を同時に進めることができる。 Note that the data processing task loss value lossM and the orthogonal constraint loss value lossC are integrated using a balance coefficient α to obtain an overall loss value lossT, and by performing learning so that the overall loss value lossT is small, the data Processing task learning and orthogonal constraint learning can proceed simultaneously.

（２）
(2)

以上の制約付き学習の様子を図１１に示す。 FIG. 11 shows the above-described learning with constraints.

具体的には、学習部１６３０は、図１１に示すように、多層ニューラルネットワーク１４４０に大量の学習用顔画像データを入力して算出される特徴量を用いて計算されるデータ処理タスク用損失値ｌｏｓｓＭと、多層ニューラルネットワーク１４４０の前段の畳み込み層におけるフィルタ３０１０の重みベクトルと、ノイズブロック３０２０の重みベクトルとを用いて計算される直交制約用損失値ｌｏｓｓＣとを統合した全体損失値ｌｏｓｓＴが小さくなるように、多層ニューラルネットワーク１４４０及びノイズブロック３０２０の学習を行う。 Specifically, as shown in FIG. 11, the learning unit 1630 calculates a loss value for a data processing task, which is calculated using feature quantities calculated by inputting a large amount of face image data for learning into a multilayer neural network 1440. The overall loss value lossT, which is the integration of lossM and the orthogonality constraint loss value lossC calculated using the weight vector of the filter 3010 in the convolution layer in the previous stage of the multilayer neural network 1440 and the weight vector of the noise block 3020, becomes smaller. As such, the multilayer neural network 1440 and the noise block 3020 are trained.

例えば、フィルタ３０１０毎に、全てのノイズブロック３０２０に関する全体損失値ｌｏｓｓＴの和を、当該フィルタ３０１０の要素の値で微分した微分値に基づいて、全体損失値ｌｏｓｓＴの和が小さくなるように、当該フィルタ３０１０の要素の値を学習すること（図１１の１１１１参照）と、ノイズブロック３０２０毎に、全てのフィルタ３０１０に関するｌｏｓｓＴの和を、当該ノイズブロック３０２０の要素の値で微分した微分値に基づいて、ｌｏｓｓＴの和が小さくなるように、当該ノイズブロック３０２０の要素の値を学習し、フィルタ３０１０毎に、全てのノイズブロック３０２０に関するｌｏｓｓＴの和を、当該フィルタ３０１０の要素の値で微分した微分値に基づいて、ｌｏｓｓＴの和が小さくなるように、当該フィルタ３０１０の要素の値を学習すること（図１１の１１１２参照）と、を交互に繰り返す。 For example, for each filter 3010, based on the differential value obtained by differentiating the sum of the total loss values lossT for all the noise blocks 3020 by the values of the elements of the filter 3010, the calculation is performed so that the sum of the total loss values lossT becomes smaller. By learning the values of the elements of the filter 3010 (see 1111 in FIG. 11), and for each noise block 3020, based on the differential value obtained by differentiating the sum of lossT for all filters 3010 by the value of the element of the noise block 3020. Then, the value of the element of the noise block 3020 is learned so that the sum of lossT becomes small, and for each filter 3010, the sum of lossT for all the noise blocks 3020 is differentiated by the value of the element of the filter 3010. Based on the values, learning the values of the elements of the filter 3010 (see 1112 in FIG. 11) is alternately repeated so that the sum of lossT becomes smaller.

（報知装置１５００）
報知装置１５００は、ＣＰＵ、ＭＰＵ、周辺回路、端子、各種メモリ、表示用モニタなどから構成され、画像処理装置１３００から送信された入力顔データ２００と、顔認証装置１４００から送信された認証済み登録顔データ７００から、認証履歴作成部１５２０で認証履歴データを作成し、記憶部１５３０に格納する。記憶部１５３０に格納した認証履歴データは、報知部１５４０に表示される。 (Notification device 1500)
The notification device 1500 is composed of a CPU, an MPU, peripheral circuits, terminals, various memories, a display monitor, etc., and receives input face data 200 sent from the image processing device 1300 and authenticated registration sent from the face authentication device 1400. Authentication history data is created from the face data 700 by the authentication history creation section 1520 and stored in the storage section 1530. The authentication history data stored in the storage section 1530 is displayed on the notification section 1540.

（受信部１５１０）
受信部１５１０は、画像処理装置１３００が送信した入力顔データ２００、および、顔認証装置１４００が送信した認証済み登録顔データ７００を受信し、認証履歴作成部１５２０に出力する。 (Receiving unit 1510)
The receiving unit 1510 receives the input face data 200 transmitted by the image processing device 1300 and the authenticated registered face data 700 transmitted by the face authentication device 1400, and outputs them to the authentication history creation unit 1520.

（認証履歴作成部１５２０）
認証履歴作成部１５２０は、受信部１５１０を介して受信した入力顔データ２００と認証済み登録顔データ７００から認証履歴データを作成し、記憶部１５３０に格納する。 (Authentication history creation unit 1520)
Authentication history creation section 1520 creates authentication history data from input face data 200 received via reception section 1510 and authenticated registered face data 700, and stores it in storage section 1530.

図１２に認証履歴データ８００の構成を示す。認証履歴データ８００の作成に際しては、まず、入力顔データ２００の顔画像識別子２１０と、認証済み登録顔データ７００の認証済み顔画像識別子７１０を突合し、同一の顔画像識別子を持つ入力顔データ２００と認証済み登録顔データ７００を選択する。そして、選択した入力顔データ２００の入力顔画像２２０と撮影時刻２３０、認証済み登録顔データ７００の登録顔画像６２０と登録属性情報６３０を連結し、入力顔画像８１０、撮影時刻８２０、登録顔画像８３０、及び登録属性情報８４０から構成される認証履歴データ８００を作成する。 FIG. 12 shows the configuration of authentication history data 800. When creating the authentication history data 800, first, the face image identifier 210 of the input face data 200 is compared with the authenticated face image identifier 710 of the authenticated registered face data 700, and the input face data 200 having the same face image identifier is Select authenticated registered face data 700. Then, the input face image 220 of the selected input face data 200 and the photographing time 230, the registered face image 620 of the authenticated registered face data 700 and the registered attribute information 630 are connected, and the input face image 810, the photographing time 820, the registered face image 830 and registered attribute information 840 is created.

（記憶部１５３０）
記憶部１５３０は、認証履歴作成部１５２０が作成した認証履歴データ８００を格納する。格納する認証履歴データ８００の上限数は、記憶部１５３０の容量に基づいて決定し、格納している認証履歴データ８００の数が上限を超えた場合は、認証履歴データ８００の撮影時刻８２０を参照して、認証履歴データ８００の数が上限数に戻るまで古い履歴から順に削除する。 (Storage unit 1530)
Storage unit 1530 stores authentication history data 800 created by authentication history creation unit 1520. The upper limit number of authentication history data 800 to be stored is determined based on the capacity of the storage unit 1530, and if the number of stored authentication history data 800 exceeds the upper limit, refer to the photographing time 820 of the authentication history data 800. Then, the authentication history data 800 is deleted in order from the oldest history until the number returns to the upper limit number.

（報知部１５４０）
報知部１５４０は、記憶部１５３０に格納された認証履歴データ８００から、報知用画像を作成し、報知部１５４０を構成する表示用モニタ等に表示する。 (Notification Department 1540)
The notification unit 1540 creates a notification image from the authentication history data 800 stored in the storage unit 1530, and displays the image on a display monitor or the like that constitutes the notification unit 1540.

図１３に報知用画像９００の作成例を示す。図１３の例では、認証履歴データ８００に含まれる入力顔画像８１０、登録顔画像８３０、撮影時刻８２０、登録属性情報８４０を、夫々報知情報９１０、９２０、９３０、９４０として並べて配置し、画像化している。これにより、オペレータが、目視で、登録人物が検知されたか否かを確認する。 FIG. 13 shows an example of creating a notification image 900. In the example of FIG. 13, input face image 810, registered face image 830, photographing time 820, and registered attribute information 840 included in authentication history data 800 are arranged side by side as notification information 910, 920, 930, and 940, respectively, and are converted into images. ing. Thereby, the operator visually confirms whether or not the registered person has been detected.

＜顔認証システムの動作＞
以下、図１４～図１７に示したフローチャートを参照しつつ、本発明を適用した顔認証システム１０００の動作を説明する。なお、顔認証装置１４００の記憶部１４２０に登録顔データ６００が予め格納されている場合を例に説明する。 <Operation of face recognition system>
The operation of the face authentication system 1000 to which the present invention is applied will be described below with reference to the flowcharts shown in FIGS. 14 to 17. Note that an example will be described in which the registered face data 600 is stored in advance in the storage unit 1420 of the face authentication device 1400.

図１４に示す顔認証装置１４００の学習処理は事前に実行される。学習処理では、最初に、入力部１６１０により、顔認証システム１０００の撮像装置１１００から取得した、監視対象領域を映した監視画像から抽出された顔画像であって、一致すべき登録顔データ６００の登録顔画像識別子６１０が予め付与された、大量の学習用顔画像データを受け付ける（ステップＳ１０１０）。 The learning process of the face authentication device 1400 shown in FIG. 14 is executed in advance. In the learning process, first, the input unit 1610 selects the registered face data 600 that is a face image that is extracted from a monitoring image showing a monitoring target area and that is obtained from the imaging device 1100 of the face authentication system 1000. A large amount of learning face image data to which a registered face image identifier 610 has been assigned in advance is received (step S1010).

そして、学習用顔画像データ毎に、顔画像を、多層ニューラルネットワーク１４４０に入力して、学習用に画像特徴量を抽出する（ステップＳ１０３０）。 Then, for each face image data for learning, the face image is input to the multilayer neural network 1440, and image feature amounts for learning are extracted (step S1030).

そして、学習用の画像特徴量とノイズブロックとを用いて、ｌｏｓｓＴを計算する（ステップＳ１０４０）。計算されたｌｏｓｓＴに基づいて、ｌｏｓｓＴを最適化するように、ノイズブロック及び多層ニューラルネットワーク１４４０のフィルタ群を学習する（ステップＳ１０５０）。 Then, lossT is calculated using the image feature amount for learning and the noise block (step S1040). Based on the calculated lossT, the noise block and the filter group of the multilayer neural network 1440 are learned to optimize the lossT (step S1050).

そして、ノイズブロック及び多層ニューラルネットワーク１４４０のフィルタ群の学習が収束したか否かを判定し（ステップＳ１０６０）、例えば、上記ステップＳ１０３０～Ｓ１０５０の繰り返し回数が上限回数に到達した場合に、当該学習が収束したと判定し、ステップＳ１０７０へ移行する。一方、上記ステップＳ１０３０～Ｓ１０５０の繰り返し回数が上限回数に到達していない場合に、当該学習が収束していないと判定し、上記ステップＳ１０３０へ戻る。 Then, it is determined whether the learning of the noise block and the filter group of the multilayer neural network 1440 has converged (step S1060). For example, if the number of repetitions of steps S1030 to S1050 has reached the upper limit, the learning is It is determined that the process has converged, and the process moves to step S1070. On the other hand, if the number of repetitions of steps S1030 to S1050 has not reached the upper limit, it is determined that the learning has not converged, and the process returns to step S1030.

そして、最終的に学習されたノイズブロック及び多層ニューラルネットワーク１４４０のフィルタ群を記憶部１４２０に格納し、学習処理を終了する。 Then, the finally learned noise block and the filter group of the multilayer neural network 1440 are stored in the storage unit 1420, and the learning process is ended.

そして、記憶部１４２０に格納されたノイズブロックのデータが、送受信部１４１０により、ネットワーク１２００を介して、画像処理装置１３００へ送信され、画像処理装置１３００の記憶部１３２０に、ノイズブロックのデータが格納される。 Then, the noise block data stored in the storage unit 1420 is transmitted to the image processing device 1300 via the network 1200 by the transmission/reception unit 1410, and the noise block data is stored in the storage unit 1320 of the image processing device 1300. be done.

図１５に示す画像処理装置１３００の検知処理は、監視画像を１枚取得するごとに実行される。検知処理では、最初に、撮像装置１１００から、監視対象領域を映した監視画像を取得する（ステップＳ１１１０）。そして、画像処理部１３１０の顔画像取得手段１３１１は当該監視画像から顔画像を抽出する（ステップＳ１１２０）。顔画像取得手段１３１１は、１つ以上の顔画像が抽出されたか否かを判定し（ステップＳ１１３０）、顔画像が全く抽出されなかった場合には以降の処理を行わず、検知処理を終了する。一方、１つ以上の顔画像が抽出された場合、顔画像取得手段１３１１は検出した顔画像から入力顔データ２００を作成して記憶部１３２０に記憶した後、ステップＳ１１４０に処理を移行させる。 The detection process of the image processing apparatus 1300 shown in FIG. 15 is executed every time one monitoring image is acquired. In the detection process, first, a monitoring image showing a monitoring target area is acquired from the imaging device 1100 (step S1110). Then, the face image acquisition unit 1311 of the image processing unit 1310 extracts a face image from the monitoring image (step S1120). The face image acquisition unit 1311 determines whether one or more face images have been extracted (step S1130), and if no face images have been extracted, the detection process is ended without performing subsequent processing. . On the other hand, if one or more face images have been extracted, the face image acquisition unit 1311 creates input face data 200 from the detected face images and stores it in the storage unit 1320, and then moves the process to step S1140.

以下のステップＳ１１４０～Ｓ１１５０の処理は、顔画像取得手段１３１１が抽出した顔画像ごとに行われる。 The following steps S1140 to S1150 are performed for each face image extracted by the face image acquisition means 1311.

ノイズ付加手段１３１２は、記憶部１３２０に記憶されているノイズブロックのデータを用いて、顔画像取得手段１３１１が作成した入力顔データ２００にノイズを付加し、ノイズ付加入力顔データ３００を作成する（ステップＳ１１４０）。この際、入力顔データ２００の入力顔画像２２０に対して、ノイズブロックを並べて配置したノイズ画像を加算することにより、目視による識別が困難なノイズ付加顔データ３２０を含むノイズ付加入力顔データ３００を作成する。次に、作成したノイズ付加入力顔データ３００を顔認証装置１４００に送信する（ステップＳ１１５０）。 The noise addition unit 1312 adds noise to the input face data 200 created by the face image acquisition unit 1311 using the noise block data stored in the storage unit 1320 to create noise-added input face data 300 ( Step S1140). At this time, by adding a noise image in which noise blocks are arranged side by side to the input face image 220 of the input face data 200, the noise-added input face data 300 including the noise-added face data 320 that is difficult to visually identify is generated. create. Next, the generated noise-added input face data 300 is transmitted to the face authentication device 1400 (step S1150).

全ての顔画像についてステップＳ１１４０～１１５０の処理が終わると、画像処理部１３１０は、検知処理を終了する。 When the processing of steps S1140 to S1150 is completed for all face images, the image processing unit 1310 ends the detection processing.

ノイズ付加入力顔データ３００を受信した顔認証装置１４００は、図１６に示す照合処理を行う。なお、以下に説明する図１６の顔認証装置１４００の動作は、ノイズ付加入力顔データ３００を１つ受信するごとに実行される。 The face authentication device 1400 that has received the noise-added input face data 300 performs the matching process shown in FIG. 16. Note that the operation of the face authentication device 1400 in FIG. 16 described below is executed every time one piece of noise-added input face data 300 is received.

顔認証装置１４００は、記憶部１４２０に予め登録されている全ての登録顔データ６００と、受信したノイズ付加入力顔データ３００とを照合する（ステップＳ１２１０）。照合には、学習済みの多層ニューラルネットワーク１４４０を使用して、受信したノイズ付加入力顔データ３００の特徴量と登録顔データ６００の特徴量とを比較し、両者の類似度を計算する。照合の結果得られた最大の類似度が所定の認証閾値以上であるかを判定し（ステップＳ１２２０）、認証閾値以上であった場合、受信したノイズ付加入力顔データ３００と最大の類似度を示した登録顔データ６００は同一人物に由来すると判定し、当該登録顔データ６００を報知装置１５００に送信し（ステップＳ１２３０）、照合処理を終了する。 The face authentication device 1400 compares all registered face data 600 registered in advance in the storage unit 1420 with the received noise-added input face data 300 (step S1210). For matching, the trained multilayer neural network 1440 is used to compare the features of the received noise-added input face data 300 and the features of the registered face data 600, and calculate the degree of similarity between the two. It is determined whether the maximum similarity obtained as a result of the matching is equal to or greater than a predetermined authentication threshold (step S1220), and if it is equal to or greater than the authentication threshold, the maximum similarity is indicated with the received noise-added input face data 300. It is determined that the registered face data 600 originated from the same person, and the registered face data 600 is transmitted to the notification device 1500 (step S1230), and the matching process ends.

また、入力顔データ２００と登録顔データ６００とを受信した報知装置１５００は、図１７に示す報知処理を行う。なお、以下に説明する図１７の報知装置１５００の動作は、入力顔データ２００と登録顔データ６００のペアを１つ受信するごとに実行される。 Further, the notification device 1500 that has received the input face data 200 and the registered face data 600 performs the notification process shown in FIG. 17. Note that the operation of the notification device 1500 in FIG. 17 described below is executed every time one pair of input face data 200 and registered face data 600 is received.

入力顔データ２００と登録顔データ６００を受信した報知装置１５００は、入力顔データ２００と登録顔データ６００を用いて認証履歴データ８００を作成し（Ｓ１３１０）、認証履歴データ８００に係る特定人物の検知を報知して（Ｓ１３２０）、報知処理を終了する。例えば、報知装置１５００が備える表示部（図示しない）から認証履歴データ８００に係る報知用画像９００を表示出力することにより報知する。 The notification device 1500 that has received the input face data 200 and the registered face data 600 creates authentication history data 800 using the input face data 200 and the registered face data 600 (S1310), and detects a specific person related to the authentication history data 800. is notified (S1320), and the notification process is ended. For example, the notification is made by displaying and outputting the notification image 900 related to the authentication history data 800 from a display unit (not shown) included in the notification device 1500.

以上説明してきたように、本発明の実施の形態に係る顔認証システム１０００では、画像処理装置１３００は、入力顔画像に対してノイズ画像を加算して得られたノイズ付加顔データを、顔認証装置１４００へ送信する。顔認証装置１４００は、ノイズ付加顔データと登録顔画像とを、多層ニューラルネットワークを用いて照合する。ここで、顔認証装置１４００は、予め、学習用顔画像データを畳み込みニューラルネットワークに入力して、多層ニューラルネットワークに含まれる前段の畳み込み層で用いられるフィルタと、ノイズ画像（フィルタを所定のストライドで畳み込む場合は、当該畳み込む各領域に対応するノイズ画像の領域）とが直交性を有し、かつ、多層ニューラルネットワークにより求められる登録人物に対応した確率出力と、同一人物に対応する出力要素のみ１となるｏｎｅ－ｈｏｔベクトルの教師データとのクロスエントロピーとして算出する損失値が小さくなるように学習しておく。これにより、入力顔データにノイズを付加しても、多層ニューラルネットワークを用いた顔認証タスクを精度よく行うことができる。 As described above, in the face recognition system 1000 according to the embodiment of the present invention, the image processing device 1300 uses noise-added face data obtained by adding a noise image to an input face image for face recognition. Send to device 1400. The face authentication device 1400 matches the noise-added face data and the registered face image using a multilayer neural network. Here, the face authentication device 1400 inputs the learning face image data into the convolutional neural network in advance, and inputs the filter used in the previous convolutional layer included in the multilayer neural network and the noise image (filter with a predetermined stride). In the case of convolution, the regions of the noise image corresponding to each region to be convolved are orthogonal, and only the probability output corresponding to the registered person obtained by the multilayer neural network and the output element corresponding to the same person are 1 Learning is performed so that the loss value calculated as the cross entropy of the one-hot vector with the teacher data becomes small. As a result, even if noise is added to the input facial data, face recognition tasks using a multilayer neural network can be performed with high accuracy.

また、画像処理装置１３００は、撮像画像から抽出した人物画像に対して顔認証装置１４００にて照合を行う多層ニューラルネットワーク１４４０に含まれる前段の畳み込み層のフィルタ３０１０と直交性を有するよう予め学習されたノイズブロック３０２０に基づくノイズ画像を付加することにより生成される、顔画像の被写体を目視で識別することが困難なノイズ付加入力顔データ３００を顔認証装置１４００に送信する。これにより、多層ニューラルネットワーク１４４０を用いた照合処理を、精度を低下させずに実行するとともに、顔認証装置１４００から、ノイズ付加入力顔データ３００に対応する認証済み顔画像識別子を受信した場合、当該認証済み顔画像識別子に対応する入力顔データ２００を報知装置１５００に送信することにより、無関係な人物のプライバシーに配慮して検知対象者の顔画像データを取得することができる。すなわち、監視画像に映る人物が、予め顔登録済みの人物であるか否かを、目視による識別性が低いノイズ付加情報によって判定し、顔登録済みの人物の情報に限って、顔画像などの識別性の高い情報を記録、報知することが可能となる。なお、本実施の形態では、顔画像を用いて顔認証しているが、これに限らず、人物領域を示す人物画像を用いて、顔だけでなく体格や服装等の類似度も含めた人物認証を行ってもよい。 Furthermore, the image processing device 1300 is trained in advance to have orthogonality with the filter 3010 of the previous convolution layer included in the multilayer neural network 1440 that performs matching in the face authentication device 1400 on a person image extracted from a captured image. The noise-added input face data 300, which is generated by adding a noise image based on the noise block 3020 and which makes it difficult to visually identify the subject of the face image, is transmitted to the face authentication device 1400. As a result, the matching process using the multilayer neural network 1440 can be executed without reducing accuracy, and when an authenticated face image identifier corresponding to the noise-added input face data 300 is received from the face authentication device 1400, the corresponding By transmitting the input face data 200 corresponding to the authenticated face image identifier to the notification device 1500, it is possible to acquire the face image data of the detection target person while considering the privacy of unrelated persons. In other words, it is determined whether or not a person appearing in a surveillance image is a person whose face has been registered in advance, using noise-added information that is difficult to distinguish visually, and only information about persons whose faces have been registered is used to determine whether or not the person is a person whose face has been registered in advance. It becomes possible to record and broadcast highly identifiable information. Note that in this embodiment, face recognition is performed using a face image; however, the present invention is not limited to this, and a person image indicating a person area is used to perform face recognition, including not only the face but also the degree of similarity in physique, clothing, etc. Authentication may also be performed.

また、ローカルに設置した撮像装置１１００で撮影した人物画像を、ネットワーク経由で顔認証装置１４００に伝送し、特定人物の認証や検知を行うネットワーク型の顔認証システム１０００に関し、被撮影者のプライバシーに配慮したデータ伝送方式を実現することができる。 Furthermore, regarding the network-type face authentication system 1000 that transmits a person image taken by a locally installed imaging device 1100 to the face authentication device 1400 via a network to authenticate or detect a specific person, it is important to protect the privacy of the person being photographed. This makes it possible to realize a data transmission method that takes into account consideration.

＜変形例＞
以上、本発明の好適な実施形態について説明してきたが、本発明はこれらの実施形態に限定されるものではない。例えば、本実施形態では、画像処理装置１３００が、認証済み顔画像識別子に対応する入力顔データ２００を報知装置１５００に送信する場合を例に説明したが、これに限定されるものではなく、他の装置、例えば、顔認証装置１４００に送信するようにしてもよい。 <Modified example>
Although preferred embodiments of the present invention have been described above, the present invention is not limited to these embodiments. For example, in this embodiment, the case where the image processing device 1300 transmits the input face data 200 corresponding to the authenticated face image identifier to the notification device 1500 has been described as an example, but the present invention is not limited to this, and other methods may be used. The information may be transmitted to a device such as the face authentication device 1400, for example.

また、本実施形態では、ノイズ付加手段１３１２は、顔画像取得手段１３１１が作成した入力顔画像２２０に対してノイズブロック３０２０に基づくノイズ画像を付加したが、それに先立ち、入力顔画像２２０に対してランダム画素欠落又はランダムノイズ付加を施したうえで、ノイズブロック３０２０を付加してもよい。例えば、入力顔画像２２０の画素のうちランダムに決定される画素を欠落させてから、ノイズブロック３０２０を並べたノイズ画像を加算したり、入力顔画像２２０にランダムノイズを付加してから、ノイズブロック３０２０を並べたノイズ画像を加算するようにしてもよい。 Furthermore, in the present embodiment, the noise addition means 1312 adds a noise image based on the noise block 3020 to the input face image 220 created by the face image acquisition means 1311. The noise block 3020 may be added after random pixel deletion or random noise addition. For example, a randomly determined pixel of the input face image 220 is deleted, and then a noise image in which noise blocks 3020 are arranged is added, or random noise is added to the input face image 220, and then the noise block 3020 is added. 3020 noise images may be added.

ただし、入力顔画像２２０に対してランダム画素欠落又はランダムノイズ付加を施した画像に対する照合精度は、多層ニューラルネットワーク１４４０の学習時に確保しておくことが好ましい。すなわち、学習用顔画像データに対して、ランダム画素欠落やランダムノイズ付加を施した上で、多層ニューラルネットワーク１４４０に、学習用顔画像データを入力して登録人物に対応した照合確率を出力し、同一人物に対応する出力要素のみ１となるｏｎｅ－ｈｏｔベクトルの教師データとのクロスエントロピーとして算出する損失値が小さくなるように学習しておけばよい。この際、入力顔画像２２０に対して施されるランダム画素欠落の割合（全画素数に対する欠落の割合）又はランダムノイズ付加の強さは、学習用画像データに対して施されたものと同程度とすることが好適である。このように、入力顔画像２２０に対してランダム画素欠落やランダムノイズ付加を施す処理を行うことで、入力顔画像２２０に対するノイズ付加に加え、さらにノイズブロック３０２０自体の周期性などの特性をも隠すことが可能となり、ノイズブロックの値を解析しにくくすることができる。 However, it is preferable to ensure matching accuracy for an image obtained by performing random pixel deletion or random noise addition to the input face image 220 during learning of the multilayer neural network 1440. That is, after applying random pixel deletion and random noise addition to the learning face image data, the learning face image data is input to the multilayer neural network 1440, and the matching probability corresponding to the registered person is outputted. It is sufficient to learn so that the loss value calculated as the cross entropy with the teacher data of the one-hot vector in which only the output elements corresponding to the same person becomes 1 is small. At this time, the rate of random pixel missing (ratio of missing pixels to the total number of pixels) applied to the input face image 220 or the strength of random noise addition is the same as that applied to the training image data. It is preferable that In this way, by performing the processing of random pixel deletion and random noise addition to the input face image 220, in addition to adding noise to the input face image 220, it also hides characteristics such as periodicity of the noise block 3020 itself. This makes it possible to make it difficult to analyze the value of the noise block.

また、本実施形態では、ノイズ付加手段１３１２は、顔画像取得手段１３１１が作成した入力顔画像２２０に対してノイズブロック３０２０を付加したが、複数の異なるサイズのノイズブロック３０２０を同時に、かつ同様に学習しておき、それぞれのノイズブロック３０２０を配置したノイズ画像の各々を、重畳して入力顔画像２２０に付加してもよい。複数の異なるサイズのどちらの当該ノイズブロックも多層ニューラルネットワーク１４４０のフィルタ３０１０と直交するよう学習できているため、両者を重畳してもそのノイズの影響を排除でき、データ処理タスクに影響しない。このように、一つのノイズブロック群に含まれる各ノイズブロック以外のノイズ画像のバリエーションを増やすことができる。また、異なるサイズのノイズブロックから作成したノイズ画像を合成することにより、見かけのノイズ周期を大きくカモフラージュすることができ、ノイズの特性を目視で推測しにくくすることが可能となる。例えば、ノイズブロックのサイズとして、５×５、３×３の２つのバリエーションを考える。すると、単独のノイズブロックとしてはそれぞれ、５、および３のノイズ周期であるが、各ノイズブロックを重畳加算した合成ノイズ画像ではノイズ周期が５×３＝１５となり、また、当該合成ノイズ画像を構成するペアのバリエーションも、２５パターン（５×５）×９パターン（３×３）＝２２５パターンに増やすことができる。 Furthermore, in the present embodiment, the noise adding unit 1312 adds the noise block 3020 to the input face image 220 created by the face image acquiring unit 1311, but it adds a plurality of noise blocks 3020 of different sizes simultaneously and in the same way. After learning, each noise image in which each noise block 3020 is arranged may be superimposed and added to the input face image 220. Since both of the plurality of noise blocks of different sizes can be learned to be orthogonal to the filter 3010 of the multilayer neural network 1440, even if they are superimposed, the influence of the noise can be eliminated and the data processing task will not be affected. In this way, variations in noise images other than each noise block included in one noise block group can be increased. Furthermore, by combining noise images created from noise blocks of different sizes, the apparent noise period can be largely camouflaged, making it difficult to visually estimate the characteristics of the noise. For example, consider two variations of the size of the noise block: 5×5 and 3×3. Then, as a single noise block, the noise period is 5 and 3, respectively, but in the composite noise image obtained by superimposing and adding each noise block, the noise period is 5 × 3 = 15, and the noise period is 5 × 3 = 15. The number of variations of pairs can also be increased to 25 patterns (5 x 5) x 9 patterns (3 x 3) = 225 patterns.

また、ノイズ画像全体に任意の重みを乗じた上で入力顔画像２２０に付加してもよい。フィルタの畳み込む領域に対応する当該ノイズ画像の領域は、多層ニューラルネットワーク１４４０のフィルタ３０１０と直交性を有するよう学習できているため、ノイズ画像全体に任意の重みを乗算しても直交性を有したままであるため、そのノイズの影響を排除でき、データ処理タスクに影響しない。同様に、複数種類のノイズブロックの夫々に基づいて生成されたノイズ画像の各々を重畳して入力顔画像２２０に付加する場合についても、各ノイズ画像全体に任意の重みを乗じた上で重畳して入力顔画像２２０に付加してもよい。ノイズ画像に乗じる重みにより、ノイズ付加による目視識別困難性を調節できる。従って、所定のＳ／Ｎ比に基づいて任意の重みの値を設定したり、Ｓ／Ｎ比を基準に任意の重みの値を決定すればよい。さらに、重みの値をランダムに設定してもよいし、抽出した顔画像領域毎に異なる重みの値を設定してもよい。 Alternatively, the entire noise image may be multiplied by an arbitrary weight and then added to the input face image 220. The region of the noise image corresponding to the convolution region of the filter has been learned to have orthogonality with the filter 3010 of the multilayer neural network 1440, so even if the entire noise image is multiplied by an arbitrary weight, it will not have orthogonality. Because it remains the same, the effects of that noise can be eliminated and it does not affect data processing tasks. Similarly, when superimposing each noise image generated based on each of multiple types of noise blocks and adding it to the input face image 220, each noise image as a whole is multiplied by an arbitrary weight and then superimposed. may be added to the input face image 220. The difficulty of visual identification due to addition of noise can be adjusted by the weight multiplied by the noise image. Therefore, an arbitrary weight value may be set based on a predetermined S/N ratio, or an arbitrary weight value may be determined based on the S/N ratio. Furthermore, the weight values may be set randomly, or different weight values may be set for each extracted facial image area.

さらに、本実施形態では、ノイズ付加手段１３１２は、顔画像取得手段１３１１が作成した入力顔画像２２０に対してノイズブロック３０２０を付加したが、ノイズブロック３０２０の学習の際に、ノイズブロックの各要素の値の大きさや空間的バラツキが所定の範囲に収まるような制約を追加することでノイズブロックの特定の要素が目立つことがないように、すなわちノイズブロックの周期性を目立ちにくくしてもよい。 Furthermore, in the present embodiment, the noise addition means 1312 adds the noise block 3020 to the input face image 220 created by the face image acquisition means 1311, but when learning the noise block 3020, each element of the noise block By adding a constraint such that the size of the value of and spatial variation fall within a predetermined range, specific elements of the noise block may be made less noticeable, that is, the periodicity of the noise block may be made less noticeable.

また、本実施形態では、データ処理タスクとして、顔画像を用いた顔認証を行っているが、これに限らず、データ処理タスクとして、顔画像や人物領域画像から、性別や年齢、体格や服装等の属性情報を推定するようにしても良い。 In addition, in this embodiment, face recognition using a face image is performed as a data processing task, but the data processing task is not limited to this. It is also possible to estimate attribute information such as.

また、本実施の形態では、監視カメラ画像から顔画像を検出して、当該顔画像にノイズ付加を施し画像処理を行うとしたが、監視カメラ画像全体にノイズ付加を施し、顔領域検出や人物の姿勢推定、人物の密度推定といった画像処理を、ノイズ付加画像に対して行う構成としても良い。 Furthermore, in this embodiment, a face image is detected from a surveillance camera image, noise is added to the face image, and image processing is performed. Image processing such as posture estimation and person density estimation may be performed on the noise-added image.

また、学習部１６３０が、顔認証装置１４００とは別の学習装置に設けられ、当該学習装置で学習された多層ニューラルネットワーク１４４０が、顔認証装置１４００に記憶されるようにしてもよい。 Further, the learning unit 1630 may be provided in a learning device other than the face authentication device 1400, and the multilayer neural network 1440 learned by the learning device may be stored in the face authentication device 1400.

また、入力データが、画像である場合を例に説明したが、これに限定されるものではなく、入力データが、画像以外のデータであってもよい。例えば、入力データが、音声データであって、データ処理タスクが、音声認識又は話者認識であってもよい。この場合には、音声データから求められるスペクトルデータに対してノイズブロックを並べたノイズデータを加算することによりノイズ付加し、畳み込み層を含む多層ニューラルネットワークを用いて、音声認識又は話者認識を行えばよい。あるいは、音声信号に対してノイズブロックを時系列方向に並べたノイズデータを加算することによりノイズ付加し、畳み込み層を含む多層ニューラルネットワークを用いて、特徴量を求め、当該特徴量から、ＲＮＮ（ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋｓ）を用いて、音声認識を行えばよい。 Further, although the case where the input data is an image has been described as an example, the present invention is not limited to this, and the input data may be data other than images. For example, the input data may be voice data and the data processing task may be voice recognition or speaker recognition. In this case, noise is added by adding noise data obtained by arranging noise blocks to spectrum data obtained from speech data, and a multilayer neural network including a convolution layer is used to perform speech recognition or speaker recognition. That's fine. Alternatively, noise is added to the audio signal by adding noise data in which noise blocks are arranged in a time-series direction, features are obtained using a multilayer neural network including convolutional layers, and RNN ( Speech recognition may be performed using Recurrent Neural Networks).

また、ノイズブロックを左上から隙間なく敷き詰めるように並べて配置したノイズ画像を加算してノイズ付加を行う場合を例に説明したが、これに限定されるものではない。例えば、多層ニューラルネットワークに含まれる前段の畳み込み層で用いられるフィルタ群のストライドが該フィルタサイズを超える場合、フィルタをずらす前後で、フィルタ間には「ストライド値－フィルタサイズ」の隙間が生じる。例えば、フィルタのサイズが３×３、ストライドが４とすると、畳み込み演算時に適用されるフィルタ間には、４－３＝１の隙間が生じる。この隙間の部分では畳み込み演算がスキップされるため、ノイズブロックをこの隙間の分だけ離して配置したノイズ画像を加算してノイズ付加を行ってもよい。この場合に、ノイズ画像に配置されたノイズブロックの隙間となる部分の値は所定値（例えば、０）又はランダムに設定された画素値となる。なお、このようにノイズブロックを離して配置可能なストライドの場合は、複数のノイズブロックをまたぐような畳み込み演算を回避できるため、ノイズブロック学習の際、必ずしもノイズブロックの要素をシフトさせて派生ノイズブロックを生成する必要はない。 Further, although an example has been described in which noise is added by adding noise images in which noise blocks are arranged side by side from the upper left side without gaps, the present invention is not limited to this. For example, if the stride of the filter group used in the previous convolutional layer included in the multilayer neural network exceeds the filter size, a gap of "stride value - filter size" will occur between the filters before and after shifting the filters. For example, if the filter size is 3×3 and the stride is 4, there will be a gap of 4-3=1 between the filters applied during the convolution operation. Since the convolution calculation is skipped in this gap, noise may be added by adding noise images in which noise blocks are placed apart by this gap. In this case, the value of the gap between the noise blocks arranged in the noise image is a predetermined value (for example, 0) or a randomly set pixel value. In addition, in the case of a stride that allows noise blocks to be placed apart, it is possible to avoid convolution operations that span multiple noise blocks, so when learning noise blocks, it is not necessary to shift the elements of the noise block to calculate the derived noise. There is no need to generate blocks.

また、入力画像とノイズ画像との位置を合わせて加算する場合を例に説明したが、これに限定されるものではない。例えば、ノイズ画像を、オフセット値だけＸ軸方向及びＹ軸方向の少なくとも一方向にずらしてから、入力画像に加算するようにしてもよい。これにより、入力画像とノイズ画像の同じペアであっても、ノイズ付加画像のバリエーションを増やすことができる。ただし、ノイズ付加画像中で、当該オフセットに相当する部分では、ノイズが付加されていない領域（ノイズ未適用領域）が生じるため、フィルタとの直交性は確保されなくなる。したがって、当該ノイズ付加画像を多層ニューラルネットワークに入力して、多層ニューラルネットワークに含まれる前段の畳み込み層で畳み込み演算を行う際は、このノイズ未適用領域とノイズ適用領域が混在する領域の畳み込み演算結果は、データ処理タスクの性能を低下させる可能性がある。そこで、例えば、畳み込み層のフィルタを、オフセット分ずらした位置より適用開始し、ノイズ画像を加算した領域でのみ畳み込み処理を実行するようにすればよい。そのためには、オフセット分ずらしてデータ処理タスクの学習を行い、同じオフセットの条件下でデータ処理タスク精度を高めておくことが望ましい。以上の処理で、このオフセットの影響を排除可能となる。 Further, although the case where the input image and the noise image are aligned and added together has been described as an example, the present invention is not limited to this. For example, the noise image may be shifted in at least one of the X-axis direction and the Y-axis direction by an offset value before being added to the input image. Thereby, even if the input image and the noise image are the same pair, variations in the noise-added image can be increased. However, in the noise-added image, in a portion corresponding to the offset, an area where noise is not added (noise-unapplied area) occurs, so orthogonality with the filter is no longer ensured. Therefore, when inputting the noise-added image to a multilayer neural network and performing a convolution operation in the previous convolution layer included in the multilayer neural network, the convolution operation result of this area where noise-unapplied areas and noise-applied areas are mixed is can reduce the performance of data processing tasks. Therefore, for example, the application of the filter of the convolution layer may be started from a position shifted by the offset, and the convolution process may be executed only in the area to which the noise image has been added. To this end, it is desirable to perform learning on the data processing task by shifting the amount of the offset to improve the accuracy of the data processing task under the same offset conditions. The above processing makes it possible to eliminate the influence of this offset.

以上のように、当業者は本発明の範囲内で、実施される形態に合わせて様々な変更を行うことができる。 As described above, those skilled in the art can make various changes within the scope of the present invention according to the embodiment.

１０００顔認証システム
１１００撮像装置
１１１０顔画像
１１２０人物
１２００ネットワーク
１３００画像処理装置
１３１０画像処理部
１３１１顔画像取得手段
１３１２ノイズ付加手段
１３２０記憶部
１３３０送受信部
１４００顔認証装置
１４１０送受信部
１４２０記憶部
１４３０入力用データ処理部
１４４０多層ニューラルネットワーク
１６１０入力部
１６３０学習部
３０１０フィルタ
３０２０ノイズブロック
３０３０ノイズ付加画像
３２００、３３１０ノイズ画像 1000 Face recognition system 1100 Imaging device 1110 Face image 1120 Person 1200 Network 1300 Image processing device 1310 Image processing section 1311 Face image acquisition means 1312 Noise addition means 1320 Storage section 1330 Transmission/reception section 1400 Face authentication device 1410 Transmission/reception section 1420 Storage section 1430 For input Data processing unit 1440 Multilayer neural network 1610 Input unit 1630 Learning unit 3010 Filter 3020 Noise block 3030 Noise added images 3200, 3310 Noise image

Claims

a learning unit that learns filters used in each convolution layer in a multilayer neural network that performs a predetermined data processing task on input data, and noise data that is added to the input data;
Noise-added data obtained by adding noise data learned by the learning unit to the input data is input to the multilayer neural network, and the data processing task is performed based on the output of the multilayer neural network. an input data processing unit that obtains the result of
A data processing device comprising:
The learning unit inputs learning data to which the results of the data processing task have been assigned in advance to the multilayer neural network, and the noise data and the filter used in the previous convolution layer included in the multilayer neural network are configured to input the learning data to the multilayer neural network. A data processing device having orthogonality and learning so that the obtained result of the data processing task matches the result of the data processing task given in advance to the learning data. .

The learning unit corresponds to each region to be convolved when convolving a filter group used in a previous convolution layer included in the multilayer neural network and a filter in the filter group used in the previous convolution layer with a predetermined stride. 2. The data processing device according to claim 1, wherein the filter and the noise data are learned so that the noise data region has orthogonality with the noise data region.

The input data is an image,
The learning unit arranges one of a noise block group consisting of a predetermined reference noise block and a derived noise block obtained by shifting the reference noise block according to a predetermined shift pattern. The data processing device according to claim 2, wherein the data processing device learns to generate the noise data.

4. The data processing device according to claim 3, wherein the learning unit learns so that all noise blocks of the noise block group and each of the filters have orthogonality.

The data processing device according to claim 4, wherein the learning unit obtains the derived noise block by shifting the noise block using the shift pattern according to the stride.

The input data is an image,
The stride is an integer multiple of the size of the filter,
The learning unit learns so that a predetermined noise block and the filter have orthogonality,
3. The data processing device according to claim 2, wherein the noise data is obtained by arranging one or more noise blocks obtained through learning.

The input data is an image,
The noise-added data is obtained by adding the noise data after dropping randomly determined pixels among the pixels of the input data,
The data processing according to any one of claims 1 to 6, wherein the learning section deletes randomly determined pixels among the pixels of the learning data and then inputs the data to the multilayer neural network for learning. Device.

The input data is an image,
The noise-added data is obtained by adding random noise to the input data and then adding the noise data,
7. The data processing device according to claim 1, wherein the learning section adds random noise to the learning data and then inputs the data to the multilayer neural network for learning.

a learning unit learns a filter used in each convolution layer in a multilayer neural network that performs a predetermined data processing task on input data, and noise data to be added to the input data;
The input data processing unit inputs noise-added data obtained by adding the noise data learned by the learning unit to the input data to the multilayer neural network, and outputs the noise added data to the multilayer neural network. determining a result of the data processing task based on the data processing method, the data processing method comprising:
The learning unit inputs learning data to which the results of the data processing task have been assigned in advance to the multilayer neural network, and the noise data and the filter used in the previous convolution layer included in the multilayer neural network are configured to input the learning data to the multilayer neural network. A data processing method having orthogonality and learning so that the obtained result of the data processing task matches the result of the data processing task assigned in advance to the learning data. .