JP2021089493A

JP2021089493A - Information processing apparatus and learning method thereof

Info

Publication number: JP2021089493A
Application number: JP2019218346A
Authority: JP
Inventors: 加藤　政美; Masami Kato; 政美加藤; 克彦森; Katsuhiko Mori; 野村　修; Osamu Nomura; 修野村
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-12-02
Filing date: 2019-12-02
Publication date: 2021-06-10
Anticipated expiration: 2039-12-02
Also published as: JP7398938B2

Abstract

To allow for more robust pattern recognition for a diversity of variations of processing object data.SOLUTION: An information processing apparatus connectable to a sensing device includes setting means which sets a data acquisition condition in the sensing device, first processing means which uses a first neural network (NN) to perform hierarchical feature extraction processing on data obtained by the sensing device, and second processing means which uses a feature map in an intermediate layer of the first NN to generate regression data indicative of a data acquisition condition for use in following data acquisition in the sensing device. The setting means sets the data acquisition condition indicated by the regression data to the sensing device.SELECTED DRAWING: Figure 3

Description

本発明は、ニューラルネットワークを用いたパターン認識処理に関するものである。 The present invention relates to a pattern recognition process using a neural network.

認識対象の変動に対して頑健なパターン認識を可能にする手法として、コンボリューショナルニューラルネットワーク（以下ＣＮＮと略記する）に代表される階層的な演算手法が注目されている。例えば、非特許文献１には、深層学習技術に基づくパターン認識手法の様々な応用例・実装例が開示されている。 As a method that enables robust pattern recognition with respect to fluctuations in the recognition target, a hierarchical calculation method represented by a convolutional neural network (hereinafter abbreviated as CNN) is drawing attention. For example, Non-Patent Document 1 discloses various application examples and implementation examples of a pattern recognition method based on a deep learning technique.

また、認識対象物の撮影環境（照明や被写体の状態等）の大きな変動に対応する手法として、特許文献１では、撮影デバイスの撮影条件を所定期間毎に変化させて画像中の顔検出確率を向上させる手法が開示されている。また、特許文献２では、顔検出の結果に基づいて撮像デバイスのゲインや露光時間を制御し、検出した人物の属性認識処理に好適な条件で画像データを再取得する手法が開示されている。 Further, as a method for dealing with large fluctuations in the shooting environment (lighting, subject state, etc.) of the recognition object, in Patent Document 1, the shooting conditions of the shooting device are changed at predetermined intervals to determine the face detection probability in the image. Techniques for improvement are disclosed. Further, Patent Document 2 discloses a method of controlling the gain and exposure time of an imaging device based on the result of face detection and reacquiring image data under conditions suitable for attribute recognition processing of the detected person.

特開２０１４−１２７９９９号公報Japanese Unexamined Patent Publication No. 2014-127999 特開２０１７−０９８７４６号公報Japanese Unexamined Patent Publication No. 2017-098746

Yann LeCun, Koray Kavukcuoglu and Clement Farabet, "Convolutional Networks and Applications in Vision", Proc. International Symposium on Circuits and Systems (ISCAS'10), IEEE, 2010Yann LeCun, Koray Kavukcuoglu and Clement Farabet, "Convolutional Networks and Applications in Vision", Proc. International Symposium on Circuits and Systems (ISCAS'10), IEEE, 2010

しかしながら、特許文献１に記載の手法では、撮影条件を所定期間毎に変化させる構成であるため、常に適切な画像の取得が可能となるわけではない。また、特許文献２に記載の手法では、撮影環境の多様な変動に対して最適な撮影条件の変更テーブルを事前に決定することが困難であるという課題がある。また、同一フレーム画像の領域毎に最適な撮影条件が異なる場合には対応することができない。 However, the method described in Patent Document 1 has a configuration in which the photographing conditions are changed at predetermined intervals, so that it is not always possible to acquire an appropriate image. Further, the method described in Patent Document 2 has a problem that it is difficult to determine in advance a change table of optimum shooting conditions for various fluctuations in the shooting environment. In addition, it is not possible to deal with cases where the optimum shooting conditions are different for each region of the same frame image.

本発明は、このような問題に鑑みてなされたものであり、処理対象データの多様な変動に対してよりロバストなパターン認識を可能とする技術を提供することを目的としている。 The present invention has been made in view of such a problem, and an object of the present invention is to provide a technique capable of more robust pattern recognition for various fluctuations of data to be processed.

上述の問題点を解決するため、本発明に係る情報処理装置は以下の構成を備える。すなわち、センシングデバイスと接続可能な情報処理装置であって、
前記センシングデバイスにおけるデータ取得条件を設定する設定手段と、
前記センシングデバイスにより得られたデータに対して第１のニューラルネットワーク（ＮＮ）を使用して階層的な特徴抽出処理を実行する第１の処理手段と、
前記第１のＮＮの中間層における特徴マップを使用して、前記センシングデバイスにより後続のデータ取得で使用されるデータ取得条件を示す回帰データを生成する第２の処理手段と、
を有し、
前記設定手段は、前記回帰データに示されるデータ取得条件を前記センシングデバイスに設定する。 In order to solve the above-mentioned problems, the information processing apparatus according to the present invention has the following configurations. That is, it is an information processing device that can be connected to a sensing device.
A setting means for setting data acquisition conditions in the sensing device, and
A first processing means for executing a hierarchical feature extraction process using a first neural network (NN) on the data obtained by the sensing device, and
A second processing means that uses the feature map in the middle layer of the first NN to generate regression data indicating the data acquisition conditions used in subsequent data acquisition by the sensing device.
Have,
The setting means sets the data acquisition conditions shown in the regression data in the sensing device.

本発明によれば、処理対象データの多様な変動に対してよりロバストなパターン認識を可能とする技術を提供することができる。 According to the present invention, it is possible to provide a technique that enables more robust pattern recognition for various fluctuations in the data to be processed.

第１実施形態における学習処理の動作フローチャートである。It is an operation flowchart of the learning process in 1st Embodiment. 画像処理システム及び学習装置の概略構成を示す図である。It is a figure which shows the schematic structure of an image processing system and a learning apparatus. 認識処理部における処理を説明する図である。It is a figure explaining the processing in a recognition processing part. 階層的特徴抽出処理部の詳細構成を示す図である。It is a figure which shows the detailed structure of the hierarchical feature extraction processing part. パターン認識装置における動作タイミングを説明する図である。It is a figure explaining the operation timing in a pattern recognition apparatus. 積層デバイスの構成を説明する図である。It is a figure explaining the structure of a laminated device. パターン認識装置の詳細構成を示す図である。It is a figure which shows the detailed structure of the pattern recognition apparatus. 回帰マップを説明する図である。It is a figure explaining a regression map. 各ネットワークの学習処理の動作を説明する図である。It is a figure explaining the operation of the learning process of each network. 第１実施形態における学習処理の具体例を示す図である。It is a figure which shows the specific example of the learning process in 1st Embodiment. センサーのゲイン制御値と出力信号との関係を示す図である。It is a figure which shows the relationship between the gain control value of a sensor, and an output signal. 第２実施形態における学習処理の具体例を示す図である。It is a figure which shows the specific example of the learning process in 2nd Embodiment. 第３実施形態における学習処理の動作フローチャートである。It is an operation flowchart of the learning process in 3rd Embodiment.

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る発明を限定するものでするものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The following embodiments do not limit the invention according to the claims. Although a plurality of features are described in the embodiment, not all of the plurality of features are essential to the invention, and the plurality of features may be arbitrarily combined. Further, in the attached drawings, the same or similar configurations are designated by the same reference numbers, and duplicate explanations are omitted.

（第１実施形態）
本発明に係る情報処理装置の第１実施形態として、パターン認識装置を利用した画像処理システムを例に挙げて以下に説明する。 (First Embodiment)
As a first embodiment of the information processing apparatus according to the present invention, an image processing system using a pattern recognition apparatus will be described below as an example.

＜システム及び装置の構成＞
図２は、画像処理システム及び学習装置の概略構成を示す図である。図２（ａ）は、パターン認識装置２０１を利用した画像処理システムの構成例を示している。当該システムは画像データから特定の物体の領域を検出する機能を有する。一方、図２（ｂ）は、学習装置の構成例を示している。学習装置による学習結果（重み係数）はパターン認識装置２０１に用いられることになる。なお、ここでは、画像処理システム及び学習装置を個別の装置として記載しているが、一体構成の装置として構成してもよい。 <System and device configuration>
FIG. 2 is a diagram showing a schematic configuration of an image processing system and a learning device. FIG. 2A shows a configuration example of an image processing system using the pattern recognition device 201. The system has a function of detecting a region of a specific object from image data. On the other hand, FIG. 2B shows a configuration example of the learning device. The learning result (weighting coefficient) by the learning device will be used for the pattern recognition device 201. Although the image processing system and the learning device are described here as individual devices, they may be configured as an integrated device.

画像処理システムは、パターン認識装置２０１、ＣＰＵ（Central Prosessing Unit）２０５、ＲＯＭ（Read Only Memory）２０６、ＲＡＭ（Random Access Memory）２０７、ＤＭＡＣ（Direct Memory Access Controller）を有する。また、パターン認識装置２０１は、撮像デバイス２０２、認識処理部２０３、ＲＡＭ２０４を有する。 The image processing system includes a pattern recognition device 201, a CPU (Central Prosessing Unit) 205, a ROM (Read Only Memory) 206, a RAM (Random Access Memory) 207, and a DMAC (Direct Memory Access Controller). Further, the pattern recognition device 201 includes an image pickup device 202, a recognition processing unit 203, and a RAM 204.

撮像デバイス２０２は、光学系、光電変換デバイス、ドライバー回路／ＡＤコンバーター等により構成される。光電変換デバイスとしては、ＣＣＤ（Charge-Coupled Devices）又はＣＭＯＳ（Complimentary Metal Oxide Semiconductor）センサー等が利用され得る。認識処理部２０３は、撮像デバイス２０２を制御して取得した画像データに対して所定の認識処理を実行する。ＲＡＭ２０４は、認識処理部２０３の演算作業バッファとして使用される。ここでは、データの伝送遅延を低減するため、パターン認識装置２０１が撮像デバイス２０２と認識処理部２０３とを含む構成として記載しているが、撮像デバイスと接続可能であれば別体の構成としてもよい。 The image pickup device 202 includes an optical system, a photoelectric conversion device, a driver circuit / AD converter, and the like. As the photoelectric conversion device, a CCD (Charge-Coupled Devices) or CMOS (Complimentary Metal Oxide Semiconductor) sensor or the like can be used. The recognition processing unit 203 executes a predetermined recognition process on the image data acquired by controlling the image pickup device 202. The RAM 204 is used as a calculation work buffer of the recognition processing unit 203. Here, in order to reduce the data transmission delay, the pattern recognition device 201 is described as a configuration including the image pickup device 202 and the recognition processing unit 203, but if it can be connected to the image pickup device, it may be a separate configuration. Good.

ＣＰＵ２０５は、画像処理システム全体の制御を司る。ＲＯＭ２０６は、ＣＰＵ２０５の動作を規定する命令やパラメータデータを格納する。ＲＡＭ２０７は、ＣＰＵ２０５の動作に必要なメモリである。ＤＭＡＣ２０８は、パターン認識処理装置２０１とＲＡＭ２０７との間のデータ転送等を司る。データバス２０９は、各デバイス間のデータ転送路である。 The CPU 205 controls the entire image processing system. The ROM 206 stores instructions and parameter data that define the operation of the CPU 205. The RAM 207 is a memory required for the operation of the CPU 205. The DMAC 208 controls data transfer and the like between the pattern recognition processing device 201 and the RAM 207. The data bus 209 is a data transfer path between each device.

パターン認識処理装置２０１は、ＣＰＵ２０５からの指示に従って撮像及び認識処理を実行し、その結果をＲＡＭ２０７に格納する。ＣＰＵ２０５は認識結果を利用して様々なアプリケーションを提供する。 The pattern recognition processing device 201 executes imaging and recognition processing according to an instruction from the CPU 205, and stores the result in the RAM 207. The CPU 205 provides various applications by using the recognition result.

学習装置は、演算装置２１０、インターフェース装置２１２、記憶装置２１３を有し、例えば汎用のコンピュータ装置により実現することが出来る。演算装置２１０は、ＣＰＵ、メモリ等のコンピュータデバイスを有し、図１を参照して後述する学習処理を実行する。記憶装置２１３は、ハードディスクドライブ等の大容量データ格納装置であり、演算装置２１０が実行するプログラムや学習に使用する画像データ・教師データ等を格納する。インターフェース装置２１２は、学習によって得られたデータを取り出すためのインターフェースであり、通信インターフェースや可搬型記憶装置のインターフェースである。学習装置による学習結果はインターフェース装置２１２を介して取り出され、画像処理システムのＲＯＭ２０６等に格納される。 The learning device includes an arithmetic unit 210, an interface device 212, and a storage device 213, and can be realized by, for example, a general-purpose computer device. The arithmetic unit 210 has a computer device such as a CPU and a memory, and executes a learning process described later with reference to FIG. The storage device 213 is a large-capacity data storage device such as a hard disk drive, and stores a program executed by the arithmetic unit 210, image data, teacher data, and the like used for learning. The interface device 212 is an interface for taking out the data obtained by learning, and is an interface of a communication interface or a portable storage device. The learning result by the learning device is taken out via the interface device 212 and stored in the ROM 206 or the like of the image processing system.

図７は、パターン認識装置２０１の詳細構成を示す図である。認識処理部２０３の構成をより詳細に記載した図である。 FIG. 7 is a diagram showing a detailed configuration of the pattern recognition device 201. It is a figure which described the structure of the recognition processing part 203 in more detail.

特徴抽出処理部７０１は、メモリ７０３に階層的演算の中間結果を保持しながら階層的な特徴抽出処理を繰り返し実行し、抽出した特徴量を利用して認識処理結果及び制御データを出力する。 The feature extraction processing unit 701 repeatedly executes the hierarchical feature extraction processing while holding the intermediate result of the hierarchical calculation in the memory 703, and outputs the recognition processing result and the control data by using the extracted feature amount.

撮像デバイス７０４は、撮像デバイス２０２に対応し、光学系、光電変換デバイス、ドライバー回路／ＡＤコンバーター等により構成される。撮像制御処理部７０５は、特徴抽出処理部７０１から提供された制御データに従って撮像デバイス７０４の動作（撮影条件など）を制御する。撮影条件は、具体的には、光電変換後の信号に対するゲインや光電変換デバイス（フォトダイオード等）の蓄積時間（露光時間）、Ａ／Ｄ変換の特性等を含む。撮像制御処理部７０５は、センサー面のブロック単位でこれらの撮影条件を制御可能に構成されている。例えば、近年の半導体積層実装技術の発展に伴い、制御ロジックをセンサー面に対して積層実装することが可能となっており、これによりブロック単位や画素単位での読み出し制御を実現することが出来る。 The image pickup device 704 corresponds to the image pickup device 202, and is composed of an optical system, a photoelectric conversion device, a driver circuit / AD converter, and the like. The image pickup control processing unit 705 controls the operation (shooting conditions, etc.) of the image pickup device 704 according to the control data provided by the feature extraction processing unit 701. Specifically, the shooting conditions include the gain for the signal after photoelectric conversion, the accumulation time (exposure time) of the photoelectric conversion device (photodiode, etc.), the characteristics of A / D conversion, and the like. The image pickup control processing unit 705 is configured to be able to control these shooting conditions in block units on the sensor surface. For example, with the recent development of semiconductor stacking and mounting technology, it has become possible to stack and mount control logic on a sensor surface, which makes it possible to realize read control in block units or pixel units.

図６は、積層デバイスの構成を説明する図である。図６（ａ）は、積層デバイスの物理構成を模式的に示している。ここでは、光電変換素子を実装するセンサー層６１、読み出し制御ロジックを実装するロジック層６２、大規模なメモリ及びその制御部を実装するメモリ層６３、を積層した例を示している。センサー層６１は撮像デバイス７０４に対応し、ロジック層６２は撮像制御処理部７０５に対応し、メモリ層６３はメモリ７０３等に対応する。各層間は貫通ビア等により信号を伝達する。 FIG. 6 is a diagram illustrating a configuration of a laminated device. FIG. 6A schematically shows the physical configuration of the laminated device. Here, an example in which a sensor layer 61 for mounting a photoelectric conversion element, a logic layer 62 for mounting a read control logic, and a memory layer 63 for mounting a large-scale memory and its control unit are shown is shown. The sensor layer 61 corresponds to the image pickup device 704, the logic layer 62 corresponds to the image pickup control processing unit 705, and the memory layer 63 corresponds to the memory 703 and the like. A signal is transmitted between each layer by a penetrating via or the like.

図６（ｂ）は、ロジック層６２の構成を模式的に示している。ここでは、ロジック層６２において、センサー層６１の光電変換素子を制御するためのｎ×ｎ個の制御回路を配置している。制御回路ｃｔ（ｎ，ｎ）は、対応する位置に存在するセンサー層６１の１以上の光電変換素子の読み出しを制御する。従って、上述の構成では、ｎ×ｎ個のブロックに対してブロック毎に読み出し条件（ゲインや露光時間等）を制御することができる。つまり画像中のｎ×ｎ個の部分毎に撮像特性を制御する事ができる。 FIG. 6B schematically shows the configuration of the logic layer 62. Here, in the logic layer 62, n × n control circuits for controlling the photoelectric conversion element of the sensor layer 61 are arranged. The control circuit ct (n, n) controls the reading of one or more photoelectric conversion elements of the sensor layer 61 existing at the corresponding positions. Therefore, in the above configuration, the reading conditions (gain, exposure time, etc.) can be controlled for each block of n × n blocks. That is, the imaging characteristics can be controlled for each of n × n parts in the image.

なお、第１実施形態では特徴抽出処理部７０１もロジック層６２やメモリ層６３に実装することを想定する。例えば、センサー層６１に対して積層実装することで、より少ない遅延で制御データをフィードバックすることが可能になる。撮影環境や対象の変化が速い場合、より少ない画像フレーム遅延で撮像デバイス７０４を制御することが望まれるため、センサー層６１に対する積層実装が好適である。 In the first embodiment, it is assumed that the feature extraction processing unit 701 is also mounted on the logic layer 62 and the memory layer 63. For example, by laminating and mounting on the sensor layer 61, control data can be fed back with less delay. When the shooting environment or the object changes rapidly, it is desired to control the image pickup device 704 with a smaller image frame delay, so that the laminated mounting on the sensor layer 61 is preferable.

図３は、認識処理部２０３における処理を説明する図である。認識処理部２０３は、特徴抽出処理部７０１の論理的な処理構造である認識ネットワーク３０２とセンサー制御ネットワーク３１３とを含む。認識ネットワーク３０２は、撮像デバイス７０４が撮像対象３０１内の所定の物体の位置をＣＮＮにより認識する演算ネットワークである。センサー制御ネットワーク３１３は、ＣＮＮにより撮像デバイスの撮影条件を制御するための情報を抽出する演算ネットワークである。 FIG. 3 is a diagram illustrating processing in the recognition processing unit 203. The recognition processing unit 203 includes a recognition network 302 and a sensor control network 313, which are logical processing structures of the feature extraction processing unit 701. The recognition network 302 is an arithmetic network in which the image pickup device 704 recognizes the position of a predetermined object in the image pickup target 301 by the CNN. The sensor control network 313 is an arithmetic network that extracts information for controlling the imaging conditions of the imaging device by CNN.

ここでは、認識ネットワーク３０２は５階層のＣＮＮにより構成した例を示している。演算処理３０３〜３０７は、畳み込み演算、活性化関数演算、プーリング演算等からなる演算処理であり、具体的には後述する図４に示す構成で実現される。 Here, an example in which the recognition network 302 is composed of five layers of CNN is shown. The arithmetic processes 303 to 307 are arithmetic operations including a convolution operation, an activation function operation, a pooling operation, and the like, and are specifically realized by the configuration shown in FIG. 4 to be described later.

特徴マップ３０８〜３１２は、ＣＮＮ演算処理の中間層（特徴マップ３０８〜３１１）或いは最終層（特徴マップ３１２）と呼ばれ、演算処理３０３〜３０７の結果にそれぞれ対応する。特徴マップ３０８〜３１２はメモリ７０３に格納される。特徴マップ３０８〜３１２は、撮像デバイスが出力する画像データに対して特徴抽出処理された２次元のデータである。 The feature maps 308 to 312 are called the intermediate layer (feature map 308 to 311) or the final layer (feature map 312) of the CNN arithmetic processing, and correspond to the results of the arithmetic processing 303 to 307, respectively. The feature maps 308 to 312 are stored in the memory 703. The feature maps 308 to 312 are two-dimensional data obtained by feature extraction processing on the image data output by the imaging device.

ここで、画像データに対する２次元ＣＮＮ演算処理の詳細について説明する。畳み込み演算のカーネル（フィルタ係数マトリクス）サイズがｃｏｌｕｍｎＳｉｚｅ×ｒｏｗＳｉｚｅであり前階層の特徴マップ数がＬの場合、以下の数式（１）に示される積和演算により１つの特徴マップが算出される。 Here, the details of the two-dimensional CNN calculation processing for the image data will be described. When the kernel (filter coefficient matrix) size of the convolution operation is volumeSize × lowSize and the number of feature maps in the previous layer is L, one feature map is calculated by the multiply-accumulate operation shown in the following formula (1).

ｉｎｐｕｔ（ｘ，ｙ）：２次元座標（ｘ、ｙ）での参照画素値
ｏｕｔｐｕｔ（ｘ，ｙ）：２次元座標（ｘ、ｙ）での演算結果
ｗｅｉｇｈｔ（ｃｏｌｕｍｎ，ｒｏｗ）：座標（ｘ＋ｃｏｌｕｍｎ、ｙ＋ｒｏｗ）での重み係数
Ｌ：前階層の特徴マップ数
ｃｏｌｕｍｎＳｉｚｅ、ｒｏｗＳｉｚｅ：２次元コンボリューションカーネルの水平方向、垂直方向のサイズ
ＣＮＮ演算処理では、数式（１）に従って複数のコンボリューションカーネルを画素単位で走査しながら積和演算を繰り返し、最終的な積和演算結果を非線形変換（活性化処理）することで特徴マップを算出する。また、生成した特徴マップをプーリング処理により縮小して次の階層で参照する場合もある。特徴マップ３０８〜３１２は一つの階層内に複数のマップを有し、異なる重み係数群に対応して異なる特性の特徴のマップが生成される。 input (x, y): Reference pixel value in two-dimensional coordinates (x, y) output (x, y): Calculation result in two-dimensional coordinates (x, y) perpendicular (collect, low): Coordinates (x + volume, Weight coefficient in y + low) L: Number of feature maps in the previous layer coordinate, lowSize: Horizontal and vertical size of the two-dimensional convolution kernel In CNN arithmetic processing, multiple convolution kernels are processed in pixel units according to formula (1). The feature map is calculated by repeating the product-sum calculation while scanning and performing a non-linear conversion (activation processing) of the final product-sum calculation result. In addition, the generated feature map may be reduced by pooling processing and referred to in the next layer. The feature maps 308 to 312 have a plurality of maps in one hierarchy, and maps of features of different characteristics are generated corresponding to different weight coefficient groups.

図４は、特徴抽出処理部７０１の詳細構成を示す図である。特徴抽出処理部７０１は、演算処理３０３〜３０７の具体的な実現構成である。 FIG. 4 is a diagram showing a detailed configuration of the feature extraction processing unit 701. The feature extraction processing unit 701 is a concrete implementation configuration of the arithmetic processing 303 to 307.

参照データバッファ４０１は、畳み込み演算の参照データとなる前階層の特徴マップのデータ（数式（１）におけるｉｎｐｕｔ（ｘ，ｙ））の全てあるいはその一部をメモリから取得しバッファリングする回路である。 The reference data buffer 401 is a circuit that acquires all or a part of the feature map data (input (x, y) in the mathematical expression (1)) of the previous layer, which is the reference data of the convolution operation, from the memory and buffers it. ..

乗算器４０２、累積加算器４０３は、数式（１）の演算を実行する回路である。 The multiplier 402 and the cumulative adder 403 are circuits that execute the calculation of the mathematical expression (1).

係数データバッファ４０４は、事前に学習によって得られた重み係数データ（数式（１）におけるｗｅｉｇｈｔ（ｃｏｌｕｍｎ，ｒｏｗ））の全てあるいは一部をメモリ７０３から所定の単位で転送しバッファリングする回路である。 The coefficient data buffer 404 is a circuit that transfers all or part of the weight coefficient data (weight (collect, low) in the mathematical formula (1)) obtained by learning in advance from the memory 703 in a predetermined unit and buffers it. ..

活性化処理回路４０５は、数式（１）に示す畳み込み演算結果（ｏｕｔｐｕｔ（ｘ，ｙ））に対してＲｅＬＵ（Rectified Linear Unit, Rectifier）等の非線形関数を処理する回路である。 The activation processing circuit 405 is a circuit that processes a non-linear function such as ReLU (Rectified Linear Unit, Rectifier) with respect to the convolution operation result (output (x, y)) shown in the mathematical formula (1).

プーリング処理回路４０６は、特徴マップを最大値フィルタ等の空間フィルタを用いて縮小処理する回路である。プーリング処理をしない場合は、活性化処理４０５の結果をメモリ７０３に格納し、プーリング処理をする場合は、プーリング処理４０６の結果をメモリ７０３に格納する。ここで格納するデータが現階層の特徴マップとなる。 The pooling processing circuit 406 is a circuit that reduces the feature map by using a spatial filter such as a maximum value filter. When the pooling process is not performed, the result of the activation process 405 is stored in the memory 703, and when the pooling process is performed, the result of the pooling process 406 is stored in the memory 703. The data stored here becomes the feature map of the current hierarchy.

現階層の特徴マップの算出が終了すると、算出された特徴マップを前階層の特徴マップとして、次の階層の特徴マップの算出が同様に処理される。この様にメモリ７０３に格納する特徴マップを順次参照しながら、複数の階層の特徴マップを算出する。図４には図示しない制御部１０２が図４中の各構成要素の動作を制御することにより、階層的な特徴抽出処理（ＣＮＮ演算処理）が実現される。 When the calculation of the feature map of the current layer is completed, the calculated feature map is used as the feature map of the previous layer, and the calculation of the feature map of the next layer is processed in the same manner. While sequentially referring to the feature maps stored in the memory 703 in this way, the feature maps of a plurality of layers are calculated. A hierarchical feature extraction process (CNN arithmetic process) is realized by controlling the operation of each component in FIG. 4 by a control unit 102 (not shown in FIG. 4).

ＣＮＮは、この様に複数の階層に渡る特徴抽出を繰り返す事で識別対象の変動にロバストな認識処理を実現する。ＣＮＮは、各階層の特徴抽出結果に従って、最終層の演算３０７で所望のパターンの存在を判定する。最終層の特徴マップ３１２が認識結果３２０を表現しており、例えば画像内の対象物の存在確率を２次元の情報として表現する信頼度マップである。なお、最終層の演算３０７は、前述した畳み込み演算ではなく、全結合型のニューラルネットワークや線形判別器で構成する事もある。 By repeating feature extraction over a plurality of layers in this way, CNN realizes a recognition process that is robust to fluctuations in the identification target. The CNN determines the existence of a desired pattern by the calculation 307 of the final layer according to the feature extraction result of each layer. The feature map 312 of the final layer expresses the recognition result 320, and is, for example, a reliability map that expresses the existence probability of an object in the image as two-dimensional information. The operation 307 of the final layer may be composed of a fully connected neural network or a linear discriminator instead of the above-mentioned convolution operation.

また、各階層の特徴マップ３０８〜３１１は入力データに対する特徴抽出結果を表現し、一般的には下位階層（処理対象データを入力する層に近い階層）はエッジ等のローレベルの特徴を示し、上位階層（認識結果に近い階層）は抽象度の高い特徴を示す。各特徴マップはパターン認識の対象や学習方法によって特性が異なる。 In addition, the feature maps 308 to 311 of each layer express the feature extraction results for the input data, and generally, the lower layer (the layer close to the layer for inputting the data to be processed) shows low-level features such as edges. The upper layer (the layer closer to the recognition result) shows features with a high degree of abstraction. The characteristics of each feature map differ depending on the target of pattern recognition and the learning method.

次に、センサー制御ネットワーク３１３について説明する。センサー制御ネットワーク３１３はセンサー制御のためのデータを回帰する演算ネットワークである。つまりＣＮＮを利用してセンサーのデータ取得条件を決定する。演算処理３１４、３１５は、演算処理３０３〜３０７と同様の演算処理であり、図４に示す回路で処理する。 Next, the sensor control network 313 will be described. The sensor control network 313 is an arithmetic network that regresses data for sensor control. That is, CNN is used to determine the data acquisition conditions of the sensor. The arithmetic processes 314 and 315 are the same arithmetic processes as the arithmetic processes 303 to 307, and are processed by the circuit shown in FIG.

センサー制御ネットワーク３１３では、認識ネットワーク３０２の下位階層の特徴マップ３０８を利用して制御信号を回帰する。つまり、認識ネットワークの演算過程で得られる特徴量を利用する。特徴マップを認識ネットワークと共有する事で回帰性能の向上・学習の容易化を期待すると共に、全体の演算コストを削減する事ができる。また、ここでは認識ネットワークと類似のネットワーク演算処理（ＣＮＮ）でセンサー制御ネットワーク３１３が構成されているため、特徴抽出処理部７０１を利用して制御データを生成することができる。即ち、専用の回路等は不要であることが利点となる。 In the sensor control network 313, the control signal is regressed by using the feature map 308 in the lower layer of the recognition network 302. That is, the features obtained in the calculation process of the recognition network are used. By sharing the feature map with the recognition network, it is expected that the regression performance will be improved and learning will be facilitated, and the overall calculation cost can be reduced. Further, since the sensor control network 313 is configured by network arithmetic processing (CNN) similar to the recognition network here, control data can be generated by using the feature extraction processing unit 701. That is, it is an advantage that a dedicated circuit or the like is not required.

特徴マップ３１６〜３１７は、センサー制御ネットワーク３１３における特徴マップであり、最終層の演算３１５で撮影条件の制御データを回帰する特徴マップ３１７（以下、回帰マップ３１７と表記する）を生成する。回帰マップ３１７は、撮像素子の空間位置に対応する撮影条件を指定する制御データであり、例えばマップの位置に対応する撮像素子のゲインや露光時間の指定に対応するデータとなる。回帰マップ３１７は、制御対象が１種類かつスカラー値で制御する場合は１枚で良い。制御条件が複数ある場合や制御パラメータがベクトルデータの場合は複数枚の回帰マップが存在することになる。 The feature maps 316 to 317 are feature maps in the sensor control network 313, and generate a feature map 317 (hereinafter, referred to as a regression map 317) that regresses the control data of the shooting conditions by the calculation 315 of the final layer. The regression map 317 is control data for designating shooting conditions corresponding to the spatial position of the image sensor, and is, for example, data corresponding to the designation of the gain and exposure time of the image sensor corresponding to the position of the map. Only one regression map 317 is required when there is only one type of control target and the scalar value is used for control. If there are multiple control conditions or if the control parameters are vector data, there will be multiple regression maps.

図８は、回帰マップを説明する図である。具体的には、図６（ｂ）で示す制御ロジックに対応する回帰マップ３１７の例を模式的に示している。ｒｇ（ｎ，ｎ）は回帰マップの画素データに対応する。即ち、ここでの回帰マップのサイズはｎ×ｎである。回帰データの値は濃淡で表現されており、例えば、光電変換後のデータのゲイン等に対応する。 FIG. 8 is a diagram illustrating a regression map. Specifically, an example of the regression map 317 corresponding to the control logic shown in FIG. 6B is schematically shown. rg (n, n) corresponds to the pixel data of the regression map. That is, the size of the regression map here is n × n. The value of the regression data is represented by shading, and corresponds to, for example, the gain of the data after photoelectric conversion.

撮像制御処理部７０５は、センサー制御ネットワーク３１３で回帰された制御信号データに従って光電変換素子を制御し、認識処理に好適な画像データを取得する。ここで得られる画像データは人が観測し内容を理解・鑑賞するための画像データとは異なり、認識処理の精度向上に好適な画像データとなる。 The image pickup control processing unit 705 controls the photoelectric conversion element according to the control signal data regressed by the sensor control network 313, and acquires image data suitable for the recognition process. The image data obtained here is different from the image data for human observation to understand and appreciate the contents, and is suitable for improving the accuracy of the recognition process.

なお、センサー制御ネットワーク３１３では、演算処理３１４にプーリング処理を有し特徴マップのサイズを縮小する。従って回帰マップのサイズはセンサー出力の画像サイズに対して小さい。即ち、複数の画素を単位とするブロック毎に読み出し条件を制御する事になる。プーリングの割合などは撮像制御処理部７０５で制御可能なブロックサイズを考慮して決定する。 The sensor control network 313 has a pooling process in the arithmetic process 314 to reduce the size of the feature map. Therefore, the size of the regression map is smaller than the image size of the sensor output. That is, the read condition is controlled for each block having a plurality of pixels as a unit. The pooling ratio and the like are determined in consideration of the block size that can be controlled by the image pickup control processing unit 705.

図５は、パターン認識装置２０１におけるパターン認識処理の動作タイミングを説明するタイミングチャートである。横軸は時間経過を表しており、認識ネットワーク（認識ネットワーク３０２）、制御ネットワーク（センサー制御ネットワーク３１３）、条件設定の各処理が実行されるタイミングを例示的に示している。ここでは、時間的に連続する３フレーム分（第１〜第３フレーム）の画像データに対して連続的に認識処理を実行する状態を示している。 FIG. 5 is a timing chart for explaining the operation timing of the pattern recognition process in the pattern recognition device 201. The horizontal axis represents the passage of time, and exemplifies the timing at which each process of the recognition network (recognition network 302), the control network (sensor control network 313), and the condition setting is executed. Here, the state in which the recognition process is continuously executed for the image data for three frames (first to third frames) that are continuous in time is shown.

タイミング５０１では、第１フレームの撮影及び第１フレームに対する認識ネットワークの処理が実行され、並行して、タイミング５０４では、後続の第２フレームに対する制御ネットワークの処理が実行される。タイミング５０７では、第２フレームに対する撮像デバイスの動作条件（ゲインや露光時間等）の設定処理が実行される。 At the timing 501, the shooting of the first frame and the processing of the recognition network for the first frame are executed, and at the timing 504, the processing of the control network for the subsequent second frame is executed. At the timing 507, the setting process of the operating conditions (gain, exposure time, etc.) of the imaging device for the second frame is executed.

タイミング５０２では、第２フレームの撮影及び第２フレームに対する認識ネットワークの処理が実行され、並行して、タイミング５０５では、後続の第３フレームに対する制御ネットワークの処理が実行される。タイミング５０８では、第３フレームに対する撮像デバイスの動作条件（ゲインや露光時間等）の設定処理が実行される。 At the timing 502, the shooting of the second frame and the processing of the recognition network for the second frame are executed, and at the timing 505, the processing of the control network for the subsequent third frame is executed. At the timing 508, the setting process of the operating conditions (gain, exposure time, etc.) of the imaging device for the third frame is executed.

タイミング５０３では、第３フレームの撮影及び第３フレームに対する認識ネットワークの処理が実行され、並行して、タイミング５０６では、後続の第４フレーム（不図示）に対する制御ネットワークの処理が実行される。タイミング５０９では、第４フレーム（不図示）に対する撮像デバイスの動作条件（ゲインや露光時間等）の設定処理が実行される。 At the timing 503, the shooting of the third frame and the processing of the recognition network for the third frame are executed, and at the timing 506, the processing of the control network for the subsequent fourth frame (not shown) is executed. At timing 509, a process of setting operating conditions (gain, exposure time, etc.) of the imaging device for the fourth frame (not shown) is executed.

この様に制御ネットワークは撮影対象の状況変化に応じて順次認識に好適な撮影条件を設定し、認識ネットワークはそれに応じた撮影を行いパターン認識処理を実行する。 In this way, the control network sequentially sets shooting conditions suitable for recognition according to changes in the situation of the shooting target, and the recognition network performs shooting according to the shooting conditions and executes pattern recognition processing.

＜学習装置の動作＞
次に、学習装置における認識ネットワーク及び制御ネットワークの学習処理について説明する。上述したように、学習装置による学習結果（重み係数）はパターン認識装置２０１に用いられることになる。 <Operation of learning device>
Next, the learning process of the recognition network and the control network in the learning device will be described. As described above, the learning result (weighting coefficient) by the learning device is used for the pattern recognition device 201.

図９は、認識ネットワーク及び制御ネットワークの学習処理の動作を説明する図である。ここでの学習とは、認識ネットワーク３０２及びセンサー制御ネットワーク３１３それぞれのニューラルネットワークの重み係数を、パターン認識処理がより好適な（あるいは最良の）性能となる様に決定する処理を意味する。なお、図９では図３の演算処理３０３〜３０７及び３１４〜３１５は省略して記載している。 FIG. 9 is a diagram illustrating the operation of the learning process of the recognition network and the control network. The learning here means a process of determining the weighting coefficients of the neural networks of the recognition network 302 and the sensor control network 313 so that the pattern recognition process has more preferable (or best) performance. In FIG. 9, the arithmetic processes 303 to 307 and 314 to 315 of FIG. 3 are omitted.

センサーモデル９０１は、各ネットワークの学習に使用するデータの生成に必要なセンサーモデルである。センサーモデル９０１は、例えば、別の撮像装置で撮影された画像データ３０１（画像形式の２次元データ）から、制御条件とセンサーの特性に応じてセンサーの出力データを模擬する疑似センサーデータ９０２（疑似データ）を生成するために用いられる。つまり、疑似センサーデータ９０２は、撮影対象が画像データ３０１であると仮定した場合のセンサーの出力を模擬した画像データである。 The sensor model 901 is a sensor model necessary for generating data used for learning each network. The sensor model 901 is, for example, a pseudo sensor data 902 (pseudo) that simulates the output data of the sensor according to the control conditions and the characteristics of the sensor from the image data 301 (two-dimensional data in the image format) taken by another imaging device. Used to generate data). That is, the pseudo sensor data 902 is image data that simulates the output of the sensor when it is assumed that the shooting target is the image data 301.

センサーモデル９０１は、センサーの物理的な特性に応じて理論的に作成することが出来る。ただし、実際のセンサーで得られたデータを利用してＧＡＮ（Generative Adversarial Network）等の学習的な手法により作成してもよい。更に、センサーモデル９０１はセンサーの読み出し条件を制御する制御信号に対する出力の変動を模擬する機能を有する。例えば制御信号として高いゲイン値が設定された場合、その出力も高い値を出力する。 The sensor model 901 can be theoretically created according to the physical characteristics of the sensor. However, it may be created by a learning method such as GAN (Generative Adversarial Network) using the data obtained by the actual sensor. Further, the sensor model 901 has a function of simulating the fluctuation of the output with respect to the control signal for controlling the reading condition of the sensor. For example, when a high gain value is set as a control signal, the output also outputs a high value.

即ち、センサーモデル９０１とは、別の撮像装置で撮影された画像データ３０１と疑似センサーデータ９０２との関係、及び、制御信号と疑似センサーデータとの関係、の両者を規定するモデルである。センサーモデルは認識ネットワーク及びセンサー制御ネットワークの学習前に予め作成されているものとする。 That is, the sensor model 901 is a model that defines both the relationship between the image data 301 and the pseudo sensor data 902 taken by another imaging device and the relationship between the control signal and the pseudo sensor data. It is assumed that the sensor model is created in advance before learning the recognition network and the sensor control network.

メモリ９０３は、センサー制御ネットワーク３１３の学習に必要な認識ネットワークの中間演算結果を保持するメモリである。教師データ９０５は、画像データ３０１とペアで作成した教師データであり予め用意される。ここでの教師データとは認識結果として期待する認識結果３２０のデータ分布である。例えば画像中の顔を検出する場合、顔の中心をピークとする正規分形式のマップデータであるとする。処理９０４は、認識結果３２０と教師データ９０５の差分を演算するための処理である。 The memory 903 is a memory that holds the intermediate calculation result of the recognition network necessary for learning the sensor control network 313. The teacher data 905 is teacher data created in pairs with the image data 301 and is prepared in advance. The teacher data here is the data distribution of the recognition result 320 expected as the recognition result. For example, when detecting a face in an image, it is assumed that the map data is in the normal minute format with the center of the face as the peak. The process 904 is a process for calculating the difference between the recognition result 320 and the teacher data 905.

図９（ａ）は、認識ネットワークを学習する場合の主要な処理を説明する図である。画像データ３０１をセンサーモデル９０１で変換し、疑似センサーデータ９０２を得る。生成した疑似センサーデータ９０２を用いてパターン認識処理を実行し認識結果３２０を取得する。認識結果３２０と教師データ９０５の差分を誤差データとしてバックプロパゲーション法により認識ネットワークを学習する。即ち、特徴マップ３０８〜３１２を生成するための畳込み演算の重み係数を順次更新する。 FIG. 9A is a diagram illustrating a main process when learning the recognition network. The image data 301 is converted by the sensor model 901 to obtain pseudo sensor data 902. The pattern recognition process is executed using the generated pseudo sensor data 902, and the recognition result 320 is acquired. The recognition network is learned by the back propagation method using the difference between the recognition result 320 and the teacher data 905 as error data. That is, the weighting coefficients of the convolution operation for generating the feature maps 308 to 312 are sequentially updated.

図９（ｂ）はセンサー制御ネットワーク３１３を学習する場合の主要な処理を説明する図である。センサー制御ネットワーク３１３によりセンサーの制御信号生成処理を実現する。センサー制御ネットワーク３１３の学習時は、認識結果３２０と教師データ９０５の差分情報である誤差情報を逆伝搬させる。この際、認識ネットワーク３０２においては学習を行わない（すなわち係数は固定される）。さらにセンサーモデルの逆関数を介してセンサー制御ネットワーク３１３を学習させるための誤差情報を取得する。センサーモデル９０１はセンサー制御の正解値を算出するために、疑似センサーデータとして生成したデータと、逆関数を実現するためのテーブル情報と、を保持する。 FIG. 9B is a diagram illustrating a main process when learning the sensor control network 313. The sensor control network 313 realizes the sensor control signal generation process. When learning the sensor control network 313, the error information, which is the difference information between the recognition result 320 and the teacher data 905, is back-propagated. At this time, learning is not performed in the recognition network 302 (that is, the coefficient is fixed). Further, the error information for training the sensor control network 313 is acquired via the inverse function of the sensor model. The sensor model 901 holds data generated as pseudo sensor data and table information for realizing an inverse function in order to calculate a correct value for sensor control.

得られた誤差情報と認識処理で得られた特徴マップ３０８のデータ（メモリ９０３に格納）を用いてバックプロパゲーション法によりセンサー制御ネットワークを学習する。即ち、特徴マップ３１６〜３１７を生成する畳込み演算の係数を順次更新する。バックプロパゲーション法は従来提案されている手法を利用して処理する。当該処理は、認識ネットワークを固定し、認識結果の誤差をセンサーモデルの逆関数を介してセンサー制御ネットワーク３１３に与える点が特徴である。センサーモデルの逆関数はセンサーの特性に応じた逆関数を事前に決定しておく。 The sensor control network is learned by the backpropagation method using the obtained error information and the data of the feature map 308 (stored in the memory 903) obtained by the recognition process. That is, the coefficients of the convolution operation that generate the feature maps 316 to 317 are sequentially updated. The backpropagation method uses a conventionally proposed method for processing. The feature of this process is that the recognition network is fixed and an error of the recognition result is given to the sensor control network 313 via an inverse function of the sensor model. As for the inverse function of the sensor model, the inverse function according to the characteristics of the sensor is determined in advance.

図１０は、第１実施形態における学習処理における動作の具体例を示す図である。図１０（ａ）及び（ｂ）は、認識ネットワークを学習する際のパターン認識処理及び学習処理の動作パターンを示している。図１０（ｃ）及び（ｄ）は、センサー制御ネットワークを学習する際のパターン認識処理及び学習処理の動作パターンを示している。図１０では、認識ネットワーク３０２のニューラルネットのノード数が２個（ノード１００３及び１００４）、センサー制御ネットワーク３１３のニューラルネットのノード数が１個（ノード１００５）の場合の例を示している。 FIG. 10 is a diagram showing a specific example of the operation in the learning process according to the first embodiment. 10 (a) and 10 (b) show the pattern recognition process and the operation pattern of the learning process when learning the recognition network. 10 (c) and 10 (d) show the operation patterns of the pattern recognition process and the learning process when learning the sensor control network. FIG. 10 shows an example in which the number of nodes of the neural network of the recognition network 302 is two (nodes 1003 and 1004) and the number of nodes of the neural network of the sensor control network 313 is one (node 1005).

また、ここでは説明を簡単にするために、ＣＮＮではなくＭＬＰ（Multi Layer Perceptron）構成のニューラルネットワークの例で説明する。ＣＮＮとしてとらえた場合は畳込み演算のカーネルサイズが１ｘ１の場合に相当する。また、ここでは、センサーモデルをセンサー撮像モデル１００１とセンサー制御モデル１００２とに分けて示している。図１０において、学習データセットの画像データ３０１に対する認識結果が認識結果３２０であり、教師データ９０５及び演算処理９０４は図９と同様のものである。 Further, for the sake of simplicity, an example of a neural network having an MLP (Multi Layer Perceptron) configuration will be described instead of CNN. When it is regarded as CNN, it corresponds to the case where the kernel size of the convolution operation is 1x1. Further, here, the sensor model is divided into a sensor imaging model 1001 and a sensor control model 1002. In FIG. 10, the recognition result for the image data 301 of the learning data set is the recognition result 320, and the teacher data 905 and the arithmetic processing 904 are the same as those in FIG.

図１は、第１実施形態における学習処理の動作フローチャートである。Ｓ１０１では、学習装置は、初期化処理を実行する。具体的には、センサーモデル（センサー撮像モデル１００１とセンサー制御モデル１００２）の初期化など、各種初期化処理を実行する。 FIG. 1 is an operation flowchart of the learning process according to the first embodiment. In S101, the learning device executes the initialization process. Specifically, various initialization processes such as initialization of the sensor model (sensor imaging model 1001 and sensor control model 1002) are executed.

Ｓ１０２では、学習装置は、学習処理に使用する学習データを選択する。例えば、記憶装置２１３に格納する学習用データセットの中から画像データ３０１及び学習のための教師データ９０５を選択して演算装置２１０の不図示のメモリに読み込む。 In S102, the learning device selects the learning data to be used for the learning process. For example, the image data 301 and the teacher data 905 for learning are selected from the learning data set stored in the storage device 213 and read into a memory (not shown) of the arithmetic unit 210.

Ｓ１０３では、学習装置は、センサーモデルの制御条件に従って画像データを変換する。ここでは、センサー制御モデル１００２に設定された条件（例えば、感度・ゲイン・露光時間等のセンサー制御条件）に応じて画像データ３０１を変換し疑似的なセンサーデータである疑似センサーデータを生成する。 In S103, the learning device converts the image data according to the control conditions of the sensor model. Here, the image data 301 is converted according to the conditions set in the sensor control model 1002 (for example, sensor control conditions such as sensitivity, gain, exposure time, etc.) to generate pseudo sensor data which is pseudo sensor data.

Ｓ１０４では、学習装置は、Ｓ１０３で生成した疑似センサーデータに対して所定のパターン認識処理を実行する（動作パターン１００６）。ここでのパターン認識処理は、例えば、画像中の顔を検出する等のパターン認識処理である。センサー撮像モデル１００１で変換された疑似センサーデータ９０２に対して、ニューラルネットの演算ノードｎ_１、ｎ_２で認識処理を実行し、認識結果３２０を得る。画像データ３０１の２次元データに対してラスター順に認識処理を実行する事で、認識結果３２０も２次元のマップとなる。 In S104, the learning device executes a predetermined pattern recognition process on the pseudo sensor data generated in S103 (operation pattern 1006). The pattern recognition process here is, for example, a pattern recognition process such as detecting a face in an image. This pseudo sensor data 902 converted by the sensor imaging model 1001, it executes the recognition process in operation node _n 1, _{n 2} of the neural network, obtaining a recognition result 320. By executing the recognition process on the two-dimensional data of the image data 301 in the raster order, the recognition result 320 also becomes a two-dimensional map.

Ｓ１０５では、学習装置は、Ｓ１０４で得られた認識結果３２０に基づいて、パターン認識処理の学習を実行する（動作パターン１００７）。ここでは、Ｓ１０２で選択した教師データを利用してバックプロパゲーション法により認識ネットワークの重み係数を学習する。認識結果３２０と教師データ９０５の差分値を誤差として、ノードｎ_２（ノード１００４）に対する係数Ｗ_ｎ２、及び、ノードｎ_１（ノード１００３）に対する係数Ｗ_ｎ１を順次更新する。なお、Ｓ１０５の演算時にはノードｒ（ノード１００５）に対する係数Ｗ_ｒは更新しない。 In S105, the learning device executes learning of the pattern recognition process based on the recognition result 320 obtained in S104 (operation pattern 1007). Here, the weighting coefficient of the recognition network is learned by the backpropagation method using the teacher data selected in S102. Using the difference value between the recognition result 320 and the teacher data 905 as an error, the _{coefficient W n2} _{for the node n 2} (node 1004) and the _{coefficient W n1} _{for the node n 1} (node 1003) are sequentially updated. _{The coefficient W r} for the node r (node 1005) is not updated during the calculation of S105.

以下、バックプロパゲーション法による学習の具体的な例について説明する。バックプロパゲーション法では、認識結果３２０と教師データ９０５のそれぞれの画像位置に対する誤差が最小となる様に係数Ｗ_１、Ｗ_２を調整する。 Hereinafter, a specific example of learning by the backpropagation method will be described. _{In the back propagation method, the coefficients W 1} and W ₂ are adjusted so that the error between the recognition result 320 and the teacher data 905 with respect to the respective image positions is minimized.

認識結果３２０に含まれるある画素位置に対応する出力値をｙ、その位置に対応する教師データ値をｙ_ｔとし、教師データと出力値の誤差Ｅを以下の数式（２）のように定義する。なお、ここでは簡単のため座標データの表記は省略する。 The output value corresponding to a pixel position included in the recognition result 320 y, the teacher data value corresponding to that position and y _t, which defines the error E teacher data and the output value by the following expression (2) .. The notation of coordinate data is omitted here for the sake of simplicity.

ノードｎ_１の出力をｎ_１、αを学習係数とすると、以下の数式（３）で係数Ｗ_２をＷ’_２に更新する。 Assuming that the output of _{node n 1} _{is n 1} and α is the learning coefficient, the coefficient W ₂ is updated to W ' _{2 by the following mathematical formula (3).}

ノードｎ_１及びｎ_２の非線形関数がＲｅＬＵ関数ｆ_ＲｅＬＵ（）であるとすると、ｙ＝ｆ_ＲｅＬＵ（Ｗ_２×ｎ_１）となる。そして、数式（３）は、Ｗ_２×ｎ_１＞０の場合、ｆＲｅＬＵ関数の微分＝１であることから以下の数式（４）となる。 Assuming that the nonlinear function of the nodes n ₁ and n ₂ _{is the ReLU function f ReLU} (), y = f _ReLU (W ₂ × n ₁ ). Then, the mathematical formula (3) is the following mathematical formula (4) because the derivative of the fReLU function is ₁ _{when W 2} × n 1> 0.

次に、対応する画素位置のセンサーモデルの出力をｓ、学習係数をαとすると、以下の数式（５）でＷ_１をＷ’_１に更新する。 Then, the output of the sensor model of the corresponding pixel position s, the learning coefficient is alpha, and updates the W ₁ W 'to ₁ by the following equation (5).

ここで、ｎ_１＝ｆ_ＲｅＬＵ（Ｗ_１×ｓ）である。そのため、数式（５）はＷ_１×ｓ＞０の場合、以下の数式（６）となり、更新後のＷ’_１を算出する事ができる。 Here, n ₁ = f _ReLU (W ₁ × s). Therefore, Equation (5) in the case of _{W 1} × s> 0, can be calculated following Equation (6), and the W _'1 after updating.

Ｓ１０６では、学習装置は、Ｓ１０５で学習（更新）した重み係数の認識ネットワークに対して再びパターン認識処理を実行し、認識結果３２０を出力する。合わせて、メモリ９０３にノードｎ_１（ノード１００３）の演算結果ｎ_１を格納する（動作パターン１００８）。 In S106, the learning device executes the pattern recognition process again on the recognition network of the weighting coefficient learned (updated) in S105, and outputs the recognition result 320. At the same time, the operation result n ₁ _{of the node n 1} (node 1003) is stored in the memory 903 (operation pattern 1008).

Ｓ１０７では、学習装置は、Ｓ１０６のパターン認識処理の結果に基づいてセンサー制御ネットワーク３１３の学習を実行する。ここでは、Ｓ１０２で選択した教師データ９０５と認識結果３２０の誤差Ｅ’を利用してセンサー制御ネットワーク３１３を学習する。ここでの誤差Ｅ’は、Ｓ１０６で算出した認識ネットワーク更新後のパターン認識処理結果に対する誤差である。 In S107, the learning device executes learning of the sensor control network 313 based on the result of the pattern recognition process of S106. Here, the sensor control network 313 is learned by using the teacher data 905 selected in S102 and the error E'of the recognition result 320. The error E'here is an error with respect to the pattern recognition processing result after the recognition network update calculated in S106.

まず、認識ネットワーク３０２の係数を固定して誤差Ｅ’を逆伝搬する。認識ネットワーク３０２を逆伝搬して算出した誤差Ｅｓ（＝Ｅ’×Ｗ’_２×Ｗ’_１）とセンサーモデルに記憶しているセンサー制御値ｒから、センサー制御の正解値ｒｔを推定する。ここでは、センサーモデル９０４の逆関数ｆ_ｒｅｖ（逆伝搬誤差，制御値）に従って推定する（数式（７））。 First, the coefficient of the recognition network 302 is fixed and the error E'is back-propagated. A recognition network 302 from the sensor control value r stored and the sensor model and the inverse propagate error was calculated Es (= E '× W' 2 × W '1), to estimate the correct value rt of the sensor control. _{Here, the estimation is performed according to the inverse function frev} (back propagation error, control value) of the sensor model 904 (mathematical expression (7)).

ｆ_ｒｅｖ関数は、センサーモデル（センサー撮像モデル１００１とセンサー制御モデル１００２）の逆関数である。センサーデータの誤差値Ｅｓと現在の制御値ｒから制御パラメータの正解値ｒ_ｔを逆算する。 The f _rev function is an inverse function of the sensor model (sensor imaging model 1001 and sensor control model 1002). Calculated backward the correct value r _t of the control parameter from the error value Es and the current control value r of the sensor data.

図１１は、センサーのゲイン制御値と出力信号との関係を示す図である。より詳細には、センサーモデルの逆関数をゲイン制御を例として模式的に示した図である。直線１１０１は、逆関数を実現するための関数を示しており、センサーのゲイン制御値と出力信号との関係を表現する関数である。なお、線形関数として示しているが、実際は論理的な解析や実験に基づいて定まる任意の関数であり、近似関数やテーブル情報として保持される。直線１１０１は、センサー撮像モデル１００１の逆関数とセンサー制御モデル１００２の逆関数とが合成されたものに相当する。 FIG. 11 is a diagram showing the relationship between the gain control value of the sensor and the output signal. More specifically, it is a diagram schematically showing the inverse function of the sensor model by taking gain control as an example. The straight line 1101 shows a function for realizing an inverse function, and is a function expressing the relationship between the gain control value of the sensor and the output signal. Although it is shown as a linear function, it is actually an arbitrary function determined based on logical analysis or experiment, and is retained as an approximate function or table information. The straight line 1101 corresponds to a combination of the inverse function of the sensor imaging model 1001 and the inverse function of the sensor control model 1002.

ポイント１１０２は、ゲイン制御の制御値ｒとその際の出力信号の関係を示すポイントである。ポイント１１０３は、センサー出力誤差Ｅｓに従ってゲイン制御の正解値を求めるポイントを示す。モデル内のメモリに記憶する疑似センサーデータ生成時の出力信号ｒ（ポイント１１０２）と認識ネットワークから逆伝搬するセンサー出力誤差信号Ｅｓとを用いて、ゲイン制御の正解値ｒ_ｔ（ポイント１１０３）を求める。 Point 1102 is a point showing the relationship between the control value r of the gain control and the output signal at that time. Point 1103 indicates a point for obtaining the correct value of gain control according to the sensor output error Es. By using the sensor output error signal output signal during the pseudo sensor data generated to be stored in memory in the model r (the point 1102) is back propagation from the recognition network Es, obtains gain control of the correct value r _{t (point} 1103) ..

得られたゲイン制御の正解値ｒ_ｔを用いて、バックプロパゲーション法により、ノードｒ（ノード１００５）の重み係数Ｗ_ｒを更新する（動作パターン１００９）。より詳細には、メモリ９０３に格納されたノードｎ_１（ノード１００３）の出力データに基づいて更新する。 With correct value _{r t} of the resulting gain control, by a back propagation method, it updates the weight coefficient _{W r} of node r (Node 1005) (operation pattern 1009). More specifically, the update is performed based on the output data of _{the node n 1} (node 1003) stored in the memory 903.

センサー制御ネットワーク３１３の重み係数Ｗ_ｒは、学習係数をβとすると以下の数式（８）で更新される。 Weight coefficient W _r of the sensor control network 313 is updated by the following equation and the learning coefficient and beta (8).

Ｅ_ｒをセンサー制御ネットワーク３１３を学習するための誤差データ、ｒをセンサー制御ネットワーク３１３のノードｒの出力値、ｒ_ｔをセンサー制御値の正解値とすると、数式（９）を満たす。 Error data for learning a sensor control network 313 to E _r, the output value of the node r of the r sensor control network 313, if the r _t a correct value of the sensor control values satisfy the equation (9).

そのため、数式（８）は以下の数式（１０）に変形する事ができる。 Therefore, the mathematical formula (8) can be transformed into the following mathematical formula (10).

これにより数式（７）及び数式（１０）から、以下の数式（１１）によりセンサー制御ネットワークの係数を更新する事ができる。 As a result, the coefficient of the sensor control network can be updated from the mathematical formula (7) and the mathematical formula (10) by the following mathematical formula (11).

以上の処理を画像データ３０１に含まれる全て或いは選択された複数の位置に対して実行する。即ち、適切にセンサーを制御する回帰情報に相当するマップ３１７が生成される様に係数Ｗ’_ｒを学習する。 The above processing is executed for all or a plurality of selected positions included in the image data 301. _{That is, the coefficient W'r} is learned so that the map 317 corresponding to the regression information that appropriately controls the sensor is generated.

Ｓ１０８では、学習装置は、更新した重み係数Ｗ’_ｒのセンサー制御ネットワークを利用してセンサー制御パラメータを回帰する。 In S108, the learning device uses the sensor control network of the updated weight coefficient W _'r regressing sensor control parameter.

Ｓ１０９では、学習装置は、回帰したパラメータをセンサーモデルの制御パラメータとして設定する。すなわち、次の画像データ（次のループ）に対するＳ１０３では、ここで設定した制御パラメータを用いて、画像データ３０１を疑似センサーデータ９０２に変換する。センサーの制御単位は、制御パラメータに相当する回帰データマップ３１８のサイズに応じて求まる部分領域単位となる。 In S109, the learning device sets the regressed parameters as control parameters of the sensor model. That is, in S103 for the next image data (next loop), the image data 301 is converted into pseudo sensor data 902 using the control parameters set here. The control unit of the sensor is a subregion unit obtained according to the size of the regression data map 318 corresponding to the control parameter.

Ｓ１１０では、学習装置は、所定の終了条件を満たしているか否かを判定する。満たしている場合はＳ１１１に進み、満たしていない場合はＳ１０２に戻る。所定の終了条件は、例えば、予め指定された複数の画像データに対する学習処理の完了である。 In S110, the learning device determines whether or not the predetermined end condition is satisfied. If it is satisfied, the process proceeds to S111, and if it is not satisfied, the process returns to S102. The predetermined end condition is, for example, the completion of the learning process for a plurality of predetermined image data.

Ｓ１１１では、学習装置は、学習結果を取り出す。ここでの学習結果は認識ネットワーク３０２とセンサー制御ネットワーク３１３の重み係数となる。すなわち、取得した重み係数は、パターン認識装置２０１のＲＡＭ２０４に格納される。これにより、パターン認識装置２０１は、より適切にパターン認識処理を実行することが可能となる。 In S111, the learning device takes out the learning result. The learning result here becomes the weighting coefficient of the recognition network 302 and the sensor control network 313. That is, the acquired weighting coefficient is stored in the RAM 204 of the pattern recognition device 201. As a result, the pattern recognition device 201 can execute the pattern recognition process more appropriately.

以上説明したとおり第１実施形態によれば、画像データ及びセンサー制御モデル１００２を含むセンサーモデル９０１を利用してセンサー制御ネットワーク３１３の学習を行う。これにより、パターン認識装置２０１において、処理対象データの多様な変動に対してよりロバストなパターン認識が可能となる。 As described above, according to the first embodiment, the sensor control network 313 is learned by using the sensor model 901 including the image data and the sensor control model 1002. As a result, the pattern recognition device 201 can perform more robust pattern recognition with respect to various fluctuations in the data to be processed.

また、センサー制御ネットワーク３１３では、認識ネットワーク３０２の下位階層の特徴マップ３０８を利用して制御信号を回帰する。つまり、センサー制御ネットワーク３１３は、認識ネットワーク３０２の演算過程で得られる特徴量を認識ネットワーク３０２と共有する。これにより、パターン認識装置２０１において、回帰性能の向上・学習の容易化が期待されると共に、全体の演算コストを削減することができる。 Further, in the sensor control network 313, the control signal is regressed by using the feature map 308 in the lower layer of the recognition network 302. That is, the sensor control network 313 shares the feature amount obtained in the calculation process of the recognition network 302 with the recognition network 302. As a result, in the pattern recognition device 201, it is expected that the regression performance will be improved and learning will be facilitated, and the overall calculation cost can be reduced.

（第２実施形態）
第２実施形態では、センサー制御ネットワーク３１３の学習時に、認識ネットワーク３０２の一部を併せて学習する形態について説明する。すなわち、第１実施形態ではセンサー制御ネットワーク３１３の学習時に、認識ネットワーク３０２の学習は行わない場合について説明したが、学習方法はこれに限定されない。 (Second Embodiment)
In the second embodiment, a mode in which a part of the recognition network 302 is also learned at the time of learning the sensor control network 313 will be described. That is, in the first embodiment, the case where the recognition network 302 is not learned at the time of learning the sensor control network 313 has been described, but the learning method is not limited to this.

＜学習装置の動作＞
図１２は、第２実施形態における学習処理の具体例を示す図である。より具体的には、制御ネットワークの学習における動作パターンを示しており、第１実施形態の動作パターン１００９に対応する。その他の処理については第１実施形態（図１、図１０）と同様であるため説明は省略する。 <Operation of learning device>
FIG. 12 is a diagram showing a specific example of the learning process in the second embodiment. More specifically, it shows an operation pattern in learning of a control network, and corresponds to an operation pattern 1009 of the first embodiment. Since other processes are the same as those in the first embodiment (FIGS. 1 and 10), description thereof will be omitted.

第１実施形態と同様に、センサーモデルをセンサー撮像モデル１２０１とセンサー制御モデル１２０２とに分けて示している。また、ノード１２０３〜１２０４は認識ネットワーク３０２のノードであり、ノード１２０５はセンサー制御ネットワーク３１３のノードである。 Similar to the first embodiment, the sensor model is divided into a sensor imaging model 1201 and a sensor control model 1202. Further, the nodes 1203 to 1204 are the nodes of the recognition network 302, and the nodes 1205 are the nodes of the sensor control network 313.

上述したように、第２実施形態では、センサー制御ネットワーク３１３の学習時に認識ネットワーク３０２の係数Ｗ’_１をＷ”_１に更新する。より具体的には、第１実施形態と同様に式（１０）でＷ’_ｒの更新するとともに、以下の数式（１２）でＷ’_１を更新する。 As described above, in the second embodiment, More specifically. For updating the coefficients W _'1 of the recognition network 302 at the time of learning of the sensor control network 313 W _"1, similarly to the first embodiment Formula (10 'and it updates the _r, W by the following equation (12)' W in) to update _1.

このような学習処理を行うことにより、ノードｎ_１（ノード１２０３）の出力する特徴量が、センサー制御ネットワーク３１３にとっても好適な特徴量となる。 By performing such learning processing, _{the feature amount output by the node n 1} (node 1203) becomes a feature amount suitable for the sensor control network 313.

以上説明したとおり第２実施形態によれば、第１実施形態に比較してよりロバストなパターン認識が可能となる。 As described above, according to the second embodiment, more robust pattern recognition is possible as compared with the first embodiment.

また、第１実施形態（図１）と同様に、認識ネットワーク３０２とセンサー制御ネットワーク３１３を交互に学習（共進化的な学習）することもできる。その場合、各ネットワークに対して好適な係数を学習する事ができる。そのため、センサー制御ネットワーク３１３の学習に伴う認識ネットワーク３０２の性能への影響を少なくすることが出来る。さらに、数式（１２）の学習係数βを小さな値にする事で認識ネットワーク３０２の性能への影響をより少なくすることも可能である。 Further, similarly to the first embodiment (FIG. 1), the recognition network 302 and the sensor control network 313 can be alternately learned (co-evolutionary learning). In that case, it is possible to learn suitable coefficients for each network. Therefore, it is possible to reduce the influence on the performance of the recognition network 302 due to the learning of the sensor control network 313. Further, it is possible to reduce the influence on the performance of the recognition network 302 by setting the learning coefficient β of the mathematical formula (12) to a small value.

（第３実施形態）
第３実施形態では、認識ネットワーク３０２とセンサー制御ネットワーク３１３とをそれぞれ独立に学習する形態について説明する。すなわち、第１及び第２実施形態では認識ネットワーク３０２とセンサー制御ネットワーク３１３を交互に学習する場合について説明したが、学習方法はこれに限定されない。 (Third Embodiment)
In the third embodiment, a mode in which the recognition network 302 and the sensor control network 313 are learned independently will be described. That is, in the first and second embodiments, the case where the recognition network 302 and the sensor control network 313 are alternately learned has been described, but the learning method is not limited to this.

＜学習装置の動作＞
図１３は、第３実施形態における学習処理の動作フローチャートである。なお、Ｓ１３０１〜Ｓ１３０５、Ｓ１３１０〜Ｓ１３１３は、図１のＳ１０１〜１０５、Ｓ１０６〜１０９と同様であるため説明は省略する。 <Operation of learning device>
FIG. 13 is an operation flowchart of the learning process according to the third embodiment. Since S1301 to S1305 and S131 to S1313 are the same as S101 to 105 and S106 to 109 in FIG. 1, description thereof will be omitted.

Ｓ１３０６では、学習装置は、所定の終了条件を満たしているか否かを判定する。満たしている場合はＳ１３０７に進み、満たしていない場合はＳ１３０２に戻る。所定の終了条件は、例えば、予め指定された複数の画像データに対する学習処理の完了である。Ｓ１３０７では、学習装置は、学習結果を取り出す。ここでの学習結果は認識ネットワーク３０２の重み係数となる。 In S1306, the learning device determines whether or not the predetermined end condition is satisfied. If it is satisfied, the process proceeds to S1307, and if it is not satisfied, the process returns to S1302. The predetermined end condition is, for example, the completion of the learning process for a plurality of predetermined image data. In S1307, the learning device takes out the learning result. The learning result here becomes the weighting coefficient of the recognition network 302.

Ｓ１３０８では、Ｓ１３０２と同様に、学習装置は、学習処理に使用する学習データを選択する。Ｓ１３０９では、Ｓ１３０３と同様に、学習装置は、センサーモデルの制御条件に従って画像データを変換して疑似センサーデータを生成する。 In S1308, similarly to S1302, the learning device selects the learning data to be used for the learning process. In S1309, similarly to S1303, the learning device converts the image data according to the control conditions of the sensor model to generate pseudo sensor data.

Ｓ１３１４では、学習装置は、所定の終了条件を満たしているか否かを判定する。満たしている場合はＳ１３１５に進み、満たしていない場合はＳ１３０８に戻る。所定の終了条件は、例えば、予め指定された複数の画像データに対する学習処理の完了である。Ｓ１３１５では、学習装置は、学習結果を取り出す。ここでの学習結果はセンサー制御ネットワーク３１３の重み係数となる。 In S1314, the learning device determines whether or not the predetermined end condition is satisfied. If it is satisfied, the process proceeds to S1315, and if it is not satisfied, the process returns to S1308. The predetermined end condition is, for example, the completion of the learning process for a plurality of predetermined image data. In S1315, the learning device takes out the learning result. The learning result here becomes the weighting coefficient of the sensor control network 313.

以上説明したとおり第３実施形態によれば、認識ネットワーク３０２とセンサー制御ネットワーク３１３とを別々に学習する。この構成により、学習済みの認識ネットワーク３０２に影響を与えることなく、センサー制御ネットワーク３１３を学習することができる。 As described above, according to the third embodiment, the recognition network 302 and the sensor control network 313 are learned separately. With this configuration, the sensor control network 313 can be learned without affecting the learned recognition network 302.

（変形例）
上述の実施形態では、認識ネットワークの例として画像中の特定のパターンを検出するタスクの場合について説明したが本発明はこれに限定されない。認識対象物の属性を認識するタスクや画像の内容を理解するタスク等、様々な認識タスクに適用する事ができる。更に、認識タスクだけではなく、画像の幾何学的変換、輝度／色の補正、ノイズ除去、フォーマット変換等の様々な画像処理タスクにも適用可能である。これにより、生成画質の向上が期待できる。 (Modification example)
In the above-described embodiment, the case of the task of detecting a specific pattern in the image has been described as an example of the recognition network, but the present invention is not limited thereto. It can be applied to various recognition tasks such as a task of recognizing the attributes of a recognition object and a task of understanding the contents of an image. Furthermore, it can be applied not only to recognition tasks but also to various image processing tasks such as geometric transformation of images, brightness / color correction, noise removal, and format conversion. This can be expected to improve the generated image quality.

上述の実施形態では２次元の画像センサーに対する例を説明したが、これに限るわけではない。例えば、データの次元やモダリティーが異なる様々なセンサーに適用することが可能である。また、音声データや電波センサーデータ様々なセンシングデバイスを利用したシステムに対して適用可能である。 In the above-described embodiment, an example for a two-dimensional image sensor has been described, but the present invention is not limited to this. For example, it can be applied to various sensors with different data dimensions and modality. In addition, voice data and radio wave sensor data can be applied to systems using various sensing devices.

上述の実施形態ではセンサー制御ネットワーク３１３においてゲインを制御する場合について説明したが、本発明はこれに限定されない。例えば、露光時間・フレームレート・感度・解像度等他の様々な読み出しパラメータの制御に適用する事が可能である。 In the above-described embodiment, the case where the gain is controlled in the sensor control network 313 has been described, but the present invention is not limited thereto. For example, it can be applied to control various other readout parameters such as exposure time, frame rate, sensitivity, and resolution.

上述の実施形態ではニューラルネットワークの結合係数（重み係数）を学習する場合について説明したが、ＮｅｕｒｏＥｖｏｌｕｔｉｏｎ手法の様にネットワークの構成を同時に学習する方法に適用しても良い。 In the above-described embodiment, the case of learning the coupling coefficient (weighting coefficient) of the neural network has been described, but it may be applied to a method of simultaneously learning the network configuration such as the NeuroEvolution method.

上述の実施形態では学習方法としてバックプロパゲーション法による場合について説明したが、本発明はこれに限定されない。例えば、遺伝的アルゴリズム等の他の様々なメタヒューリスティクス手法を適用することが可能である。この場合、誤差逆伝搬に必要なセンサーモデルの逆関数を設定することが困難な場合にも本発明を適用することができる。 In the above-described embodiment, the case where the backpropagation method is used as the learning method has been described, but the present invention is not limited to this. For example, various other metaheuristic methods such as genetic algorithms can be applied. In this case, the present invention can be applied even when it is difficult to set the inverse function of the sensor model required for error back propagation.

上述の実施形態では撮像制御処理部７０５がブロック単位で撮影条件を制御する場合について説明したが、これに限定されない。画素単位で制御してもよいし、画像全体を一括制御しても良い。画像全体を一括制御する場合は、センサー制御ネットワーク３１３の最終階層の特徴マップのデータを線形判別器に通して制御データを算出する構成としてもよい。あるいは、最終階層の特徴マップに対してグローバルプーリング処理を施した結果を制御データとする構成としてもよい。 In the above-described embodiment, the case where the image pickup control processing unit 705 controls the shooting conditions in block units has been described, but the present invention is not limited to this. It may be controlled in pixel units, or the entire image may be controlled collectively. When the entire image is collectively controlled, the control data may be calculated by passing the data of the feature map of the final layer of the sensor control network 313 through a linear discriminator. Alternatively, the control data may be the result of performing global pooling processing on the feature map of the final layer.

上述の実施形態ではセンサー制御ネットワーク３１３は認識ネットワーク３０２の下位階層の特徴マップ３０８を利用して制御信号を利用（回帰）したが、これに限定されない。上位階層の特徴マップを利用しても良いし、各階層の特徴マップを選択して利用しても良い。また、認識ネットワーク３０２やセンサー制御ネットワーク３１３の階層構造（階層の数や階層内の特徴マップの数）は適用する認識対象や制御対象等に応じてどの様な構成としても良い。更には、認識ネットワーク３０２の特徴マップは使用せずに撮像デバイス７０４の出力を入力として独立したセンサー制御ネットワークを構成しても良い。ただし、その場合もセンサー制御ネットワーク３１３の学習時には認識ネットワーク３０２を利用して学習する。 In the above-described embodiment, the sensor control network 313 uses (regresses) the control signal by using the feature map 308 in the lower layer of the recognition network 302, but the present invention is not limited to this. The feature map of the upper layer may be used, or the feature map of each layer may be selected and used. Further, the hierarchical structure (the number of layers and the number of feature maps in the hierarchy) of the recognition network 302 and the sensor control network 313 may have any configuration depending on the recognition target to be applied, the control target, and the like. Further, an independent sensor control network may be configured by using the output of the image pickup device 704 as an input without using the feature map of the recognition network 302. However, even in that case, the recognition network 302 is used for learning when learning the sensor control network 313.

上述の実施形態では、階層的な特徴抽出処理の最終層でパターン認識の信頼度や制御条件を生成する場合について説明したがこれに限定されない。例えば、中間層の特徴マップを直接参照して認識や制御データ生成を実現する構成でも良い。 In the above-described embodiment, the case where the reliability and control conditions of pattern recognition are generated in the final layer of the hierarchical feature extraction process has been described, but the present invention is not limited to this. For example, it may be configured to realize recognition and control data generation by directly referring to the feature map of the intermediate layer.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other Examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the above embodiments, and various modifications and modifications can be made without departing from the spirit and scope of the invention. Therefore, a claim is attached to make the scope of the invention public.

７０１特徴抽出処理部；７０２制御部；７０３メモリ；７０４撮像デバイス；７０５撮像制御処理部；７０６画像補正処理部；３０２認識ネットワーク；３１３センサー制御ネットワーク 701 Feature extraction processing unit; 702 control unit; 703 memory; 704 imaging device; 705 imaging control processing unit; 706 image correction processing unit; 302 recognition network; 313 sensor control network

Claims

An information processing device that can be connected to a sensing device
A setting means for setting data acquisition conditions in the sensing device, and
A first processing means for executing a hierarchical feature extraction process using a first neural network (NN) on the data obtained by the sensing device, and
A second processing means that uses the feature map in the middle layer of the first NN to generate regression data indicating the data acquisition conditions used in subsequent data acquisition by the sensing device.
Have,
The information processing apparatus is characterized in that the setting means sets the data acquisition conditions shown in the regression data in the sensing device.

The information processing apparatus according to claim 1, wherein the second processing means uses a second NN to generate the regression data.

The information processing apparatus according to claim 2, wherein at least one of the first NN and the second NN is a convolutional neural network (CNN).

The information processing apparatus according to any one of claims 1 to 3, wherein the second processing means generates the regression data by using the feature map in the lower layer of the first NN. ..

The information processing apparatus according to any one of claims 1 to 4, wherein the control unit of the data acquisition condition in the regression data is a subregion unit of the sensing device.

The sensing device is an imaging device.
The information processing apparatus according to any one of claims 1 to 5, wherein the data is image data.

A method of learning the weighting coefficient of the second NN in the information processing apparatus according to claim 2.
A generation step of generating pseudo data that simulates the data output from the sensing device based on the learning data and the data acquisition conditions using the sensor model according to the characteristics of the sensing device.
The first step of learning the weighting coefficient of the first NN using the pseudo data, and
A second step of learning the weighting coefficient of the second NN by using the pseudo data and the weighting coefficient of the first NN learned in the first step, and
How to include.

The first step is
A first recognition step of executing a recognition process using the first NN with the pseudo data as an input, and
A first that learns the weighting coefficient of the first NN by back-propagating the error between the recognition result by the first recognition step and the teacher data prepared in advance corresponding to the learning data by back-propagating the first NN. Learning process and
7. The method of claim 7.

The second step is
A second recognition step of executing the recognition process using the first NN with the pseudo data as an input, and
The recognition result by the second recognition step, the result of back-propagating the error of the teacher data prepared in advance corresponding to the learning data by the first NN, and the inverse function of the sensor model are used. In the second learning step of learning the weighting coefficient of the second NN,
The method according to claim 7 or 8, wherein the method comprises.

The method according to claim 9, wherein in the second learning step, a part of the weighting factors included in the first NN is fixed.

The method according to any one of claims 7 to 10, wherein the first step and the second step are alternately executed on a plurality of learning data.

A program for causing a computer to execute the method according to any one of claims 7 to 11.