JP7398938B2

JP7398938B2 - Information processing device and its learning method

Info

Publication number: JP7398938B2
Application number: JP2019218346A
Authority: JP
Inventors: 政美加藤; 克彦森; 修野村
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-12-02
Filing date: 2019-12-02
Publication date: 2023-12-15
Anticipated expiration: 2039-12-02
Also published as: JP2021089493A

Description

本発明は、ニューラルネットワークを用いたパターン認識処理に関するものである。 The present invention relates to pattern recognition processing using a neural network.

認識対象の変動に対して頑健なパターン認識を可能にする手法として、コンボリューショナルニューラルネットワーク（以下ＣＮＮと略記する）に代表される階層的な演算手法が注目されている。例えば、非特許文献１には、深層学習技術に基づくパターン認識手法の様々な応用例・実装例が開示されている。 Hierarchical calculation methods represented by convolutional neural networks (hereinafter abbreviated as CNN) are attracting attention as a method that enables pattern recognition that is robust against changes in recognition targets. For example, Non-Patent Document 1 discloses various application examples and implementation examples of a pattern recognition method based on deep learning technology.

また、認識対象物の撮影環境（照明や被写体の状態等）の大きな変動に対応する手法として、特許文献１では、撮影デバイスの撮影条件を所定期間毎に変化させて画像中の顔検出確率を向上させる手法が開示されている。また、特許文献２では、顔検出の結果に基づいて撮像デバイスのゲインや露光時間を制御し、検出した人物の属性認識処理に好適な条件で画像データを再取得する手法が開示されている。 In addition, as a method to cope with large fluctuations in the photographing environment (lighting, subject condition, etc.) of the recognition target, Patent Document 1 proposes a method that changes the photographing conditions of the photographing device at predetermined intervals to increase the face detection probability in the image. A method for improving this is disclosed. Further, Patent Document 2 discloses a method of controlling the gain and exposure time of an imaging device based on the result of face detection and re-acquiring image data under conditions suitable for attribute recognition processing of a detected person.

特開２０１４－１２７９９９号公報Japanese Patent Application Publication No. 2014-127999 特開２０１７－０９８７４６号公報JP2017-098746A

Yann LeCun, Koray Kavukcuoglu and Clement Farabet, "Convolutional Networks and Applications in Vision", Proc. International Symposium on Circuits and Systems (ISCAS'10), IEEE, 2010Yann LeCun, Koray Kavukcuoglu and Clement Farabet, "Convolutional Networks and Applications in Vision", Proc. International Symposium on Circuits and Systems (ISCAS'10), IEEE, 2010

しかしながら、特許文献１に記載の手法では、撮影条件を所定期間毎に変化させる構成であるため、常に適切な画像の取得が可能となるわけではない。また、特許文献２に記載の手法では、撮影環境の多様な変動に対して最適な撮影条件の変更テーブルを事前に決定することが困難であるという課題がある。また、同一フレーム画像の領域毎に最適な撮影条件が異なる場合には対応することができない。 However, since the method described in Patent Document 1 is configured to change the imaging conditions at predetermined intervals, it is not always possible to obtain appropriate images. Furthermore, the method described in Patent Document 2 has a problem in that it is difficult to determine in advance an optimal photographing condition change table for various changes in the photographing environment. Furthermore, it is not possible to deal with cases where the optimal photographing conditions differ for each region of the same frame image.

本発明は、このような問題に鑑みてなされたものであり、処理対象データの多様な変動に対してよりロバストなパターン認識を可能とする技術を提供することを目的としている。 The present invention has been made in view of such problems, and an object of the present invention is to provide a technique that enables more robust pattern recognition against various fluctuations in data to be processed.

上述の問題点を解決するため、本発明に係る情報処理装置は以下の構成を備える。すなわち、センシングデバイスと接続可能な情報処理装置であって、
前記センシングデバイスにおけるデータ取得条件を設定する設定手段と、
前記センシングデバイスにより得られたデータに対して第１のニューラルネットワーク（ＮＮ）を使用して階層的な特徴抽出処理を実行する第１の処理手段と、
前記第１のＮＮの中間層における特徴マップを使用して、前記センシングデバイスにより後続のデータ取得で使用されるデータ取得条件を示す回帰データを生成する第２の処理手段と、
を有し、
前記設定手段は、前記回帰データに示されるデータ取得条件を前記センシングデバイスに設定する。 In order to solve the above-mentioned problems, an information processing device according to the present invention has the following configuration. That is, an information processing device that can be connected to a sensing device,
Setting means for setting data acquisition conditions in the sensing device;
a first processing means for performing hierarchical feature extraction processing on data obtained by the sensing device using a first neural network (NN);
a second processing means for generating regression data indicative of data acquisition conditions to be used in subsequent data acquisition by the sensing device using the feature map in the intermediate layer of the first NN;
has
The setting means sets data acquisition conditions indicated in the regression data to the sensing device.

本発明によれば、処理対象データの多様な変動に対してよりロバストなパターン認識を可能とする技術を提供することができる。 According to the present invention, it is possible to provide a technique that enables more robust pattern recognition against various fluctuations in data to be processed.

第１実施形態における学習処理の動作フローチャートである。It is an operation flowchart of learning processing in a 1st embodiment. 画像処理システム及び学習装置の概略構成を示す図である。1 is a diagram showing a schematic configuration of an image processing system and a learning device. 認識処理部における処理を説明する図である。It is a figure explaining the processing in a recognition processing part. 階層的特徴抽出処理部の詳細構成を示す図である。FIG. 3 is a diagram showing a detailed configuration of a hierarchical feature extraction processing section. パターン認識装置における動作タイミングを説明する図である。It is a figure explaining the operation timing in a pattern recognition device. 積層デバイスの構成を説明する図である。FIG. 2 is a diagram illustrating the configuration of a stacked device. パターン認識装置の詳細構成を示す図である。FIG. 2 is a diagram showing a detailed configuration of a pattern recognition device. 回帰マップを説明する図である。It is a figure explaining a regression map. 各ネットワークの学習処理の動作を説明する図である。FIG. 3 is a diagram illustrating the operation of learning processing of each network. 第１実施形態における学習処理の具体例を示す図である。It is a figure showing a specific example of learning processing in a 1st embodiment. センサーのゲイン制御値と出力信号との関係を示す図である。FIG. 3 is a diagram showing a relationship between a sensor gain control value and an output signal. 第２実施形態における学習処理の具体例を示す図である。It is a figure showing a specific example of learning processing in a 2nd embodiment. 第３実施形態における学習処理の動作フローチャートである。It is an operation flowchart of learning processing in a 3rd embodiment.

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る発明を限定するものでするものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. Note that the following embodiments are not intended to limit the claimed invention. Although a plurality of features are described in the embodiments, not all of these features are essential to the invention, and the plurality of features may be arbitrarily combined. Furthermore, in the accompanying drawings, the same or similar components are designated by the same reference numerals, and redundant description will be omitted.

（第１実施形態）
本発明に係る情報処理装置の第１実施形態として、パターン認識装置を利用した画像処理システムを例に挙げて以下に説明する。 (First embodiment)
As a first embodiment of the information processing device according to the present invention, an image processing system using a pattern recognition device will be described below as an example.

＜システム及び装置の構成＞
図２は、画像処理システム及び学習装置の概略構成を示す図である。図２（ａ）は、パターン認識装置２０１を利用した画像処理システムの構成例を示している。当該システムは画像データから特定の物体の領域を検出する機能を有する。一方、図２（ｂ）は、学習装置の構成例を示している。学習装置による学習結果（重み係数）はパターン認識装置２０１に用いられることになる。なお、ここでは、画像処理システム及び学習装置を個別の装置として記載しているが、一体構成の装置として構成してもよい。 <System and device configuration>
FIG. 2 is a diagram showing a schematic configuration of an image processing system and a learning device. FIG. 2A shows a configuration example of an image processing system using the pattern recognition device 201. The system has the ability to detect a specific object region from image data. On the other hand, FIG. 2(b) shows an example of the configuration of the learning device. The learning results (weighting coefficients) obtained by the learning device will be used by the pattern recognition device 201. Note that although the image processing system and the learning device are described here as separate devices, they may be configured as an integrated device.

画像処理システムは、パターン認識装置２０１、ＣＰＵ（Central Prosessing Unit）２０５、ＲＯＭ（Read Only Memory）２０６、ＲＡＭ（Random Access Memory）２０７、ＤＭＡＣ（Direct Memory Access Controller）を有する。また、パターン認識装置２０１は、撮像デバイス２０２、認識処理部２０３、ＲＡＭ２０４を有する。 The image processing system includes a pattern recognition device 201, a CPU (Central Processing Unit) 205, a ROM (Read Only Memory) 206, a RAM (Random Access Memory) 207, and a DMAC (Direct Memory Access Controller). Furthermore, the pattern recognition device 201 includes an imaging device 202, a recognition processing section 203, and a RAM 204.

撮像デバイス２０２は、光学系、光電変換デバイス、ドライバー回路／ＡＤコンバーター等により構成される。光電変換デバイスとしては、ＣＣＤ（Charge-Coupled Devices）又はＣＭＯＳ（Complimentary Metal Oxide Semiconductor）センサー等が利用され得る。認識処理部２０３は、撮像デバイス２０２を制御して取得した画像データに対して所定の認識処理を実行する。ＲＡＭ２０４は、認識処理部２０３の演算作業バッファとして使用される。ここでは、データの伝送遅延を低減するため、パターン認識装置２０１が撮像デバイス２０２と認識処理部２０３とを含む構成として記載しているが、撮像デバイスと接続可能であれば別体の構成としてもよい。 The imaging device 202 includes an optical system, a photoelectric conversion device, a driver circuit/AD converter, and the like. As the photoelectric conversion device, a CCD (Charge-Coupled Devices) or a CMOS (Complimentary Metal Oxide Semiconductor) sensor may be used. The recognition processing unit 203 controls the imaging device 202 to perform predetermined recognition processing on the acquired image data. The RAM 204 is used as a calculation work buffer for the recognition processing unit 203. Here, in order to reduce data transmission delay, the pattern recognition apparatus 201 is described as having a configuration including an imaging device 202 and a recognition processing unit 203, but it may also be configured as a separate unit if it can be connected to the imaging device. good.

ＣＰＵ２０５は、画像処理システム全体の制御を司る。ＲＯＭ２０６は、ＣＰＵ２０５の動作を規定する命令やパラメータデータを格納する。ＲＡＭ２０７は、ＣＰＵ２０５の動作に必要なメモリである。ＤＭＡＣ２０８は、パターン認識処理装置２０１とＲＡＭ２０７との間のデータ転送等を司る。データバス２０９は、各デバイス間のデータ転送路である。 The CPU 205 controls the entire image processing system. The ROM 206 stores instructions and parameter data that define the operation of the CPU 205. The RAM 207 is a memory necessary for the operation of the CPU 205. The DMAC 208 controls data transfer between the pattern recognition processing device 201 and the RAM 207. Data bus 209 is a data transfer path between devices.

パターン認識処理装置２０１は、ＣＰＵ２０５からの指示に従って撮像及び認識処理を実行し、その結果をＲＡＭ２０７に格納する。ＣＰＵ２０５は認識結果を利用して様々なアプリケーションを提供する。 The pattern recognition processing device 201 executes imaging and recognition processing according to instructions from the CPU 205, and stores the results in the RAM 207. The CPU 205 provides various applications using the recognition results.

学習装置は、演算装置２１０、インターフェース装置２１２、記憶装置２１３を有し、例えば汎用のコンピュータ装置により実現することが出来る。演算装置２１０は、ＣＰＵ、メモリ等のコンピュータデバイスを有し、図１を参照して後述する学習処理を実行する。記憶装置２１３は、ハードディスクドライブ等の大容量データ格納装置であり、演算装置２１０が実行するプログラムや学習に使用する画像データ・教師データ等を格納する。インターフェース装置２１２は、学習によって得られたデータを取り出すためのインターフェースであり、通信インターフェースや可搬型記憶装置のインターフェースである。学習装置による学習結果はインターフェース装置２１２を介して取り出され、画像処理システムのＲＯＭ２０６等に格納される。 The learning device includes a calculation device 210, an interface device 212, and a storage device 213, and can be realized by, for example, a general-purpose computer device. The arithmetic unit 210 includes computer devices such as a CPU and a memory, and executes a learning process that will be described later with reference to FIG. The storage device 213 is a large-capacity data storage device such as a hard disk drive, and stores programs executed by the arithmetic device 210, image data used for learning, teacher data, and the like. The interface device 212 is an interface for retrieving data obtained through learning, and is a communication interface or a portable storage device interface. The learning results by the learning device are taken out via the interface device 212 and stored in the ROM 206 or the like of the image processing system.

図７は、パターン認識装置２０１の詳細構成を示す図である。認識処理部２０３の構成をより詳細に記載した図である。 FIG. 7 is a diagram showing the detailed configuration of the pattern recognition device 201. As shown in FIG. 2 is a diagram illustrating the configuration of a recognition processing unit 203 in more detail. FIG.

特徴抽出処理部７０１は、メモリ７０３に階層的演算の中間結果を保持しながら階層的な特徴抽出処理を繰り返し実行し、抽出した特徴量を利用して認識処理結果及び制御データを出力する。 The feature extraction processing unit 701 repeatedly executes hierarchical feature extraction processing while retaining intermediate results of the hierarchical calculation in the memory 703, and outputs recognition processing results and control data using the extracted feature amounts.

撮像デバイス７０４は、撮像デバイス２０２に対応し、光学系、光電変換デバイス、ドライバー回路／ＡＤコンバーター等により構成される。撮像制御処理部７０５は、特徴抽出処理部７０１から提供された制御データに従って撮像デバイス７０４の動作（撮影条件など）を制御する。撮影条件は、具体的には、光電変換後の信号に対するゲインや光電変換デバイス（フォトダイオード等）の蓄積時間（露光時間）、Ａ／Ｄ変換の特性等を含む。撮像制御処理部７０５は、センサー面のブロック単位でこれらの撮影条件を制御可能に構成されている。例えば、近年の半導体積層実装技術の発展に伴い、制御ロジックをセンサー面に対して積層実装することが可能となっており、これによりブロック単位や画素単位での読み出し制御を実現することが出来る。 The imaging device 704 corresponds to the imaging device 202 and includes an optical system, a photoelectric conversion device, a driver circuit/AD converter, and the like. The imaging control processing unit 705 controls the operation (imaging conditions, etc.) of the imaging device 704 according to the control data provided from the feature extraction processing unit 701. Specifically, the photographing conditions include the gain for the signal after photoelectric conversion, the accumulation time (exposure time) of the photoelectric conversion device (photodiode, etc.), the characteristics of A/D conversion, and the like. The imaging control processing unit 705 is configured to be able to control these imaging conditions on a block-by-block basis on the sensor surface. For example, with the recent development of semiconductor stacking technology, it has become possible to stack control logic on the sensor surface, which makes it possible to implement readout control on a block-by-block or pixel-by-pixel basis.

図６は、積層デバイスの構成を説明する図である。図６（ａ）は、積層デバイスの物理構成を模式的に示している。ここでは、光電変換素子を実装するセンサー層６１、読み出し制御ロジックを実装するロジック層６２、大規模なメモリ及びその制御部を実装するメモリ層６３、を積層した例を示している。センサー層６１は撮像デバイス７０４に対応し、ロジック層６２は撮像制御処理部７０５に対応し、メモリ層６３はメモリ７０３等に対応する。各層間は貫通ビア等により信号を伝達する。 FIG. 6 is a diagram illustrating the configuration of a stacked device. FIG. 6(a) schematically shows the physical configuration of a stacked device. Here, an example is shown in which a sensor layer 61 on which a photoelectric conversion element is mounted, a logic layer 62 on which a read control logic is mounted, and a memory layer 63 on which a large-scale memory and its control section are mounted are stacked. The sensor layer 61 corresponds to the imaging device 704, the logic layer 62 corresponds to the imaging control processing section 705, and the memory layer 63 corresponds to the memory 703 and the like. Signals are transmitted between each layer using through vias or the like.

図６（ｂ）は、ロジック層６２の構成を模式的に示している。ここでは、ロジック層６２において、センサー層６１の光電変換素子を制御するためのｎ×ｎ個の制御回路を配置している。制御回路ｃｔ（ｎ，ｎ）は、対応する位置に存在するセンサー層６１の１以上の光電変換素子の読み出しを制御する。従って、上述の構成では、ｎ×ｎ個のブロックに対してブロック毎に読み出し条件（ゲインや露光時間等）を制御することができる。つまり画像中のｎ×ｎ個の部分毎に撮像特性を制御する事ができる。 FIG. 6(b) schematically shows the configuration of the logic layer 62. Here, in the logic layer 62, n×n control circuits for controlling the photoelectric conversion elements of the sensor layer 61 are arranged. The control circuit ct(n, n) controls reading of one or more photoelectric conversion elements of the sensor layer 61 located at the corresponding position. Therefore, with the above configuration, readout conditions (gain, exposure time, etc.) can be controlled for each block of n×n blocks. In other words, the imaging characteristics can be controlled for each n×n portion in the image.

なお、第１実施形態では特徴抽出処理部７０１もロジック層６２やメモリ層６３に実装することを想定する。例えば、センサー層６１に対して積層実装することで、より少ない遅延で制御データをフィードバックすることが可能になる。撮影環境や対象の変化が速い場合、より少ない画像フレーム遅延で撮像デバイス７０４を制御することが望まれるため、センサー層６１に対する積層実装が好適である。 Note that in the first embodiment, it is assumed that the feature extraction processing unit 701 is also implemented in the logic layer 62 and the memory layer 63. For example, by stacking the sensor layer 61, control data can be fed back with less delay. When the photographing environment or object changes rapidly, it is desirable to control the imaging device 704 with less image frame delay, so stacking the sensor layer 61 is preferable.

図３は、認識処理部２０３における処理を説明する図である。認識処理部２０３は、特徴抽出処理部７０１の論理的な処理構造である認識ネットワーク３０２とセンサー制御ネットワーク３１３とを含む。認識ネットワーク３０２は、撮像デバイス７０４が撮像対象３０１内の所定の物体の位置をＣＮＮにより認識する演算ネットワークである。センサー制御ネットワーク３１３は、ＣＮＮにより撮像デバイスの撮影条件を制御するための情報を抽出する演算ネットワークである。 FIG. 3 is a diagram illustrating processing in the recognition processing unit 203. The recognition processing unit 203 includes a recognition network 302 and a sensor control network 313, which are the logical processing structure of the feature extraction processing unit 701. The recognition network 302 is an arithmetic network in which the imaging device 704 recognizes the position of a predetermined object within the imaging target 301 using CNN. The sensor control network 313 is a calculation network that uses CNN to extract information for controlling the imaging conditions of the imaging device.

ここでは、認識ネットワーク３０２は５階層のＣＮＮにより構成した例を示している。演算処理３０３～３０７は、畳み込み演算、活性化関数演算、プーリング演算等からなる演算処理であり、具体的には後述する図４に示す構成で実現される。 Here, an example is shown in which the recognition network 302 is configured by a five-layer CNN. The calculation processes 303 to 307 are calculation processes including convolution calculations, activation function calculations, pooling calculations, etc., and are specifically realized by the configuration shown in FIG. 4, which will be described later.

特徴マップ３０８～３１２は、ＣＮＮ演算処理の中間層（特徴マップ３０８～３１１）或いは最終層（特徴マップ３１２）と呼ばれ、演算処理３０３～３０７の結果にそれぞれ対応する。特徴マップ３０８～３１２はメモリ７０３に格納される。特徴マップ３０８～３１２は、撮像デバイスが出力する画像データに対して特徴抽出処理された２次元のデータである。 Feature maps 308 to 312 are called intermediate layers (feature maps 308 to 311) or final layers (feature map 312) of CNN calculation processing, and correspond to the results of calculation processing 303 to 307, respectively. Feature maps 308-312 are stored in memory 703. The feature maps 308 to 312 are two-dimensional data obtained by performing feature extraction processing on image data output by an imaging device.

ここで、画像データに対する２次元ＣＮＮ演算処理の詳細について説明する。畳み込み演算のカーネル（フィルタ係数マトリクス）サイズがｃｏｌｕｍｎＳｉｚｅ×ｒｏｗＳｉｚｅであり前階層の特徴マップ数がＬの場合、以下の数式（１）に示される積和演算により１つの特徴マップが算出される。 Here, details of the two-dimensional CNN calculation processing on image data will be explained. When the kernel (filter coefficient matrix) size of the convolution operation is columnSize×rowSize and the number of feature maps in the previous layer is L, one feature map is calculated by the product-sum operation shown in Equation (1) below.

ｉｎｐｕｔ（ｘ，ｙ）：２次元座標（ｘ、ｙ）での参照画素値
ｏｕｔｐｕｔ（ｘ，ｙ）：２次元座標（ｘ、ｙ）での演算結果
ｗｅｉｇｈｔ（ｃｏｌｕｍｎ，ｒｏｗ）：座標（ｘ＋ｃｏｌｕｍｎ、ｙ＋ｒｏｗ）での重み係数
Ｌ：前階層の特徴マップ数
ｃｏｌｕｍｎＳｉｚｅ、ｒｏｗＳｉｚｅ：２次元コンボリューションカーネルの水平方向、垂直方向のサイズ
ＣＮＮ演算処理では、数式（１）に従って複数のコンボリューションカーネルを画素単位で走査しながら積和演算を繰り返し、最終的な積和演算結果を非線形変換（活性化処理）することで特徴マップを算出する。また、生成した特徴マップをプーリング処理により縮小して次の階層で参照する場合もある。特徴マップ３０８～３１２は一つの階層内に複数のマップを有し、異なる重み係数群に対応して異なる特性の特徴のマップが生成される。 input (x, y): reference pixel value at two-dimensional coordinates (x, y) output (x, y): calculation result at two-dimensional coordinates (x, y) weight (column, row): coordinate (x+column, y+row) L: Number of feature maps in the previous layer ColumnSize, rowSize: Horizontal and vertical size of the two-dimensional convolution kernel In CNN calculation processing, multiple convolution kernels are calculated in pixel units according to formula (1). A feature map is calculated by repeating product-sum calculations while scanning and non-linearly transforming (activation processing) the final product-sum calculation results. Furthermore, the generated feature map may be reduced by pooling processing and referred to in the next layer. The feature maps 308 to 312 have a plurality of maps in one hierarchy, and feature maps with different characteristics are generated in response to different weighting coefficient groups.

図４は、特徴抽出処理部７０１の詳細構成を示す図である。特徴抽出処理部７０１は、演算処理３０３～３０７の具体的な実現構成である。 FIG. 4 is a diagram showing the detailed configuration of the feature extraction processing unit 701. The feature extraction processing unit 701 is a concrete implementation configuration of the calculation processes 303 to 307.

参照データバッファ４０１は、畳み込み演算の参照データとなる前階層の特徴マップのデータ（数式（１）におけるｉｎｐｕｔ（ｘ，ｙ））の全てあるいはその一部をメモリから取得しバッファリングする回路である。 The reference data buffer 401 is a circuit that acquires from memory all or part of the data of the feature map of the previous layer (input (x, y) in formula (1)), which serves as reference data for the convolution operation, and buffers it. .

乗算器４０２、累積加算器４０３は、数式（１）の演算を実行する回路である。 The multiplier 402 and the cumulative adder 403 are circuits that execute the calculation of formula (1).

係数データバッファ４０４は、事前に学習によって得られた重み係数データ（数式（１）におけるｗｅｉｇｈｔ（ｃｏｌｕｍｎ，ｒｏｗ））の全てあるいは一部をメモリ７０３から所定の単位で転送しバッファリングする回路である。 The coefficient data buffer 404 is a circuit that transfers and buffers all or part of the weighting coefficient data (weight (column, row) in formula (1)) obtained through learning in advance from the memory 703 in a predetermined unit. .

活性化処理回路４０５は、数式（１）に示す畳み込み演算結果（ｏｕｔｐｕｔ（ｘ，ｙ））に対してＲｅＬＵ（Rectified Linear Unit, Rectifier）等の非線形関数を処理する回路である。 The activation processing circuit 405 is a circuit that processes a nonlinear function such as ReLU (Rectified Linear Unit, Rectifier) on the convolution operation result (output (x, y)) shown in equation (1).

プーリング処理回路４０６は、特徴マップを最大値フィルタ等の空間フィルタを用いて縮小処理する回路である。プーリング処理をしない場合は、活性化処理４０５の結果をメモリ７０３に格納し、プーリング処理をする場合は、プーリング処理４０６の結果をメモリ７０３に格納する。ここで格納するデータが現階層の特徴マップとなる。 The pooling processing circuit 406 is a circuit that reduces the feature map using a spatial filter such as a maximum value filter. If pooling processing is not performed, the result of activation processing 405 is stored in memory 703, and if pooling processing is performed, the result of pooling processing 406 is stored in memory 703. The data stored here becomes the feature map of the current layer.

現階層の特徴マップの算出が終了すると、算出された特徴マップを前階層の特徴マップとして、次の階層の特徴マップの算出が同様に処理される。この様にメモリ７０３に格納する特徴マップを順次参照しながら、複数の階層の特徴マップを算出する。図４には図示しない制御部１０２が図４中の各構成要素の動作を制御することにより、階層的な特徴抽出処理（ＣＮＮ演算処理）が実現される。 When the calculation of the feature map of the current layer is completed, calculation of the feature map of the next layer is similarly processed using the calculated feature map as the feature map of the previous layer. In this way, feature maps of a plurality of layers are calculated while sequentially referring to the feature maps stored in the memory 703. The control unit 102 (not shown in FIG. 4) controls the operation of each component in FIG. 4, thereby realizing hierarchical feature extraction processing (CNN calculation processing).

ＣＮＮは、この様に複数の階層に渡る特徴抽出を繰り返す事で識別対象の変動にロバストな認識処理を実現する。ＣＮＮは、各階層の特徴抽出結果に従って、最終層の演算３０７で所望のパターンの存在を判定する。最終層の特徴マップ３１２が認識結果３２０を表現しており、例えば画像内の対象物の存在確率を２次元の情報として表現する信頼度マップである。なお、最終層の演算３０７は、前述した畳み込み演算ではなく、全結合型のニューラルネットワークや線形判別器で構成する事もある。 CNN achieves recognition processing that is robust to changes in the identification target by repeating feature extraction across multiple layers in this way. The CNN determines the existence of a desired pattern in calculation 307 of the final layer according to the feature extraction results of each layer. The feature map 312 in the final layer expresses the recognition result 320, and is, for example, a confidence map that expresses the existence probability of an object in an image as two-dimensional information. Note that the calculation 307 in the final layer may be configured by a fully connected neural network or a linear discriminator instead of the above-mentioned convolution calculation.

また、各階層の特徴マップ３０８～３１１は入力データに対する特徴抽出結果を表現し、一般的には下位階層（処理対象データを入力する層に近い階層）はエッジ等のローレベルの特徴を示し、上位階層（認識結果に近い階層）は抽象度の高い特徴を示す。各特徴マップはパターン認識の対象や学習方法によって特性が異なる。 In addition, the feature maps 308 to 311 of each layer express the feature extraction results for input data, and generally, lower layers (layers close to the layer that inputs the data to be processed) indicate low-level features such as edges, The upper layer (layer close to the recognition result) shows features with a high degree of abstraction. Each feature map has different characteristics depending on the target of pattern recognition and the learning method.

次に、センサー制御ネットワーク３１３について説明する。センサー制御ネットワーク３１３はセンサー制御のためのデータを回帰する演算ネットワークである。つまりＣＮＮを利用してセンサーのデータ取得条件を決定する。演算処理３１４、３１５は、演算処理３０３～３０７と同様の演算処理であり、図４に示す回路で処理する。 Next, the sensor control network 313 will be explained. The sensor control network 313 is a calculation network that returns data for sensor control. In other words, the sensor data acquisition conditions are determined using CNN. Arithmetic processes 314 and 315 are similar to arithmetic processes 303 to 307, and are processed by the circuit shown in FIG.

センサー制御ネットワーク３１３では、認識ネットワーク３０２の下位階層の特徴マップ３０８を利用して制御信号を回帰する。つまり、認識ネットワークの演算過程で得られる特徴量を利用する。特徴マップを認識ネットワークと共有する事で回帰性能の向上・学習の容易化を期待すると共に、全体の演算コストを削減する事ができる。また、ここでは認識ネットワークと類似のネットワーク演算処理（ＣＮＮ）でセンサー制御ネットワーク３１３が構成されているため、特徴抽出処理部７０１を利用して制御データを生成することができる。即ち、専用の回路等は不要であることが利点となる。 The sensor control network 313 regresses the control signal using the feature map 308 in the lower layer of the recognition network 302. In other words, the feature values obtained during the calculation process of the recognition network are used. By sharing the feature map with the recognition network, it is expected to improve regression performance and make learning easier, as well as reduce the overall calculation cost. Further, here, since the sensor control network 313 is configured by network calculation processing (CNN) similar to the recognition network, control data can be generated using the feature extraction processing unit 701. That is, the advantage is that a dedicated circuit or the like is not required.

特徴マップ３１６～３１７は、センサー制御ネットワーク３１３における特徴マップであり、最終層の演算３１５で撮影条件の制御データを回帰する特徴マップ３１７（以下、回帰マップ３１７と表記する）を生成する。回帰マップ３１７は、撮像素子の空間位置に対応する撮影条件を指定する制御データであり、例えばマップの位置に対応する撮像素子のゲインや露光時間の指定に対応するデータとなる。回帰マップ３１７は、制御対象が１種類かつスカラー値で制御する場合は１枚で良い。制御条件が複数ある場合や制御パラメータがベクトルデータの場合は複数枚の回帰マップが存在することになる。 Feature maps 316 to 317 are feature maps in the sensor control network 313, and a feature map 317 (hereinafter referred to as regression map 317) that regresses the control data of the imaging conditions is generated in the final layer calculation 315. The regression map 317 is control data that specifies imaging conditions corresponding to the spatial position of the image sensor, and is data that corresponds to, for example, designation of the gain and exposure time of the image sensor corresponding to the position of the map. Only one regression map 317 is required when there is only one type of control target and the control is performed using scalar values. If there are multiple control conditions or if the control parameters are vector data, multiple regression maps will exist.

図８は、回帰マップを説明する図である。具体的には、図６（ｂ）で示す制御ロジックに対応する回帰マップ３１７の例を模式的に示している。ｒｇ（ｎ，ｎ）は回帰マップの画素データに対応する。即ち、ここでの回帰マップのサイズはｎ×ｎである。回帰データの値は濃淡で表現されており、例えば、光電変換後のデータのゲイン等に対応する。 FIG. 8 is a diagram illustrating a regression map. Specifically, an example of a regression map 317 corresponding to the control logic shown in FIG. 6(b) is schematically shown. rg(n, n) corresponds to pixel data of the regression map. That is, the size of the regression map here is n×n. The value of the regression data is expressed in shading, and corresponds to, for example, the gain of data after photoelectric conversion.

撮像制御処理部７０５は、センサー制御ネットワーク３１３で回帰された制御信号データに従って光電変換素子を制御し、認識処理に好適な画像データを取得する。ここで得られる画像データは人が観測し内容を理解・鑑賞するための画像データとは異なり、認識処理の精度向上に好適な画像データとなる。 The imaging control processing unit 705 controls the photoelectric conversion element according to the control signal data returned by the sensor control network 313, and acquires image data suitable for recognition processing. The image data obtained here is different from image data for humans to observe, understand and appreciate the content, and is image data suitable for improving the accuracy of recognition processing.

なお、センサー制御ネットワーク３１３では、演算処理３１４にプーリング処理を有し特徴マップのサイズを縮小する。従って回帰マップのサイズはセンサー出力の画像サイズに対して小さい。即ち、複数の画素を単位とするブロック毎に読み出し条件を制御する事になる。プーリングの割合などは撮像制御処理部７０５で制御可能なブロックサイズを考慮して決定する。 Note that in the sensor control network 313, the calculation process 314 includes a pooling process to reduce the size of the feature map. Therefore, the size of the regression map is smaller than the image size of the sensor output. That is, the readout conditions are controlled for each block, which has a plurality of pixels as a unit. The pooling ratio and the like are determined in consideration of the block size that can be controlled by the imaging control processing unit 705.

図５は、パターン認識装置２０１におけるパターン認識処理の動作タイミングを説明するタイミングチャートである。横軸は時間経過を表しており、認識ネットワーク（認識ネットワーク３０２）、制御ネットワーク（センサー制御ネットワーク３１３）、条件設定の各処理が実行されるタイミングを例示的に示している。ここでは、時間的に連続する３フレーム分（第１～第３フレーム）の画像データに対して連続的に認識処理を実行する状態を示している。 FIG. 5 is a timing chart illustrating the operation timing of pattern recognition processing in the pattern recognition device 201. The horizontal axis represents the passage of time and exemplarily shows the timing at which each process of the recognition network (recognition network 302), control network (sensor control network 313), and condition setting is executed. Here, a state in which recognition processing is continuously performed on three temporally consecutive frames (first to third frames) of image data is shown.

タイミング５０１では、第１フレームの撮影及び第１フレームに対する認識ネットワークの処理が実行され、並行して、タイミング５０４では、後続の第２フレームに対する制御ネットワークの処理が実行される。タイミング５０７では、第２フレームに対する撮像デバイスの動作条件（ゲインや露光時間等）の設定処理が実行される。 At timing 501, imaging of the first frame and recognition network processing for the first frame are executed, and in parallel, at timing 504, control network processing for the subsequent second frame is executed. At timing 507, processing for setting the operating conditions (gain, exposure time, etc.) of the imaging device for the second frame is executed.

タイミング５０２では、第２フレームの撮影及び第２フレームに対する認識ネットワークの処理が実行され、並行して、タイミング５０５では、後続の第３フレームに対する制御ネットワークの処理が実行される。タイミング５０８では、第３フレームに対する撮像デバイスの動作条件（ゲインや露光時間等）の設定処理が実行される。 At timing 502, imaging of the second frame and recognition network processing for the second frame are executed, and in parallel, at timing 505, control network processing for the subsequent third frame is executed. At timing 508, processing for setting the operating conditions (gain, exposure time, etc.) of the imaging device for the third frame is executed.

タイミング５０３では、第３フレームの撮影及び第３フレームに対する認識ネットワークの処理が実行され、並行して、タイミング５０６では、後続の第４フレーム（不図示）に対する制御ネットワークの処理が実行される。タイミング５０９では、第４フレーム（不図示）に対する撮像デバイスの動作条件（ゲインや露光時間等）の設定処理が実行される。 At timing 503, imaging of the third frame and recognition network processing for the third frame are executed, and in parallel, at timing 506, control network processing for the subsequent fourth frame (not shown) is executed. At timing 509, processing for setting operating conditions (gain, exposure time, etc.) of the imaging device for the fourth frame (not shown) is executed.

この様に制御ネットワークは撮影対象の状況変化に応じて順次認識に好適な撮影条件を設定し、認識ネットワークはそれに応じた撮影を行いパターン認識処理を実行する。 In this way, the control network sequentially sets imaging conditions suitable for recognition in response to changes in the situation of the object to be photographed, and the recognition network performs imaging in accordance with the imaging conditions and executes pattern recognition processing.

＜学習装置の動作＞
次に、学習装置における認識ネットワーク及び制御ネットワークの学習処理について説明する。上述したように、学習装置による学習結果（重み係数）はパターン認識装置２０１に用いられることになる。 <Operation of learning device>
Next, learning processing of the recognition network and control network in the learning device will be explained. As described above, the learning results (weighting coefficients) by the learning device are used by the pattern recognition device 201.

図９は、認識ネットワーク及び制御ネットワークの学習処理の動作を説明する図である。ここでの学習とは、認識ネットワーク３０２及びセンサー制御ネットワーク３１３それぞれのニューラルネットワークの重み係数を、パターン認識処理がより好適な（あるいは最良の）性能となる様に決定する処理を意味する。なお、図９では図３の演算処理３０３～３０７及び３１４～３１５は省略して記載している。 FIG. 9 is a diagram illustrating the operation of the learning process of the recognition network and the control network. Learning here means a process of determining the weighting coefficients of the respective neural networks of the recognition network 302 and the sensor control network 313 so that the pattern recognition process has more suitable (or best) performance. Note that in FIG. 9, the calculation processes 303 to 307 and 314 to 315 in FIG. 3 are omitted.

センサーモデル９０１は、各ネットワークの学習に使用するデータの生成に必要なセンサーモデルである。センサーモデル９０１は、例えば、別の撮像装置で撮影された画像データ３０１（画像形式の２次元データ）から、制御条件とセンサーの特性に応じてセンサーの出力データを模擬する疑似センサーデータ９０２（疑似データ）を生成するために用いられる。つまり、疑似センサーデータ９０２は、撮影対象が画像データ３０１であると仮定した場合のセンサーの出力を模擬した画像データである。 A sensor model 901 is a sensor model required to generate data used for learning each network. For example, the sensor model 901 generates pseudo sensor data 902 (pseudo data). In other words, the pseudo sensor data 902 is image data that simulates the output of a sensor when it is assumed that the image data 301 is the object to be photographed.

センサーモデル９０１は、センサーの物理的な特性に応じて理論的に作成することが出来る。ただし、実際のセンサーで得られたデータを利用してＧＡＮ（Generative Adversarial Network）等の学習的な手法により作成してもよい。更に、センサーモデル９０１はセンサーの読み出し条件を制御する制御信号に対する出力の変動を模擬する機能を有する。例えば制御信号として高いゲイン値が設定された場合、その出力も高い値を出力する。 The sensor model 901 can be created theoretically according to the physical characteristics of the sensor. However, it may be created using a learning method such as GAN (Generative Adversarial Network) using data obtained by actual sensors. Further, the sensor model 901 has a function of simulating output fluctuations in response to a control signal that controls sensor readout conditions. For example, if a high gain value is set as the control signal, the output will also be a high value.

即ち、センサーモデル９０１とは、別の撮像装置で撮影された画像データ３０１と疑似センサーデータ９０２との関係、及び、制御信号と疑似センサーデータとの関係、の両者を規定するモデルである。センサーモデルは認識ネットワーク及びセンサー制御ネットワークの学習前に予め作成されているものとする。 That is, the sensor model 901 is a model that defines both the relationship between the image data 301 captured by another imaging device and the pseudo sensor data 902, and the relationship between the control signal and the pseudo sensor data. It is assumed that the sensor model is created in advance before learning the recognition network and sensor control network.

メモリ９０３は、センサー制御ネットワーク３１３の学習に必要な認識ネットワークの中間演算結果を保持するメモリである。教師データ９０５は、画像データ３０１とペアで作成した教師データであり予め用意される。ここでの教師データとは認識結果として期待する認識結果３２０のデータ分布である。例えば画像中の顔を検出する場合、顔の中心をピークとする正規分形式のマップデータであるとする。処理９０４は、認識結果３２０と教師データ９０５の差分を演算するための処理である。 The memory 903 is a memory that holds intermediate calculation results of the recognition network necessary for learning the sensor control network 313. The teacher data 905 is teacher data created in a pair with the image data 301, and is prepared in advance. The teacher data here is the data distribution of the recognition result 320 expected as the recognition result. For example, when detecting a face in an image, it is assumed that map data is in a normal division format with a peak at the center of the face. Processing 904 is processing for calculating the difference between recognition result 320 and teacher data 905.

図９（ａ）は、認識ネットワークを学習する場合の主要な処理を説明する図である。画像データ３０１をセンサーモデル９０１で変換し、疑似センサーデータ９０２を得る。生成した疑似センサーデータ９０２を用いてパターン認識処理を実行し認識結果３２０を取得する。認識結果３２０と教師データ９０５の差分を誤差データとしてバックプロパゲーション法により認識ネットワークを学習する。即ち、特徴マップ３０８～３１２を生成するための畳込み演算の重み係数を順次更新する。 FIG. 9(a) is a diagram illustrating main processing when learning a recognition network. Image data 301 is converted using a sensor model 901 to obtain pseudo sensor data 902. A pattern recognition process is executed using the generated pseudo sensor data 902 and a recognition result 320 is obtained. A recognition network is trained by a backpropagation method using the difference between the recognition result 320 and the teacher data 905 as error data. That is, the weighting coefficients of the convolution operations for generating the feature maps 308 to 312 are sequentially updated.

図９（ｂ）はセンサー制御ネットワーク３１３を学習する場合の主要な処理を説明する図である。センサー制御ネットワーク３１３によりセンサーの制御信号生成処理を実現する。センサー制御ネットワーク３１３の学習時は、認識結果３２０と教師データ９０５の差分情報である誤差情報を逆伝搬させる。この際、認識ネットワーク３０２においては学習を行わない（すなわち係数は固定される）。さらにセンサーモデルの逆関数を介してセンサー制御ネットワーク３１３を学習させるための誤差情報を取得する。センサーモデル９０１はセンサー制御の正解値を算出するために、疑似センサーデータとして生成したデータと、逆関数を実現するためのテーブル情報と、を保持する。 FIG. 9(b) is a diagram illustrating the main processing when learning the sensor control network 313. The sensor control network 313 realizes sensor control signal generation processing. During learning of the sensor control network 313, error information, which is difference information between the recognition result 320 and the teacher data 905, is back-propagated. At this time, no learning is performed in the recognition network 302 (that is, the coefficients are fixed). Furthermore, error information for learning the sensor control network 313 is obtained via the inverse function of the sensor model. The sensor model 901 holds data generated as pseudo sensor data and table information for realizing an inverse function in order to calculate a correct value for sensor control.

得られた誤差情報と認識処理で得られた特徴マップ３０８のデータ（メモリ９０３に格納）を用いてバックプロパゲーション法によりセンサー制御ネットワークを学習する。即ち、特徴マップ３１６～３１７を生成する畳込み演算の係数を順次更新する。バックプロパゲーション法は従来提案されている手法を利用して処理する。当該処理は、認識ネットワークを固定し、認識結果の誤差をセンサーモデルの逆関数を介してセンサー制御ネットワーク３１３に与える点が特徴である。センサーモデルの逆関数はセンサーの特性に応じた逆関数を事前に決定しておく。 The sensor control network is learned by the backpropagation method using the obtained error information and the data of the feature map 308 (stored in the memory 903) obtained through the recognition process. That is, the coefficients of the convolution operation that generates the feature maps 316 to 317 are sequentially updated. The backpropagation method is processed using a conventionally proposed method. This process is characterized in that the recognition network is fixed and the error in the recognition result is given to the sensor control network 313 via an inverse function of the sensor model. The inverse function of the sensor model is determined in advance according to the characteristics of the sensor.

図１０は、第１実施形態における学習処理における動作の具体例を示す図である。図１０（ａ）及び（ｂ）は、認識ネットワークを学習する際のパターン認識処理及び学習処理の動作パターンを示している。図１０（ｃ）及び（ｄ）は、センサー制御ネットワークを学習する際のパターン認識処理及び学習処理の動作パターンを示している。図１０では、認識ネットワーク３０２のニューラルネットのノード数が２個（ノード１００３及び１００４）、センサー制御ネットワーク３１３のニューラルネットのノード数が１個（ノード１００５）の場合の例を示している。 FIG. 10 is a diagram showing a specific example of the operation in the learning process in the first embodiment. FIGS. 10A and 10B show operation patterns of pattern recognition processing and learning processing when learning a recognition network. FIGS. 10C and 10D show operation patterns of pattern recognition processing and learning processing when learning the sensor control network. FIG. 10 shows an example in which the number of nodes in the neural network of the recognition network 302 is two (nodes 1003 and 1004), and the number of nodes in the neural network of the sensor control network 313 is one (node 1005).

また、ここでは説明を簡単にするために、ＣＮＮではなくＭＬＰ（Multi Layer Perceptron）構成のニューラルネットワークの例で説明する。ＣＮＮとしてとらえた場合は畳込み演算のカーネルサイズが１ｘ１の場合に相当する。また、ここでは、センサーモデルをセンサー撮像モデル１００１とセンサー制御モデル１００２とに分けて示している。図１０において、学習データセットの画像データ３０１に対する認識結果が認識結果３２０であり、教師データ９０５及び演算処理９０４は図９と同様のものである。 Furthermore, in order to simplify the explanation, an example of a neural network having an MLP (Multi Layer Perceptron) configuration will be used instead of a CNN. When viewed as a CNN, this corresponds to a case where the kernel size of the convolution operation is 1x1. Further, here, the sensor model is shown divided into a sensor imaging model 1001 and a sensor control model 1002. In FIG. 10, the recognition result 320 is the recognition result for the image data 301 of the learning data set, and the teacher data 905 and calculation processing 904 are the same as those in FIG.

図１は、第１実施形態における学習処理の動作フローチャートである。Ｓ１０１では、学習装置は、初期化処理を実行する。具体的には、センサーモデル（センサー撮像モデル１００１とセンサー制御モデル１００２）の初期化など、各種初期化処理を実行する。 FIG. 1 is an operational flowchart of learning processing in the first embodiment. In S101, the learning device executes initialization processing. Specifically, various initialization processes such as initialization of sensor models (sensor imaging model 1001 and sensor control model 1002) are executed.

Ｓ１０２では、学習装置は、学習処理に使用する学習データを選択する。例えば、記憶装置２１３に格納する学習用データセットの中から画像データ３０１及び学習のための教師データ９０５を選択して演算装置２１０の不図示のメモリに読み込む。 In S102, the learning device selects learning data to be used for learning processing. For example, the image data 301 and the teacher data 905 for learning are selected from the learning data set stored in the storage device 213 and read into a memory (not shown) of the arithmetic device 210 .

Ｓ１０３では、学習装置は、センサーモデルの制御条件に従って画像データを変換する。ここでは、センサー制御モデル１００２に設定された条件（例えば、感度・ゲイン・露光時間等のセンサー制御条件）に応じて画像データ３０１を変換し疑似的なセンサーデータである疑似センサーデータを生成する。 In S103, the learning device converts the image data according to the control conditions of the sensor model. Here, the image data 301 is converted according to conditions set in the sensor control model 1002 (for example, sensor control conditions such as sensitivity, gain, exposure time, etc.) to generate pseudo sensor data that is pseudo sensor data.

Ｓ１０４では、学習装置は、Ｓ１０３で生成した疑似センサーデータに対して所定のパターン認識処理を実行する（動作パターン１００６）。ここでのパターン認識処理は、例えば、画像中の顔を検出する等のパターン認識処理である。センサー撮像モデル１００１で変換された疑似センサーデータ９０２に対して、ニューラルネットの演算ノードｎ_１、ｎ_２で認識処理を実行し、認識結果３２０を得る。画像データ３０１の２次元データに対してラスター順に認識処理を実行する事で、認識結果３２０も２次元のマップとなる。 In S104, the learning device performs a predetermined pattern recognition process on the pseudo sensor data generated in S103 (operation pattern 1006). The pattern recognition processing here is, for example, pattern recognition processing such as detecting a face in an image. Recognition processing is performed on the pseudo sensor data 902 converted by the sensor imaging model 1001 at the calculation nodes n ₁ and n ₂ of the neural network, and a recognition result 320 is obtained. By performing recognition processing on the two-dimensional data of the image data 301 in raster order, the recognition result 320 also becomes a two-dimensional map.

Ｓ１０５では、学習装置は、Ｓ１０４で得られた認識結果３２０に基づいて、パターン認識処理の学習を実行する（動作パターン１００７）。ここでは、Ｓ１０２で選択した教師データを利用してバックプロパゲーション法により認識ネットワークの重み係数を学習する。認識結果３２０と教師データ９０５の差分値を誤差として、ノードｎ_２（ノード１００４）に対する係数Ｗ_ｎ２、及び、ノードｎ_１（ノード１００３）に対する係数Ｗ_ｎ１を順次更新する。なお、Ｓ１０５の演算時にはノードｒ（ノード１００５）に対する係数Ｗ_ｒは更新しない。 In S105, the learning device performs learning of pattern recognition processing based on the recognition result 320 obtained in S104 (operation pattern 1007). Here, the weighting coefficients of the recognition network are learned by the backpropagation method using the training data selected in S102. The coefficient W _n2 for the node n ₂ (node 1004) and the coefficient W _n1 for the node n ₁ (node 1003) are sequentially updated using the difference value between the recognition result 320 and the teacher data 905 as an error. Note that the coefficient _Wr for node r (node 1005) is not updated during the calculation in S105.

以下、バックプロパゲーション法による学習の具体的な例について説明する。バックプロパゲーション法では、認識結果３２０と教師データ９０５のそれぞれの画像位置に対する誤差が最小となる様に係数Ｗ_１、Ｗ_２を調整する。 A specific example of learning using the backpropagation method will be described below. In the backpropagation method, the coefficients W ₁ and W ₂ are adjusted so that the error between the recognition result 320 and the teacher data 905 in their respective image positions is minimized.

認識結果３２０に含まれるある画素位置に対応する出力値をｙ、その位置に対応する教師データ値をｙ_ｔとし、教師データと出力値の誤差Ｅを以下の数式（２）のように定義する。なお、ここでは簡単のため座標データの表記は省略する。 Let y be the output value corresponding to a certain pixel position included in the recognition result 320, let _yt be the teacher data value corresponding to that position, and define the error E between the teacher data and the output value as shown in Equation (2) below. . Note that the notation of coordinate data is omitted here for the sake of simplicity.

ノードｎ_１の出力をｎ_１、αを学習係数とすると、以下の数式（３）で係数Ｗ_２をＷ’_２に更新する。 When the output of the node n ₁ is n ₁ and α is the learning coefficient, the coefficient W ₂ is updated to W' ₂ using the following equation (3).

ノードｎ_１及びｎ_２の非線形関数がＲｅＬＵ関数ｆ_ＲｅＬＵ（）であるとすると、ｙ＝ｆ_ＲｅＬＵ（Ｗ_２×ｎ_１）となる。そして、数式（３）は、Ｗ_２×ｎ_１＞０の場合、ｆＲｅＬＵ関数の微分＝１であることから以下の数式（４）となる。 If the nonlinear function of nodes n ₁ and n ₂ is a ReLU function f _ReLU (), then y=f _ReLU (W ₂ ×n ₁ ). In addition, when W ₂ ×n ₁ >0, the differential of the fReLU function is 1, so the formula (3) becomes the following formula (4).

次に、対応する画素位置のセンサーモデルの出力をｓ、学習係数をαとすると、以下の数式（５）でＷ_１をＷ’_１に更新する。 Next, when the output of the sensor model at the corresponding pixel position is s and the learning coefficient is α, W ₁ is updated to W′ ₁ using the following equation (5).

ここで、ｎ_１＝ｆ_ＲｅＬＵ（Ｗ_１×ｓ）である。そのため、数式（５）はＷ_１×ｓ＞０の場合、以下の数式（６）となり、更新後のＷ’_１を算出する事ができる。 Here, n ₁ =f _ReLU (W ₁ ×s). Therefore, when W ₁ ×s>0, Equation (5) becomes Equation (6) below, and W′ ₁ after updating can be calculated.

Ｓ１０６では、学習装置は、Ｓ１０５で学習（更新）した重み係数の認識ネットワークに対して再びパターン認識処理を実行し、認識結果３２０を出力する。合わせて、メモリ９０３にノードｎ_１（ノード１００３）の演算結果ｎ_１を格納する（動作パターン１００８）。 In S106, the learning device performs the pattern recognition process again on the recognition network of the weighting coefficients learned (updated) in S105, and outputs the recognition result 320. At the same time, the calculation result n ₁ of the node n ₁ (node 1003) is stored in the memory 903 (operation pattern 1008).

Ｓ１０７では、学習装置は、Ｓ１０６のパターン認識処理の結果に基づいてセンサー制御ネットワーク３１３の学習を実行する。ここでは、Ｓ１０２で選択した教師データ９０５と認識結果３２０の誤差Ｅ’を利用してセンサー制御ネットワーク３１３を学習する。ここでの誤差Ｅ’は、Ｓ１０６で算出した認識ネットワーク更新後のパターン認識処理結果に対する誤差である。 In S107, the learning device executes learning of the sensor control network 313 based on the result of the pattern recognition process in S106. Here, the sensor control network 313 is learned using the teacher data 905 selected in S102 and the error E' of the recognition result 320. The error E' here is an error with respect to the pattern recognition processing result after the recognition network update calculated in S106.

まず、認識ネットワーク３０２の係数を固定して誤差Ｅ’を逆伝搬する。認識ネットワーク３０２を逆伝搬して算出した誤差Ｅｓ（＝Ｅ’×Ｗ’_２×Ｗ’_１）とセンサーモデルに記憶しているセンサー制御値ｒから、センサー制御の正解値ｒｔを推定する。ここでは、センサーモデル９０４の逆関数ｆ_ｒｅｖ（逆伝搬誤差，制御値）に従って推定する（数式（７））。 First, the coefficients of the recognition network 302 are fixed and the error E' is back-propagated. The correct sensor control value rt is estimated from the error Es (=E'×W' ₂ ×W' ₁ ) calculated by back-propagating the recognition network 302 and the sensor control value r stored in the sensor model. Here, estimation is performed according to the inverse function f _rev (back propagation error, control value) of the sensor model 904 (Equation (7)).

ｆ_ｒｅｖ関数は、センサーモデル（センサー撮像モデル１００１とセンサー制御モデル１００２）の逆関数である。センサーデータの誤差値Ｅｓと現在の制御値ｒから制御パラメータの正解値ｒ_ｔを逆算する。 The f _rev function is an inverse function of the sensor model (sensor imaging model 1001 and sensor control model 1002). The correct value r _t of the control parameter is calculated backward from the error value Es of the sensor data and the current control value r.

図１１は、センサーのゲイン制御値と出力信号との関係を示す図である。より詳細には、センサーモデルの逆関数をゲイン制御を例として模式的に示した図である。直線１１０１は、逆関数を実現するための関数を示しており、センサーのゲイン制御値と出力信号との関係を表現する関数である。なお、線形関数として示しているが、実際は論理的な解析や実験に基づいて定まる任意の関数であり、近似関数やテーブル情報として保持される。直線１１０１は、センサー撮像モデル１００１の逆関数とセンサー制御モデル１００２の逆関数とが合成されたものに相当する。 FIG. 11 is a diagram showing the relationship between the sensor gain control value and the output signal. More specifically, it is a diagram schematically showing an inverse function of a sensor model using gain control as an example. A straight line 1101 indicates a function for realizing an inverse function, and is a function expressing the relationship between the sensor gain control value and the output signal. Although shown as a linear function, it is actually an arbitrary function determined based on logical analysis or experimentation, and is retained as an approximate function or table information. The straight line 1101 corresponds to a combination of the inverse function of the sensor imaging model 1001 and the inverse function of the sensor control model 1002.

ポイント１１０２は、ゲイン制御の制御値ｒとその際の出力信号の関係を示すポイントである。ポイント１１０３は、センサー出力誤差Ｅｓに従ってゲイン制御の正解値を求めるポイントを示す。モデル内のメモリに記憶する疑似センサーデータ生成時の出力信号ｒ（ポイント１１０２）と認識ネットワークから逆伝搬するセンサー出力誤差信号Ｅｓとを用いて、ゲイン制御の正解値ｒ_ｔ（ポイント１１０３）を求める。 Point 1102 is a point indicating the relationship between the control value r of gain control and the output signal at that time. A point 1103 indicates a point at which a correct value for gain control is determined according to the sensor output error Es. The correct value r _t (point 1103) for gain control is calculated using the output signal r (point 1102) during pseudo sensor data generation stored in the memory in the model and the sensor output error signal Es back-propagated from the recognition network. .

得られたゲイン制御の正解値ｒ_ｔを用いて、バックプロパゲーション法により、ノードｒ（ノード１００５）の重み係数Ｗ_ｒを更新する（動作パターン１００９）。より詳細には、メモリ９０３に格納されたノードｎ_１（ノード１００３）の出力データに基づいて更新する。 Using the obtained gain control correct value r _t , the weighting coefficient W _r of the node r (node 1005) is updated by the back propagation method (operation pattern 1009). More specifically, it is updated based on the output data of node n ₁ (node 1003) stored in memory 903.

センサー制御ネットワーク３１３の重み係数Ｗ_ｒは、学習係数をβとすると以下の数式（８）で更新される。 The weighting coefficient W _r of the sensor control network 313 is updated using the following formula (8), where β is the learning coefficient.

Ｅ_ｒをセンサー制御ネットワーク３１３を学習するための誤差データ、ｒをセンサー制御ネットワーク３１３のノードｒの出力値、ｒ_ｔをセンサー制御値の正解値とすると、数式（９）を満たす。 If E _r is the error data for learning the sensor control network 313, r is the output value of the node r of the sensor control network 313, and r _t is the correct value of the sensor control value, Equation (9) is satisfied.

そのため、数式（８）は以下の数式（１０）に変形する事ができる。 Therefore, Equation (8) can be transformed into Equation (10) below.

これにより数式（７）及び数式（１０）から、以下の数式（１１）によりセンサー制御ネットワークの係数を更新する事ができる。 As a result, the coefficients of the sensor control network can be updated using the following equation (11) from equations (7) and (10).

以上の処理を画像データ３０１に含まれる全て或いは選択された複数の位置に対して実行する。即ち、適切にセンサーを制御する回帰情報に相当するマップ３１７が生成される様に係数Ｗ’_ｒを学習する。 The above processing is executed for all or a plurality of selected positions included in the image data 301. That is, the coefficient W' _r is learned so that a map 317 corresponding to regression information for appropriately controlling the sensor is generated.

Ｓ１０８では、学習装置は、更新した重み係数Ｗ’_ｒのセンサー制御ネットワークを利用してセンサー制御パラメータを回帰する。 In S108, the learning device regresses the sensor control parameters using the sensor control network with the updated weighting coefficient _W'r .

Ｓ１０９では、学習装置は、回帰したパラメータをセンサーモデルの制御パラメータとして設定する。すなわち、次の画像データ（次のループ）に対するＳ１０３では、ここで設定した制御パラメータを用いて、画像データ３０１を疑似センサーデータ９０２に変換する。センサーの制御単位は、制御パラメータに相当する回帰データマップ３１８のサイズに応じて求まる部分領域単位となる。 In S109, the learning device sets the regressed parameters as control parameters of the sensor model. That is, in S103 for the next image data (next loop), the image data 301 is converted into pseudo sensor data 902 using the control parameters set here. The sensor control unit is a partial region unit determined according to the size of the regression data map 318 corresponding to the control parameter.

Ｓ１１０では、学習装置は、所定の終了条件を満たしているか否かを判定する。満たしている場合はＳ１１１に進み、満たしていない場合はＳ１０２に戻る。所定の終了条件は、例えば、予め指定された複数の画像データに対する学習処理の完了である。 In S110, the learning device determines whether a predetermined termination condition is satisfied. If the conditions are satisfied, the process proceeds to S111; if the conditions are not satisfied, the process returns to S102. The predetermined termination condition is, for example, the completion of learning processing for a plurality of image data specified in advance.

Ｓ１１１では、学習装置は、学習結果を取り出す。ここでの学習結果は認識ネットワーク３０２とセンサー制御ネットワーク３１３の重み係数となる。すなわち、取得した重み係数は、パターン認識装置２０１のＲＡＭ２０４に格納される。これにより、パターン認識装置２０１は、より適切にパターン認識処理を実行することが可能となる。 In S111, the learning device retrieves the learning results. The learning results here become weighting coefficients for the recognition network 302 and sensor control network 313. That is, the obtained weighting coefficients are stored in the RAM 204 of the pattern recognition device 201. This allows the pattern recognition device 201 to more appropriately perform pattern recognition processing.

以上説明したとおり第１実施形態によれば、画像データ及びセンサー制御モデル１００２を含むセンサーモデル９０１を利用してセンサー制御ネットワーク３１３の学習を行う。これにより、パターン認識装置２０１において、処理対象データの多様な変動に対してよりロバストなパターン認識が可能となる。 As described above, according to the first embodiment, the sensor control network 313 is trained using the sensor model 901 including image data and the sensor control model 1002. This allows the pattern recognition device 201 to perform pattern recognition that is more robust against various fluctuations in the data to be processed.

また、センサー制御ネットワーク３１３では、認識ネットワーク３０２の下位階層の特徴マップ３０８を利用して制御信号を回帰する。つまり、センサー制御ネットワーク３１３は、認識ネットワーク３０２の演算過程で得られる特徴量を認識ネットワーク３０２と共有する。これにより、パターン認識装置２０１において、回帰性能の向上・学習の容易化が期待されると共に、全体の演算コストを削減することができる。 Furthermore, the sensor control network 313 regresses the control signal using the feature map 308 in the lower layer of the recognition network 302. That is, the sensor control network 313 shares with the recognition network 302 the feature amount obtained in the calculation process of the recognition network 302. This is expected to improve regression performance and facilitate learning in the pattern recognition device 201, as well as reduce the overall calculation cost.

（第２実施形態）
第２実施形態では、センサー制御ネットワーク３１３の学習時に、認識ネットワーク３０２の一部を併せて学習する形態について説明する。すなわち、第１実施形態ではセンサー制御ネットワーク３１３の学習時に、認識ネットワーク３０２の学習は行わない場合について説明したが、学習方法はこれに限定されない。 (Second embodiment)
In the second embodiment, a mode will be described in which a part of the recognition network 302 is also learned when the sensor control network 313 is learned. That is, in the first embodiment, a case has been described in which the recognition network 302 is not trained when the sensor control network 313 is trained, but the learning method is not limited to this.

＜学習装置の動作＞
図１２は、第２実施形態における学習処理の具体例を示す図である。より具体的には、制御ネットワークの学習における動作パターンを示しており、第１実施形態の動作パターン１００９に対応する。その他の処理については第１実施形態（図１、図１０）と同様であるため説明は省略する。 <Operation of learning device>
FIG. 12 is a diagram showing a specific example of learning processing in the second embodiment. More specifically, it shows an operation pattern in learning of the control network, and corresponds to the operation pattern 1009 of the first embodiment. Other processes are the same as those in the first embodiment (FIGS. 1 and 10), so description thereof will be omitted.

第１実施形態と同様に、センサーモデルをセンサー撮像モデル１２０１とセンサー制御モデル１２０２とに分けて示している。また、ノード１２０３～１２０４は認識ネットワーク３０２のノードであり、ノード１２０５はセンサー制御ネットワーク３１３のノードである。 Similar to the first embodiment, the sensor model is shown divided into a sensor imaging model 1201 and a sensor control model 1202. Further, nodes 1203 to 1204 are nodes of the recognition network 302, and node 1205 is a node of the sensor control network 313.

上述したように、第２実施形態では、センサー制御ネットワーク３１３の学習時に認識ネットワーク３０２の係数Ｗ’_１をＷ”_１に更新する。より具体的には、第１実施形態と同様に式（１０）でＷ’_ｒの更新するとともに、以下の数式（１２）でＷ’_１を更新する。 As described above, in the second embodiment, the coefficient W' ₁ of the recognition network 302 is updated to W'' ₁ when the sensor control network 313 learns.More specifically, as in the first embodiment, equation (10 ) is used to update W' _r , and W' ₁ is updated using the following equation (12).

このような学習処理を行うことにより、ノードｎ_１（ノード１２０３）の出力する特徴量が、センサー制御ネットワーク３１３にとっても好適な特徴量となる。 By performing such a learning process, the feature quantity output by the node n ₁ (node 1203) becomes a feature quantity suitable for the sensor control network 313 as well.

以上説明したとおり第２実施形態によれば、第１実施形態に比較してよりロバストなパターン認識が可能となる。 As explained above, according to the second embodiment, more robust pattern recognition is possible compared to the first embodiment.

また、第１実施形態（図１）と同様に、認識ネットワーク３０２とセンサー制御ネットワーク３１３を交互に学習（共進化的な学習）することもできる。その場合、各ネットワークに対して好適な係数を学習する事ができる。そのため、センサー制御ネットワーク３１３の学習に伴う認識ネットワーク３０２の性能への影響を少なくすることが出来る。さらに、数式（１２）の学習係数βを小さな値にする事で認識ネットワーク３０２の性能への影響をより少なくすることも可能である。 Further, similarly to the first embodiment (FIG. 1), the recognition network 302 and the sensor control network 313 can be learned alternately (coevolutionary learning). In that case, suitable coefficients can be learned for each network. Therefore, the influence on the performance of the recognition network 302 due to learning of the sensor control network 313 can be reduced. Furthermore, by setting the learning coefficient β in Equation (12) to a small value, it is possible to further reduce the influence on the performance of the recognition network 302.

（第３実施形態）
第３実施形態では、認識ネットワーク３０２とセンサー制御ネットワーク３１３とをそれぞれ独立に学習する形態について説明する。すなわち、第１及び第２実施形態では認識ネットワーク３０２とセンサー制御ネットワーク３１３を交互に学習する場合について説明したが、学習方法はこれに限定されない。 (Third embodiment)
In the third embodiment, a mode will be described in which the recognition network 302 and the sensor control network 313 are learned independently. That is, in the first and second embodiments, the case where the recognition network 302 and the sensor control network 313 are learned alternately has been described, but the learning method is not limited to this.

＜学習装置の動作＞
図１３は、第３実施形態における学習処理の動作フローチャートである。なお、Ｓ１３０１～Ｓ１３０５、Ｓ１３１０～Ｓ１３１３は、図１のＳ１０１～１０５、Ｓ１０６～１０９と同様であるため説明は省略する。 <Operation of learning device>
FIG. 13 is an operational flowchart of learning processing in the third embodiment. Note that S1301 to S1305 and S1310 to S1313 are the same as S101 to S105 and S106 to S109 in FIG. 1, so a description thereof will be omitted.

Ｓ１３０６では、学習装置は、所定の終了条件を満たしているか否かを判定する。満たしている場合はＳ１３０７に進み、満たしていない場合はＳ１３０２に戻る。所定の終了条件は、例えば、予め指定された複数の画像データに対する学習処理の完了である。Ｓ１３０７では、学習装置は、学習結果を取り出す。ここでの学習結果は認識ネットワーク３０２の重み係数となる。 In S1306, the learning device determines whether a predetermined termination condition is satisfied. If the conditions are met, the process advances to S1307; if not, the process returns to S1302. The predetermined termination condition is, for example, the completion of learning processing for a plurality of image data specified in advance. In S1307, the learning device retrieves the learning results. The learning results here become weighting coefficients for the recognition network 302.

Ｓ１３０８では、Ｓ１３０２と同様に、学習装置は、学習処理に使用する学習データを選択する。Ｓ１３０９では、Ｓ１３０３と同様に、学習装置は、センサーモデルの制御条件に従って画像データを変換して疑似センサーデータを生成する。 In S1308, similarly to S1302, the learning device selects learning data to be used in the learning process. In S1309, similarly to S1303, the learning device converts the image data according to the control conditions of the sensor model to generate pseudo sensor data.

Ｓ１３１４では、学習装置は、所定の終了条件を満たしているか否かを判定する。満たしている場合はＳ１３１５に進み、満たしていない場合はＳ１３０８に戻る。所定の終了条件は、例えば、予め指定された複数の画像データに対する学習処理の完了である。Ｓ１３１５では、学習装置は、学習結果を取り出す。ここでの学習結果はセンサー制御ネットワーク３１３の重み係数となる。 In S1314, the learning device determines whether a predetermined termination condition is satisfied. If the conditions are met, the process advances to S1315; if not, the process returns to S1308. The predetermined termination condition is, for example, the completion of learning processing for a plurality of image data specified in advance. In S1315, the learning device retrieves the learning results. The learning result here becomes a weighting coefficient for the sensor control network 313.

以上説明したとおり第３実施形態によれば、認識ネットワーク３０２とセンサー制御ネットワーク３１３とを別々に学習する。この構成により、学習済みの認識ネットワーク３０２に影響を与えることなく、センサー制御ネットワーク３１３を学習することができる。 As explained above, according to the third embodiment, the recognition network 302 and the sensor control network 313 are learned separately. With this configuration, the sensor control network 313 can be learned without affecting the learned recognition network 302.

（変形例）
上述の実施形態では、認識ネットワークの例として画像中の特定のパターンを検出するタスクの場合について説明したが本発明はこれに限定されない。認識対象物の属性を認識するタスクや画像の内容を理解するタスク等、様々な認識タスクに適用する事ができる。更に、認識タスクだけではなく、画像の幾何学的変換、輝度／色の補正、ノイズ除去、フォーマット変換等の様々な画像処理タスクにも適用可能である。これにより、生成画質の向上が期待できる。 (Modified example)
In the embodiments described above, the task of detecting a specific pattern in an image has been described as an example of the recognition network, but the present invention is not limited thereto. It can be applied to various recognition tasks, such as recognizing the attributes of a recognition target or understanding the content of an image. Furthermore, it is applicable not only to recognition tasks but also to various image processing tasks such as geometric image transformation, brightness/color correction, noise removal, and format conversion. This can be expected to improve the quality of generated images.

上述の実施形態では２次元の画像センサーに対する例を説明したが、これに限るわけではない。例えば、データの次元やモダリティーが異なる様々なセンサーに適用することが可能である。また、音声データや電波センサーデータ様々なセンシングデバイスを利用したシステムに対して適用可能である。 Although the above-described embodiment describes an example of a two-dimensional image sensor, the present invention is not limited to this. For example, it can be applied to various sensors with different data dimensions and modalities. Furthermore, it is applicable to systems that use voice data, radio wave sensor data, and various sensing devices.

上述の実施形態ではセンサー制御ネットワーク３１３においてゲインを制御する場合について説明したが、本発明はこれに限定されない。例えば、露光時間・フレームレート・感度・解像度等他の様々な読み出しパラメータの制御に適用する事が可能である。 Although the above embodiment describes the case where the gain is controlled in the sensor control network 313, the present invention is not limited thereto. For example, it can be applied to control various other readout parameters such as exposure time, frame rate, sensitivity, resolution, etc.

上述の実施形態ではニューラルネットワークの結合係数（重み係数）を学習する場合について説明したが、ＮｅｕｒｏＥｖｏｌｕｔｉｏｎ手法の様にネットワークの構成を同時に学習する方法に適用しても良い。 In the above-described embodiment, a case has been described in which the coupling coefficients (weighting coefficients) of a neural network are learned, but the present invention may also be applied to a method of learning the configuration of a network at the same time, such as the NeuroEvolution method.

上述の実施形態では学習方法としてバックプロパゲーション法による場合について説明したが、本発明はこれに限定されない。例えば、遺伝的アルゴリズム等の他の様々なメタヒューリスティクス手法を適用することが可能である。この場合、誤差逆伝搬に必要なセンサーモデルの逆関数を設定することが困難な場合にも本発明を適用することができる。 In the above-described embodiment, a backpropagation method was used as the learning method, but the present invention is not limited thereto. For example, it is possible to apply various other metaheuristic techniques such as genetic algorithms. In this case, the present invention can be applied even when it is difficult to set the inverse function of the sensor model required for error backpropagation.

上述の実施形態では撮像制御処理部７０５がブロック単位で撮影条件を制御する場合について説明したが、これに限定されない。画素単位で制御してもよいし、画像全体を一括制御しても良い。画像全体を一括制御する場合は、センサー制御ネットワーク３１３の最終階層の特徴マップのデータを線形判別器に通して制御データを算出する構成としてもよい。あるいは、最終階層の特徴マップに対してグローバルプーリング処理を施した結果を制御データとする構成としてもよい。 In the above-described embodiment, a case has been described in which the imaging control processing unit 705 controls the imaging conditions on a block-by-block basis, but the present invention is not limited to this. Control may be performed pixel by pixel, or the entire image may be controlled at once. When controlling the entire image at once, the configuration may be such that the data of the feature map in the final layer of the sensor control network 313 is passed through a linear discriminator to calculate the control data. Alternatively, a configuration may be adopted in which the control data is the result of performing global pooling processing on the feature map of the final layer.

上述の実施形態ではセンサー制御ネットワーク３１３は認識ネットワーク３０２の下位階層の特徴マップ３０８を利用して制御信号を利用（回帰）したが、これに限定されない。上位階層の特徴マップを利用しても良いし、各階層の特徴マップを選択して利用しても良い。また、認識ネットワーク３０２やセンサー制御ネットワーク３１３の階層構造（階層の数や階層内の特徴マップの数）は適用する認識対象や制御対象等に応じてどの様な構成としても良い。更には、認識ネットワーク３０２の特徴マップは使用せずに撮像デバイス７０４の出力を入力として独立したセンサー制御ネットワークを構成しても良い。ただし、その場合もセンサー制御ネットワーク３１３の学習時には認識ネットワーク３０２を利用して学習する。 In the above-described embodiment, the sensor control network 313 uses the control signal (regression) using the feature map 308 in the lower layer of the recognition network 302, but the present invention is not limited thereto. The feature map of the upper layer may be used, or the feature map of each layer may be selected and used. Further, the hierarchical structure (the number of layers and the number of feature maps in a layer) of the recognition network 302 and the sensor control network 313 may have any configuration depending on the recognition target, control target, etc. to which they are applied. Furthermore, an independent sensor control network may be constructed using the output of the imaging device 704 as input without using the feature map of the recognition network 302. However, even in that case, the recognition network 302 is used for learning when the sensor control network 313 is trained.

上述の実施形態では、階層的な特徴抽出処理の最終層でパターン認識の信頼度や制御条件を生成する場合について説明したがこれに限定されない。例えば、中間層の特徴マップを直接参照して認識や制御データ生成を実現する構成でも良い。 In the above-described embodiment, a case has been described in which reliability of pattern recognition and control conditions are generated in the final layer of hierarchical feature extraction processing, but the present invention is not limited to this. For example, a configuration may be adopted in which recognition and control data generation are realized by directly referring to the feature map of the intermediate layer.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other examples)
The present invention provides a system or device with a program that implements one or more of the functions of the embodiments described above via a network or a storage medium, and one or more processors in the computer of the system or device reads and executes the program. This can also be achieved by processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the embodiments described above, and various changes and modifications can be made without departing from the spirit and scope of the invention. Therefore, the following claims are hereby appended to disclose the scope of the invention.

７０１特徴抽出処理部；７０２制御部；７０３メモリ；７０４撮像デバイス；７０５撮像制御処理部；７０６画像補正処理部；３０２認識ネットワーク；３１３センサー制御ネットワーク 701 Feature extraction processing unit; 702 Control unit; 703 Memory; 704 Imaging device; 705 Imaging control processing unit; 706 Image correction processing unit; 302 Recognition network; 313 Sensor control network

Claims

An information processing device connectable to a sensing device,
Setting means for setting data acquisition conditions in the sensing device;
a first processing means for performing hierarchical feature extraction processing on data obtained by the sensing device using a first neural network (NN);
second processing means for generating regression data indicative of data acquisition conditions to be used in subsequent data acquisition by the sensing device using the feature map in the intermediate layer of the first NN;
has
The information processing apparatus is characterized in that the setting means sets data acquisition conditions indicated in the regression data to the sensing device.

The information processing apparatus according to claim 1, wherein the second processing means generates the regression data using a second NN.

The information processing apparatus according to claim 2, wherein at least one of the first NN and the second NN is a convolutional neural network (CNN).

The information processing apparatus according to any one of claims 1 to 3, wherein the second processing means generates the regression data using a feature map in a lower hierarchy of the first NN. .

5. The information processing apparatus according to claim 1, wherein a control unit of data acquisition conditions in the regression data is a partial area unit of the sensing device.

The sensing device is an imaging device,
6. The information processing apparatus according to claim 1, wherein the data is image data.

3. A method for learning weighting coefficients of the second NN in the information processing device according to claim 2, comprising:
a generation step of generating pseudo data that simulates data output from the sensing device based on learning data and data acquisition conditions using a sensor model according to characteristics of the sensing device;
a first step of learning weighting coefficients of the first NN using the pseudo data;
a second step of learning a weighting factor of the second NN using the pseudo data and the weighting factor of the first NN learned in the first step;
method including.

The first step is
a first recognition step of performing a recognition process using the first NN with the pseudo data as input;
A first method of learning weighting coefficients of the first NN by back-propagating the first NN with errors in teacher data prepared in advance corresponding to the recognition result of the first recognition step and the learning data. learning process and
8. The method according to claim 7, comprising:

The second step is
a second recognition step of performing a recognition process using the first NN with the pseudo data as input;
Using the result of back-propagating the error of the teacher data prepared in advance corresponding to the recognition result of the second recognition step and the learning data through the first NN, and the inverse function of the sensor model. a second learning step of learning weighting coefficients of the second NN;
9. The method according to claim 7 or 8, comprising:

10. The method according to claim 9, wherein in the second learning step, some weighting coefficients included in the first NN are fixed.

11. The method according to claim 7, wherein the first step and the second step are performed alternately on a plurality of pieces of learning data.

A program for causing a computer to execute the method according to any one of claims 7 to 11.