JP2021089476A

JP2021089476A - Information processing device, imaging device, control method, and program

Info

Publication number: JP2021089476A
Application number: JP2019218083A
Authority: JP
Inventors: 浩崇進藤; Hirotaka Shindo
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-12-02
Filing date: 2019-12-02
Publication date: 2021-06-10

Abstract

To provide an information processing device capable of executing accurate deduction to image data of various brightness.SOLUTION: An information processing device 100 includes a learning model 20 being a neural network for executing a deduction process by an arithmetic operation using a weighting value 240 for image data, an exposure specification unit 250 for specifying brightness of the image data, and a weight coefficient determination unit 260 which determines a weight coefficient α to be multiplied by the weighting value 240 on the basis of the specified brightness.SELECTED DRAWING: Figure 2

Description

本発明は、推論処理を実行する学習モデルを有する情報処理装置、撮像装置、制御方法、およびプログラムに関する。 The present invention relates to an information processing device, an imaging device, a control method, and a program having a learning model for executing inference processing.

画像データに含まれる被写体を認識する処理（物体認識処理）が利用されている。暗い画像データに対して被写体の認識処理を実行する場合、特徴量を適切に取得できないので認識精度が低い。一方で、暗い画像データを含む様々な明るさの画像データに対する認識処理のニーズが存在する。 A process of recognizing a subject included in image data (object recognition process) is used. When the subject recognition process is executed for dark image data, the recognition accuracy is low because the feature amount cannot be acquired appropriately. On the other hand, there is a need for recognition processing for image data having various brightness including dark image data.

近年、物体認識処理の分野において、機械学習による手法が広く活用されている。一般的に、機械学習の学習段階において使用される入力画像に対して、オフセット処理や画像のビット幅に応じたゲイン処理等の前処理が施される。しかしながら、全体的に暗くコントラストが低い画像をスケーリングしても情報量は増加しないので、認識精度を向上させることが困難である。 In recent years, machine learning methods have been widely used in the field of object recognition processing. Generally, the input image used in the learning stage of machine learning is subjected to preprocessing such as offset processing and gain processing according to the bit width of the image. However, it is difficult to improve the recognition accuracy because the amount of information does not increase even if the image that is dark as a whole and has low contrast is scaled.

情報処理リソースが潤沢な環境では、広い演算ビット幅を有する学習モデルを用いて桁落ちを抑制したり、画像の明るさに応じた複数の学習モデルを用いて種々の明るさに対応したりすることができる。他方、組込みシステム等では、リソースが潤沢ではないので、限られたリソースを用いて精度良く推論処理を実行すべきである。 In an environment with abundant information processing resources, a learning model with a wide calculation bit width is used to suppress digit loss, and multiple learning models according to the brightness of the image are used to support various brightnesses. be able to. On the other hand, in embedded systems and the like, resources are not abundant, so inference processing should be executed with high accuracy using limited resources.

特許文献１は、ディープニューラルネットの活性化関数が引き起こす勾配消失を低減して、推論精度を向上させる技術を提案している。 Patent Document 1 proposes a technique for improving inference accuracy by reducing the gradient disappearance caused by the activation function of a deep neural network.

特開２０１９−６７０６２号公報JP-A-2019-67062

重みパラメータの乗算によって勾配消失を低減させる特許文献１の技術では、各ニューロンに複数の重みパラメータが設けられることによって、回路規模等の情報処理リソースが増大してしまう。入力データに応じたパラメータの設定については特に触れられていない。 In the technique of Patent Document 1 in which gradient disappearance is reduced by multiplying weight parameters, information processing resources such as circuit scale are increased by providing a plurality of weight parameters in each neuron. There is no particular mention of setting parameters according to the input data.

また、情報処理リソースが限られた環境において、複数の学習モデルを切り替えて様々な明るさに対応するようなリソースの使い方を採用することは困難である。 Further, in an environment where information processing resources are limited, it is difficult to switch a plurality of learning models and adopt a resource usage that corresponds to various brightnesses.

以上の事情に鑑み、本発明は、様々な明るさの画像データに対して高精度な推論を実行できる情報処理装置、撮像装置、制御方法、およびプログラムを提供することを目的とする。 In view of the above circumstances, it is an object of the present invention to provide an information processing device, an imaging device, a control method, and a program capable of performing highly accurate inference on image data having various brightnesses.

上記目的を達成するために、本発明の情報処理装置は、画像データに対する重み値を用いた演算によって推論処理を実行するニューラルネットワークである学習モデルと、前記画像データの明るさを特定する露出特定手段と、特定された前記明るさに基づいて前記重み値に乗算すべき重み係数を決定する重み係数決定手段と、を有することを特徴とする。 In order to achieve the above object, the information processing apparatus of the present invention is a learning model which is a neural network that executes inference processing by calculation using weight values for image data, and exposure specification for specifying the brightness of the image data. It is characterized by having a means and a weight coefficient determining means for determining a weighting coefficient to be multiplied by the weight value based on the specified brightness.

本発明によれば、様々な明るさの画像データに対して高精度な推論を実行できる。 According to the present invention, highly accurate inference can be performed on image data having various brightnesses.

本発明の第１実施形態に係る情報処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing apparatus which concerns on 1st Embodiment of this invention. 本発明の第１実施形態に係る学習モデル（推論器）の説明図である。It is explanatory drawing of the learning model (inference device) which concerns on 1st Embodiment of this invention. 本発明の第１実施形態に係る学習モデルに含まれる重みの調整を説明する図である。It is a figure explaining the adjustment of the weight included in the learning model which concerns on 1st Embodiment of this invention. 従来技術による推論処理（物体認識処理）を例示する説明図である。It is explanatory drawing which illustrates the inference processing (object recognition processing) by a prior art. 本発明の第１実施形態における推論処理（物体認識処理）を例示する説明図である。It is explanatory drawing which illustrates the inference processing (object recognition processing) in 1st Embodiment of this invention. 本発明の第１実施形態における重み係数の決定に関する処理のフローチャートである。It is a flowchart of the process concerning determination of a weighting coefficient in 1st Embodiment of this invention. 本発明の第１実施形態における明るさと重み係数との関係を示す図である。It is a figure which shows the relationship between the brightness and the weighting coefficient in the 1st Embodiment of this invention. 本発明の第２実施形態における推論処理（物体認識処理）を例示する説明図である。It is explanatory drawing which illustrates the inference processing (object recognition processing) in the 2nd Embodiment of this invention.

以下、本発明の実施形態について添付図面を参照しながら詳細に説明する。以下に説明される各実施形態は、本発明を実現可能な構成の一例に過ぎない。以下の各実施形態は、本発明が適用される装置の構成や各種の条件に応じて適宜に修正または変更することが可能である。したがって、本発明の範囲は、以下の各実施形態に記載される構成によって限定されるものではない。例えば、相互に矛盾のない限りにおいて実施形態内に記載された複数の構成を組み合わせた構成も採用可能である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. Each embodiment described below is merely an example of a configuration in which the present invention can be realized. Each of the following embodiments can be appropriately modified or modified according to the configuration of the apparatus to which the present invention is applied and various conditions. Therefore, the scope of the present invention is not limited by the configurations described in each of the following embodiments. For example, a configuration in which a plurality of configurations described in the embodiment are combined can be adopted as long as there is no mutual contradiction.

＜第１実施形態＞
図１は、本発明の第１実施形態に係る情報処理装置１００の構成を示すブロック図である。概略的には、第１実施形態では、入力される画像の明るさに応じてニューラルネットワークの重み係数を調整することによって、情報処理装置１００に設けられた推論器による推定の精度を向上させる。 <First Embodiment>
FIG. 1 is a block diagram showing a configuration of an information processing device 100 according to a first embodiment of the present invention. Generally, in the first embodiment, the accuracy of estimation by the inference device provided in the information processing apparatus 100 is improved by adjusting the weighting coefficient of the neural network according to the brightness of the input image.

情報処理装置１００は、ＣＰＵ１０１、ＲＯＭ１０２、ＲＡＭ１０３、ＨＤＤ１０４、入力部１０５、表示部１０６、およびシステムバス１０７を有する。なお、情報処理装置１００は、撮像光学系と撮像素子とを有し画像データを生成する撮像装置であってもよいし、ネットワーク等を介して外部から画像データを取得する端末（パーソナルコンピュータ等）であってもよい。 The information processing device 100 includes a CPU 101, a ROM 102, a RAM 103, an HDD 104, an input unit 105, a display unit 106, and a system bus 107. The information processing device 100 may be an image pickup device having an image pickup optical system and an image pickup element to generate image data, or a terminal (personal computer or the like) that acquires image data from the outside via a network or the like. It may be.

ＣＰＵ１０１は、種々の演算処理を実行可能なプロセッサであって、情報処理装置１００に設けられた要素を統合的に制御する制御部として機能する。 The CPU 101 is a processor capable of executing various arithmetic processes, and functions as a control unit that integrally controls elements provided in the information processing apparatus 100.

ＲＯＭ１０２は、不揮発性の記憶媒体であって、例えば、フラッシュメモリやＥＥＰＲＯＭ等の素子によって構成され、情報処理装置１００の制御に用いられるプログラムを格納している。 The ROM 102 is a non-volatile storage medium, and is composed of elements such as a flash memory and EEPROM, and stores a program used for controlling the information processing apparatus 100.

ＲＡＭ１０３は、揮発性の記憶媒体であって、ＣＰＵ１０１が演算に使用するワーキングメモリとして機能する。 The RAM 103 is a volatile storage medium and functions as a working memory used by the CPU 101 for calculation.

ＨＤＤ１０４は、情報処理装置１００の内部ストレージであって、種々のデータおよび制御情報を格納する。ＨＤＤ１０４が、情報処理装置１００の制御に用いられるプログラムを格納していてもよい。 The HDD 104 is an internal storage of the information processing device 100 and stores various data and control information. The HDD 104 may store a program used for controlling the information processing apparatus 100.

入力部１０５は、ユーザからの指示や他装置からのデータ（例えば、画像データ）が入力されるインタフェースである。 The input unit 105 is an interface for inputting instructions from the user and data (for example, image data) from another device.

表示部１０６は、情報処理装置１００の動作によって取得および生成される種々の情報を表示する表示部であって、例えば、液晶ディスプレイによって構成される。 The display unit 106 is a display unit that displays various information acquired and generated by the operation of the information processing device 100, and is composed of, for example, a liquid crystal display.

システムバス１０７は、情報処理装置１００が有する上記の要素を相互に接続する伝送路である。 The system bus 107 is a transmission line that interconnects the above-mentioned elements of the information processing apparatus 100.

以下に説明される本実施形態に係る学習モデル２０、機能ブロック、および種々の処理は、ＣＰＵ１０１が、ＲＯＭ１０２またはＨＤＤ１０４に格納されたプログラムをＲＡＭ１０３に展開して実行することによって実現される。また、本実施形態に係る種々の処理は、例えば、ユーザが入力部１０５を操作してＣＰＵ１０１に指示することによって開始され、種々の処理の結果は表示部１０６に出力される。 The learning model 20, the functional block, and various processes according to the present embodiment described below are realized by the CPU 101 expanding the program stored in the ROM 102 or the HDD 104 into the RAM 103 and executing the program. Further, various processes according to the present embodiment are started by, for example, a user operating the input unit 105 to instruct the CPU 101, and the results of the various processes are output to the display unit 106.

図２は、本発明の第１実施形態に係る学習モデル２０（推論器）の説明図である。学習モデル２０は、入力層２１０と中間層２２０と出力層２３０とを有するニューラルネットワーク（以下、ＮＮと省略することがある）であって、教師あり学習によって学習された学習済みモデルである。本実施形態の学習済みモデルは、入力データを画像データ、教師データをその画像に写る被写体の種類（人、動物、木など）を示す情報として、例えば畳み込みニューラルネットワークのアルゴリズムに従って学習されたものとする。学習モデル２０は、以下に説明されるアルゴリズムを示すプログラムおよび推論処理中に使用されるパラメータによって規定される。 FIG. 2 is an explanatory diagram of the learning model 20 (inference device) according to the first embodiment of the present invention. The learning model 20 is a neural network having an input layer 210, an intermediate layer 220, and an output layer 230 (hereinafter, may be abbreviated as NN), and is a trained model learned by supervised learning. In the trained model of the present embodiment, the input data is image data, and the teacher data is information indicating the type of subject (human, animal, tree, etc.) reflected in the image, for example, the trained model is trained according to an algorithm of a convolutional neural network. To do. The learning model 20 is defined by a program indicating the algorithm described below and parameters used during the inference process.

入力層２１０は、画像データが入力される複数のノードｕ１，ｕ２を含む層であって、中間層２２０に対して画像データを出力する。より具体的には、入力層２１０には、画像データ内の画素値を行列に変換したデータが入力される。入力層２１０のノードｕ１，ｕ２は、入力されたデータを中間層２２０にそれぞれ出力する。 The input layer 210 is a layer including a plurality of nodes u1 and u2 into which image data is input, and outputs image data to the intermediate layer 220. More specifically, the input layer 210 is input with data obtained by converting the pixel values in the image data into a matrix. The nodes u1 and u2 of the input layer 210 output the input data to the intermediate layer 220, respectively.

中間層２２０は、複数のニューロンｖ１，ｖ２（ノード）を含む層であって、入力層２１０から供給される入力データに対する演算処理（積和演算、活性化関数による非線形演算等）を実行して、出力層２３０に出力する。すなわち、中間層２２０は、入力層２１０と中間層２２０との経路であるエッジごとに設定された重み値２４０（ｕｖ１１，ｕｖ１２，ｕｖ２１，ｕｖ２２）を入力データに乗算し、得られた重み付き和を活性化関数によって変換して、出力層２３０に出力する。中間層２２０における活性化関数として、例えば、シグモイド関数またはＲｅＬＵ関数が用いられる。なお、学習モデル２０が、複数の中間層を含んでよい。すなわち、学習モデル２０は、ディープニューラルネットワークによって構成されてよい。 The intermediate layer 220 is a layer including a plurality of neurons v1 and v2 (nodes), and executes arithmetic processing (product-sum operation, non-linear operation by activation function, etc.) on the input data supplied from the input layer 210. , Output to the output layer 230. That is, the intermediate layer 220 multiplies the input data by the weight values 240 (uv11, uv12, uv21, uv22) set for each edge, which is the path between the input layer 210 and the intermediate layer 220, and the obtained weighted sum. Is converted by the activation function and output to the output layer 230. As the activation function in the intermediate layer 220, for example, a sigmoid function or a ReLU function is used. The learning model 20 may include a plurality of intermediate layers. That is, the learning model 20 may be configured by a deep neural network.

出力層２３０は、中間層２２０から供給される入力データに対する演算処理（積和演算、活性化関数による非線形演算等）を実行して出力する複数のノードｙ１，ｙ２を含む層である。より具体的には、出力層２３０は、中間層２２０と出力層２３０との経路であるエッジごとに設定された重み値２４０（ｖｙ１１，ｖｙ１２，ｖｙ２１，ｖｙ２２）を入力データに乗算し、重み付き和を活性化関数によって変換して、出力層２３０に出力する。なお、出力層２３０は、出力値を確率値に変換して出力してよい。ノードｙ１が人を表す出力であり、ノードｙ２が木を表す出力である場合、入力された画像データに含まれる主被写体が人である確率がノードｙ１から出力され、主被写体が木である確率がノードｙ２から出力される。出力層２３０における活性化関数として、例えば、ソフトマックス関数が用いられる。 The output layer 230 is a layer including a plurality of nodes y1 and y2 that execute arithmetic processing (product-sum operation, non-linear operation by activation function, etc.) on the input data supplied from the intermediate layer 220 and output the data. More specifically, the output layer 230 is weighted by multiplying the input data by the weight values 240 (by11, by12, by21, by22) set for each edge, which is the path between the intermediate layer 220 and the output layer 230. The sum is converted by the activation function and output to the output layer 230. The output layer 230 may convert the output value into a probability value and output it. When node y1 is an output representing a person and node y2 is an output representing a tree, the probability that the main subject included in the input image data is a person is output from node y1, and the probability that the main subject is a tree. Is output from the node y2. As the activation function in the output layer 230, for example, a softmax function is used.

上記した重み値２４０（ｕｖ１１，ｕｖ１２，ｕｖ２１，ｕｖ２２，ｖｙ１１，ｖｙ１２，ｖｙ２１，ｖｙ２２）は、露出特定部２５０および重み係数決定部２６０によって調整される。 The above-mentioned weight values 240 (uv11, uv12, uv21, uv22, by11, by12, by21, by22) are adjusted by the exposure specifying unit 250 and the weighting coefficient determining unit 260.

露出特定部２５０は、学習モデル２０の入力層２１０に入力される画像データの明るさＬを検出して、重み係数決定部２６０に出力する。なお、「明るさ」は、一般的に、光束（ルーメン）、光度（カンデラ）、照度（ルクス）等、種々の単位に基づいて示され得る値である。本実施形態の露出特定部２５０は、画像データの「明るさ」を示す１つの指標値Ｌを１つの画像データごとに特定する。 The exposure specifying unit 250 detects the brightness L of the image data input to the input layer 210 of the learning model 20 and outputs it to the weighting coefficient determining unit 260. The "brightness" is generally a value that can be expressed based on various units such as a luminous flux (lumen), a luminous intensity (candela), and an illuminance (lux). The exposure specifying unit 250 of the present embodiment specifies one index value L indicating the "brightness" of the image data for each image data.

重み係数決定部２６０は、露出特定部２５０が検出した画像データの明るさＬに基づいて、重み値２４０に乗算すべき重み係数を決定する（すなわち、重み値２４０を調整する）。図２に示すように、全ての重み値２４０に対して同じ重み係数αが乗算されてもよいし、図３に示すように、複数の重み係数α，β，γ，…が選択的に複数の重み値２４０に対して乗算されてもよい。例えば、図３（ａ）に示すように、層ごとに異なる重み係数α，βが重み値２４０に乗算されてもよい。図３（ｂ）に示すように、層内に含まれる複数のエッジに対して異なる重み係数α，βが乗算されてもよい。図３（ｃ）に示すように、重み係数が乗算されない層（エッジ）があってもよい。 The weighting coefficient determining unit 260 determines the weighting coefficient to be multiplied by the weighting value 240 (that is, adjusts the weighting value 240) based on the brightness L of the image data detected by the exposure specifying unit 250. As shown in FIG. 2, the same weight coefficient α may be multiplied by all the weight values 240, and as shown in FIG. 3, a plurality of weight coefficients α, β, γ, ... Are selectively present. May be multiplied by the weight value 240 of. For example, as shown in FIG. 3A, the weighting factors α and β, which are different for each layer, may be multiplied by the weight value 240. As shown in FIG. 3B, different weighting factors α and β may be multiplied by a plurality of edges included in the layer. As shown in FIG. 3C, there may be layers (edges) that are not multiplied by the weighting factors.

図４および図５を参照して、本発明の第１実施形態における画像認識について説明する。図４は従来技術による画像認識を示し、図５は本実施形態の構成による画像認識を示している。いずれの図においても、明るい画像４０１，５０１（明るさの指標値Ｌが相対的に大きい画像）と暗い画像４０２，５０２（明るさの指標値Ｌが相対的に小さい画像）とにおける認識処理が対比されている。図４および図５に示すように、本例における主被写体は「人」である。 Image recognition according to the first embodiment of the present invention will be described with reference to FIGS. 4 and 5. FIG. 4 shows image recognition according to the prior art, and FIG. 5 shows image recognition according to the configuration of the present embodiment. In each of the figures, the recognition processing is performed on the bright images 401 and 501 (images in which the brightness index value L is relatively large) and the dark images 402 and 502 (images in which the brightness index value L is relatively small). It is contrasted. As shown in FIGS. 4 and 5, the main subject in this example is a “person”.

本例における学習モデル２０は、入力される画像データに示される主被写体が何であるかを推論する推論値を出力する。出力層２３０に含まれる複数のノード（ニューロン）が、推論される物体にそれぞれ対応する。学習モデル２０の出力層２３０は、例えば、主被写体が人である確率を出力するノードｙ１と、主被写体が木である確率を出力するノードｙ２とを含む。なお、本例の学習モデル２０は、説明の簡単のために一般的な３層のＮＮとして図示されているが、画像認識を含む画像処理に好適である畳み込みニューラルネットワーク（ＣＮＮ）によって学習モデル２０が構成されてもよい。 The learning model 20 in this example outputs an inferred value for inferring what the main subject shown in the input image data is. A plurality of nodes (neurons) included in the output layer 230 correspond to the inferred object. The output layer 230 of the learning model 20 includes, for example, a node y1 that outputs the probability that the main subject is a person and a node y2 that outputs the probability that the main subject is a tree. The learning model 20 of this example is shown as a general three-layer NN for simplicity of explanation, but the learning model 20 is based on a convolutional neural network (CNN) suitable for image processing including image recognition. May be configured.

図４に示すような従来技術による学習モデル２０’では、明るい画像データが入力された場合には精度良く被写体を認識できるが、暗い画像データが入力された場合には精度良く被写体を認識できない場合がある。 In the learning model 20'by the conventional technique as shown in FIG. 4, the subject can be recognized accurately when bright image data is input, but the subject cannot be recognized accurately when dark image data is input. There is.

例えば、学習モデル２０’に明るい画像データ４０１が入力された図４（ａ）では、学習モデル２０’が、主被写体が人である確率が９９％であり、主被写体が木である確率が１％であると出力している。すなわち、画像データに示される主被写体が人であることが精度良く認識されている。他方、学習モデル２０’に暗い画像データ４０２が入力された図４（ｂ）では、学習モデル２０’が、主被写体が人である確率が３３％であり、主被写体が木である確率が６７％であると出力している。すなわち、画像データに示される主被写体が人であることが精度良く認識されていない。理由は以下の通りである。 For example, in FIG. 4A in which bright image data 401 is input to the learning model 20', the probability that the main subject is a person is 99% and the probability that the main subject is a tree is 1 in the learning model 20'. It is output as%. That is, it is accurately recognized that the main subject shown in the image data is a person. On the other hand, in FIG. 4B in which dark image data 402 is input to the learning model 20', the probability that the main subject is a person is 33% and the probability that the main subject is a tree is 67 in the learning model 20'. It is output as%. That is, it is not accurately recognized that the main subject shown in the image data is a person. The reason is as follows.

暗い画像データにおいては、画像データ内の画素値が小さい値を取る傾向にある。したがって、暗い画像データが学習モデル２０’に入力されると、ＮＮでの推論演算の過程において桁落ちが発生するので、演算途中で値が喪失する（０になる）ことが多い。結果として、学習モデル２０’による推論の精度が低下する。以上の値（データ）の喪失は、学習モデル２０’の演算ビット幅が、入力される画像データのビット幅以下である場合により顕著に発生する。 In dark image data, the pixel value in the image data tends to be small. Therefore, when dark image data is input to the learning model 20', digit loss occurs in the process of inference calculation in NN, so that the value is often lost (becomes 0) in the middle of the calculation. As a result, the accuracy of inference by the learning model 20'decreases. The loss of the above value (data) occurs more remarkably when the calculation bit width of the learning model 20'is less than or equal to the bit width of the input image data.

そこで、本実施形態では、前述したように、露出特定部２５０が特定した入力画像データの明るさＬに基づいて重み係数決定部２６０が重み係数αを決定し、学習モデル２０に含まれる重み値２４０に重み係数αを乗算する。重み係数αの決定においては、後述のように、学習モデル２０のＮＮの演算ビット幅の最大値が考慮される。以上の重み係数αの調整によって、入力画像データに対する推論処理におけるデジタルデータ（演算値）の桁落ち（データ喪失）が抑制され、ひいては、被写体の認識精度が向上する。 Therefore, in the present embodiment, as described above, the weighting coefficient determining unit 260 determines the weighting coefficient α based on the brightness L of the input image data specified by the exposure specifying unit 250, and the weighting value included in the learning model 20. Multiply 240 by the weighting factor α. In determining the weighting coefficient α, the maximum value of the operation bit width of the NN of the learning model 20 is taken into consideration, as will be described later. By adjusting the weighting coefficient α as described above, digit loss (data loss) of digital data (calculated value) in inference processing for input image data is suppressed, and thus the recognition accuracy of the subject is improved.

例えば、学習モデル２０に明るい画像データ５０１が入力された図５（ａ）では、重み係数αが推論に影響しない値（例えば、１）に設定され、図４（ａ）と同様の結果が出力される。また、学習モデル２０に暗い画像データ５０２が入力された図５（ｂ）では、学習モデル２０が、主被写体が人である確率が８３％であり、主被写体が木である確率が１７％であると出力している。すなわち、図４に示す従来技術と比較して、図５に示す本実施形態の技術によれば、暗い画像データが学習モデル２０に入力されたときの認識精度を顕著に改善できる。 For example, in FIG. 5A in which bright image data 501 is input to the learning model 20, the weighting coefficient α is set to a value that does not affect inference (for example, 1), and the same result as in FIG. 4A is output. Will be done. Further, in FIG. 5B in which dark image data 502 is input to the learning model 20, the probability that the main subject is a person is 83% and the probability that the main subject is a tree is 17% in the learning model 20. It is output that there is. That is, as compared with the conventional technique shown in FIG. 4, according to the technique of the present embodiment shown in FIG. 5, the recognition accuracy when dark image data is input to the learning model 20 can be remarkably improved.

図６を参照して、本発明の第１実施形態における重み係数αの決定について説明する。図６は、露出特定部２５０が明るさＬを特定する処理および重み係数決定部２６０が重み係数αを決定する処理のフローチャートである。概略的には、入力された画像データの明るさＬと、学習モデル２０の演算ビット幅とに基づいて重み係数αが決定される。本フローは、例えば、推論対象としての画像データが入力された後、学習モデル２０による画像認識に先立って実行される。 The determination of the weighting coefficient α in the first embodiment of the present invention will be described with reference to FIG. FIG. 6 is a flowchart of a process in which the exposure specifying unit 250 specifies the brightness L and a process in which the weighting coefficient determining unit 260 determines the weighting coefficient α. Roughly speaking, the weighting coefficient α is determined based on the brightness L of the input image data and the calculation bit width of the learning model 20. This flow is executed, for example, after the image data as the inference target is input and prior to the image recognition by the learning model 20.

ステップＳ６０１において、露出特定部２５０が、学習モデル２０に入力される画像データの明るさＬを特定する。以上の画像データの明るさＬは、入力画像データの解析によって取得されてもよいし、撮像時のセンサ（露出計、照度センサ等）からの情報に基づいて取得されてもよい。また、以上の両者の組合せによって明るさＬが特定されてもよい。 In step S601, the exposure specifying unit 250 specifies the brightness L of the image data input to the learning model 20. The brightness L of the above image data may be acquired by analyzing the input image data, or may be acquired based on information from a sensor (exposure meter, illuminance sensor, etc.) at the time of imaging. Further, the brightness L may be specified by the combination of both of the above.

本例における明るさＬは、例えば、露出値（ＥＶ値）である。以上の露出値は、ＩＳＯ感度がＩＳＯ１００、絞りがＦ１．０、シャッター速度が１秒の状態を基準（ＥＶ０）とした相対値である。露出値が１段増加すると（例えば、ＥＶ１からＥＶ２に増加すると）、２倍明るくなる。なお、露出値と照度は以下の式（１）の関係にある。例えば、露出値が０（ＥＶ０）である場合、照度は２．５ルクスである。
（照度（ルクス））＝２．５×２^{（露出値）} ……（１） The brightness L in this example is, for example, an exposure value (EV value). The above exposure values are relative values based on the state where the ISO sensitivity is ISO 100, the aperture is F1.0, and the shutter speed is 1 second (EV0). When the exposure value is increased by one step (for example, when it is increased from EV1 to EV2), it becomes twice as bright. The exposure value and the illuminance are related to the following equation (1). For example, when the exposure value is 0 (EV0), the illuminance is 2.5 lux.
(Illuminance (lux)) = 2.5 x 2 ^{(exposure value)} …… (1)

ステップＳ６０２において、露出特定部２５０に特定された明るさＬに基づいて、重み係数決定部２６０が係数Ｃを計算する。より具体的には以下の通りである。 In step S602, the weighting coefficient determining unit 260 calculates the coefficient C based on the brightness L specified by the exposure specifying unit 250. More specifically, it is as follows.

まず、重み係数決定部２６０は、入力された画像データの明るさＬと、学習モデル２０の学習段階において予め算出された基準値Ｒとの比である明るさ比Ａ（＝Ｌ／Ｒ）を算出する。基準値Ｒは、学習段階において学習モデル２０に入力された複数の画像データにおける明るさＬの統計的な代表値（平均値、中央値、最頻値等）である。明るさＬは画像データごとに取得される値であるから、画像データが変化すると明るさ比Ａも変化する。 First, the weighting coefficient determining unit 260 determines the brightness ratio A (= L / R), which is the ratio between the brightness L of the input image data and the reference value R calculated in advance in the learning stage of the learning model 20. calculate. The reference value R is a statistical representative value (average value, median value, mode value, etc.) of brightness L in a plurality of image data input to the learning model 20 in the learning stage. Since the brightness L is a value acquired for each image data, the brightness ratio A also changes when the image data changes.

次いで、重み係数決定部２６０は、算出した明るさ比Ａを用いて以下の式（２）に従って係数Ｃを算出する。
Ｃ＝（１／ｓｑｒｔ（Ａ）×（ｂ／（１／ｓｑｒｔ（Ａｅｖ）））−（ｂ／（１／ｓｑｒｔ（Ａｅｖ））−ｄ））×ｃ−ｅ ……（２） Next, the weighting coefficient determining unit 260 calculates the coefficient C according to the following equation (2) using the calculated brightness ratio A.
C = (1 / square (A) x (b / (1 / square (Aev)))-(b / (1 / square (Aev))-d)) x c-e ... (2)

以上の式（２）において、値Ａｅｖは、画像データを取得した撮像素子における撮像限界に相当する明るさ（限界明るさ）であって、画像データにおける明るさの最小単位となる値である。値Ａｅｖは、撮像素子の仕様に応じて定まる値であるから、同じ撮像装置によって取得された画像データに関して共通する所定値である。値ｂは、学習モデル２０のＮＮにて使用される演算ビット幅の最大値（例えば、２５６＝８ビット）である。補正係数ｃ，ｄ，ｅは、明るさ比Ａに対する係数Ｃの変化の程度（グラフの傾き）等を調整するためのパラメータであって、情報処理装置１００の製造者やユーザが推論結果に応じて調整可能な値である。 In the above equation (2), the value Av is the brightness (limit brightness) corresponding to the imaging limit of the image sensor that has acquired the image data, and is a value that is the minimum unit of the brightness in the image data. Since the value Aev is a value determined according to the specifications of the image pickup device, it is a predetermined value common to the image data acquired by the same image pickup device. The value b is the maximum value (for example, 256 = 8 bits) of the calculation bit width used in the NN of the learning model 20. The correction coefficients c, d, and e are parameters for adjusting the degree of change (slope of the graph) of the coefficient C with respect to the brightness ratio A, and the manufacturer or user of the information processing apparatus 100 responds to the inference result. It is an adjustable value.

図７は、本発明の第１実施形態における明るさ比Ａと係数Ｃ（重み係数α）との関係を示す説明図である。図７の例では、上記した値Ａｅｖ（限界明るさ）が０．００７８１２５（＝１／２^７）であり、上記した値ｂ（ＮＮの演算ビット幅の最大値）が２５６（＝８ビット）であると想定する。明るさ比Ａが、限界輝度に相当する値Ａｅｖに等しい場合、以上の式（２）に従って演算すると、係数Ｃ（重み係数α）として、ＮＮの演算ビット幅の最大値ｂに等しい値である２５６が得られる。明るさ比Ａが変化すると、図７（ａ）の表および図７（ｂ）のグラフに示すように係数Ｃ（重み係数α）が変化する。概略的には、明るさ比Ａが小さいほど、係数Ｃ（重み係数α）が大きくなる。 FIG. 7 is an explanatory diagram showing the relationship between the brightness ratio A and the coefficient C (weighting coefficient α) in the first embodiment of the present invention. In the example of FIG. 7, a said value Aev (limit brightness) is 0.0078125 (= ^1/27), the value b (the maximum value of the operation bit width of the NN) 256 (= 8 bits) Is assumed to be. When the brightness ratio A is equal to the value Aev corresponding to the limit luminance, when the calculation is performed according to the above equation (2), the coefficient C (weight coefficient α) is a value equal to the maximum value b of the calculation bit width of NN. 256 is obtained. When the brightness ratio A changes, the coefficient C (weighting coefficient α) changes as shown in the table of FIG. 7 (a) and the graph of FIG. 7 (b). Generally, the smaller the brightness ratio A, the larger the coefficient C (weighting coefficient α).

なお、以上に説明した式（２）は、撮像素子の検出限界（撮像限界）に相当する値Ａｅｖ（限界明るさ）に基づいて、学習モデル２０において使用可能な演算ビット幅がより広くＮＮ演算に用いられるように係数Ｃを算出する式である。したがって、以上のように係数Ｃ（重み係数α）を算出できる任意の数式をステップＳ６０２にて使用し得る。例えば、上記した式（２）では、明るさ比Ａが値Ａｅｖに等しいときに係数Ｃが最大値を取るが、補正係数ｃ，ｄ，ｅを調整して明るさ比Ａが他の値を取るときに係数Ｃが最大値を取る式が採用されてもよい。また、図３を参照して前述したように、ＮＮの層ごとまたはエッジごとに異なる補正係数ｃ，ｄ，ｅを適用してそれぞれ係数Ｃ（重み係数α，β，γ，…）が算出されてもよい。 In the equation (2) described above, the calculation bit width that can be used in the learning model 20 is wider based on the value Aev (limit brightness) corresponding to the detection limit (imaging limit) of the image sensor, and the NN calculation is performed. It is an expression which calculates a coefficient C as used in. Therefore, any mathematical formula capable of calculating the coefficient C (weighting coefficient α) as described above can be used in step S602. For example, in the above equation (2), the coefficient C takes the maximum value when the brightness ratio A is equal to the value Aev, but the correction coefficients c, d, and e are adjusted so that the brightness ratio A sets another value. An equation in which the coefficient C takes the maximum value at the time of taking may be adopted. Further, as described above with reference to FIG. 3, the coefficients C (weighting coefficients α, β, γ, ...) Are calculated by applying different correction coefficients c, d, and e for each layer or edge of the NN. You may.

ステップＳ６０３において、重み係数決定部２６０は、ステップＳ６０２にて算出された係数Ｃを、ＮＮの重み係数αとして学習モデル２０に設定する。 In step S603, the weighting coefficient determining unit 260 sets the coefficient C calculated in step S602 in the learning model 20 as the weighting coefficient α of NN.

上記した本実施形態の構成によれば、推論段階において学習モデル２０に入力される画像データの明るさに応じて決定される重み係数が学習モデル２０の重みに乗算されるので、様々な明るさの画像データに対して高精度な推論を実行できる。 According to the configuration of the present embodiment described above, since the weighting coefficient determined according to the brightness of the image data input to the learning model 20 in the inference stage is multiplied by the weight of the learning model 20, various brightnesses are obtained. Highly accurate inference can be performed on the image data of.

特に、学習モデル２０の演算ビット幅の最大値を踏まえて重み係数が決定されるので、暗い画像データ、すなわち、小さな画素値を有する画像データが学習モデル２０に入力されても、演算時の桁落ちが抑制されるので推論の精度を維持できる。 In particular, since the weighting coefficient is determined based on the maximum value of the calculation bit width of the learning model 20, even if dark image data, that is, image data having a small pixel value is input to the learning model 20, the digit at the time of calculation Since the drop is suppressed, the accuracy of inference can be maintained.

＜第２実施形態＞
以下、本発明の第２実施形態について説明する。なお、以下に例示する各実施形態において、作用、機能が前提例または第１実施形態と同等である要素については、以上の説明で参照した符号を流用して各々の説明を適宜に省略する。 <Second Embodiment>
Hereinafter, the second embodiment of the present invention will be described. In each of the embodiments illustrated below, for elements whose actions and functions are equivalent to those of the premise example or the first embodiment, the reference numerals referred to in the above description will be used and the respective description will be omitted as appropriate.

第１実施形態では、画像データの各々について１つの明るさＬが特定され、学習モデル２０の重み係数の調整に用いられる。第２実施形態では、画像データの各々に含まれる複数の領域について、それぞれ明るさＬが特定され、学習モデル８０の重み係数の調整に用いられる。 In the first embodiment, one brightness L is specified for each of the image data and used for adjusting the weighting coefficient of the learning model 20. In the second embodiment, the brightness L is specified for each of the plurality of regions included in the image data, and is used for adjusting the weighting coefficient of the learning model 80.

以下に説明される本実施形態に係る学習モデル８０、機能ブロック、および種々の処理は、情報処理装置１００のＣＰＵ１０１が、ＲＯＭ１０２またはＨＤＤ１０４に格納されたプログラムをＲＡＭ１０３に展開して実行することによって実現される。また、本実施形態に係る種々の処理は、例えば、ユーザが入力部１０５を操作してＣＰＵ１０１に指示することによって開始され、種々の処理の結果は表示部１０６に出力される。 The learning model 80, the functional blocks, and various processes according to the present embodiment described below are realized by the CPU 101 of the information processing apparatus 100 expanding the program stored in the ROM 102 or the HDD 104 into the RAM 103 and executing the program. Will be done. Further, various processes according to the present embodiment are started by, for example, a user operating the input unit 105 to instruct the CPU 101, and the results of the various processes are output to the display unit 106.

図８を参照して、本発明の第２実施形態における推論処理（物体認識処理）を例示する説明図である。本実施形態の学習モデル８０は、第１実施形態の学習モデル２０と同様、入力される画像データに示される主被写体が何であるかを推論する推論値を出力する。図８の画像データ８０１は、主被写体である人の影によって生じた暗い部分（暗部）を含んでいる。画像データ８０１は、複数の領域に分割されて、それぞれ、学習モデル８０の入力層の複数のノードおよび露出特定部８５０に入力される。 It is explanatory drawing which illustrates the inference processing (object recognition processing) in the 2nd Embodiment of this invention with reference to FIG. Similar to the learning model 20 of the first embodiment, the learning model 80 of the present embodiment outputs an inferred value for inferring what the main subject is shown in the input image data. The image data 801 of FIG. 8 includes a dark portion (dark portion) created by the shadow of a person who is the main subject. The image data 801 is divided into a plurality of regions and is input to a plurality of nodes of the input layer of the learning model 80 and the exposure specifying unit 850, respectively.

露出特定部８５０は、学習モデル８０に入力される画像データの明るさＬを領域ごとに検出して、重み係数決定部８６０に出力する。重み係数決定部８６０は、露出特定部８５０が検出した画像データの各領域の明るさＬに基づいて、各領域に対応する重み値に乗算すべき重み係数α，β，γ，…を決定する（すなわち、重み値を調整する）。 The exposure specifying unit 850 detects the brightness L of the image data input to the learning model 80 for each region and outputs it to the weighting coefficient determining unit 860. The weighting coefficient determining unit 860 determines the weighting coefficients α, β, γ, ... To be multiplied by the weight value corresponding to each region based on the brightness L of each region of the image data detected by the exposure specifying unit 850. (That is, adjust the weight value).

例えば、図３（ｃ）に示すように、ノードｕ１が出力するデータに対する重み値ｕｖ１１，ｕｖ１２に乗算すべき重み係数αと、ノードｕ２が出力するデータに対する重み値ｕｖ２１，ｕｖ２２に乗算すべき重み係数βとが別個に設定される。すなわち、ノードｕ１に入力される画像データ８０１の領域の明るさＬ１に応じて重み係数αが決定され、ノードｕ２に入力される画像データ８０１の領域の明るさＬ２に応じて重み係数βが決定される。 For example, as shown in FIG. 3C, the weight coefficients α to be multiplied by the weight values uv11 and uv12 for the data output by the node u1 and the weights uv21 and uv22 to be multiplied by the weight values uv21 and uv22 for the data output by the node u2. The coefficient β is set separately. That is, the weighting coefficient α is determined according to the brightness L1 of the region of the image data 801 input to the node u1, and the weighting coefficient β is determined according to the brightness L2 of the region of the image data 801 input to the node u2. Will be done.

上記した本実施形態の構成によれば、第１実施形態と同様の技術的効果が奏される。加えて、画像データの領域ごとに明るさＬを特定して重み係数α，β，γ，…を決定するので、明るい領域と暗い領域とを併有する画像データに対しても、高精度な推論を実行できる。 According to the configuration of the present embodiment described above, the same technical effect as that of the first embodiment is achieved. In addition, since the brightness L is specified for each image data area and the weighting coefficients α, β, γ, ... Are determined, highly accurate inference can be made even for image data having both a bright area and a dark area. Can be executed.

＜変形例＞
以上、本発明の好ましい実施の形態について説明したが、本発明は上述した実施の形態に限定されず、その要旨の範囲内で種々の変形および変更が可能である。 <Modification example>
Although the preferred embodiment of the present invention has been described above, the present invention is not limited to the above-described embodiment, and various modifications and modifications can be made within the scope of the gist thereof.

上記した実施形態では、学習モデル２０，８０が画像データの主被写体を認識しているが、学習モデル２０，８０が他の推論に用いられてもよい。例えば、学習モデル２０，８０が、画像データに対応するシーン（風景、ポートレート等）を推定してもよい。 In the above embodiment, the learning models 20 and 80 recognize the main subject of the image data, but the learning models 20 and 80 may be used for other inferences. For example, the learning models 20 and 80 may estimate the scene (landscape, portrait, etc.) corresponding to the image data.

上記した実施形態では、露出特定部２５０，８５０および重み係数決定部２６０，８６０が、ソフトウェア的な機能ブロックとして構成されているが、上述した処理を実行可能な電気的構成（回路等）によってハードウェア的に構成されてもよい。 In the above-described embodiment, the exposure specifying units 250 and 850 and the weighting coefficient determining units 260 and 860 are configured as software-like functional blocks, but they are hardened by an electrical configuration (circuit or the like) capable of executing the above-described processing. It may be configured as a piece of clothing.

推論の対象である画像データは、撮影済みの画像データであってもよいし、撮像装置の撮像素子が出力したリアルアイムな画像データであってもよい。 The image data to be inferred may be image data that has already been captured, or may be real-eye image data output by the image sensor of the image pickup device.

本発明は、上述の実施の形態の１以上の機能を実現するプログラムを、ネットワークや記憶媒体を介してシステムや装置に供給し、そのシステム又は装置のコンピュータの１つ以上のプロセッサがプログラムを読み出して実行する処理でも実現可能である。また、本発明は、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors of the computer of the system or device reads the program. It can also be realized by the processing to be executed. The present invention can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

以上のコンピュータは、１以上のプロセッサまたは回路を有してよい。分離した複数のコンピュータ、または分離した複数のプロセッサもしくは回路のネットワークが、コンピュータによって実行可能な命令を読み出して実行してもよい。 The above computers may have one or more processors or circuits. Multiple separate computers, or a network of separate processors or circuits, may read and execute instructions that can be executed by a computer.

上記した実施形態におけるＣＰＵ１０１に代えて、情報処理装置１００は、任意のプロセッサまたは回路を採用し得る。例えば、以下の素子がプロセッサまたは回路として用いてよい。 Instead of the CPU 101 in the above embodiment, the information processing apparatus 100 may adopt any processor or circuit. For example, the following elements may be used as a processor or circuit.

−マイクロプロセッシングユニット（ＭＰＵ）
−グラフィクスプロセッシングユニット（ＧＰＵ）
−特定用途向け集積回路（ＡＳＩＣ）
−フィールドプログラマブルゲートウェイ（ＦＰＧＡ）
−デジタルシグナルプロセッサ（ＤＳＰ）
−データフロープロセッサ（ＤＦＰ）
−ニューラルプロセッシングユニット（ＮＰＵ） -Micro processing unit (MPU)
-Graphics processing unit (GPU)
-Application-specific integrated circuit (ASIC)
-Field Programmable Gateway (FPGA)
-Digital Signal Processor (DSP)
-Data Flow Processor (DFP)
-Neural processing unit (NPU)

２０学習モデル
８０学習モデル
１００情報処理装置
２５０露出特定部（露出特定手段）
２６０重み係数決定部（重み係数決定手段）
８５０露出特定部（露出特定手段）
８６０重み係数決定部（重み係数決定手段） 20 Learning model 80 Learning model 100 Information processing device 250 Exposure identification part (exposure identification means)
260 Weight coefficient determination unit (weight coefficient determination means)
850 Exposure identification part (exposure identification means)
860 Weight coefficient determination unit (weight coefficient determination means)

Claims

A learning model, which is a neural network that executes inference processing by operations using weight values for image data,
An exposure specifying means for specifying the brightness of the image data and
An information processing apparatus comprising: a weighting coefficient determining means for determining a weighting coefficient to be multiplied by the weighting value based on the specified brightness.

The information processing apparatus according to claim 1, wherein the calculation bit width in the calculation executed by the learning model is equal to or less than the bit width of the image data.

The information processing apparatus according to claim 1 or 2, wherein the weighting coefficient determining means increases the weighting coefficient as the brightness of the image data decreases.

The learning model has a plurality of layers including a plurality of nodes, respectively.
The information processing according to any one of claims 1 to 3, wherein the weighting coefficient determining means changes the weight value corresponding to at least one of the layers based on the brightness. apparatus.

The weighting coefficient determining means determines the weighting coefficient based on the ratio of the brightness to a representative value of brightness in a plurality of image data input to the learning model in the learning stage. The information processing apparatus according to any one of claims 1 to 4.

Any of claims 1 to 5, wherein the weighting coefficient determining means determines the weighting coefficient based on an imaging limit in an image pickup device that has acquired the image data that is the target of the inference processing. The information processing apparatus according to item 1.

The exposure specifying means identifies the brightness for each of a plurality of regions included in the image data.
Any one of claims 1 to 6, wherein the weighting coefficient determining means determines, respectively, a plurality of the weighting coefficients based on the plurality of brightnesses specified for each of the plurality of the regions. The information processing device described in the section.

The information processing apparatus according to any one of claims 1 to 7.
An image pickup apparatus comprising: an image pickup means including an image pickup element for acquiring the image data.

A control method for an information processing device equipped with a learning model, which is a neural network that executes inference processing by calculation using weight values for image data.
Identifying the brightness of the image data and
A control method comprising: determining a weighting coefficient to be multiplied by the weight value based on the specified brightness.

A program that causes a computer to function as a learning model and means for the information processing apparatus according to any one of claims 1 to 7.