JP6994950B2

JP6994950B2 - How to learn image recognition system and neural network

Info

Publication number: JP6994950B2
Application number: JP2018001267A
Authority: JP
Inventors: 育郎佐藤; 竜馬新原
Original assignee: Denso Corp; Denso IT Laboratory Inc
Current assignee: Denso Corp; Denso IT Laboratory Inc
Priority date: 2018-01-09
Filing date: 2018-01-09
Publication date: 2022-02-04
Anticipated expiration: 2038-01-09
Also published as: JP2019121225A

Description

本発明は、ニューラルネットワークを用いた画像認識システムおよび画像認識システムにおけるニューラルネットワークの学習方法に関する。 The present invention relates to an image recognition system using a neural network and a method for learning a neural network in an image recognition system.

車載カメラを使って他車両や歩行者などを検知し、それら対象物体の存在をドライバに注意喚起するシステムや、それらの対象物体の存在に応じて自動制御を行うシステムにおいて、対象物体までの距離を正確に求めることは重要な要素技術である。 Distance to the target object in a system that detects other vehicles and pedestrians using an in-vehicle camera and alerts the driver to the existence of those target objects, and a system that automatically controls according to the existence of those target objects. Accurately finding is an important elemental technology.

従来技術として、単眼カメラで得られた静止画に対してニューラルネットワークを利用したパターン認識を行い、対象物体までの距離などの物理量を推定する方法などがある。 As a conventional technique, there is a method of performing pattern recognition using a neural network on a still image obtained by a monocular camera and estimating a physical quantity such as a distance to a target object.

しかしながら、従来技術による方法では、精度よく物理量を求められないことがある。特に、単眼カメラによる上記の方法は既に相当程度成熟しており、大幅な改善が見込めない。 However, the physical quantity may not be obtained accurately by the method by the prior art. In particular, the above method using a monocular camera has already matured to a considerable extent, and no significant improvement can be expected.

従来技術による方法で精度が十分でない理由の１つとして、従来は、１枚の静止画の見えのみを考慮してニューラルネットワークが最適化されていることにある。そのようなニューラルネットワークを用いて得られた物理量は、静止画に対しては適切な結果が得られるものの、得られた物理量を時系列で捉えると物理的知見とは乖離していることも少なくない。例えば、対象物体までの距離が短時間で大きく変化することを示していることもある。 One of the reasons why the accuracy is not sufficient by the method by the conventional technique is that the neural network is optimized in consideration of only the appearance of one still image in the past. Although the physical quantities obtained by using such a neural network give appropriate results for still images, the obtained physical quantities are rarely different from the physical knowledge when grasped in chronological order. do not have. For example, it may indicate that the distance to the target object changes significantly in a short time.

本発明はこのような問題点に鑑みてなされたものであり、本発明の課題は、より精度よく物理量を推定できる画像認識システムおよび画像認識システムにおけるニューラルネットワークの学習方法を提供することである。 The present invention has been made in view of such problems, and an object of the present invention is to provide an image recognition system capable of estimating physical quantities more accurately and a method for learning a neural network in an image recognition system.

本発明の一態様によれば、複数の画像のそれぞれに含まれる対象の見えに関する観測をニューラルネットワークを用いて推定し、推定された前記観測から前記対象の物理量を算出する画像認識システムにおけるニューラルネットワークの学習方法であって、前記観測に対する真値と、前記物理量に関する事前知識とを用いて、前記ニューラルネットワークにおける重みを最適化する、ニューラルネットワークの学習方法が提供される。 According to one aspect of the present invention, a neural network in an image recognition system that estimates observations regarding the appearance of an object contained in each of a plurality of images using a neural network and calculates the physical quantity of the object from the estimated observations. The learning method of the neural network is provided, which optimizes the weight in the neural network by using the true value for the observation and the prior knowledge about the physical quantity.

前記観測の推定精度を示す第１コスト項と、前記物理量の算出精度を示す第２コスト項と、の総和が最小となるよう、前記ニューラルネットワークにおける重みを最適化するのが望ましい。 It is desirable to optimize the weights in the neural network so that the sum of the first cost term indicating the estimation accuracy of the observation and the second cost term indicating the calculation accuracy of the physical quantity is minimized.

この場合、前記第２コスト項は、算出される前記物理量が事前知識に基づく値に近いほど小さい値をとるものであってもよい。
あるいは、前記第２コスト項は、算出される前記物理量の分布が事前知識に基づく分布に近いほど小さい値をとるものであってもよい。 In this case, the second cost term may have a smaller value as the calculated physical quantity is closer to a value based on prior knowledge.
Alternatively, the second cost term may have a smaller value as the calculated distribution of the physical quantity is closer to the distribution based on prior knowledge.

前記物理量は加速度またはジャークであってもよい。 The physical quantity may be jerk or jerk.

前記画像認識システムは、前記複数の画像のそれぞれから外乱を推定し、推定された前記観測および前記外乱から前記対象の物理量を算出し、前記外乱に関する事前知識も用いて、前記ニューラルネットワークにおける重みを最適化してもよい。 The image recognition system estimates a disturbance from each of the plurality of images, calculates the physical quantity of the object from the estimated observation and the disturbance, and uses the prior knowledge about the disturbance to calculate the weight in the neural network. It may be optimized.

そして、前記観測の推定精度を示す第１コスト項と、前記物理量の算出精度を示す第２コスト項と、前記外乱の推定精度を示す第３コスト項と、の総和が最小となるよう、前記ニューラルネットワークにおける重みを最適化するのが望ましい。 Then, the sum of the first cost term indicating the estimation accuracy of the observation, the second cost term indicating the calculation accuracy of the physical quantity, and the third cost term indicating the estimation accuracy of the disturbance is minimized. It is desirable to optimize the weights in the neural network.

この場合、前記第３コスト項は、推定される前記外乱が事前知識に基づく値に近いほど小さい値をとるものであってもよい。
あるいは、前記第３コスト項は、推定される前記外乱の分布が事前知識に基づく分布に近いほど小さい値をとるものであってもよい。 In this case, the third cost term may have a smaller value as the estimated disturbance is closer to a value based on prior knowledge.
Alternatively, the third cost term may have a smaller value as the estimated distribution of the disturbance is closer to the distribution based on prior knowledge.

前記外乱は、路面の傾き、前記複数の画像のそれぞれを撮影するカメラの取り付け高さのずれ、および、前記カメラのピッチ角の少なくとも１つを含んでもよい。 The disturbance may include an inclination of the road surface, a deviation in the mounting height of the camera that captures each of the plurality of images, and at least one of the pitch angles of the camera.

前記対象の見えに関する観測は、前記複数の画像において前記対象を囲う矩形であってもよい。 The observation regarding the appearance of the object may be a rectangle surrounding the object in the plurality of images.

また、本発明の別の態様によれば、複数の画像のそれぞれに含まれる対象の見えに関する観測をニューラルネットワークを用いて推定し、推定された前記観測から前記対象の物理量を算出する画像認識システムであって、前記観測に対する真値と、前記物理量に関する事前知識とを用いて、前記ニューラルネットワークにおける重みを最適化する、画像認識システムが提供される。 Further, according to another aspect of the present invention, an image recognition system that estimates observations regarding the appearance of an object contained in each of a plurality of images using a neural network, and calculates the physical quantity of the object from the estimated observations. Therefore, an image recognition system is provided that optimizes weights in the neural network by using the true value for the observation and the prior knowledge about the physical quantity.

物理量の推定精度が向上する。 The accuracy of estimating physical quantities is improved.

画像認識システムの一例を模式的に示す図。The figure which shows an example of an image recognition system schematically. 画像認識システムの概略構成を示す模式図。The schematic diagram which shows the schematic structure of the image recognition system. 第１の実施形態における観測、外乱および物理量のモデルを説明する図。The figure explaining the model of the observation, the disturbance and the physical quantity in 1st Embodiment. 第１の実施形態に係る物理量推定部２の内部構成の一例を示すブロック図。The block diagram which shows an example of the internal structure of the physical quantity estimation part 2 which concerns on 1st Embodiment. 学習時の物理量推定部２を説明する図。The figure explaining the physical quantity estimation part 2 at the time of learning. 本手法による加速度の推定結果と従来手法による加速度の推定結果とを比較するグラフ。A graph comparing the acceleration estimation result by this method and the acceleration estimation result by the conventional method. 本手法による距離の推定結果と従来手法による距離の推定結果とを比較するグラフ。A graph comparing the distance estimation result by this method and the distance estimation result by the conventional method. 第２の実施形態における観測および物理量のモデルを説明する図。The figure explaining the model of the observation and the physical quantity in the 2nd Embodiment. 第２の実施形態に係る物理量推定部２の内部構成の一例を示すブロック図。The block diagram which shows an example of the internal structure of the physical quantity estimation part 2 which concerns on 2nd Embodiment.

以下、本発明に係る実施形態について、図面を参照しながら具体的に説明する。 Hereinafter, embodiments according to the present invention will be specifically described with reference to the drawings.

（第１の実施形態）
図１は、画像認識システムの一例を模式的に示す図である。本例では、自車両１００に搭載されたカメラ１による撮影によって得られた画像から、該画像に含まれる対象車両２００に関する物理量を、事前に学習が行われた畳み込みニューラルネットワークを利用して推定するものである。出力される物理量から衝突の危険判定などが行われる。 (First Embodiment)
FIG. 1 is a diagram schematically showing an example of an image recognition system. In this example, the physical quantity of the target vehicle 200 included in the image is estimated from the image obtained by the image taken by the camera 1 mounted on the own vehicle 100 by using a convolutional neural network that has been learned in advance. It is a thing. Collision risk judgment is performed from the output physical quantity.

物理量とは、例えば対象車両２００の大きさ、距離、速度、加速度などである。これらの物理量は、まず画像における対象車両２００の位置を推定し、次いで推定結果に対して予め設定したモデルに基づく幾何計算を行って求められる。 The physical quantity is, for example, the size, distance, speed, acceleration, etc. of the target vehicle 200. These physical quantities are obtained by first estimating the position of the target vehicle 200 in the image and then performing geometric calculation based on a preset model for the estimation result.

ここで、画像における対象車両２００の位置は、予め真値が分かっている画像（例えば、手動で真値を与えた画像）を大量に用いた教師あり学習が可能であり、従来技術によって高い精度で推定可能である。 Here, the position of the target vehicle 200 in the image can be supervised learning using a large amount of images whose true value is known in advance (for example, an image in which the true value is manually given), and high accuracy is achieved by the prior art. It can be estimated with.

一方、対象車両２００の大きさ、距離、速度、加速度といった物理量は、これらの真値を得るのは容易ではなく、事前に教師あり学習を行うのは困難である。そのため、これらの物理量は外乱の影響を受けると、誤差が大きくなることがある。特に、加速度は距離を２階微分して得られるものであるため、わずかな外乱に対しても特に誤差が大きくなる傾向にある。 On the other hand, it is not easy to obtain the true values of physical quantities such as the size, distance, speed, and acceleration of the target vehicle 200, and it is difficult to perform supervised learning in advance. Therefore, when these physical quantities are affected by disturbance, the error may become large. In particular, since the acceleration is obtained by differentiating the distance to the second order, the error tends to be particularly large even for a slight disturbance.

そこで、本実施形態では、外乱がある場合でも、大きさ、距離、速度、加速度といったダイナミクスの物理量も精度よく検出できる学習法を提示する。なお、「速度」は、正確には自車両１００に対する対象車両２００の「相対速度」であるが、本明細書では単に速度という。加速度などについても同様とする。 Therefore, in this embodiment, we present a learning method that can accurately detect physical quantities of dynamics such as magnitude, distance, velocity, and acceleration even when there is a disturbance. The "speed" is, to be exact, the "relative speed" of the target vehicle 200 with respect to the own vehicle 100, but in the present specification, it is simply referred to as a speed. The same applies to acceleration and the like.

図２は、画像認識システムの概略構成を示す模式図である。画像認識システムは、センサの一例であるカメラ１と、物理量推定部２と、時系列フィルタ３とを備えている。 FIG. 2 is a schematic diagram showing a schematic configuration of an image recognition system. The image recognition system includes a camera 1 which is an example of a sensor, a physical quantity estimation unit 2, and a time series filter 3.

カメラ１は、図１に示したように自車両１００に搭載された単眼カメラであり、前方の特定範囲を撮影する。撮影された画像は動画像ではあるが、連続する複数時刻の静止画像として物理量推定部２に入力される。なお、本実施形態では、「複数時刻」が連続する５時刻（ｔ－２～ｔ＋２）であるとする。 As shown in FIG. 1, the camera 1 is a monocular camera mounted on the own vehicle 100, and captures a specific range in front of the camera 1. Although the captured image is a moving image, it is input to the physical quantity estimation unit 2 as a continuous still image at a plurality of times. In this embodiment, it is assumed that the "plurality of times" are consecutive 5 times (t-2 to t + 2).

物理量推定部２は、複数時刻の画像から、対象車両２００に関する物理量を畳み込みニューラルネットワークを利用して推定する。本実施形態における物理量は、対象車両２００の大きさ（より具体的には、車高や車幅）、対象車両２００との距離、対象車両２００の速度および加速度などであり、詳細は図３を用いて後述する。また、物理量推定部２の具体的な構成例は、図４Ａを用いて後述する。 The physical quantity estimation unit 2 estimates the physical quantity related to the target vehicle 200 from the images at a plurality of times by using a convolutional neural network. The physical quantities in the present embodiment are the size of the target vehicle 200 (more specifically, the vehicle height and the vehicle width), the distance to the target vehicle 200, the speed and acceleration of the target vehicle 200, and the like, and the details are shown in FIG. It will be described later using this. A specific configuration example of the physical quantity estimation unit 2 will be described later with reference to FIG. 4A.

時系列フィルタ３は、例えばカルマンフィルタであり、物理量推定部２からの出力を補正する。本実施形態では、距離のみならず、速度や加速度も時系列フィルタ３に入力されるため、補正の精度が向上する。 The time-series filter 3 is, for example, a Kalman filter, and corrects the output from the physical quantity estimation unit 2. In the present embodiment, not only the distance but also the speed and the acceleration are input to the time series filter 3, so that the accuracy of the correction is improved.

図３は、第１の実施形態における観測、外乱および物理量のモデルを説明する図である。カメラ１の光軸をＺ軸（紙面右向き）とし、鉛直方向をＹ軸（紙面下向き、鉛直下向きを正）とし、Ｚ軸およびＹ軸と直交する方向をＸ軸（紙面と垂直）とする。また、カメラ１の焦点位置（既知）を原点とする。そして、外乱がない場合、路面はＺ軸と平行であり、カメラ１は路面から高さＨ（既知）に取り付けられているとする。なお、路面には凹凸がないものと仮定する。 FIG. 3 is a diagram illustrating a model of observation, disturbance, and physical quantity in the first embodiment. The optical axis of the camera 1 is the Z axis (rightward on the paper surface), the vertical direction is the Y axis (downward on the paper surface, the vertical downward direction is positive), and the directions orthogonal to the Z axis and the Y axis are the X axis (perpendicular to the paper surface). Further, the focal position (known) of the camera 1 is set as the origin. Then, when there is no disturbance, it is assumed that the road surface is parallel to the Z axis and the camera 1 is mounted at a height H (known) from the road surface. It is assumed that the road surface has no unevenness.

本実施形態では、観測Γ_T，Γ_B，Γ_R，Γ_L、外乱α，ΔＨおよび物理量Ｄｙ，Ｄｘ，Ｚ，ｄＺ／ｄｔ，ｄ²Ｚ／ｄｔ²を次のように定義する。 In this embodiment, the observations Γ _T , Γ _B , Γ _R , Γ _L , disturbance α, ΔH and physical quantities Dy, Dx, Z, dZ / dt, d ² Z / dt ² are defined as follows.

まずは、画像に含まれる対象車両２００の見えに関する指標である観測Γ_T，Γ_B，Γ_R，Γ_Lについて説明する。Γ_Tは、画像上で対象車両２００を矩形で囲んだ場合に、カメラ１および矩形の上端中点を通る直線を示すレイである。同様に、Γ_B，Γ_R，Γ_Lは、カメラ１および矩形の下端中点、右端中点および左端中点を通る直線をそれぞれ示すレイである。すなわち、以下の式が成立する。
Ｙ＝Γ_T＊Ｚ・・・（１）
Ｙ＝Γ_B＊Ｚ・・・（２） First, the observations Γ _T , Γ _B , Γ _R , and Γ _L , which are indicators related to the appearance of the target vehicle 200 included in the image, will be described. Γ _T is a ray showing a straight line passing through the camera 1 and the midpoint of the upper end of the rectangle when the target vehicle 200 is surrounded by a rectangle on the image. Similarly, Γ _B , Γ _R , and Γ _L are rays indicating straight lines passing through the camera 1 and the lower midpoint, the right midpoint, and the left midpoint of the rectangle, respectively. That is, the following equation holds.
Y = Γ _T * Z ・・・ (1)
Y = Γ _B * Z ・・・ (2)

観測Γ_T，Γ_B，Γ_R，Γ_Lは画像において対象車両２００を囲う矩形を示すもの、言い換えると、画像における対象車両２００の位置を示すともいえる。以下では、Γ_T，Γ_B，Γ_R，Γ_LをまとめてΓと表記する。 Observations Γ _T , Γ _B , Γ _R , and Γ _L can be said to indicate a rectangle surrounding the target vehicle 200 in the image, in other words, indicate the position of the target vehicle 200 in the image. In the following, Γ _T , Γ _B , Γ _R , and Γ _L are collectively referred to as Γ.

次に、外乱α，ΔＨについて説明する。外乱αは路面の傾きαであり、自車両１００のピッキングの影響などによって生じ得る。外乱ΔＨはカメラ１の高さずれ、正確には、カメラ１の取り付け高さＨからの差分であり、自車両１００におけるサスペンションの影響などによって生じ得る。この場合、外乱α，ΔＨを考慮した真の路面は次のように表わされる。
Ｙ＝（Ｈ＋ΔＨ）＋α＊Ｚ・・・（３） Next, the disturbances α and ΔH will be described. The disturbance α is the inclination α of the road surface, and may be caused by the influence of picking of the own vehicle 100 or the like. The disturbance ΔH is a height deviation of the camera 1, more accurately, a difference from the mounting height H of the camera 1, and may be caused by the influence of the suspension in the own vehicle 100 or the like. In this case, the true road surface considering the disturbances α and ΔH is expressed as follows.
Y = (H + ΔH) + α * Z ・・・ (3)

その他、外乱としてピッチ角を考慮してもよい。ピッチ角はＸＺ平面におけるカメラ１の回転移動であり、自車両１００のピッチングによって生じる。 In addition, the pitch angle may be considered as a disturbance. The pitch angle is the rotational movement of the camera 1 in the XZ plane, which is caused by the pitching of the own vehicle 100.

次に、最終的に求めたい値である物理量Ｚ，ｄＺ／ｄｔ，ｄ²Ｚ／ｄｔ²，Ｄｘ，Ｄｙについて説明する。 Next, the physical quantities Z, dZ / dt, d ² Z / dt ² , Dx, and Dy, which are the values to be finally obtained, will be described.

物理量Ｚはカメラ１から対象車両２００までの距離であり、上記（２），（３）式から算出される。
Ｚ＝（Ｈ＋ΔＨ）／（Γ_B－α）・・・（４）
なお、実際の距離は、上記（４）式の距離Ｚにカメラ１の焦点距離ｆ（既知の内部パラメタ）を乗じることで得られる。 The physical quantity Z is the distance from the camera 1 to the target vehicle 200, and is calculated from the above equations (2) and (3).
Z = (H + ΔH) / (Γ _B －α) ・・・ (4)
The actual distance can be obtained by multiplying the distance Z in the above equation (4) by the focal length f (known internal parameter) of the camera 1.

物理量ｄＺ／ｄｔ，ｄ²Ｚ／ｄｔ²はそれぞれ対象車両２００の速度および加速度であり、距離Ｚを離散微分することで算出される。 The physical quantities dZ / dt and d ² Z / dt ² are the velocities and accelerations of the target vehicle 200, respectively, and are calculated by discretely differentiating the distance Z.

物理量Ｄｙは対象車両２００の高さ（車高）であり、上記（１），（２），（４）式から算出される。
Ｄｙ＝（Γ_B－Γ_T）＊（Ｈ＋ΔＨ）／（Γ_B－α）・・・（５） The physical quantity Dy is the height (vehicle height) of the target vehicle 200, and is calculated from the above equations (1), (2), and (4).
Dy = (Γ _B -Γ _T ) * (H + ΔH) / (Γ _B -α) ・・・ (5)

物理量Ｄｘは対象車両２００の幅（車幅）であり、ピッチ角を考慮することで車高Ｄｙと同様に算出される。 The physical quantity Dx is the width (vehicle width) of the target vehicle 200, and is calculated in the same manner as the vehicle height Dy by considering the pitch angle.

以上説明したように、本実施形態における物理量Ｄｘ，Ｄｙ，Ｚ，ｄＺ／ｄｔ，ｄ²Ｚ／ｄｔ²は、いずれも観測Γおよび外乱α，ΔＨに基づいて直ちに算出可能である。 As described above, the physical quantities Dx, Dy, Z, dZ / dt, and d ² Z / dt ² in the present embodiment can all be calculated immediately based on the observed Γ and the disturbances α and ΔH.

図４Ａは、第１の実施形態に係る物理量推定部２の内部構成の一例を示すブロック図である。物理量推定部２は、観測推定部２１と、外乱推定部２２と、物理量算出部２３とを有する。なお、図４Ａでは、説明のために観測推定部２１および外乱推定部２２に分けているが、単一の畳み込みニューラルネットワークで構成することもできる。また、各部の一部または全部は、コンピュータのプロセッサが所定のプログラムを実行することによって実現されてもよい。 FIG. 4A is a block diagram showing an example of the internal configuration of the physical quantity estimation unit 2 according to the first embodiment. The physical quantity estimation unit 2 includes an observation estimation unit 21, a disturbance estimation unit 22, and a physical quantity calculation unit 23. In FIG. 4A, the observation estimation unit 21 and the disturbance estimation unit 22 are divided for the sake of explanation, but a single convolutional neural network may be used. Further, a part or all of each part may be realized by the processor of the computer executing a predetermined program.

観測推定部２１は、時刻ｔ－２～ｔ＋２における各画像から、後述する事前学習が行われた畳み込みニューラルネットワーク（例えば、パターン認識器）を利用して５時刻分の観測Γ（ｔ－２）～Γ（ｔ＋２）を推定するものであり、いわゆるブラックボックスとなっている。 The observation estimation unit 21 uses a convolutional neural network (for example, a pattern recognizer) that has been pre-learned to be described later from each image at times t-2 to t + 2, and observes Γ (t-2) for 5 hours. It estimates ~ Γ (t + 2) and is a so-called black box.

外乱推定部２２は、時刻ｔ－２～ｔ＋２における各画像から、後述する事前学習が行われた畳み込みニューラルネットワークを利用して５時刻分の外乱α（ｔ－２）～α（ｔ＋２），ΔＨ（ｔ－２）～ΔＨ（ｔ＋２）を推定する。 The disturbance estimation unit 22 uses a convolutional neural network that has been pre-learned, which will be described later, from each image at times t-2 to t + 2, and disturbs α (t-2) to α (t + 2), ΔH for 5 hours. (T-2) to ΔH (t + 2) are estimated.

物理量算出部２３は、観測推定部２１で推定された観測Γ（ｔ－２）～Γ（ｔ＋２）と、外乱推定部２２で推定された外乱α（ｔ－２）～α（ｔ＋２），ΔＨ（ｔ－２）～ΔＨ（ｔ＋２）から、上記（４），（５）式などを適用し、５時刻分の物理量Ｚ（ｔ－２）～Ｚ（ｔ＋２），Ｄｙ（ｔ－２）～Ｄｙ（ｔ＋２），Ｄｘ（ｔ－２）～Ｄｘ（ｔ＋２）を算出する。さらに、物理量算出部２３は距離Ｚを離散微分して速度ｄＺ（ｔ）／ｄｔおよび加速度ｄ²Ｚ（ｔ）／ｄｔ²を算出する。なお、速度ｄＺ（ｔ）／ｄｔおよび加速度ｄ²Ｚ（ｔ）／ｄｔ²は離散微分によって得られるため、５時刻分あるわけではない。 The physical quantity calculation unit 23 includes observations Γ (t-2) to Γ (t + 2) estimated by the observation estimation unit 21, and disturbances α (t-2) to α (t + 2), ΔH estimated by the disturbance estimation unit 22. From (t-2) to ΔH (t + 2), the above equations (4) and (5) are applied, and the physical quantities Z (t-2) to Z (t + 2), Dy (t-2) to 5 hours are applied. Dy (t + 2), Dx (t-2) to Dx (t + 2) are calculated. Further, the physical quantity calculation unit 23 discretely differentiates the distance Z to calculate the velocity dZ (t) / dt and the acceleration d ² Z (t) / dt ² . Since the velocity dZ (t) / dt and the acceleration d ² Z (t) / dt ² are obtained by discrete differentiation, they do not have five hours.

物理量算出部２３は、畳み込みニューラルネットワークを利用して推定を行う必要はなく、物理量を幾何計算によって「算出」するホワイトボックスとなっている。 The physical quantity calculation unit 23 does not need to perform estimation using a convolutional neural network, and is a white box that “calculates” the physical quantity by geometric calculation.

続いて、観測推定部２１および外乱推定部２２における畳み込みニューラルネットワークの事前の学習について説明する。畳み込みニューラルネットワークは複数段の畳み込み層および全結合層から構成される。畳み込み層は、前段の畳み込み層からの出力（初段の畳み込み層においては、入力される複数時刻の画像）に対してフィルタを適用して畳み込みを行う。フィルタには重みが設定されており、事前学習とはこの重みを最適化することである。 Next, prior learning of the convolutional neural network in the observation estimation unit 21 and the disturbance estimation unit 22 will be described. A convolutional neural network is composed of a plurality of convolutional layers and a fully connected layer. The convolution layer applies a filter to the output from the convolution layer in the previous stage (in the convolution layer in the first stage, an input image at a plurality of times) to perform convolution. Weights are set in the filter, and pre-learning is to optimize these weights.

図４Ｂは、学習時の物理量推定部２を説明する図である。本実施形態では、観測推定部２１、外乱推定部２２および物理量算出部２３のそれぞれに、コスト項Ｊ１，Ｊ２，Ｊ３を設定する。そして、これらの総和であるコスト関数Ｊが最小となるよう重みを最適化する。 FIG. 4B is a diagram illustrating a physical quantity estimation unit 2 at the time of learning. In the present embodiment, the cost terms J1, J2, and J3 are set in each of the observation estimation unit 21, the disturbance estimation unit 22, and the physical quantity calculation unit 23. Then, the weights are optimized so that the cost function J, which is the sum of these, is minimized.

観測推定部２１では、対象車両２００を囲う矩形（すなわち、観測Γ）の真値を予め得ることが容易であるため、教師あり学習を行うのが望ましい。よって、観測推定部２１には、５時刻分の画像と、そのそれぞれについて、矩形領域（すなわち、観測Γ）の真値が教師データとして入力される。そして、観測の推定精度を示すコスト項Ｊ１を定義する。例えば、観測Γの真値と、推定された観測Γとの差分の２乗の総和をコスト項Ｊ１とすることができる。 Since it is easy for the observation estimation unit 21 to obtain the true value of the rectangle (that is, the observation Γ) surrounding the target vehicle 200 in advance, it is desirable to perform supervised learning. Therefore, the image for 5 hours and the true value of the rectangular region (that is, the observation Γ) are input to the observation estimation unit 21 as teacher data for each of the images. Then, the cost term J1 indicating the estimation accuracy of the observation is defined. For example, the sum of the squares of the difference between the true value of the observed Γ and the estimated observed Γ can be the cost term J1.

なお、観測推定部２１の前段に推定器（不図示）を設け、５時刻分の画像のそれぞれを静止画としてみなして対象車両２００の矩形（観測）を推定し、これを初期値Γ０として観測推定部２１の全結合層に入力してもよい。この場合、観測推定部２１は初期値Γ０を補正して観測Γを推定することとなり、画像に複数の車両が含まれる場合にも適用可能となる。このような推定器を設けない場合、画像に複数の車両がある場合には画像の領域ごとに真値を設定すればよい。 An estimator (not shown) is provided in front of the observation estimation unit 21, and each of the images for 5 hours is regarded as a still image to estimate the rectangle (observation) of the target vehicle 200, and this is observed as the initial value Γ0. It may be input to the fully connected layer of the estimation unit 21. In this case, the observation estimation unit 21 corrects the initial value Γ0 and estimates the observation Γ, which can be applied even when a plurality of vehicles are included in the image. If such an estimator is not provided and there are a plurality of vehicles in the image, the true value may be set for each area of the image.

外乱推定部２２では、外乱α，ΔＨの真値を予め得るのが困難であるため、教師なし学習を行うこととなる。よって、外乱推定部２２には教師データは入力されない。代わりに、本実施形態では、以下のような外乱α，ΔＨに関する事前知識を利用することとする。 Since it is difficult for the disturbance estimation unit 22 to obtain the true values of the disturbances α and ΔH in advance, unsupervised learning is performed. Therefore, the teacher data is not input to the disturbance estimation unit 22. Instead, in this embodiment, the following prior knowledge about disturbances α and ΔH will be used.

路面の傾きαの分布は、平均値０、標準偏差σ_αの正規分布に従うものとする。平均値０とするのは、自車両１００のピッチングや路面形状の変化が正側にも負側にも現れ得るためである。標準偏差σ_αは、現実的な路面の傾きを考慮すると１／１００オーダ程度が妥当であって、１／１０オーダ以上でないことは経験から自明である。 The distribution of the slope α of the road surface shall follow the normal distribution with a mean value of 0 and a standard deviation of σ _α . The reason why the average value is set to 0 is that the pitching of the own vehicle 100 and the change in the road surface shape can appear on both the positive side and the negative side. It is obvious from experience that the standard deviation σ _α is about 1/100 order, which is appropriate considering the realistic slope of the road surface, and is not more than 1/10 order.

カメラ１の高さずれΔＨの分布は、平均値０、標準偏差σ_Hの正規分布に従うものとする。平均値０とするのは、ΔＨが元々カメラ１の取り付け高さからのずれとして定義されているためである。標準偏差σ_Hは、設計者の事前知識を反映して設定されるべき値であり、１ｃｍオーダ程度が妥当であって、１ｍオーダ以上でないことは経験から自明である。 It is assumed that the distribution of the height deviation ΔH of the camera 1 follows a normal distribution with a mean value of 0 and a standard deviation of σ _H. The average value of 0 is set because ΔH is originally defined as a deviation from the mounting height of the camera 1. The standard deviation σ _H is a value that should be set reflecting the prior knowledge of the designer, and it is obvious from experience that about 1 cm order is appropriate and not more than 1 m order.

そして、外乱の推定精度を示すコスト項Ｊ２を定義する。例えば、推定された５時刻分の外乱α，ΔＨのそれぞれの分布が、上記事前知識に基づく正規分布に近いほど小さな値をとるコスト項Ｊ２を設定する。より具体的には、外乱α，ΔＨの分布と、上記事前知識に基づく正規分布とのカルバックライブラー距離（ＫＬ距離）の対数の総和をコスト項Ｊ２とすることができる。別の例として、推定された５時刻分の外乱α，ΔＨが事前知識に基づく平均値（ここでは０）に近いほど小さな値をとるコスト項Ｊ２を設定してもよい。より具体的には、推定された外乱α，ΔＨの２乗または絶対値（平均値０であるため）の総和をコスト項Ｊ２としてもよい。 Then, the cost term J2 indicating the estimation accuracy of the disturbance is defined. For example, a cost term J2 is set in which the distributions of the estimated disturbances α and ΔH for 5 hours are smaller as they are closer to the normal distribution based on the above prior knowledge. More specifically, the sum of the logarithms of the Kullback-Leibler distance (KL distance) between the distributions of disturbances α and ΔH and the normal distribution based on the above prior knowledge can be set as the cost term J2. As another example, a cost term J2 may be set in which the closer the estimated 5 hours of disturbance α, ΔH is to the mean value (here, 0) based on prior knowledge, the smaller the value. More specifically, the sum of the squares of the estimated disturbances α and ΔH or the absolute value (because the average value is 0) may be the cost term J2.

物理量算出部２３では、物理量の真値を事前に得るのが困難であるため、教師なし学習を行うこととなる。よって、物理量算出部２３には教師データは入力されない。代わりに、本実施形態では、以下のような物理量ｄ²Ｚ／ｄｔ²，Ｄｙ，Ｄｘに関する事前知識を利用することとする。 Since it is difficult for the physical quantity calculation unit 23 to obtain the true value of the physical quantity in advance, unsupervised learning is performed. Therefore, the teacher data is not input to the physical quantity calculation unit 23. Instead, in this embodiment, the following prior knowledge about the physical quantities d ² Z / dt ² , Dy, and Dx is used.

加速度ｄ²Ｚ／ｄｔ²の分布は、平均値０、標準偏差σ_Z2の正規分布に従うものとする。平均値０とするのは、加速と減速とが同頻度と考えられるためである。標準偏差σ_Z2は、現実的な自車両１００および対象車両２００の加速度を考慮すると、０．０１Ｇ（Ｇは重力加速度）オーダ程度が妥当であって、０．１Ｇオーダ以上でないことは経験から自明である。 The distribution of acceleration d ² Z / dt ² shall follow a normal distribution with mean value 0 and standard deviation σ _{Z 2} . The average value is set to 0 because acceleration and deceleration are considered to have the same frequency. It is self-evident from experience that the standard deviation σ _Z2 is appropriate on the order of 0.01 G (G is gravitational acceleration) and not more than 0.1 G, considering the realistic acceleration of the own vehicle 100 and the target vehicle 200. Is.

車高Ｄｙおよび車幅Ｄｘの分布は、平均値が時系列平均（その対象車両２００の車高および車幅）であり、標準偏差σ_Dy，σ_Dxの正規分布に従うものとする。標準偏差σ_Dy，σ_Dxは、設計者の事前知識を反映して設定されるべき値であり、１ｃｍオーダ程度が妥当であって、１ｍオーダ以上でないことは経験から自明である。 The distribution of the vehicle height Dy and the vehicle width Dx is assumed that the average value is a time-series average (the vehicle height and the vehicle width of the target vehicle 200) and follows the normal distribution of the standard deviations σ _Dy and σ _Dx . The standard deviations σ _Dy and σ _Dx are values that should be set reflecting the prior knowledge of the designer, and it is obvious from experience that about 1 cm order is appropriate and not more than 1 m order.

そして、物理量の算出精度を示すコスト項Ｊ３を定義する。例えば、算出された物理量ｄ²Ｚ／ｄｔ²および５時刻分の物理量Ｄｙ，Ｄｘのそれぞれの分布が、上記事前知識に基づく正規分布に近いほど小さな値となるコスト項Ｊ３を設定する。より具体的には、物理量ｄ²Ｚ／ｄｔ²，Ｄｙ，Ｄｘの分布と、上記事前知識に基づく正規分布とのカルバックライブラー距離（ＫＬ距離）の対数の総和をコスト項Ｊ３とすることができる。別の例として、標準偏差σ_z2，σ_Dy，σ_Dxを０と考え、推定された物理量ｄ²Ｚ／ｄｔ²，Ｄｙ，Ｄｘの推定値が事前知識に基づく平均値に近いほど小さな値をとるコスト項Ｊ３を設定してもよい。より具体的には、算出された物理量ｄ²Ｚ／ｄｔ²，Ｄｙ，Ｄｘと、それぞれの平均値との差分の２乗または絶対値の総和をコスト項Ｊ３としてもよい。あるいは、走行用テストコースで、高精度ＰＧＳあるいはミリ波レーダを用いた計測で加速度の真値を得られるのであれば、真値がある画像について教師あり学習を、真値がない画像について教師なし学習を行ってもよい。 Then, a cost term J3 indicating the accuracy of calculating the physical quantity is defined. For example, a cost term J3 is set in which the calculated physical quantities d ² Z / dt ² and the distributions of the physical quantities Dy and Dx for 5 hours become smaller as they are closer to the normal distribution based on the above prior knowledge. More specifically, the sum of the logarithms of the Kullback-Leibler distance (KL distance) between the distribution of the physical quantities d ² Z / dt ² , Dy, and Dx and the normal distribution based on the above prior knowledge can be set as the cost term J3. can. As another example, consider the standard deviations σ _z2 , σ _Dy , and σ _Dx as 0, and the smaller the estimated values of the estimated physical quantities d ² Z / dt ² , Dy, and Dx are, the closer they are to the mean value based on prior knowledge. The cost term J3 to be taken may be set. More specifically, the sum of the squares or absolute values of the differences between the calculated physical quantities d ² Z / dt ² , Dy, and Dx and their respective average values may be used as the cost term J3. Alternatively, if the true value of acceleration can be obtained by measurement using high-precision PGS or millimeter-wave radar on a driving test course, supervised learning is performed for images with true values, and unsupervised learning is performed for images without true values. You may study.

また、距離Ｚや速度ｄＺ／ｄｔについても、真値が得られれば教師あり学習を行い、得られないのであれば事前知識に基づく教師なし学習を行うようにしてもよい。 Further, for the distance Z and the velocity dZ / dt, supervised learning may be performed if a true value is obtained, and unsupervised learning based on prior knowledge may be performed if the true value cannot be obtained.

以上のコスト項Ｊ１～Ｊ３の総和をコスト関数とし、コスト関数が最小となるよう重みを最適化する。具体的には、ニューラルネットワーク学習における標準的手法である、ミニバッチ確率的勾配降下法を適用できる。ミニバッチとは、最適化の各反復において、複数個のランダムサンプリングされた学習標本を指す。本実施形態では、連続する５時刻分の画像が１つの学習標本である。ミニバッチサイズ（１ミニバッチ内の学習標本数）は十分に大きい。 The sum of the above cost terms J1 to J3 is used as the cost function, and the weight is optimized so that the cost function is minimized. Specifically, the mini-batch stochastic gradient descent method, which is a standard method in neural network learning, can be applied. A mini-batch refers to a plurality of randomly sampled learning samples at each iteration of optimization. In the present embodiment, images for five consecutive hours are one learning sample. The mini-batch size (the number of learning samples in one mini-batch) is large enough.

観測推定部２１においては、推定される観測Γが真値に近づくだけでなく、物理量算出部２３によって算出される物理量の分布が事前知識に従うよう、内部の重みが最適化される。同様に、外乱推定部２２においては、推定される外乱α，ΔＨが事前知識に従うだけでなく、物理量算出部２３によって算出される物理量の分布が事前知識に従うよう、内部の重みが最適化される。 In the observation estimation unit 21, not only the estimated observation Γ approaches the true value, but also the internal weight is optimized so that the distribution of the physical quantity calculated by the physical quantity calculation unit 23 follows the prior knowledge. Similarly, in the disturbance estimation unit 22, not only the estimated disturbances α and ΔH follow the prior knowledge, but also the internal weights are optimized so that the distribution of the physical quantity calculated by the physical quantity calculation unit 23 follows the prior knowledge. ..

その結果、観測推定部２１は真値に近い観測Γを推定できるようになり、かつ、外乱推定部２２は事前知識に基づく確率分布に程よく従う外乱α，ΔＨを推定できるようになり、かつ、物理量算出部２３は事前知識に基づく確率分布に程よく従う物理量ｄ²Ｚ／ｄｔ²，Ｄｙ，Ｄｘを算出できるようになる。このことは、例えば路面の傾きαが１／１０程度になるとか、加速度ｄ²Ｚ／ｄｔ²が０．１Ｇ程度になるといった、物理的にあり得ない推定結果が得られにくくなり、物理的に意味のある結果が得られるようになることを意味する。 As a result, the observation estimation unit 21 can estimate the observation Γ that is close to the true value, and the disturbance estimation unit 22 can estimate the disturbances α and ΔH that appropriately follow the probability distribution based on the prior knowledge. The physical quantity calculation unit 23 can calculate physical quantities d ² Z / dt ² , Dy, and Dx that appropriately follow a probability distribution based on prior knowledge. This makes it difficult to obtain physically impossible estimation results such as the slope α of the road surface becoming about 1/10 and the acceleration d ² Z / dt ² becoming about 0.1 G, which is physically difficult. It means that you will be able to obtain meaningful results.

図５は、本手法による加速度の推定結果と従来手法による加速度の推定結果とを比較するグラフである。横軸は対象までの真の距離であり、縦軸は推定された加速度の２乗平方平均値（Root Mean Square、単位は重力加速度Ｇ）である。なお、観測Γは推定値ではなく、真値を与えている。 FIG. 5 is a graph comparing the acceleration estimation result by the present method and the acceleration estimation result by the conventional method. The horizontal axis is the true distance to the object, and the vertical axis is the root mean square value (Root Mean Square, unit is gravity acceleration G) of the estimated acceleration. Note that the observed Γ gives the true value, not the estimated value.

従来手法によれば、加速度の絶対値が異常に大きく（通常の加速度はせいぜい０．２Ｇ程度のはずである）、明らかに誤った値が頻繁に得られることが分かる。これは、観測Γが理想的であっても、必ずしも正確な物理量が得られるわけではないことを示している。 According to the conventional method, it can be seen that the absolute value of the acceleration is abnormally large (normal acceleration should be about 0.2 G at most), and clearly wrong values are frequently obtained. This indicates that even if the observed Γ is ideal, it does not necessarily give an accurate physical quantity.

一方、本手法によれば、対象までの距離が長い場合であっても、加速度は従来手法より十分に低く、より現実に近い結果が得られている。 On the other hand, according to this method, even when the distance to the target is long, the acceleration is sufficiently lower than that of the conventional method, and the result closer to reality is obtained.

図６は、本手法による距離の推定結果と従来手法による距離の推定結果とを比較するグラフである。横軸は対象までの真の距離であり、縦軸は推定された距離と真の距離との誤差の２乗平方平均値（Root Mean Square Error）である。 FIG. 6 is a graph comparing the distance estimation result by the present method and the distance estimation result by the conventional method. The horizontal axis is the true distance to the object, and the vertical axis is the root mean square error of the error between the estimated distance and the true distance.

従来手法によれば、誤差は大きく、特に対象までの距離が離れるほど誤差が大きいことが分かる。一方、本手法によれば、従来手法より正確に距離が得られていることが分かる。 According to the conventional method, the error is large, and it can be seen that the error is particularly large as the distance to the target increases. On the other hand, according to this method, it can be seen that the distance is obtained more accurately than the conventional method.

このように、第１の実施形態では、連続する複数の画像から推定される加速度を、事前知識を用いて学習する。そのため、事前知識が反映され、得られる物理量が物理的により現実に近い高精度なものとなる。また、事前に真値を用意することが困難な外乱も事前知識を用いて学習することで、推定精度がさらに向上する。 As described above, in the first embodiment, the acceleration estimated from a plurality of continuous images is learned by using prior knowledge. Therefore, prior knowledge is reflected, and the obtained physical quantity becomes physically closer to reality and highly accurate. In addition, by learning using prior knowledge even for disturbances for which it is difficult to prepare true values in advance, the estimation accuracy is further improved.

（第２の実施形態）
次に説明する第２の実施形態は、外乱を考慮しないものである。以下、第１の実施形態との相違点を中心に説明する。 (Second embodiment)
The second embodiment described below does not consider disturbance. Hereinafter, the differences from the first embodiment will be mainly described.

図７は、第２の実施形態における観測および物理量のモデルを説明する図である。座標の定義は図３と共通するが、路面が傾くことは考慮しない。 FIG. 7 is a diagram illustrating a model of observation and physical quantity in the second embodiment. The definition of coordinates is the same as in FIG. 3, but the inclination of the road surface is not taken into consideration.

本実施形態では、第１の実施形態の観測Γ_T，Γ_B，Γ_R，Γ_Lに加え、観測Γ_Cを次のように定義する。観測Γ_Cは、画像上で、カメラ１と、対象車両２００を囲う矩形において路面からカメラ１の取り付け高さＨだけ鉛直上方向に移動した点とを通る直線を示すカメラ高レイである。すなわち、以下の式が成立する。
Ｙ＝Γ_C＊Ｚ・・・（６） In this embodiment, in addition to the observations Γ _T , Γ _B , Γ _R , and Γ _L of the first embodiment, the observation Γ _C is defined as follows. The observation Γ _C is a camera height ray that shows a straight line passing through the camera 1 and a point that moves vertically upward by the mounting height H of the camera 1 from the road surface in a rectangle surrounding the target vehicle 200 on the image. That is, the following equation holds.
Y = Γ _C * Z ・・・ (6)

この場合でも、物理量である距離Ｚおよび車高Ｄｙは次のように「算出」される。
Ｚ＝Ｈ／（Γ_B－Γ_C）・・・（７）
Ｄｙ＝（Γ_B－Γ_T）＊Ｈ／（Γ_B－Γ_C）・・・（８）
同様にして、車幅Ｄｘや、車速ｄＺ／ｄｔ、加速度ｄ²Ｚ／ｄｔ²も算出される。 Even in this case, the distance Z and the vehicle height Dy, which are physical quantities, are "calculated" as follows.
Z = H / (Γ _B -Γ _C ) ・・・ (7)
Dy = (Γ _B -Γ _T ) * H / (Γ _B -Γ _C ) ・・・ (8)
Similarly, the vehicle width Dx, the vehicle speed dZ / dt, and the acceleration d ² Z / dt ² are also calculated.

図８は、第２の実施形態に係る物理量推定部２の内部構成の一例を示すブロック図である。本実施形態では外乱を考慮しないため、図４と比較すると外乱推定部２２を設けなくてもよい。観測推定部２１および物理量算出部２３の動作は第１の実施形態とほぼ同様である。すなわち、観測推定部２１では、観測Γ_Cの真値も用いて教師あり学習が行われる。物理量算出部２３では、事前知識を用いた教師なし学習が行われる。 FIG. 8 is a block diagram showing an example of the internal configuration of the physical quantity estimation unit 2 according to the second embodiment. Since the disturbance is not considered in the present embodiment, the disturbance estimation unit 22 may not be provided as compared with FIG. The operations of the observation estimation unit 21 and the physical quantity calculation unit 23 are substantially the same as those of the first embodiment. That is, in the observation estimation unit 21, supervised learning is performed using the true value of the observation Γ _C. In the physical quantity calculation unit 23, unsupervised learning using prior knowledge is performed.

このように、第２の実施形態でも、連続する複数の画像から推定される加速度を事前知識を用いて学習する。そのため、外乱を考慮しなくても、事前知識が反映されて、推定精度が向上する。 As described above, also in the second embodiment, the acceleration estimated from a plurality of continuous images is learned by using prior knowledge. Therefore, prior knowledge is reflected and the estimation accuracy is improved without considering the disturbance.

上述した各実施形態において、物理量算出部２３は、加速度ｄ²Ｚ／ｄｔ²に限らず、他の物理量（車高Ｄｙや車幅Ｄｘといった対象車両２００の大きさ、距離Ｚ、速度ｄＺ／ｄｔ、ジャークｄ³Ｚ／ｄｔ³（あるいはさらに高次の離散微分））の少なくとも１つを事前知識を用いて学習することにより、物理量の推定精度が向上する。ただし、衝突の危険判断などに有用であり、かつ、推定誤差が大きくなりがちな加速度ｄ²Ｚ／ｄｔ²を事前知識を用いて学習し、推定するのが特に望ましい。 In each of the above-described embodiments, the physical quantity calculation unit 23 is not limited to the acceleration d ² Z / dt ² , but other physical quantities (size, distance Z, speed dZ / dt of the target vehicle 200 such as vehicle height Dy and vehicle width Dx). , Jerk d ³ Z / dt ³ (or higher-order discrete derivative)) is learned using prior knowledge to improve the estimation accuracy of physical quantities. However, it is particularly desirable to learn and estimate the acceleration d ² Z / dt ² which is useful for determining the danger of collision and tends to have a large estimation error by using prior knowledge.

上述した実施形態は、本発明が属する技術分野における通常の知識を有する者が本発明を実施できることを目的として記載されたものである。上記実施形態の種々の変形例は、当業者であれば当然になしうることであり、本発明の技術的思想は他の実施形態にも適用しうることである。したがって、本発明は、記載された実施形態に限定されることはなく、特許請求の範囲によって定義される技術的思想に従った最も広い範囲とすべきである。 The above-described embodiments have been described for the purpose of allowing a person having ordinary knowledge in the technical field to which the present invention belongs to carry out the present invention. Various modifications of the above embodiment can be naturally made by those skilled in the art, and the technical idea of the present invention can be applied to other embodiments. Accordingly, the invention is not limited to the described embodiments and should be the broadest scope according to the technical ideas defined by the claims.

１カメラ
２物理量推定部
３時系列フィルタ
２１観測推定部
２２外乱推定部
２３物理量算出部
１００自車両
２００対象車両 1 Camera 2 Physical quantity estimation unit 3 Time series filter 21 Observation estimation unit 22 Disturbance estimation unit 23 Physical quantity calculation unit 100 Own vehicle 200 Target vehicle

Claims

It is a learning method of a neural network in an image recognition system that estimates observations about the appearance of an object included in each of a plurality of images using a neural network and calculates the physical quantity of the object from the estimated observations.
Using the true value for the observation and the prior knowledge about the physical quantity, the sum of the first cost term indicating the estimation accuracy of the observation and the second cost term indicating the calculation accuracy of the physical quantity is minimized. , A neural network learning method that optimizes the weights in the neural network.

The second cost term is the method for learning a neural network according to claim 1 , wherein the calculated physical quantity takes a smaller value as it is closer to a value based on prior knowledge.

The second cost term is the method for learning a neural network according to claim 1 , wherein the calculated distribution of the physical quantity takes a smaller value as it is closer to the distribution based on prior knowledge.

It is a learning method of a neural network in an image recognition system that estimates observations about the appearance of an object included in each of a plurality of images using a neural network and calculates the physical quantity of the object from the estimated observations.
The image recognition system estimates a disturbance from each of the plurality of images, calculates the physical quantity of the object from the estimated observation and the disturbance, and calculates the physical quantity of the object.
A method for learning a neural network that optimizes weights in the neural network by using the true value for the observation, the prior knowledge about the physical quantity, and the prior knowledge about the disturbance .

The neural network so that the sum of the first cost term indicating the estimation accuracy of the observation, the second cost term indicating the calculation accuracy of the physical quantity, and the third cost term indicating the estimation accuracy of the disturbance is minimized. The method for learning a neural network according to claim 4 , wherein the weights in the above are optimized.

The third cost term is the method for learning a neural network according to claim 5 , wherein the estimated disturbance takes a smaller value as it approaches a value based on prior knowledge.

The third cost term is the method for learning a neural network according to claim 5 , wherein the estimated distribution of the disturbance takes a smaller value as it is closer to the distribution based on prior knowledge.

13. How to learn a neural network.

The method for learning a neural network according to any one of claims 1 to 8, wherein the physical quantity is jerk or jerk.

The method for learning a neural network according to any one of claims 1 to 9 , wherein the observation regarding the appearance of the object is a rectangle surrounding the object in the plurality of images.

It is an image recognition system that estimates the observation of the appearance of an object contained in each of a plurality of images using a neural network and calculates the physical quantity of the object from the estimated observation.
Using the true value for the observation and the prior knowledge about the physical quantity, the sum of the first cost term indicating the estimation accuracy of the observation and the second cost term indicating the calculation accuracy of the physical quantity is minimized. , An image recognition system that optimizes weights in the neural network.

It is an image recognition system that estimates the observation of the appearance of an object contained in each of a plurality of images using a neural network and calculates the physical quantity of the object from the estimated observation.
The image recognition system estimates a disturbance from each of the plurality of images, calculates the physical quantity of the object from the estimated observation and the disturbance, and calculates the physical quantity of the object.
An image recognition system that optimizes weights in a neural network using the true value for the observation, the prior knowledge about the physical quantity, and the prior knowledge about the disturbance.