JP2021148730A

JP2021148730A - Position estimation method, position estimation device, and program

Info

Publication number: JP2021148730A
Application number: JP2020051545A
Authority: JP
Inventors: 崚伊藤; Shun Ito
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2020-03-23
Filing date: 2020-03-23
Publication date: 2021-09-27

Abstract

To provide a position estimation method and the like capable of accurately estimating the position of an object even when using image data captured by a monocular camera.SOLUTION: A position estimation method includes the steps of: acquiring image data including an object from a monocular camera 20(S101); converting first coordinates of the object in the image data into second coordinates in the real world (S103); calculating the size of the object based on the second coordinates (S104); determining whether or not the size is included in a predetermined range according to the object (S105); and when the size is not included in the specified range (No in S105), converting again the first coordinates into third coordinates in the real world based on representative values within the specified range (S108).SELECTED DRAWING: Figure 8

Description

本開示は、位置推定方法、位置推定装置、及び、プログラムに関する。 The present disclosure relates to position estimation methods, position estimation devices, and programs.

近年、運転中の事故防止のために、衝突被害低減ブレーキを搭載する車両が増えており、今後もさらに増えることが予測される。このような衝突被害低減ブレーキを実現するために、車載カメラ等が撮像した画像データを用いて、車両周囲の物体を検知する物体検知装置が知られている。車両は、物体検知装置が物体を検知した結果に基づいて走行が制御されるので、物体検知装置の検知精度は高いことが望まれる。 In recent years, the number of vehicles equipped with collision damage reduction brakes has increased in order to prevent accidents while driving, and it is expected that the number will increase in the future. In order to realize such a collision damage reduction brake, an object detection device that detects an object around a vehicle by using image data captured by an in-vehicle camera or the like is known. Since the traveling of the vehicle is controlled based on the result of the object detection device detecting the object, it is desired that the detection accuracy of the object detection device is high.

特許文献１には、３次元形状が既知の対象物を撮像した複数の画像データ（２次元画像データ）から３次元物体の位置を推定する３次元物体認識装置が開示されている。３次元物体認識装置は、３次元形状をワイヤーフレームデータとして保持しており、当該ワイヤーフレームデータに対して撮影した実画像を貼り付けることで、仮想的な実画像を生成する。 Patent Document 1 discloses a three-dimensional object recognition device that estimates the position of a three-dimensional object from a plurality of image data (two-dimensional image data) obtained by imaging an object having a known three-dimensional shape. The three-dimensional object recognition device holds the three-dimensional shape as wire frame data, and generates a virtual real image by pasting the captured real image on the wire frame data.

特許第３４５６０２９号公報Japanese Patent No. 3456029

しかしながら、特許文献１の技術では、単眼カメラで撮像した画像データに基づいて対象物の位置を推定するときに、対象物の位置を正確に推定できない場合がある。 However, in the technique of Patent Document 1, when the position of the object is estimated based on the image data captured by the monocular camera, the position of the object may not be estimated accurately.

そこで、本開示は、単眼カメラで撮像した画像データを用いた場合であっても、対象物の位置を正確に推定することができる位置推定方法、位置推定装置、及び、プログラムに関する。 Therefore, the present disclosure relates to a position estimation method, a position estimation device, and a program capable of accurately estimating the position of an object even when using image data captured by a monocular camera.

本開示の一態様に係る位置推定方法は、単眼カメラから対象物を含む画像データを取得し、前記画像データにおける前記対象物の第１座標を、実世界における第２座標に変換し、前記第２座標に基づいて、前記対象物の大きさを算出し、前記大きさが前記対象物に応じた所定範囲に含まれるか否かを判定し、前記大きさが前記所定範囲に含まれない場合、前記所定範囲内の代表値に基づいて、前記第１座標を実世界における第３座標に変換しなおす。 In the position estimation method according to one aspect of the present disclosure, image data including an object is acquired from a monocular camera, the first coordinates of the object in the image data are converted into second coordinates in the real world, and the first coordinate is converted. When the size of the object is calculated based on the two coordinates, it is determined whether or not the size is included in the predetermined range according to the object, and the size is not included in the predetermined range. , The first coordinate is converted back to the third coordinate in the real world based on the representative value within the predetermined range.

本開示の一態様に係る位置推定装置は、単眼カメラから対象物を含む画像データを取得する取得部と、前記画像データにおける前記対象物の第１座標を、実世界における第２座標に変換する第１変換部と、前記第２座標に基づいて、前記対象物の大きさを算出する算出部と、前記大きさが前記対象物に応じた所定範囲に含まれるか否かを判定する判定部と、前記大きさが前記所定範囲に含まれない場合、前記所定範囲内の代表値に基づいて、前記第１座標を第３座標に変換しなおす第２変換部とを備える。 The position estimation device according to one aspect of the present disclosure is an acquisition unit that acquires image data including an object from a monocular camera, and converts the first coordinates of the object in the image data into second coordinates in the real world. A first conversion unit, a calculation unit that calculates the size of the object based on the second coordinates, and a determination unit that determines whether or not the size is included in a predetermined range corresponding to the object. And a second conversion unit that reconverts the first coordinate to the third coordinate based on the representative value within the predetermined range when the size is not included in the predetermined range.

本開示の一態様に係るプログラムは、上記の位置推定方法をコンピュータに実行させるためのプログラムである。 The program according to one aspect of the present disclosure is a program for causing a computer to execute the above-mentioned position estimation method.

本開示の一態様に係る位置推定方法等によれば、単眼カメラで撮像した画像データを用いた場合であっても、対象物の位置を正確に推定することができる。 According to the position estimation method or the like according to one aspect of the present disclosure, the position of the object can be accurately estimated even when the image data captured by the monocular camera is used.

図１Ａは、比較例に係る位置推定装置における対象物の位置の推定を説明するための第１図である。FIG. 1A is a diagram for explaining the estimation of the position of an object in the position estimation device according to the comparative example. 図１Ｂは、比較例に係る位置推定装置における対象物の位置の推定を説明するための第２図である。FIG. 1B is a second diagram for explaining the estimation of the position of the object in the position estimation device according to the comparative example. 図２は、実施の形態１に係る位置推定装置を備える車両の機能構成を示すブロック図である。FIG. 2 is a block diagram showing a functional configuration of a vehicle including the position estimation device according to the first embodiment. 図３は、実施の形態１に係る検知部による対象物の検知結果を示す図である。FIG. 3 is a diagram showing a detection result of an object by the detection unit according to the first embodiment. 図４は、実施の形態１に係る第１変換部の座標変換を説明するための図である。FIG. 4 is a diagram for explaining the coordinate conversion of the first conversion unit according to the first embodiment. 図５は、実施の形態１に係る所定範囲を説明するための図である。FIG. 5 is a diagram for explaining a predetermined range according to the first embodiment. 図６は、実施の形態１に係る過去のフレームにおける対象物の大きさの推定結果を示す図である。FIG. 6 is a diagram showing an estimation result of the size of the object in the past frame according to the first embodiment. 図７は、実施の形態１に係る位置推定装置が出力する位置情報の一例を示す図である。FIG. 7 is a diagram showing an example of position information output by the position estimation device according to the first embodiment. 図８は、実施の形態１に係る位置推定装置の動作を示すフローチャートである。FIG. 8 is a flowchart showing the operation of the position estimation device according to the first embodiment. 図９は、実施の形態２に係る記憶部が記憶する対象物それぞれの大きさの標準値を含むテーブルである。FIG. 9 is a table including standard values of the sizes of the objects stored in the storage unit according to the second embodiment. 図１０は、実施の形態２に係る位置推定装置の動作を示すフローチャートである。FIG. 10 is a flowchart showing the operation of the position estimation device according to the second embodiment.

（本開示に至った経緯）
近年、車載カメラ等が撮像した画像データを用いて、車両周囲の対象物を検知する物体検知装置について、様々な検討が行われている。例えば、単眼カメラで撮像した画像データ（後述する図３参照）に基づいて、対象物の位置を推定する検討が行われている。単眼カメラを用いて対象物の位置を推定することにより、車両が複数のカメラを備えていなくても、対象物の位置を推定することができる。つまり、より低コストで対象物の位置を推定することができる。 (Background to this disclosure)
In recent years, various studies have been conducted on an object detection device that detects an object around a vehicle by using image data captured by an in-vehicle camera or the like. For example, studies are being conducted to estimate the position of an object based on image data captured by a monocular camera (see FIG. 3 described later). By estimating the position of the object using a monocular camera, the position of the object can be estimated even if the vehicle does not have a plurality of cameras. That is, the position of the object can be estimated at a lower cost.

単眼カメラで撮像した画像データに基づいて、対象物の位置を推定することについて、図１Ａ及び図１Ｂを参照しながら説明する。図１Ａは、比較例に係る位置推定装置における対象物の位置の推定を説明するための第１図である。図１Ａは、単眼カメラ２０を備える車両１０の前方に道路Ｌ（地面）と接触している歩行者Ｕがいる場合を示している。また、車両１０は、道路Ｌに接している。図１Ａでは、車両１０が接している平面と同じ平面に歩行者Ｕが接している場合を示しているとも言える。歩行者Ｕは、対象物の一例である。 The estimation of the position of the object based on the image data captured by the monocular camera will be described with reference to FIGS. 1A and 1B. FIG. 1A is a diagram for explaining the estimation of the position of an object in the position estimation device according to the comparative example. FIG. 1A shows a case where a pedestrian U in contact with a road L (ground) is in front of a vehicle 10 equipped with a monocular camera 20. Further, the vehicle 10 is in contact with the road L. It can be said that FIG. 1A shows a case where the pedestrian U is in contact with the same plane as the plane in which the vehicle 10 is in contact. Pedestrian U is an example of an object.

図１Ａに示すように、車両１０の単眼カメラ２０は、前方にいる歩行者Ｕを撮像する。車両１０が備える位置推定装置（図示しない）は、単眼カメラ２０が撮像した画像データに基づいて、当該歩行者Ｕの位置を推定する。ここで、位置推定装置は、撮像した画像データに写る歩行者Ｕが、車両１０が接している平面と同じ平面にいることを前提として当該歩行者Ｕの位置を推定する。具体的には、位置推定装置は、歩行者Ｕと、車両１０とが同じ平面にいることを前提として、歩行者Ｕの画像データ上の座標を、実世界の座標に変換する。このような前提は、単眼カメラ２０を用いて位置を推定する場合に用いられる。なお、画像データ上の座標は、カメラ座標系の座標であり、実世界の座標は直交座標系の座標である。 As shown in FIG. 1A, the monocular camera 20 of the vehicle 10 images a pedestrian U in front of the vehicle 10. The position estimation device (not shown) included in the vehicle 10 estimates the position of the pedestrian U based on the image data captured by the monocular camera 20. Here, the position estimation device estimates the position of the pedestrian U on the premise that the pedestrian U shown in the captured image data is on the same plane as the plane in which the vehicle 10 is in contact. Specifically, the position estimation device converts the coordinates on the image data of the pedestrian U into the coordinates in the real world on the assumption that the pedestrian U and the vehicle 10 are on the same plane. Such a premise is used when estimating the position using the monocular camera 20. The coordinates on the image data are the coordinates of the camera coordinate system, and the coordinates in the real world are the coordinates of the Cartesian coordinate system.

図１Ａの場合、歩行者Ｕが道路Ｌに接触しているという状態と、位置推定装置における座標変換の前提とが一致しているので、位置推定装置は、歩行者Ｕの位置を正確に推定することができる。位置推定装置は、位置Ｐ１に歩行者Ｕがいると推定することができる。この場合、道路Ｌに接触している歩行者Ｕが単眼カメラ２０の画像平面上に撮像される。なお、画像平面とは、単眼カメラ２０の撮像面である。 In the case of FIG. 1A, since the state in which the pedestrian U is in contact with the road L and the premise of the coordinate conversion in the position estimation device match, the position estimation device accurately estimates the position of the pedestrian U. can do. The position estimation device can estimate that the pedestrian U is at the position P1. In this case, the pedestrian U in contact with the road L is imaged on the image plane of the monocular camera 20. The image plane is an imaging surface of the monocular camera 20.

次に、歩行者Ｕと、車両１０とが同じ平面にいない場合について説明する。図１Ｂは、比較例に係る位置推定装置における対象物の位置の推定を説明するための第２図である。図１Ｂは、単眼カメラ２０を備える車両１０の前方に道路Ｌ（地面）と接触していない歩行者Ｕがいる場合を示している。図１Ｂでは、歩行者Ｕがジャンプしている場合を示している。 Next, a case where the pedestrian U and the vehicle 10 are not on the same plane will be described. FIG. 1B is a second diagram for explaining the estimation of the position of the object in the position estimation device according to the comparative example. FIG. 1B shows a case where a pedestrian U who is not in contact with the road L (ground) is in front of the vehicle 10 equipped with the monocular camera 20. FIG. 1B shows a case where the pedestrian U is jumping.

図１Ｂの場合、歩行者Ｕが道路Ｌに接触していないという状態と、位置推定装置における座標変換の前提とが一致していないので、位置推定装置は、歩行者Ｕの位置を正確に推定することができない。つまり、位置推定装置は、歩行者Ｕの位置の推定を誤ってしまう。位置推定装置は、歩行者Ｕと、車両１０とが同じ平面にいない場合であっても、歩行者Ｕと、車両１０とが同じ平面にいることを前提として歩行者Ｕの位置を推定するためである。位置推定装置は、現実には位置Ｐ１にいる歩行者Ｕの位置を、位置Ｐ１より遠い位置Ｐ２であると推定する（図１Ｂに示す歩行者Ｍ参照）。また、この場合、位置推定装置は、歩行者Ｕの大きさを本来の大きさよりも大きく推定する（図１Ｂに示す歩行者Ｍ参照）。 In the case of FIG. 1B, since the state in which the pedestrian U is not in contact with the road L and the premise of the coordinate conversion in the position estimation device do not match, the position estimation device accurately estimates the position of the pedestrian U. Can not do it. That is, the position estimation device erroneously estimates the position of the pedestrian U. The position estimation device estimates the position of the pedestrian U on the assumption that the pedestrian U and the vehicle 10 are on the same plane even when the pedestrian U and the vehicle 10 are not on the same plane. Is. The position estimation device estimates that the position of the pedestrian U actually at the position P1 is the position P2 farther than the position P1 (see the pedestrian M shown in FIG. 1B). Further, in this case, the position estimation device estimates the size of the pedestrian U to be larger than the original size (see the pedestrian M shown in FIG. 1B).

また、歩行者Ｕが道路Ｌに接触していても、車両１０のタイヤがマンホールなどの物体に乗り上げている場合、道路Ｌに接触している歩行者Ｕと物体に乗り上げている車両１０とが同じ平面にない状態となる。そのため、位置推定装置は、車両１０のタイヤがマンホールなどの物体に乗り上げている場合においても、上記の歩行者Ｕが道路Ｌに接触していない場合と同様、歩行者Ｕの位置を正確に推定することができない。 Further, even if the pedestrian U is in contact with the road L, if the tire of the vehicle 10 is riding on an object such as a manhole, the pedestrian U in contact with the road L and the vehicle 10 riding on the object It will not be on the same plane. Therefore, the position estimation device accurately estimates the position of the pedestrian U even when the tire of the vehicle 10 is riding on an object such as a manhole, as in the case where the pedestrian U is not in contact with the road L. Can not do it.

上記のように、単眼カメラ２０の画像データを用いて歩行者Ｕの位置を推定する位置推定装置では、歩行者Ｕと自車との相対的な位置関係が変化した場合、自車に対する歩行者Ｕの位置の推定を誤ってしまうことがある。位置推定装置は、例えば、対象物がジャンプなどのイレギュラーな動作をしている場合、又は、車両１０のタイヤが物体に乗り上げているなどのイレギュラーな挙動をしている場合に、当該対象物の位置を正確に推定できない場合がある。上記で記載した特許文献１においても、同様に歩行者Ｕの位置の推定を誤ってしまうことがある。そこで、本願発明者は、単眼カメラで撮像した画像データを用いた場合であっても、対象物の位置を正確に推定することができる位置推定方法等について、鋭意検討を行い、以下に説明する位置推定方法等を創案した。 As described above, in the position estimation device that estimates the position of the pedestrian U using the image data of the monocular camera 20, when the relative positional relationship between the pedestrian U and the own vehicle changes, the pedestrian with respect to the own vehicle. The estimation of the position of U may be incorrect. The position estimation device is used when, for example, the object is performing irregular movements such as jumping, or when the tire of the vehicle 10 is riding on the object or the like is performing irregular behavior. It may not be possible to accurately estimate the position of an object. Similarly, in Patent Document 1 described above, the estimation of the position of the pedestrian U may be erroneous. Therefore, the inventor of the present application has diligently studied a position estimation method and the like capable of accurately estimating the position of an object even when using image data captured by a monocular camera, and will be described below. The position estimation method was devised.

単眼カメラが撮像した画像データに基づいて座標変換を行う場合、対象物がジャンプなどの動作をしている、又は、車両が物体に乗り上げているときに撮像された画像データに基づいて第１座標を第２座標に変換しても、正確な第２座標を得ることができない。第２座標は、本来より遠い座標となる。つまり、対象物の正確な位置を推定することができない。また、このように対象物の位置が正確ではない場合、当該対象物の大きさも正確に推定することができない。 When performing coordinate conversion based on the image data captured by the monocular camera, the first coordinates are based on the image data captured when the object is performing an action such as jumping or the vehicle is riding on an object. Is converted to the second coordinate, but an accurate second coordinate cannot be obtained. The second coordinate is farther than it should be. That is, the exact position of the object cannot be estimated. Further, when the position of the object is not accurate in this way, the size of the object cannot be estimated accurately.

そこで、本開示の一態様に係る位置推定方法は、大きさが所定範囲に含まれない場合、代表値に基づいて再度座標変換を行う。第３座標は、対象物に応じた代表値に基づいて座標変換された座標であるので、第２座標に比べて対象物の実世界での座標に近い座標となり得る。よって、単眼カメラで撮像した画像データを用いた場合であっても、対象物の位置を正確に推定することができる。 Therefore, in the position estimation method according to one aspect of the present disclosure, when the size is not included in the predetermined range, the coordinate conversion is performed again based on the representative value. Since the third coordinate is a coordinate that has been coordinate-transformed based on a representative value corresponding to the object, it can be a coordinate closer to the real-world coordinate of the object than the second coordinate. Therefore, even when the image data captured by the monocular camera is used, the position of the object can be accurately estimated.

また、例えば、前記代表値は、前記対象物に応じてあらかじめ設定された標準値であってもよい。 Further, for example, the representative value may be a standard value preset according to the object.

これにより、対象物に応じた標準値に基づいて、第１座標が第３座標に変換されるので、対象物によらず一定の標準値を用いて座標変換する場合に比べて、対象物の位置を正確に推定することができる。 As a result, the first coordinate is converted to the third coordinate based on the standard value corresponding to the object, so that the coordinate is converted using a constant standard value regardless of the object, as compared with the case where the coordinate is converted using a constant standard value regardless of the object. The position can be estimated accurately.

また、例えば、前記標準値は、前記対象物のクラスごとに設定されており、前記大きさが前記所定範囲内に含まれない場合、前記対象物が属するクラスに応じた前記標準値を前記代表値として取得し、取得した前記代表値に基づいて、前記第１座標を前記第３座標に変換しなおしてもよい。 Further, for example, the standard value is set for each class of the object, and when the size is not included in the predetermined range, the standard value corresponding to the class to which the object belongs is represented. It may be acquired as a value, and the first coordinate may be converted back to the third coordinate based on the acquired representative value.

これにより、ラベルごとの標準値を用いて第１座標を第３座標に変換することができるので、さらに正確に対象物の位置を推定することができる。 As a result, the first coordinate can be converted to the third coordinate using the standard value for each label, so that the position of the object can be estimated more accurately.

また、例えば、前記代表値は、過去に取得された画像データに基づく、前記対象物の大きさの時系列データに基づいて決定されてもよい。 Further, for example, the representative value may be determined based on the time series data of the size of the object based on the image data acquired in the past.

これにより、当該対象物における時系列データを用いるので、代表値をより適切に設定することができる。よって、さらに正確に対象物の位置を推定することができる。 As a result, since the time series data of the object is used, the representative value can be set more appropriately. Therefore, the position of the object can be estimated more accurately.

また、例えば、前記大きさが前記所定範囲に含まれる場合、現フレームにおける当該大きさと、過去のフレームにおける前記対象物の大きさとに基づいて、前記対象物の大きさを更新してもよい。 Further, for example, when the size is included in the predetermined range, the size of the object may be updated based on the size in the current frame and the size of the object in the past frame.

これにより、より精度よい対象物の大きさを取得することができる。例えば、当該大きさを用いて、座標変換がしなおされることにより、より一層正確に対象物の位置を推定することができる。 This makes it possible to obtain a more accurate size of the object. For example, the position of the object can be estimated more accurately by performing the coordinate transformation again using the size.

また、本開示の一態様に係る位置推定装置は、単眼カメラから対象物を含む画像データを取得する取得部と、前記画像データにおける前記対象物の第１座標を、実世界における第２座標に変換する第１変換部と、前記第２座標に基づいて、前記対象物の大きさを算出する算出部と、前記大きさが前記対象物に応じた所定範囲に含まれるか否かを判定する判定部と、前記大きさが前記所定範囲に含まれない場合、前記所定範囲内の代表値に基づいて、前記第１座標を第３座標に変換しなおす第２変換部とを備える。また、本開示の一態様に係るプログラムは、上記の位置推定方法をコンピュータに実行させるためのプログラムである。 Further, the position estimation device according to one aspect of the present disclosure uses an acquisition unit that acquires image data including an object from a monocular camera, and the first coordinates of the object in the image data as second coordinates in the real world. A first conversion unit to be converted, a calculation unit that calculates the size of the object based on the second coordinates, and a determination as to whether or not the size is included in a predetermined range corresponding to the object. A determination unit and a second conversion unit that reconverts the first coordinate to the third coordinate based on a representative value within the predetermined range when the size is not included in the predetermined range are provided. Further, the program according to one aspect of the present disclosure is a program for causing a computer to execute the above-mentioned position estimation method.

これにより、上記の位置推定方法と同様の効果を奏する。 As a result, the same effect as that of the above-mentioned position estimation method is obtained.

なお、これらの全般的または具体的な態様は、システム、方法、集積回路、コンピュータプログラムまたはコンピュータで読み取り可能なＣＤ−ＲＯＭ等の非一時的記録媒体で実現されてもよく、システム、方法、集積回路、コンピュータプログラムまたは記録媒体の任意な組み合わせで実現されてもよい。プログラムは、記録媒体に予め記憶されていてもよいし、インターネット等を含む広域通信網を介して記録媒体に供給されてもよい。 It should be noted that these general or specific aspects may be realized by a system, a method, an integrated circuit, a computer program, or a non-temporary recording medium such as a computer-readable CD-ROM, and the system, the method, the integrated. It may be realized by any combination of circuits, computer programs or recording media. The program may be stored in the recording medium in advance, or may be supplied to the recording medium via a wide area communication network including the Internet or the like.

以下、実施の形態について、図面を参照しながら具体的に説明する。 Hereinafter, embodiments will be specifically described with reference to the drawings.

なお、以下で説明する実施の形態は、いずれも包括的または具体的な例を示すものである。以下の実施の形態で示される数値、形状、構成要素、構成要素の配置位置および接続形態、ステップ、ステップの順序などは、一例であり、本開示を限定する主旨ではない。例えば、数値は、厳格な意味のみを表す表現ではなく、実質的に同等な範囲、例えば数％程度の差異をも含むことを意味する表現である。また、以下の実施の形態における構成要素のうち、独立請求項に記載されていない構成要素については、任意の構成要素として説明される。また、各図は、模式図であり、必ずしも厳密に図示されたものではない。また、各図において、同じ構成部材については同じ符号を付している。 It should be noted that all of the embodiments described below show comprehensive or specific examples. Numerical values, shapes, components, arrangement positions and connection forms of components, steps, step order, and the like shown in the following embodiments are examples, and are not intended to limit the present disclosure. For example, a numerical value is not an expression that expresses only a strict meaning, but an expression that includes a substantially equivalent range, for example, a difference of about several percent. Further, among the components in the following embodiments, the components not described in the independent claims will be described as arbitrary components. Further, each figure is a schematic view and is not necessarily exactly illustrated. Further, in each figure, the same components are designated by the same reference numerals.

（実施の形態１）
以下、本実施の形態に係る位置推定装置を備える車両ついて、図２〜図８を参照しながら説明する。 (Embodiment 1)
Hereinafter, a vehicle provided with the position estimation device according to the present embodiment will be described with reference to FIGS. 2 to 8.

［１−１．車両の構成］
まず、本実施の形態に係る車両の構成について、図２を参照しながら説明する。図２は、本実施の形態に係る位置推定装置３０を備える車両１０の機能構成を示すブロック図である。 [1-1. Vehicle configuration]
First, the configuration of the vehicle according to the present embodiment will be described with reference to FIG. FIG. 2 is a block diagram showing a functional configuration of a vehicle 10 including the position estimation device 30 according to the present embodiment.

図２に示すように、車両１０は、単眼カメラ２０と、位置推定装置３０とを備える。 As shown in FIG. 2, the vehicle 10 includes a monocular camera 20 and a position estimation device 30.

単眼カメラ２０は、車両１０に搭載され、車両１０の周囲を撮像する。単眼カメラ２０は、例えば、車両１０の前方の車幅の中心位置近くに取り付けられた小型な車載カメラ（車載単眼カメラ）である。単眼カメラ２０は、例えば、車内のフロントガラス付近の天井に取り付けられていてもよい。この場合、単眼カメラ２０の撮像範囲は、車両１０の前方において、ボンネットの一部を含む範囲となる。また、単眼カメラ２０は、例えば、ボンネットの先端部に取り付けられていてもよい。また、単眼カメラ２０は、車両１０の後方又は側方を撮像できるように取り付けられていてもよい。 The monocular camera 20 is mounted on the vehicle 10 and images the surroundings of the vehicle 10. The monocular camera 20 is, for example, a small vehicle-mounted camera (vehicle-mounted monocular camera) mounted near the center position of the vehicle width in front of the vehicle 10. The monocular camera 20 may be mounted on the ceiling near the windshield in the vehicle, for example. In this case, the imaging range of the monocular camera 20 is a range including a part of the bonnet in front of the vehicle 10. Further, the monocular camera 20 may be attached to the tip of the bonnet, for example. Further, the monocular camera 20 may be attached so as to be able to take an image of the rear or side of the vehicle 10.

単眼カメラ２０としては、特に限定されず、公知の単眼カメラを用いることができる。単眼カメラ２０は、例えば、可視光領域の波長の光を撮像する一般的な可視光カメラであるが、赤外光の情報を取得できるカメラであってもよい。また、単眼カメラ２０は、例えば、広角で撮像するものであってもよい。単眼カメラ２０は、例えば、魚眼レンズを有する魚眼カメラであってもよい。また、単眼カメラ２０は、モノクロ画像を撮像するモノクロカメラであってもよいし、カラー画像を撮像するカラーカメラであってもよい。 The monocular camera 20 is not particularly limited, and a known monocular camera can be used. The monocular camera 20 is, for example, a general visible light camera that captures light having a wavelength in the visible light region, but may be a camera capable of acquiring infrared light information. Further, the monocular camera 20 may capture images at a wide angle, for example. The monocular camera 20 may be, for example, a fisheye camera having a fisheye lens. Further, the monocular camera 20 may be a monochrome camera that captures a monochrome image, or may be a color camera that captures a color image.

単眼カメラ２０は、撮像した画像データを位置推定装置３０に出力する。単眼カメラ２０は、撮像装置の一例である。また、画像データは、２次元画像データである。 The monocular camera 20 outputs the captured image data to the position estimation device 30. The monocular camera 20 is an example of an imaging device. The image data is two-dimensional image data.

位置推定装置３０は、単眼カメラ２０から取得した画像データに基づいて、対象物の位置を推定する。位置推定装置３０は、画像データに基づいて、実世界における対象物の３次元位置を推定する３次元位置推定装置である。位置推定装置３０は、検知部４０と、制御部５０と、記憶部６０とを有する。 The position estimation device 30 estimates the position of the object based on the image data acquired from the monocular camera 20. The position estimation device 30 is a three-dimensional position estimation device that estimates the three-dimensional position of an object in the real world based on image data. The position estimation device 30 includes a detection unit 40, a control unit 50, and a storage unit 60.

検知部４０は、単眼カメラ２０から取得した画像データに基づいて、検知対象の対象物を検知する。以下において検知部４０の検知対象の対象物は人（例えば、歩行者Ｕ）である例について説明するが、対象物は人に限定されない。検知部４０は、単眼カメラ２０から歩行者Ｕを含む画像データを取得する取得部として機能する。 The detection unit 40 detects an object to be detected based on the image data acquired from the monocular camera 20. Hereinafter, an example in which the object to be detected by the detection unit 40 is a person (for example, a pedestrian U) will be described, but the object is not limited to the person. The detection unit 40 functions as an acquisition unit that acquires image data including the pedestrian U from the monocular camera 20.

検知部４０は、画像データを入力とし、当該画像データに写る人を検知した検知枠を出力するように学習された学習済みモデルを用いて人を検知する。学習済みモデルは、例えば、教師ありデータによる機械学習により生成される。学習済みモデルは、人が写った画像データを訓練データとし、当該画像データに写る人の検知枠（検知枠の座標）を正解データとして機械学習を用いて生成される。また、学習済モデルは、例えば、ニューラルネットワークを用いて、深層学習（ディープラーニング）により生成される。つまり、検知部４０は、ディープラーニング人検知部であるとも言える。検知部４０は、検知した検知枠のリストである検知枠リストを制御部５０に出力する。検知枠リストは、検知結果の一例である。 The detection unit 40 uses image data as input and detects a person using a trained model trained to output a detection frame that detects a person appearing in the image data. The trained model is generated, for example, by machine learning with supervised data. The trained model is generated by using machine learning using the image data of a person as training data and the detection frame (coordinates of the detection frame) of the person in the image data as correct answer data. Further, the trained model is generated by deep learning using, for example, a neural network. That is, it can be said that the detection unit 40 is a deep learning person detection unit. The detection unit 40 outputs a detection frame list, which is a list of detected detection frames, to the control unit 50. The detection frame list is an example of the detection result.

検知枠リストには、検知部４０が検知した検知枠の数、それぞれの検知枠の画像データ上での座標情報、及び、それぞれの検知枠の信頼度が含まれる。信頼度は、例えば、０〜１までの値をとり得る。なお、検知部４０は、歩いている又はジャンプしている歩行者Ｕを検知可能なように学習されている。検知部４０は、道路Ｌに接触している歩行者Ｕ及び道路Ｌに接触してない歩行者Ｕのそれぞれを検知可能なように学習されている。また、検知枠リストには、それぞれの検知枠のクラスが含まれていてもよい。クラスは、検知部４０が検知可能な対象物の分類を示す。クラスは、位置推定装置３０が搭載される物体に応じて適宜決定される。位置推定装置３０が車両１０に搭載される場合、クラスは、「人」、「車」などであり、歩行者Ｕのクラスは、例えば「人」である。なお、物体は、車両１０などの移動体であるが、固定して使用される機器又は据え置きで使用される機器であってもよい。 The detection frame list includes the number of detection frames detected by the detection unit 40, the coordinate information on the image data of each detection frame, and the reliability of each detection frame. The reliability can be, for example, a value from 0 to 1. The detection unit 40 is learned so that it can detect a pedestrian U who is walking or jumping. The detection unit 40 is learned so that it can detect each of a pedestrian U in contact with the road L and a pedestrian U not in contact with the road L. Further, the detection frame list may include the class of each detection frame. The class indicates the classification of the object that can be detected by the detection unit 40. The class is appropriately determined according to the object on which the position estimation device 30 is mounted. When the position estimation device 30 is mounted on the vehicle 10, the class is "person", "car", etc., and the class of the pedestrian U is, for example, "person". The object is a moving body such as a vehicle 10, but it may be a device used in a fixed manner or a device used in a stationary manner.

なお、検知枠リストに含まれる検知枠の数がゼロであることもあり得る。この場合、検知枠リストには、検知枠の数がゼロであることを示す情報が含まれる。 The number of detection frames included in the detection frame list may be zero. In this case, the detection frame list contains information indicating that the number of detection frames is zero.

ここで、検知枠について図３を参照しながら説明する。図３は、本実施の形態に係る検知部４０による対象物の検知結果を示す図である。 Here, the detection frame will be described with reference to FIG. FIG. 3 is a diagram showing a detection result of an object by the detection unit 40 according to the present embodiment.

図３に示すように、検知部４０は、単眼カメラ２０から取得した画像データＤに基づいて、歩行者Ｕの検知枠Ｆの座標情報を含む検知枠リストを制御部５０に出力する。検知部４０は、歩行者Ｕが道路Ｌに接触している場合、道路Ｌに接触している歩行者Ｕを囲む検知枠Ｆの座標情報を含む検知枠リストを制御部５０に出力する。また、検知部４０は、歩行者Ｕが道路Ｌに接触していない場合、道路Ｌに接触していない歩行者Ｕ（例えば、ジャンプしている歩行者Ｕ）を囲む検知枠Ｆの座標情報を含む検知枠リストを制御部５０に出力する。座標情報は、例えば、検知枠Ｆの対角をなす点の座標Ｐｂ１（ｕ１、ｖ１）、及び、Ｐｂ２（ｕ２、ｖ２）を含む。 As shown in FIG. 3, the detection unit 40 outputs a detection frame list including the coordinate information of the detection frame F of the pedestrian U to the control unit 50 based on the image data D acquired from the monocular camera 20. When the pedestrian U is in contact with the road L, the detection unit 40 outputs a detection frame list including the coordinate information of the detection frame F surrounding the pedestrian U in contact with the road L to the control unit 50. Further, when the pedestrian U is not in contact with the road L, the detection unit 40 obtains the coordinate information of the detection frame F surrounding the pedestrian U (for example, the jumping pedestrian U) that is not in contact with the road L. The detection frame list including the detection frame is output to the control unit 50. The coordinate information includes, for example, the coordinates Pb1 (u1, v1) and Pb2 (u2, v2) of the diagonal points of the detection frame F.

図２を再び参照して、制御部５０は、検知枠リストに基づいて対象物の位置を推定する。本実施の形態に係る制御部５０は、歩行者Ｕが道路Ｌに接触しているか否かを判定し、判定結果に基づいて当該歩行者Ｕの位置を推定する。制御部５０は、第１変換部５１と、算出部５２と、判定部５３と、第２変換部５４とを有する。 With reference to FIG. 2 again, the control unit 50 estimates the position of the object based on the detection frame list. The control unit 50 according to the present embodiment determines whether or not the pedestrian U is in contact with the road L, and estimates the position of the pedestrian U based on the determination result. The control unit 50 includes a first conversion unit 51, a calculation unit 52, a determination unit 53, and a second conversion unit 54.

第１変換部５１は、検知部４０から検知枠リストを取得すると、検知枠リストに含まれる検知枠それぞれの座標を変換する。具体的には、第１変換部５１は、検知枠それぞれの座標を、画像データ上の座標から実世界（実空間）における座標に変換する。第１変換部５１は、カメラ座標系における歩行者Ｕの座標を、直交座標系における歩行者Ｕの座標に変換するとも言える。画像データ上の座標は第１座標の一例であり、第１変換部５１が座標変換して得られる実世界上の座標は第２座標の一例である。 When the first conversion unit 51 acquires the detection frame list from the detection unit 40, the first conversion unit 51 converts the coordinates of each detection frame included in the detection frame list. Specifically, the first conversion unit 51 converts the coordinates of each detection frame from the coordinates on the image data to the coordinates in the real world (real space). It can be said that the first conversion unit 51 converts the coordinates of the pedestrian U in the camera coordinate system into the coordinates of the pedestrian U in the Cartesian coordinate system. The coordinates on the image data are an example of the first coordinates, and the coordinates in the real world obtained by the coordinate conversion by the first conversion unit 51 are an example of the second coordinates.

第１変換部５１が座標変換を行う方法は特に限定されないが、例えば、以下の方法により行われる。図４は、本実施の形態に係る第１変換部５１の座標変換を説明するための図である。図４に示す座標Ｐａ（Ｘ、Ｙ、Ｚ）は、歩行者Ｕの実世界での座標であり、直交座標系における３次元点（３次元座標）を示す。また、ｘ軸、ｙ軸及びｚ軸は、互いに直交しており、３次元直交座標系を構成する。ｚ軸は単眼カメラ２０の光軸（ｏｐｔｉｃａｌａｘｉｓ）と平行な方向の軸であり、ｘ軸及びｙ軸のそれぞれは、ｚ軸と直交する軸である。ｘｙ平面と光軸との交点が、直交座標系における原点である。 The method by which the first conversion unit 51 performs coordinate conversion is not particularly limited, but is performed by, for example, the following method. FIG. 4 is a diagram for explaining the coordinate conversion of the first conversion unit 51 according to the present embodiment. The coordinates Pa (X, Y, Z) shown in FIG. 4 are the coordinates of the pedestrian U in the real world, and indicate three-dimensional points (three-dimensional coordinates) in the Cartesian coordinate system. Further, the x-axis, y-axis and z-axis are orthogonal to each other, forming a three-dimensional Cartesian coordinate system. The z-axis is an axis in a direction parallel to the optical axis of the monocular camera 20, and each of the x-axis and the y-axis is an axis orthogonal to the z-axis. The intersection of the xy plane and the optical axis is the origin in the Cartesian coordinate system.

また、Ｘｃ軸、Ｙｃ軸及びＺｃ軸は、互いに直交しており、カメラ座標系を構成する。Ｚｃ軸は単眼カメラ２０の光軸と平行な方向の軸であり、Ｘｃ軸及びＹｃ軸のそれぞれは、Ｚｃ軸と直交する軸である。ＸｃＹｃ平面と光軸との交点が、カメラ座標系における原点Ｆｃである。 Further, the Xc axis, the Yc axis, and the Zc axis are orthogonal to each other and form a camera coordinate system. The Zc axis is an axis in a direction parallel to the optical axis of the monocular camera 20, and each of the Xc axis and the Yc axis is an axis orthogonal to the Zc axis. The intersection of the XcYc plane and the optical axis is the origin Fc in the camera coordinate system.

座標Ｐａ及び単眼カメラ２０の原点Ｆｃを結んだ直線と画像平面とが交わる点の座標を座標Ｐｂ（ｕｃ、ｖｃ）とする。座標Ｐｂ（ｕｃ、ｖｃ）は、カメラ座標系での座標である。 The coordinates of the point where the straight line connecting the coordinate Pa and the origin Fc of the monocular camera 20 and the image plane intersect are defined as the coordinates Pb (uc, vc). The coordinates Pb (uc, vc) are the coordinates in the camera coordinate system.

このとき、座標Ｐｂ（ｕｃ、ｖｃ）は、以下の式１及び式２により算出することができる。 At this time, the coordinates Pb (uc, vc) can be calculated by the following equations 1 and 2.

ｕｃ＝（Ｘ×ｆｘ）／Ｚ＋ｃｘ（式１）
ｖｃ＝（Ｙ×ｆｙ）／Ｚ＋ｃｙ（式２） uc = (X × fx) / Z + cx (Equation 1)
vc = (Y × fy) / Z + cy (Equation 2)

ここで、ｆｘ及びｆｙはピクセル単位で表される焦点距離であり、ｆｘはｘ軸方向の焦点距離であり、ｆｙはｙ軸方向の焦点距離である。また、ｃｘ及びｃｙは画像中心（ｐｒｉｎｃｉｐａｌｐｏｉｎｔ）である。画像中心は、単眼カメラ２０の光軸と画像平面とが交わる点である。 Here, fx and fy are focal lengths expressed in pixel units, fx is a focal length in the x-axis direction, and fy is a focal length in the y-axis direction. Further, cx and cy are image centers (principal points). The center of the image is the point where the optical axis of the monocular camera 20 and the image plane intersect.

図３に示す画像データ上での矩形枠の左上の角の座標Ｐｂ１（ｕ１、ｖ１）、及び、右下の角の座標Ｐｂ２（ｕ２、ｖ２）を用いると、矩形枠の画像データ上での面積Ｓｉｍｇを以下の式３で算出することができる。 Using the coordinates Pb1 (u1, v1) of the upper left corner of the rectangular frame and the coordinates Pb2 (u2, v2) of the lower right corner on the image data shown in FIG. 3, the image data of the rectangular frame can be used. The area size can be calculated by the following formula 3.

Ｓｉｍｇ＝｜ｕ１−ｕ２｜×｜ｖ１−ｖ２｜（式３） Simg = | u1-u2 | × | v1-v2 | (Equation 3)

画像データ上の座標Ｐｂ１に対応する実世界上での座標を座標Ｐａ１（Ｘ１、Ｙ１、Ｚ１）とし、画像データ上の座標Ｐｂ２に対応する実世界上での座標を座標Ｐａ２（Ｘ２、Ｙ２、Ｚ２）とし、かつ、奥行きＺ１＝Ｚ２＝Ｚと仮定すると、検知枠の実世界上での面積Ｓｗｏｒｌｄは、以下の式４で算出される。 The coordinates in the real world corresponding to the coordinates Pb1 on the image data are the coordinates Pa1 (X1, Y1, Z1), and the coordinates in the real world corresponding to the coordinates Pb2 on the image data are the coordinates Pa2 (X2, Y2, Assuming that Z2) and the depth Z1 = Z2 = Z, the area quad of the detection frame in the real world is calculated by the following equation 4.

Ｓｗｏｒｌｄ＝｜Ｘ１−Ｘ２｜×｜Ｙ１−Ｙ２｜
＝｜Ｚ（ｕ１−ｃｘ）／ｆｘ−Ｚ（ｃ２−ｃｘ）／ｆｘ｜×
｜Ｚ（ｖ１−ｃｙ）／ｆｙ−Ｚ（ｖ２−ｃｙ）／ｆｙ｜
＝（Ｚ^２×Ｓｉｍｇ）／（ｆｘ×ｆｙ）（式４） World = | X1-X2 | × | Y1-Y2 |
= | Z (u1-cx) / fx-Z (c2-cx) / fx | ×
| Z (v1-cy) / fy-Z (v2-cy) / fy |
= (Z ² x Sigma) / (fx x fy) (Equation 4)

面積Ｓｗｏｒｌｄは、対象物が人（例えば、大人の人）である場合、幅が５０ｃｍ程度、高さ（身長）が１７０ｃｍ程度として算出することが可能である。よって、式４における残りの未知数である奥行きＺの値を算出することができる。奥行きＺは、単眼カメラ２０の画像平面から対象物までの光軸と平行な方向の距離を示す。そして、奥行きＺの値と、座標Ｐｂ１及びＰｂ２とに基づいて、座標Ｐａ１及びＰａ２が算出可能である。 The area Round can be calculated assuming that the width is about 50 cm and the height (height) is about 170 cm when the object is a person (for example, an adult person). Therefore, the value of the depth Z, which is the remaining unknown in Equation 4, can be calculated. The depth Z indicates the distance from the image plane of the monocular camera 20 to the object in the direction parallel to the optical axis. Then, the coordinates Pa1 and Pa2 can be calculated based on the value of the depth Z and the coordinates Pb1 and Pb2.

なお、上記では、検知枠の面積を用いて奥行きＺの値を算出する例について説明したが、検知枠の高さ及び幅の一方を用いて奥行きＺの値を算出することも可能である。例えば、車両１０であれば正面から対象物を見るか横から見るかにより、対象物の幅は異なるが、高さはおおよそ同じである。そこで、以下では、高さを用いて奥行きＺの値を算出する場合について説明する。検知枠の実世界上での高さＨｗｏｒｌｄは、検知枠の画像データ上での高さを高さＨｉｍｇとすると、以下の式５で算出される。 Although the example of calculating the depth Z value using the area of the detection frame has been described above, it is also possible to calculate the depth Z value using either the height or the width of the detection frame. For example, in the case of the vehicle 10, the width of the object differs depending on whether the object is viewed from the front or from the side, but the height is approximately the same. Therefore, in the following, a case where the value of the depth Z is calculated using the height will be described. The height Hworld of the detection frame in the real world is calculated by the following equation 5 assuming that the height of the detection frame on the image data is Himg.

Ｈｗｏｒｌｄ＝｜Ｙ１−Ｙ２｜＝｜Ｚ（ｖ１−ｃｙ）／ｆｙ−Ｚ（ｖ２−ｃｙ）／ｆｙ｜＝（Ｚ×Ｈｉｍｇ）／ｆｙ（式５） Hworld = | Y1-Y2 | = | Z (v1-cy) / fy-Z (v2-cy) / fy | = (Z × Himg) / fy (Equation 5)

ここで、検知枠の画像データ上での高さＨｉｍｇは、以下の式６で算出される。 Here, the height Himg of the detection frame on the image data is calculated by the following formula 6.

Ｈｉｍｇ＝｜ｖ１−ｖ２｜（式６） Himg = | v1-v2 | (Equation 6)

式５及び式６に示すように、検知枠の実世界上での高さＨｗｏｒｌｄがわかると、奥行きＺの値を算出可能である。 As shown in Equations 5 and 6, if the height Hworld of the detection frame in the real world is known, the value of the depth Z can be calculated.

図２を再び参照して、算出部５２は、第１変換部５１が座標変換した座標、つまり実世界における座標に基づいて、歩行者Ｕの大きさを算出する。算出部５２は、例えば、歩行者Ｕの高さを算出する。なお、歩行者Ｕの大きさは、歩行者Ｕの幅であってもよいし、面積であってもよい。 With reference to FIG. 2 again, the calculation unit 52 calculates the size of the pedestrian U based on the coordinates converted by the first conversion unit 51, that is, the coordinates in the real world. The calculation unit 52 calculates, for example, the height of the pedestrian U. The size of the pedestrian U may be the width of the pedestrian U or the area.

判定部５３は、歩行者Ｕの実世界での大きさが当該歩行者Ｕに応じた所定範囲に含まれるか否かを判定する。本実施の形態では、判定部５３は、過去に単眼カメラ２０から取得された画像データに基づく、歩行者Ｕの実世界での大きさの時系列データに基づいて所定範囲を決定する。つまり、所定範囲は、歩行者Ｕごと（対象物ごと）に決定される。判定部５３は、歩行者Ｕの正確な位置を取得したか否かを、歩行者Ｕの大きさに基づいて判定するとも言える。時系列データは、現フレームより前の複数のフレームで取得された画像データに基づいて算出された、歩行者Ｕの複数のフレームそれぞれでの大きさを含む。当該時系列データ（例えば、後述する図７を参照）は、記憶部６０に記憶されている。図５は、本実施の形態に係る所定範囲を説明するための図である。図５の縦軸は大きさで示しており、横軸は時刻を示している。 The determination unit 53 determines whether or not the size of the pedestrian U in the real world is included in the predetermined range corresponding to the pedestrian U. In the present embodiment, the determination unit 53 determines a predetermined range based on the time-series data of the size of the pedestrian U in the real world based on the image data acquired from the monocular camera 20 in the past. That is, the predetermined range is determined for each pedestrian U (for each object). It can be said that the determination unit 53 determines whether or not the accurate position of the pedestrian U has been acquired based on the size of the pedestrian U. The time-series data includes the size of the pedestrian U in each of the plurality of frames, which is calculated based on the image data acquired in the plurality of frames before the current frame. The time series data (for example, see FIG. 7 described later) is stored in the storage unit 60. FIG. 5 is a diagram for explaining a predetermined range according to the present embodiment. The vertical axis of FIG. 5 indicates the size, and the horizontal axis indicates the time.

図５に示すように、判定部５３は、過去の数フレームでの歩行者Ｕの大きさに基づいて、所定範囲を決定する。判定部５３は、例えば、歩行者Ｕの大きさの平均値及び標準偏差に基づいて、所定範囲を決定する。判定部５３は、例えば、過去の数フレームでの歩行者Ｕの大きさの平均値をＳａｖｇ、標準偏差をσとすると、以下の式７及び式８に基づいて所定範囲を決定する。 As shown in FIG. 5, the determination unit 53 determines a predetermined range based on the size of the pedestrian U in the past several frames. The determination unit 53 determines a predetermined range based on, for example, the average value and standard deviation of the sizes of the pedestrian U. For example, assuming that the average value of the sizes of the pedestrians U in the past several frames is Savg and the standard deviation is σ, the determination unit 53 determines a predetermined range based on the following equations 7 and 8.

上限値＝Ｓａｖｇ＋ｋ×σ （式７）
下限値＝Ｓａｖｇ−ｋ×σ （式８） Upper limit = Savg + k × σ (Equation 7)
Lower limit = Savg-k x σ (Equation 8)

この場合、判定部５３は、式８で算出される下限値以上、かつ、式７で算出される上限値以下の範囲を所定範囲に決定する。なお、ｋは、予め設定される定数である。 In this case, the determination unit 53 determines a range equal to or greater than the lower limit value calculated by the equation 8 and less than or equal to the upper limit value calculated by the equation 7 as a predetermined range. Note that k is a preset constant.

図２を再び参照して、判定部５３は、現フレームの歩行者Ｕの大きさが所定範囲内である場合、現フレームにおいて歩行者Ｕが道路Ｌと接触していると判定する。判定部５３は、歩行者Ｕが道路Ｌに接触しているという状態と、位置推定装置３０における座標変換の前提とが一致していると判定するとも言える。また、判定部５３は、現フレームの歩行者Ｕの大きさが所定範囲外である場合、現フレームにおいて歩行者Ｕが道路Ｌと接触していないと判定する。現フレームの歩行者Ｕの大きさが所定範囲外である場合、歩行者Ｕが道路Ｌに接触していないという状態と、位置推定装置３０における座標変換の前提とが一致していないので、当該現フレームにおける第１変換部５１による座標変換後の座標、つまり実世界での座標は、正確な値ではない場合がある。判定部５３は、判定結果を第２変換部５４に出力する。 With reference to FIG. 2 again, when the size of the pedestrian U in the current frame is within the predetermined range, the determination unit 53 determines that the pedestrian U is in contact with the road L in the current frame. It can be said that the determination unit 53 determines that the state in which the pedestrian U is in contact with the road L matches the premise of the coordinate conversion in the position estimation device 30. Further, when the size of the pedestrian U in the current frame is out of the predetermined range, the determination unit 53 determines that the pedestrian U is not in contact with the road L in the current frame. When the size of the pedestrian U in the current frame is out of the predetermined range, the state in which the pedestrian U is not in contact with the road L and the premise of the coordinate conversion in the position estimation device 30 do not match. The coordinates after coordinate conversion by the first conversion unit 51 in the current frame, that is, the coordinates in the real world may not be accurate values. The determination unit 53 outputs the determination result to the second conversion unit 54.

第２変換部５４は、判定部５３により現フレームの歩行者Ｕの大きさが所定範囲内に含まれないと判定された場合、つまり歩行者Ｕが道路Ｌに接触していないという状態と、位置推定装置３０における座標変換の前提とが一致していない場合、当該所定範囲内の値に基づいて、画像データにおける歩行者Ｕの座標を、実世界における歩行者Ｕの座標に変換しなおす。第２変換部５４は、検知部４０からの検知枠の座標を、代表値に基づいて実世界における検知枠の座標に変換しなおす。第２変換部５４は、歩行者Ｕの実世界における大きさが代表値の値であるとして、実世界における検知枠の座標に変換しなおす。なお、本実施の形態では、代表値は、例えば、式７で算出される上限値、及び、式８で算出される下限値の間の値であり、例えば、過去の数フレームの実空間における大きさの平均値、中央値、最頻値などである。つまり、代表値は、過去に取得された画像データに基づく、対象物の実世界での大きさの時系列データに基づいて決定される。また、第２変換部５４が座標変換して得られる実世界上の座標は、第３座標の一例である。 The second conversion unit 54 determines that the size of the pedestrian U in the current frame is not included in the predetermined range by the determination unit 53, that is, the state in which the pedestrian U is not in contact with the road L. If the premise of the coordinate conversion in the position estimation device 30 does not match, the coordinates of the pedestrian U in the image data are converted back to the coordinates of the pedestrian U in the real world based on the values within the predetermined range. The second conversion unit 54 reconverts the coordinates of the detection frame from the detection unit 40 to the coordinates of the detection frame in the real world based on the representative value. The second conversion unit 54 reconverts the pedestrian U into the coordinates of the detection frame in the real world, assuming that the size of the pedestrian U in the real world is the value of the representative value. In the present embodiment, the representative value is, for example, a value between the upper limit value calculated by the formula 7 and the lower limit value calculated by the formula 8, and is, for example, in the real space of several frames in the past. Average size, median, mode, etc. That is, the representative value is determined based on the time series data of the size of the object in the real world based on the image data acquired in the past. Further, the coordinates in the real world obtained by the coordinate conversion by the second conversion unit 54 are an example of the third coordinates.

また、第２変換部５４は、実世界における検知枠の座標を、車両１０を鳥瞰したときの座標に変換してもよい。第２変換部５４は、単眼カメラ２０の視点から鳥瞰基準の視点に、視点変換するとも言える。第２変換部５４は、鳥瞰したときの座標として、鳥瞰したときの２次元座標を算出してもよい。 Further, the second conversion unit 54 may convert the coordinates of the detection frame in the real world into the coordinates when the vehicle 10 is viewed from a bird's-eye view. It can be said that the second conversion unit 54 converts the viewpoint from the viewpoint of the monocular camera 20 to the viewpoint based on the bird's-eye view. The second conversion unit 54 may calculate the two-dimensional coordinates when the bird's-eye view is taken as the coordinates when the bird's-eye view is taken.

記憶部６０は、位置推定装置３０の各処理部で行われる処理のための各種情報を記憶する記憶装置である。記憶部６０は、例えば、過去のフレームにおける歩行者Ｕの大きさ情報、各式１〜式８に示す情報及び各種係数などを記憶する。記憶部６０は、例えば、半導体メモリなどにより実現される。 The storage unit 60 is a storage device that stores various information for processing performed by each processing unit of the position estimation device 30. The storage unit 60 stores, for example, the size information of the pedestrian U in the past frame, the information shown in each equation 1 to 8, various coefficients, and the like. The storage unit 60 is realized by, for example, a semiconductor memory or the like.

ここで、記憶部６０に記憶される過去のフレームにおける歩行者Ｕの大きさの推定結果について、図６を参照しながら説明する。図６は、本実施の形態に係る過去のフレームにおける対象物の大きさの推定結果を示す図である。 Here, the estimation result of the size of the pedestrian U in the past frame stored in the storage unit 60 will be described with reference to FIG. FIG. 6 is a diagram showing the estimation result of the size of the object in the past frame according to the present embodiment.

図６に示すように、推定結果は、時刻と歩行者Ｕの大きさとが対応付けられたテーブルである。時刻の間隔は特に限定されない。また、歩行者Ｕの大きさは、計測誤差などを含んだ値であり、フレームごとにバラつきが発生する。図６では、テーブルは、現在（例えば、８時１０秒）の直近４フレーム（例えば、８時６秒〜８時９秒）における歩行者Ｕの大きさを含む。 As shown in FIG. 6, the estimation result is a table in which the time and the size of the pedestrian U are associated with each other. The time interval is not particularly limited. Further, the size of the pedestrian U is a value including a measurement error and the like, and variations occur for each frame. In FIG. 6, the table includes the size of the pedestrian U at the present (eg, 8:10) in the last 4 frames (eg, 8: 06-8: 9).

制御部５０は、算出部５２が算出した歩行者Ｕの実世界での大きさを図６に示すテーブルとして記憶部６０に記憶する。制御部５０は、例えば、歩行者Ｕにおける大きさが時間的に変化していない場合、当該大きさを歩行者Ｕが道路Ｌに接触しているときの大きさであるとして、記憶部６０に記憶する。制御部５０は、例えば、数フレームの間において、歩行者Ｕの大きさが類似した値をとるときに、その大きさを当該歩行者Ｕの大きさとして記憶部６０に記憶する。類似した値とは、例えば、単眼カメラ２０の性能（例えば、分解能）、制御部５０による処理などにより、発生し得る誤差に基づいて設定された範囲内の値であることであってもよいし、予め設定された範囲内の値であることであってもよい。 The control unit 50 stores in the storage unit 60 the size of the pedestrian U calculated by the calculation unit 52 in the real world as a table shown in FIG. For example, when the size of the pedestrian U does not change with time, the control unit 50 determines that the size is the size when the pedestrian U is in contact with the road L, and stores the size in the storage unit 60. Remember. For example, when the size of the pedestrian U takes a similar value in several frames, the control unit 50 stores the size as the size of the pedestrian U in the storage unit 60. The similar value may be, for example, a value within a range set based on an error that may occur due to the performance (for example, resolution) of the monocular camera 20, the processing by the control unit 50, or the like. , The value may be within a preset range.

上記のような位置推定装置３０から出力される位置情報について、図７を参照しながら説明する。図７は、本実施の形態に係る位置推定装置３０が出力する位置情報の一例を示す図である。位置情報は、位置推定装置３０が推定した歩行者Ｕの車両１０に対する位置推定結果を含む。具体的には、位置情報は、第１変換部５１又は第２変換部５４の座標変換の結果を含む。なお、図６では、位置情報が鳥瞰図で示されている例について説明するが、位置情報は鳥瞰図で示されることに限定されない。また、図７では、単眼カメラ２０は、車両１０のボンネットの先端部に配置されているとする。 The position information output from the position estimation device 30 as described above will be described with reference to FIG. 7. FIG. 7 is a diagram showing an example of position information output by the position estimation device 30 according to the present embodiment. The position information includes the position estimation result of the pedestrian U with respect to the vehicle 10 estimated by the position estimation device 30. Specifically, the position information includes the result of the coordinate conversion of the first conversion unit 51 or the second conversion unit 54. Although FIG. 6 describes an example in which the position information is shown in the bird's-eye view, the position information is not limited to being shown in the bird's-eye view. Further, in FIG. 7, it is assumed that the monocular camera 20 is arranged at the tip of the bonnet of the vehicle 10.

図７に示すように、位置情報は、単眼カメラ２０に対する歩行者Ｕの位置を含む。単眼カメラ２０が車両１０に搭載されている場合、位置情報は、当該車両１０に対する歩行者Ｕの位置を含む。図７では、位置推定装置３０により、車両１０の前方７ｍ及び左側８ｍの位置に歩行者Ｕがいると推定された例を示している。 As shown in FIG. 7, the position information includes the position of the pedestrian U with respect to the monocular camera 20. When the monocular camera 20 is mounted on the vehicle 10, the position information includes the position of the pedestrian U with respect to the vehicle 10. FIG. 7 shows an example in which the position estimation device 30 estimates that the pedestrian U is located 7 m in front of the vehicle 10 and 8 m on the left side of the vehicle 10.

なお、位置情報は、歩行者Ｕの位置の時系列データに基づいて算出される当該歩行者Ｕの速度をさらに含んでいてもよい。 The position information may further include the speed of the pedestrian U calculated based on the time series data of the position of the pedestrian U.

［１−２．位置推定装置の動作］
続いて、本実施の形態に係る位置推定装置３０の動作について、図８を参照しながら説明する。図８は、本実施の形態に係る位置推定装置３０の動作を示すフローチャートである。 [1-2. Operation of position estimator]
Subsequently, the operation of the position estimation device 30 according to the present embodiment will be described with reference to FIG. FIG. 8 is a flowchart showing the operation of the position estimation device 30 according to the present embodiment.

図８に示すように、位置推定装置３０の検知部４０は、単眼カメラ２０で撮像された画像データを取得する（Ｓ１０１）。検知部４０は、例えば、単眼カメラ２０から対象物を含む画像データを取得する。検知部４０は、画像データを取得すると、当該画像データに対象物（本実施の形態では、歩行者Ｕ）が含まれるかを検知する処理を実行する。具体的には、検知部４０は、学習済みモデルに画像データを入力し、当該学習済みモデルの出力である検知枠の座標情報及び信頼度を取得する。そして、検知部４０は、検知枠の座標情報及び信頼度を含む検知枠リストを制御部５０に出力する。 As shown in FIG. 8, the detection unit 40 of the position estimation device 30 acquires the image data captured by the monocular camera 20 (S101). The detection unit 40 acquires image data including an object from, for example, the monocular camera 20. When the detection unit 40 acquires the image data, the detection unit 40 executes a process of detecting whether the image data includes an object (pedestrian U in the present embodiment). Specifically, the detection unit 40 inputs image data to the trained model and acquires the coordinate information and reliability of the detection frame which is the output of the trained model. Then, the detection unit 40 outputs a detection frame list including the coordinate information of the detection frame and the reliability to the control unit 50.

次に、制御部５０は、検知部４０から検知枠リストを取得すると、検知部４０が対象物を検知したか否かを判定する（Ｓ１０２）。制御部５０は、検知枠リストに含まれる検知枠の数が１以上である場合、ステップＳ１０２でＹｅｓと判定し、検知枠の数が０である場合、ステップＳ１０２でＮｏと判定する。 Next, when the control unit 50 acquires the detection frame list from the detection unit 40, the control unit 50 determines whether or not the detection unit 40 has detected the object (S102). The control unit 50 determines Yes in step S102 when the number of detection frames included in the detection frame list is 1 or more, and determines No in step S102 when the number of detection frames is 0.

次に、第１変換部５１は、検知枠リストに含まれる検知枠の数が１以上である場合（Ｓ１０２でＹｅｓ）、検知枠それぞれの画像データ上の座標を実世界での座標に座標変換する（Ｓ１０３）。第１変換部５１は、検知対象の対象物が道路Ｌに接触しているとして、座標変換を行う。第１変換部５１は、座標変換した検知枠の座標（対象物の座標）を算出部５２に出力する。第１変換部５１は、検知枠リストに含まれる検知枠の信頼度が所定以上である検知枠のみについて座標変換を行ってもよいし、全ての検知枠について座標変換を行ってもよい。 Next, when the number of detection frames included in the detection frame list is 1 or more (Yes in S102), the first conversion unit 51 converts the coordinates on the image data of each detection frame into the coordinates in the real world. (S103). The first conversion unit 51 performs coordinate conversion on the assumption that the object to be detected is in contact with the road L. The first conversion unit 51 outputs the coordinates (coordinates of the object) of the detection frame whose coordinates have been converted to the calculation unit 52. The first conversion unit 51 may perform coordinate conversion only on the detection frames whose reliability of the detection frames included in the detection frame list is equal to or higher than a predetermined value, or may perform coordinate conversion on all the detection frames.

次に、算出部５２は、第１変換部５１が座標変換した検知枠の座標に基づいて、当該対象物の大きさを算出する（Ｓ１０４）。算出部５２は、例えば、検知枠の対角の位置の座標に基づいて、当該対象物の実世界での大きさを算出する。算出部５２は、対象物の実世界での大きさを判定部５３に出力する。算出部５２は、図３の例では、座標ｖ１及びｖ２の差分に対応する実世界での大きさを、対象物の大きさ（高さ）であるとして算出してもよい。 Next, the calculation unit 52 calculates the size of the object based on the coordinates of the detection frame whose coordinates have been converted by the first conversion unit 51 (S104). The calculation unit 52 calculates the size of the object in the real world, for example, based on the coordinates of the diagonal position of the detection frame. The calculation unit 52 outputs the size of the object in the real world to the determination unit 53. In the example of FIG. 3, the calculation unit 52 may calculate the size in the real world corresponding to the difference between the coordinates v1 and v2 as the size (height) of the object.

次に、判定部５３は、算出された大きさが所定範囲内であるか否かを判定する（Ｓ１０５）。判定部５３は、当該大きさが対象物に応じた所定範囲内であるか否かを判定する。判定部５３は、過去のフレームの対象物の大きさの時系列データに基づいて、所定範囲を決定し、決定した所定範囲に基づいて、ステップＳ１０５の判定を行う。判定部５３は、例えば、式７及び式８に示す式に基づいて、所定範囲を決定する。判定部５３は、図５の例では、現フレームでの大きさが所定範囲内に含まれるので、大きさが所定範囲内であると判定する。判定部５３は、判定結果を第２変換部５４に出力する。 Next, the determination unit 53 determines whether or not the calculated size is within a predetermined range (S105). The determination unit 53 determines whether or not the size is within a predetermined range according to the object. The determination unit 53 determines a predetermined range based on the time series data of the size of the object in the past frame, and determines in step S105 based on the determined predetermined range. The determination unit 53 determines a predetermined range based on, for example, the equations shown in the equations 7 and 8. In the example of FIG. 5, the determination unit 53 determines that the size is within the predetermined range because the size of the current frame is included in the predetermined range. The determination unit 53 outputs the determination result to the second conversion unit 54.

第２変換部５４は、大きさが所定範囲内であることを示す判定結果を判定部５３から取得する（Ｓ１０５でＹｅｓ）と、対象物の大きさを更新する（Ｓ１０６）。第２変換部５４は、例えば、過去の数フレームの対象物の大きさと、算出部５２が算出した現フレームの対象物の大きさ、つまり第１変換部５１が座標変換した座標に基づいて算出された対象物の大きさとに基づいて、対象物の大きさを更新する。第２変換部５４は、例えば、過去の数フレームの対象物の大きさに現フレームの対象物の大きさを加えることで、対象物の大きさを更新してもよい。第２変換部５４は、現フレームの大きさを用いて、次のフレームで判定部５３が所定範囲を決定するための推定結果を更新する。なお、判定部５３が決定する所定範囲は、動的に変化してもよい。また、第２変換部５４は、例えば、過去の数フレームの対象物の大きさ、及び、現フレームの対象物の大きさから対象物の大きさの平均値などを算出することで、対象物の大きさを更新してもよい。 When the second conversion unit 54 acquires the determination result indicating that the size is within the predetermined range from the determination unit 53 (Yes in S105), the second conversion unit 54 updates the size of the object (S106). The second conversion unit 54 calculates, for example, based on the size of the object of several frames in the past and the size of the object of the current frame calculated by the calculation unit 52, that is, the coordinates converted by the first conversion unit 51. The size of the object is updated based on the size of the object. The second conversion unit 54 may update the size of the object by, for example, adding the size of the object of the current frame to the size of the object of several frames in the past. The second conversion unit 54 uses the size of the current frame to update the estimation result for the determination unit 53 to determine the predetermined range in the next frame. The predetermined range determined by the determination unit 53 may change dynamically. Further, the second conversion unit 54 calculates, for example, the average value of the sizes of the objects from the sizes of the objects in the past several frames and the sizes of the objects in the current frame. You may update the size of.

次に、第２変換部５４は、ステップＳ１０５でＹｅｓの場合、第１変換部５１が座標変換した対象物の実世界での座標を当該対象物の位置として含む位置情報を出力する（Ｓ１０７）。つまり、第２変換部５４は、第２座標を対象物の位置として出力する。第２出力部５４は、位置情報を出力する出力部としても機能する。 Next, in the case of Yes in step S105, the second conversion unit 54 outputs position information including the coordinates of the object whose coordinates have been converted by the first conversion unit 51 in the real world as the position of the object (S107). .. That is, the second conversion unit 54 outputs the second coordinates as the position of the object. The second output unit 54 also functions as an output unit that outputs position information.

第２変換部５４は、大きさが所定範囲外であることを示す判定結果を判定部５３から取得する（Ｓ１０５でＮｏ）と、過去のフレームの対象物の大きさに基づいて座標変換する（Ｓ１０８）。第２変換部５４は、現フレームの対象物の大きさを用いずに、過去のフレームの対象物の大きさ（例えば、図６に示す推定結果）を用いて、検知部４０からの検知枠リストの検知枠それぞれの座標を座標変換しなおす。第２変換部５４は、対象物の実世界での大きさが過去のフレームの対象物の大きさに基づく代表値（例えば、平均値、最頻値など）となる位置を、当該対象物の実世界での大きさとして座標変換する。第２変換部５４は、例えば、式５に示す高さＨｗｏｒｌｄに代表値を代入して座標変換する。このように、第２変換部５４は、大きさが所定範囲に含まれない場合、所定範囲内の代表値に基づいて、検知部４０からの検知枠の座標を実世界における座標に変換しなおす。 When the second conversion unit 54 acquires a determination result indicating that the size is out of the predetermined range from the determination unit 53 (No in S105), the second conversion unit 54 performs coordinate conversion based on the size of the object in the past frame (No). S108). The second conversion unit 54 uses the size of the object of the past frame (for example, the estimation result shown in FIG. 6) without using the size of the object of the current frame, and the detection frame from the detection unit 40. Reconvert the coordinates of each detection frame in the list. The second conversion unit 54 sets the position where the size of the object in the real world becomes a representative value (for example, average value, mode value, etc.) based on the size of the object in the past frame. Coordinates are converted as the size in the real world. The second conversion unit 54, for example, substitutes a representative value into the height Hworld shown in Equation 5 to perform coordinate conversion. In this way, when the size is not included in the predetermined range, the second conversion unit 54 reconverts the coordinates of the detection frame from the detection unit 40 into the coordinates in the real world based on the representative value within the predetermined range. ..

これにより、位置推定装置３０は、例えば、対象物が道路Ｌと接触していない場合であっても、対象物の実世界での大きさの推定値を正確に取得することができるので、対象物が遠くにいるように推定されてしまうことを抑制することができる。 As a result, the position estimation device 30 can accurately obtain an estimated value of the size of the object in the real world even when the object is not in contact with the road L, so that the target can be obtained. It is possible to prevent an object from being presumed to be far away.

次に、第２変換部５４は、当該第２変換部５４が座標変換した対象物の実世界での座標を当該対象物の位置として含む位置情報を出力する（Ｓ１０７）。言い換えると、第２変換部５４は、ステップＳ１０５でＮｏの場合、第１変換部５１が変換した座標を含まない位置情報を出力する。第２変換部５４は、第３座標を対象物の位置として出力する。 Next, the second conversion unit 54 outputs the position information including the coordinates of the object whose coordinates have been converted by the second conversion unit 54 in the real world as the position of the object (S107). In other words, if No in step S105, the second conversion unit 54 outputs the position information that does not include the coordinates converted by the first conversion unit 51. The second conversion unit 54 outputs the third coordinate as the position of the object.

また、制御部５０は、検知枠リストに含まれる検知枠の数が０である場合（Ｓ１０２でＮｏ）、対象物の位置を推定する処理を終了する。 Further, when the number of detection frames included in the detection frame list is 0 (No in S102), the control unit 50 ends the process of estimating the position of the object.

なお、位置推定装置３０が図８に示す処理を実行するタイミングは、特に限定されないが、例えば、車両１０の走行中、繰り返し実行される。 The timing at which the position estimation device 30 executes the process shown in FIG. 8 is not particularly limited, but is repeatedly executed, for example, while the vehicle 10 is traveling.

以上のように、本実施の形態に係る位置推定装置３０は、過去の数フレームにおける対象物の大きさに基づいて、現フレームの対象物の大きさに異常がないか否か、つまり急激に大きさが変化していないか否かを判定する。そして、位置推定装置３０は、現フレームの対象物の大きさが正常である場合、第１変換部５１の座標変換の結果に基づいて位置情報を生成する。また、位置推定装置３０は、現フレームの対象物の大きさが異常である場合、歩行者Ｕがジャンプ等のイレギュラーな動作をしている、又は、車両１０が物体に乗り上げている等のイレギュラーな挙動をしており、第１変換部５１の座標変換の結果が正しくない可能性があるので、再度、検知部４０の検知枠リストに基づいて座標変換を行う。このとき、第２変換部５４は、過去の数フレームの歩行者Ｕの大きさに基づく値（例えば、大きさの平均値）を、現フレームでの歩行者Ｕの大きさとして座標変換を行う。 As described above, the position estimation device 30 according to the present embodiment is based on the size of the object in the past several frames, and whether or not there is an abnormality in the size of the object in the current frame, that is, suddenly. Determine if the size has not changed. Then, when the size of the object of the current frame is normal, the position estimation device 30 generates position information based on the result of the coordinate conversion of the first conversion unit 51. Further, in the position estimation device 30, when the size of the object of the current frame is abnormal, the pedestrian U is performing an irregular movement such as jumping, or the vehicle 10 is riding on the object. Since the behavior is irregular and the result of the coordinate conversion of the first conversion unit 51 may not be correct, the coordinate conversion is performed again based on the detection frame list of the detection unit 40. At this time, the second conversion unit 54 performs coordinate conversion using a value based on the size of the pedestrian U in the past several frames (for example, the average value of the sizes) as the size of the pedestrian U in the current frame. ..

これにより、位置推定装置３０は、歩行者Ｕがジャンプ等のイレギュラーな動作をしている、又は、車両１０が物体に乗り上げている等のイレギュラーな挙動をしている場合であっても、過去の数フレームの歩行者Ｕの大きさに基づく値を用いて再度座標変換を行うので、対象物の位置を正確に推定することができる。つまり、位置推定装置３０は、歩行者Ｕがジャンプ等のイレギュラーな動作をしている、又は、車両１０が物体に乗り上げている等のイレギュラーな挙動をしている場合であっても、歩行者Ｕの位置の推定を誤ってしまうことを抑制することができる。 As a result, the position estimation device 30 may perform irregular movements such as a pedestrian U jumping or an irregular movement such as a vehicle 10 riding on an object. Since the coordinate conversion is performed again using the value based on the size of the pedestrian U in the past several frames, the position of the object can be estimated accurately. That is, even when the position estimation device 30 is performing irregular movements such as jumping or the vehicle 10 is riding on an object, the position estimation device 30 is performing irregular movements such as jumping. It is possible to prevent the pedestrian U from erroneously estimating the position.

（実施の形態２）
［２−１．位置推定装置の構成］
まずは、本実施の形態に係る位置推定装置の構成について、図９を参照しながら説明する。図９は、本実施の形態に係る記憶部が記憶する対象物それぞれの大きさの標準値を含むテーブルである。本実施の形態に係る位置推定装置は、主に記憶部に記憶されている情報、及び、判定部の判定処理が実施の形態１に係る位置推定装置３０と異なる。つまり、本実施の形態に係る位置推定装置の機能構成は、実施の形態１に係る位置推定装置３０と同じであってもよい。よって、以降において、本実施の形態に係る位置推定装置の機能構成は、実施の形態１に係る位置推定装置３０の機能構成と同じであるものとし、実施の形態１に係る位置推定装置３０と同一の符号を付し、説明を省略する。また、以下では、大きさは、高さを示すものとして説明するが、これに限定されない。大きさは、幅であってもよいし、面積であってもよい。 (Embodiment 2)
[2-1. Position estimation device configuration]
First, the configuration of the position estimation device according to the present embodiment will be described with reference to FIG. FIG. 9 is a table including standard values of the sizes of the objects stored in the storage unit according to the present embodiment. The position estimation device according to the present embodiment is different from the position estimation device 30 according to the first embodiment in that the information mainly stored in the storage unit and the determination process of the determination unit are different. That is, the functional configuration of the position estimation device according to the present embodiment may be the same as that of the position estimation device 30 according to the first embodiment. Therefore, hereinafter, the functional configuration of the position estimation device according to the present embodiment is assumed to be the same as the functional configuration of the position estimation device 30 according to the first embodiment, and the position estimation device 30 according to the first embodiment is used. The same reference numerals are given, and the description thereof will be omitted. Further, in the following, the size will be described as indicating the height, but the size is not limited to this. The size may be width or area.

図９に示すように、本実施の形態に係る記憶部６０は、クラス、所定範囲及び標準値（例えば、高さ）が対応付けられたテーブルを記憶している。クラスの数は特に限定されない。 As shown in FIG. 9, the storage unit 60 according to the present embodiment stores a table to which a class, a predetermined range, and a standard value (for example, a height) are associated with each other. The number of classes is not particularly limited.

所定範囲は、判定部５３が判定に用いるための大きさ（高さ）の範囲である。つまり、所定範囲は、対象物が道路Ｌと接触しているか否かを判定可能な範囲であり、クラスごとに設定される。 The predetermined range is a range of size (height) for the determination unit 53 to use for determination. That is, the predetermined range is a range in which it can be determined whether or not the object is in contact with the road L, and is set for each class.

標準値は、当該クラスにおける標準的な大きさ（高さ）を示す。標準値は、所定範囲内の数値であり、予めクラスごとに設定される。標準値は、例えば、所定範囲の平均値、中央値又は最頻値であるが、これに限定されない。標準値は、所定範囲内の代表値の一例である。代表値は、対象物に応じてあらかじめ設定された標準値であるとも言える。 The standard value indicates the standard size (height) in the class. The standard value is a numerical value within a predetermined range and is set in advance for each class. The standard value is, for example, an average value, a median value, or a mode value in a predetermined range, but is not limited thereto. The standard value is an example of a representative value within a predetermined range. It can be said that the representative value is a standard value set in advance according to the object.

判定部５３は、上記のテーブルに含まれるクラス及び所定範囲を用いて、対象物の大きさが当該対象物に応じた所定範囲に含まれるか否かを判定する。対象物が歩行者Ｕである場合、判定部５３は、画像データに基づく歩行者Ｕの大きさが当該歩行者Ｕに応じた所定範囲である１．４〜２．０ｍに含まれるか否かを判定する。 The determination unit 53 determines whether or not the size of the object is included in the predetermined range corresponding to the object by using the class and the predetermined range included in the above table. When the object is a pedestrian U, the determination unit 53 determines whether or not the size of the pedestrian U based on the image data is included in 1.4 to 2.0 m, which is a predetermined range corresponding to the pedestrian U. To judge.

第２変換部５４は、対象物の大きさが当該対象物に応じた所定範囲外である場合、上記のテーブルに含まれるクラス及び標準値に基づいて、当該対象物の座標の座標変換をしなおす。第２変換部５４は、対象物の大きさが当該対象物に応じた所定範囲外である場合、クラスに応じた固定値に基づいて、当該対象物の座標の座標変換をしなおすとも言える。 When the size of the object is out of the predetermined range according to the object, the second conversion unit 54 performs coordinate conversion of the coordinates of the object based on the classes and standard values included in the above table. fix. When the size of the object is out of the predetermined range corresponding to the object, the second conversion unit 54 can be said to reconvert the coordinates of the object based on the fixed value according to the class.

［２−２．位置推定装置の動作］
続いて、本実施の形態に係る位置推定装置３０の動作について、図１０を参照しながら説明する。図１０は、本実施の形態に係る位置推定装置３０の動作を示すフローチャートである。なお、実施の形態１に係る位置推定装置３０と同一又は類似の動作については、実施の形態１に係る位置推定装置３０の動作と同一の符号を付し、説明を省略する。本実施の形態に係る位置推定装置３０の動作は、図８に示すステップＳ１０８に替えてステップＳ２０８を実行する点が異なるので、その点を中心に説明する。 [2-2. Operation of position estimator]
Subsequently, the operation of the position estimation device 30 according to the present embodiment will be described with reference to FIG. FIG. 10 is a flowchart showing the operation of the position estimation device 30 according to the present embodiment. The same or similar operation as the position estimation device 30 according to the first embodiment is designated by the same reference numerals as the operation of the position estimation device 30 according to the first embodiment, and the description thereof will be omitted. The operation of the position estimation device 30 according to the present embodiment is different in that step S208 is executed instead of step S108 shown in FIG. 8, and this point will be mainly described.

図１０に示すように、判定部５３は、大きさが所定範囲内であるか否かを判定する（Ｓ１０５）。判定部５３は、記憶部６０に記憶されたテーブル（例えば、図９参照）に基づいて、所定範囲を決定し、決定した所定範囲に基づいて、ステップＳ１０５の判定を行う。判定部５３は、例えば、検知部４０が検知した検知枠のクラスを検知枠リストから取得し、取得したクラスに対応する所定範囲を、図９に示すテーブルに基づいて取得する。検知枠リストに複数の検知枠が含まれる場合、検知枠のそれぞれにおいて所定範囲を取得する。テーブルから所定範囲を取得することは、所定範囲を決定することの一例である。 As shown in FIG. 10, the determination unit 53 determines whether or not the size is within a predetermined range (S105). The determination unit 53 determines a predetermined range based on the table stored in the storage unit 60 (see, for example, FIG. 9), and determines in step S105 based on the determined predetermined range. For example, the determination unit 53 acquires the class of the detection frame detected by the detection unit 40 from the detection frame list, and acquires a predetermined range corresponding to the acquired class based on the table shown in FIG. When a plurality of detection frames are included in the detection frame list, a predetermined range is acquired in each of the detection frames. Obtaining a predetermined range from a table is an example of determining a predetermined range.

判定部５３は、図９に基づいて決定した所定範囲を用いて、ステップＳ１０５の判定を行う。判定部５３は、検知枠リストに複数の検知枠が含まれる場合、検知枠のそれぞれにおいて当該検知枠に応じた所定範囲を用いてステップＳ１０５の判定を行う。 The determination unit 53 determines in step S105 using a predetermined range determined based on FIG. 9. When a plurality of detection frames are included in the detection frame list, the determination unit 53 determines in step S105 using a predetermined range corresponding to the detection frame in each of the detection frames.

第２変換部５４は、大きさが所定範囲外であることを示す判定結果を判定部５３から取得する（Ｓ１０５でＮｏ）と、対象物に応じた代表値に基づいて、座標変換しなおす（Ｓ２０８）。具体的には、第２変換部５４は、対象物に応じた大きさの代表値（例えば、図９に示す標準値）を用いて、検知部４０からの検知枠リストの検知枠それぞれの座標を実世界の座標に座標変換しなおす。第２変換部５４は、対象物の実世界での大きさが代表値となる位置を、当該対象物の実世界での位置として座標変換する。第２変換部５４は、例えば、式５に示す高さＨｗｏｒｌｄに代表値を代入することで座標変換する。 When the second conversion unit 54 acquires a determination result indicating that the size is out of the predetermined range from the determination unit 53 (No in S105), the second conversion unit 54 reconverts the coordinates based on the representative value corresponding to the object (No). S208). Specifically, the second conversion unit 54 uses the representative value of the size corresponding to the object (for example, the standard value shown in FIG. 9), and the coordinates of each detection frame in the detection frame list from the detection unit 40. Is reconverted to the coordinates of the real world. The second conversion unit 54 coordinates the position where the size of the object in the real world is a representative value as the position of the object in the real world. The second conversion unit 54 performs coordinate conversion by, for example, substituting a representative value into the height Hworld shown in Equation 5.

このように、第２変換部５４は、算出部５２が算出した大きさが所定範囲内に含まれない場合、対象物が属するクラスに応じた大きさの標準値を代表値として取得し、取得した代表値に基づいて、検知部４０の検知枠の座標を実世界の座標に座標変換しなおす。 In this way, when the size calculated by the calculation unit 52 is not included in the predetermined range, the second conversion unit 54 acquires and acquires a standard value of the size corresponding to the class to which the object belongs as a representative value. Based on the representative value, the coordinates of the detection frame of the detection unit 40 are reconverted to the coordinates of the real world.

これにより、位置推定装置３０は、例えば、対象物が道路Ｌと接触していない場合であっても、代表値に基づいて対象物の実世界での大きさの推定値を取得することができるので、対象物が遠くにいるように推定されてしまうことを抑制することができる。 As a result, the position estimation device 30 can acquire an estimated value of the size of the object in the real world based on the representative value even when the object is not in contact with the road L, for example. Therefore, it is possible to prevent the object from being presumed to be far away.

以上のように、本実施の形態に係る位置推定装置３０は、対象物のクラスごとに設定された大きさの範囲（所定範囲）に基づいて、現フレームの対象物の大きさに異常がないか否か、つまり急激に大きさが変化していないか否かを判定する。そして、位置推定装置３０は、現フレームの対象物の大きさが正常である場合、第１変換部５１の座標変換の結果に基づいて位置情報を生成し、現フレームの対象物の大きさが異常である場合、歩行者Ｕがジャンプ等のイレギュラーな動作をしている、又は、車両１０が物体に乗り上げている等のイレギュラーな挙動をしており、第１変換部５１の座標変換の結果が正しくない可能性があるので、再度、検知部４０の検知枠リストに基づいて座標変換を行う。このとき、第２変換部５４は、対象物のクラスごとの標準的な大きさを示す標準値（代表値の一例）を、現フレームでの歩行者Ｕの大きさとして座標変換を行う。 As described above, the position estimation device 30 according to the present embodiment has no abnormality in the size of the object of the current frame based on the size range (predetermined range) set for each class of the object. Whether or not, that is, whether or not the size has changed suddenly is determined. Then, when the size of the object of the current frame is normal, the position estimation device 30 generates position information based on the result of the coordinate conversion of the first conversion unit 51, and the size of the object of the current frame is increased. If it is abnormal, the pedestrian U is performing irregular movements such as jumping, or the vehicle 10 is riding on an object or other irregular behavior, and the coordinates of the first conversion unit 51 are converted. Since there is a possibility that the result of the above is not correct, the coordinate conversion is performed again based on the detection frame list of the detection unit 40. At this time, the second conversion unit 54 performs coordinate conversion using a standard value (an example of a representative value) indicating a standard size for each class of the object as the size of the pedestrian U in the current frame.

これにより、位置推定装置３０は、歩行者Ｕがジャンプ等のイレギュラーな動作をしている、又は、車両１０が物体に乗り上げている等のイレギュラーな挙動をしている場合であっても、対象物のクラスごとの標準的な大きさを用いて再度座標変換を行うので、対象物の位置を正確に推定することができる。つまり、位置推定装置３０は、歩行者Ｕがジャンプ等のイレギュラーな動作をしている、又は、車両１０が物体に乗り上げている等のイレギュラーな挙動をしている場合であっても、歩行者Ｕの位置の推定を誤ってしまうことを抑制することができる。 As a result, the position estimation device 30 may perform irregular movements such as a pedestrian U jumping or an irregular movement such as a vehicle 10 riding on an object. Since the coordinate conversion is performed again using the standard size of each class of the object, the position of the object can be estimated accurately. That is, even when the position estimation device 30 is performing irregular movements such as jumping or the vehicle 10 is riding on an object, the position estimation device 30 is performing irregular movements such as jumping. It is possible to prevent the pedestrian U from erroneously estimating the position.

（その他の実施の形態）
以上、一つまたは複数の態様に係る位置推定方法等について、実施の形態に基づいて説明したが、本開示は、この実施の形態に限定されるものではない。本開示の趣旨を逸脱しない限り、当業者が思いつく各種変形を本実施の形態に施したものや、異なる実施の形態における構成要素を組み合わせて構築される形態も、本開示に含まれてもよい。 (Other embodiments)
Although the position estimation method and the like according to one or more aspects have been described above based on the embodiment, the present disclosure is not limited to this embodiment. As long as the purpose of the present disclosure is not deviated, various modifications that can be conceived by those skilled in the art are applied to the present embodiment, and a form constructed by combining components in different embodiments may also be included in the present disclosure. ..

例えば、上記実施の形態等では、検知部は、学習済みモデルを用いて物体検知を行う例について説明したが、画像データにおける物体検知の手法は、既知のいかなる技術が用いられてもよい。検知部は、例えば、パターンマッチングにより対象物を検知してもよい。 For example, in the above-described embodiment and the like, the detection unit has described an example in which the object detection is performed using the trained model, but any known technique may be used as the object detection method in the image data. The detection unit may detect an object by pattern matching, for example.

また、上記実施の形態等に係る位置推定装置は、複数の装置により実現されてもよい。位置推定装置が複数の装置によって実現される場合、当該位置推定装置が有する各構成要素は、複数の装置にどのように振り分けられてもよい。また、位置推定装置が備える各構成要素の少なくとも１つは、サーバ装置により実現されてもよい。例えば、第１変換部、算出部、判定部及び第２変換部などの処理部の少なくとも１つは、サーバ装置により実現されてもよい。また、位置推定装置がサーバ装置を含む複数の装置で実現される場合、当該位置推定装置が備える装置間の通信方法は、特に限定されず、無線通信であってもよいし、有線通信であってもよい。また、装置間では、無線通信および有線通信が組み合わされてもよい。 Further, the position estimation device according to the above embodiment and the like may be realized by a plurality of devices. When the position estimation device is realized by a plurality of devices, each component of the position estimation device may be distributed to the plurality of devices in any way. Further, at least one of the components included in the position estimation device may be realized by the server device. For example, at least one of the processing units such as the first conversion unit, the calculation unit, the determination unit, and the second conversion unit may be realized by the server device. When the position estimation device is realized by a plurality of devices including the server device, the communication method between the devices included in the position estimation device is not particularly limited, and may be wireless communication or wired communication. You may. Further, wireless communication and wired communication may be combined between the devices.

また、フローチャートにおける各ステップが実行される順序は、本開示を具体的に説明するために例示するためのものであり、上記以外の順序であってもよい。また、上記ステップの一部が他のステップと同時（並列）に実行されてもよいし、上記ステップの一部は実行されなくてもよい。 Further, the order in which each step in the flowchart is executed is for exemplifying the present disclosure in detail, and may be an order other than the above. Further, a part of the above steps may be executed simultaneously with other steps (parallel), or a part of the above steps may not be executed.

また、ブロック図における機能ブロックの分割は一例であり、複数の機能ブロックを一つの機能ブロックとして実現したり、一つの機能ブロックを複数に分割したり、一部の機能を他の機能ブロックに移してもよい。また、類似する機能を有する複数の機能ブロックの機能を単一のハードウェア又はソフトウェアが並列又は時分割に処理してもよい。 Further, the division of the functional block in the block diagram is an example, and a plurality of functional blocks can be realized as one functional block, one functional block can be divided into a plurality of functional blocks, and some functions can be transferred to other functional blocks. You may. Further, the functions of a plurality of functional blocks having similar functions may be processed by a single hardware or software in parallel or in a time division manner.

また、上記実施の形態における位置推定装置が備える構成要素の一部または全部は、１個のシステムＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ：大規模集積回路）から構成されているとしてもよい。 Further, a part or all of the components included in the position estimation device in the above embodiment may be composed of one system LSI (Large Scale Integration: large-scale integrated circuit).

システムＬＳＩは、複数の処理部を１個のチップ上に集積して製造された超多機能ＬＳＩであり、具体的には、マイクロプロセッサ、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）などを含んで構成されるコンピュータシステムである。ＲＯＭには、コンピュータプログラムが記憶されている。マイクロプロセッサが、コンピュータプログラムに従って動作することにより、システムＬＳＩは、その機能を達成する。 The system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of processing units on one chip, and specifically, a microprocessor, a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. It is a computer system composed of. A computer program is stored in the ROM. The system LSI achieves its function by operating the microprocessor according to the computer program.

また、本開示の一態様は、図８又は図１０などに示す位置推定方法に含まれる特徴的な各ステップをコンピュータに実行させるコンピュータプログラムであってもよい。例えば、プログラムは、コンピュータに実行させるためのプログラムであってもよい。また、本開示の一態様は、そのようなプログラムが記録された、コンピュータ読み取り可能な非一時的な記録媒体であってもよい。例えば、そのようなプログラムを記録媒体に記録して頒布又は流通させてもよい。例えば、頒布されたプログラムを、他のプロセッサを有する装置にインストールして、そのプログラムをそのプロセッサに実行させることで、その装置に、上記各処理を行わせることが可能となる。 Further, one aspect of the present disclosure may be a computer program that causes a computer to execute each characteristic step included in the position estimation method shown in FIG. 8 or FIG. For example, the program may be a program to be executed by a computer. Also, one aspect of the present disclosure may be a computer-readable, non-temporary recording medium on which such a program is recorded. For example, such a program may be recorded on a recording medium and distributed or distributed. For example, by installing the distributed program on a device having another processor and causing the processor to execute the program, it is possible to cause the device to perform each of the above processes.

本開示は、単眼カメラで撮像した画像データを用いて対象物の位置を推定する位置推定装置に有用である。 The present disclosure is useful for a position estimation device that estimates the position of an object using image data captured by a monocular camera.

１０車両
２０単眼カメラ
３０位置推定装置
４０検知部
５０制御部
５１第１変換部
５２算出部
５３判定部
５４第２変換部
６０記憶部
Ｄ画像データ
Ｆ検知枠
Ｈｉｍｇ、Ｈｗｏｒｌｄ高さ
Ｌ道路
Ｍ、Ｕ歩行者（対象物）
Ｐ１、Ｐ２位置
Ｐａ、Ｐｂ、Ｐｂ１、Ｐｂ２座標
Ｓｉｍｇ、Ｓｗｏｒｌｄ面積 10 Vehicle 20 Monocular camera 30 Position estimation device 40 Detection unit 50 Control unit 51 First conversion unit 52 Calculation unit 53 Judgment unit 54 Second conversion unit 60 Storage unit D Image data F Detection frame Himg, Hworld Height L Road M, U Pedestrian (object)
P1, P2 Position Pa, Pb, Pb1, Pb2 Coordinates Simg, Lead Area

Claims

Obtain image data including an object from a monocular camera and
The first coordinate of the object in the image data is converted into the second coordinate in the real world.
The size of the object is calculated based on the second coordinates.
It is determined whether or not the size is included in a predetermined range according to the object, and the size is determined.
A position estimation method in which when the size is not included in the predetermined range, the first coordinate is converted back into a third coordinate in the real world based on a representative value within the predetermined range.

The position estimation method according to claim 1, wherein the representative value is a standard value preset according to the object.

The standard value is set for each class of the object, and is set.
If the size is not within the predetermined range
The position estimation according to claim 2, wherein the standard value corresponding to the class to which the object belongs is acquired as the representative value, and the first coordinate is converted back to the third coordinate based on the acquired representative value. Method.

The position estimation method according to claim 1, wherein the representative value is determined based on time-series data of the size of the object based on image data acquired in the past.

When the size is included in the predetermined range, any one of claims 1 to 4 for updating the size of the object based on the size in the current frame and the size of the object in the past frame. The position estimation method according to item 1.

An acquisition unit that acquires image data including an object from a monocular camera,
A first conversion unit that converts the first coordinate of the object in the image data to the second coordinate in the real world,
A calculation unit that calculates the size of the object based on the second coordinates,
A determination unit for determining whether or not the size is included in a predetermined range according to the object,
A position estimation device including a second conversion unit that reconverts the first coordinate to a third coordinate based on a representative value within the predetermined range when the size is not included in the predetermined range.

A program for causing a computer to execute the position estimation method according to any one of claims 1 to 5.