JP7117878B2

JP7117878B2 - processor and program

Info

Publication number: JP7117878B2
Application number: JP2018066467A
Authority: JP
Inventors: 佑記名和; 道昌井東; 圭吾多田; 忠関原; 純一気屋村; 将城榊原; 安利深谷
Original assignee: NEC Solutions Innovators Ltd; Tokai Rika Co Ltd
Current assignee: NEC Solutions Innovators Ltd; Tokai Rika Co Ltd
Priority date: 2018-03-30
Filing date: 2018-03-30
Publication date: 2022-08-15
Anticipated expiration: 2038-03-30
Also published as: JP2019179288A

Description

本発明は、処理装置、及びプログラムに関し、特に、撮像画像に基づいて検出処理を行なう処理装置、及びプログラムに関する。 The present invention relates to a processing device and program, and more particularly to a processing device and program that perform detection processing based on a captured image.

従来、２次元平面における座標情報を有する乗員が映り込んだ画像を撮像し、その撮像した画像を画像処理することで、乗員の特徴点（例えば、関節点）を特定する技術がある（例えば、特許文献１参照）。 Conventionally, there is a technique of capturing an image in which an occupant having coordinate information on a two-dimensional plane is captured and processing the captured image to identify characteristic points (for example, joint points) of the occupant (for example, See Patent Document 1).

特許文献１の処理装置は、複数の操作対象，および，該操作対象を操作可能な位置に存在する操作者を含む画像を繰り返し取得する画像取得手段と、該画像取得手段により繰り返し取得される画像毎に、該画像に含まれる操作者における所定の人体特徴点を特定する人体特徴点特定手段と、該人体特徴点特定手段により特定された画像毎の人体特徴点それぞれに基づいて、操作者が実施しようとしている操作内容を推定する操作推定手段と、を備えており、該操作推定手段は、操作対象を操作する操作者が辿ると推定される姿勢軌跡（推定姿勢軌跡）をその操作内容毎にモデル化してなる遷移推定モデルそれぞれと、前記人体特徴点特定手段により特定された画像毎の人体特徴点それぞれから求められる操作者の姿勢軌跡（実姿勢軌跡）とを照合し、前記遷移推定モデルでモデル化された推定姿勢軌跡のうち、前記実姿勢軌跡との近似度が所定のしきい値を満たす前記推定姿勢軌跡につき、該推定姿勢軌跡に対応する操作内容を操作者が実施しようとしていると推定するように処理装置が構成されている。 The processing device of Patent Document 1 includes image acquisition means for repeatedly acquiring images including a plurality of operation targets and an operator existing at a position where the operation targets can be operated, and images repeatedly acquired by the image acquisition means. each time, based on the human body feature point specifying means for specifying a predetermined human body feature point of the operator included in the image, and the human body feature point for each image specified by the human body feature point specifying means, the operator an operation estimating means for estimating the content of an operation to be performed, the operation estimating means estimating a posture trajectory (estimated posture trajectory) that is estimated to be followed by an operator who operates an operation target for each operation content. with the operator's posture trajectory (actual posture trajectory) obtained from each of the human body feature points for each image specified by the human body feature point specifying means; Among the estimated posture trajectories modeled in , for the estimated posture trajectory whose degree of approximation to the actual posture trajectory satisfies a predetermined threshold value, the operator is about to perform the operation content corresponding to the estimated posture trajectory. The processing unit is configured to estimate that

特開２００８－１４０２６８号公報Japanese Patent Application Laid-Open No. 2008-140268

しかし、特許文献１の技術では、乗員の特徴点を、２次元平面での座標情報を有した画像から特定しているため、その特徴点の特定精度を向上させることが要求されている。 However, in the technique of Patent Document 1, since the feature points of the occupant are specified from the image having the coordinate information on the two-dimensional plane, it is required to improve the accuracy of specifying the feature points.

したがって、本発明の目的は、人体の特徴点を特定する技術において、その特徴点の特定精度を向上させる処理装置、及びプログラムを提供することにある。 SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide a processing apparatus and a program for improving the accuracy of specifying feature points in a technique for specifying feature points of a human body.

［１］上記目的を達成するため、車両の乗員を含む撮像領域を撮像対象とした距離画像であって、前記撮像対象までの距離情報を画素に割り当てた距離画像を取得する取得部と、前記車両の車室内空間における座標系と前記距離画像における座標系との対応関係に従って、特定の座標系を有した検出画像へと変換し、その検出画像において、前記乗員の特徴点を少なくとも１つ検出する特徴点検出処理を実行する制御部と、を有する処理装置を提供する。
［２］前記制御部は、規定された１つの特徴点を特定し、その規定された１つの特徴点を基準として、予め規定された特徴点探索モデルを順次照合して残りの特徴点を検出することを、前記特徴点検出処理として実行する、上記［１］に記載の処理装置であってもよい。
［３］また、前記制御部は、前記特徴点の周囲に位置するボクセルを含めたボクセルの集合を考え、前記ボクセルの集合におけるボクセル値の合計が閾値未満かどうかにより、前記特徴点の確からしさを判断する閾値処理を、前記特徴点検出処理の１つとして実行する、上記［１］又は［２］に記載の処理装置であってもよい。
［４］また、前記制御部は、前記ボクセル値の合計が閾値未満かどうかを判定する、前記ボクセルの集合を、規定された１つの特徴点である起点から当該特徴点とは別の特徴点である対象特徴点までの間を所定間隔で存在するボクセル群とする、上記［３］に記載の処理装置であってもよい。
［５］また、前記制御部は、前記起点から前記対象特徴点までの全てのボクセル群の各々におけるボクセル値の合計が閾値以上であれば、前記対象特徴点を確からしいものとする、上記［４］に記載の処理装置であってもよい。
［６］また、前記制御部は、前記起点から前記対象特徴点までのボクセル群において、規定された距離を示す個数のボクセル群におけるボクセル値の合計が閾値未満であっても、前記対象特徴点を確からしいものとする、上記［４］又は［５］に記載の処理装置であってもよい。
［７］また、前記制御部は、前記特徴点が未検出の場合、前フレームまでの検出結果を用いて補完処理を行う、上記［１］から［６］のいずれか１に記載の処理装置であってもよい。
［８］また、前記規定された１つの特徴点は、頭部である、上記［１］から［７］のいずれか１に記載の処理装置であってもよい。
［９］上記目的を達成するため、車両の乗員を含む撮像領域を撮像対象とした距離画像であって、前記撮像対象までの距離情報を画素に割り当てた距離画像を取得する取得ステップと、車両の車室内空間における座標系と距離画像における座標系との対応関係に従って、特定の座標系を有した検出画像へと変換し、その検出画像において、乗員の特徴点を少なくとも１つ検出する特徴点検出処理を実行する制御ステップとを、コンピュータに実行させるためのプログラムを提供する。
［１０］前記制御ステップは、規定された１つの特徴点を特定し、その規定された１つの特徴点を基準として、予め規定された特徴点探索モデルを順次照合して残りの特徴点を検出することを、前記特徴点検出処理として実行する、上記［９］に記載のプログラムであってもよい。
［１１］前記制御ステップは、前記特徴点の周囲に位置するボクセルを含めたボクセルの集合を考え、前記ボクセルの集合におけるボクセル値の合計が閾値未満かどうかにより、前記特徴点の確からしさを判断する閾値処理を、前記特徴点検出処理の１つとして実行する、上記［９］又は［１０］に記載のプログラムであってもよい。
［１２］また、前記制御ステップは、前記ボクセル値の合計が閾値未満かどうかを判定する、前記ボクセルの集合を、規定された１つの特徴点である起点から当該特徴点とは別の特徴点である対象特徴点までの間を所定間隔で存在するボクセル群とする、上記［１１］に記載のプログラムであってもよい。
［１３］また、前記制御ステップは、前記起点から前記対象特徴点までの全てのボクセル群の各々におけるボクセル値の合計が閾値以上であれば、前記対象特徴点を確からしいものとする、上記［１２］に記載のプログラムであってもよい。
［１４］また、前記制御ステップは、前記起点から前記対象特徴点までのボクセル群において、規定された距離を示す個数のボクセル群におけるボクセル値の合計が閾値未満であっても、前記対象特徴点を確からしいものとする、上記［１２］又は［１３］に記載のプログラムであってもよい。
［１５］また、前記制御ステップは、前記特徴点が未検出の場合、前フレームまでの検出結果を用いて補完処理を行う、上記［９］から［１４］のいずれか１に記載のプログラムであってもよい。
［１６］また、前記制御ステップは、前記規定された１つの特徴点は、頭部である、上記［９］から［１５］のいずれか１に記載のプログラムであってもよい。 [1] In order to achieve the above object, an acquisition unit for acquiring a distance image in which an imaging area including an occupant of a vehicle is an imaging target, the distance image obtained by assigning distance information to the imaging target to pixels; According to the correspondence relationship between the coordinate system in the interior space of the vehicle and the coordinate system in the range image, the detection image is converted into a detection image having a specific coordinate system, and at least one feature point of the occupant is detected in the detection image. and a control unit that executes feature point detection processing.
[2] The control unit identifies one defined feature point, and with the one defined feature point as a reference, sequentially collates predefined feature point search models to detect the remaining feature points. The processing device according to the above [1] may be configured to execute, as the feature point detection processing, to
[3] Further, the control unit considers a set of voxels including voxels located around the feature point, and determines whether the sum of voxel values in the set of voxels is less than a threshold, and determines the likelihood of the feature point. The processing device according to the above [1] or [2] may be configured to execute a threshold process for determining as one of the feature point detection processes.
[4] Further, the control unit determines whether the sum of the voxel values is less than a threshold value, and divides the set of voxels from an origin that is one specified feature point to another feature point The processing apparatus according to [3] above may be used, wherein the voxel group exists at a predetermined interval between the target feature point.
[5] In addition, if the sum of voxel values in each of all voxel groups from the starting point to the target feature point is equal to or greater than a threshold value, the control unit determines that the target feature point is probable. 4] may be used.
[6] In addition, in the voxel group from the starting point to the target feature point, even if the sum of the voxel values in the voxel group of the number indicating the specified distance is less than a threshold, the target feature point The processing apparatus according to the above [4] or [5] may be one in which it is probable that
[7] Further, the processing device according to any one of [1] to [6] above, wherein, when the feature point is not detected, the control unit performs complementary processing using detection results up to the previous frame. may be
[8] The processing device according to any one of [1] to [7] above, wherein the defined one feature point is a head.
[9] In order to achieve the above object, an acquisition step of acquiring a distance image in which an imaging area including an occupant of a vehicle is an imaging target, the distance image obtained by assigning distance information to the imaging target to pixels; A feature inspection that converts into a detection image having a specific coordinate system according to the correspondence relationship between the coordinate system in the vehicle interior space and the coordinate system in the range image, and detects at least one feature point of the occupant in the detection image. A program for causing a computer to execute a control step for executing output processing is provided.
[10] The control step identifies one specified feature point, and with the one specified feature point as a reference, sequentially collates predefined feature point search models to detect the remaining feature points. The program according to the above [9] may be executed as the feature point detection process.
[11] The control step considers a set of voxels including voxels located around the feature point, and determines the likelihood of the feature point based on whether the sum of voxel values in the set of voxels is less than a threshold. The program according to the above [9] or [10] may be configured to execute the threshold processing for detecting the feature points as one of the feature point detection processing.
[12] In addition, the control step determines whether the sum of the voxel values is less than a threshold, and divides the set of voxels from an origin that is one specified feature point to another feature point. The program according to the above [11] may be such that a voxel group exists at a predetermined interval from the target feature point to .
[13] In addition, in the control step, if the sum of voxel values in each of all voxel groups from the starting point to the target feature point is equal to or greater than a threshold, the target feature point is likely to be the above [ 12].
[14] Further, in the control step, in the voxel group from the starting point to the target feature point, even if the sum of the voxel values in the voxel group of the number indicating the prescribed distance is less than a threshold value, the target feature point may be the program described in [12] or [13] above, which makes it probable that
[15] Further, in the program according to any one of [9] to [14] above, in the control step, when the feature point has not been detected, interpolation processing is performed using the detection result up to the previous frame. It can be.
[16] Further, the control step may be the program according to any one of [9] to [15] above, wherein the defined single feature point is the head.

本発明の処理装置、及びプログラムによれば、人体の特徴点を特定する技術において、その特徴点の特定精度を向上させることができる。 According to the processing device and the program of the present invention, it is possible to improve the accuracy of specifying the feature points in the technique of specifying the feature points of the human body.

図１（ａ）は、人体の各骨格のサイズ、関節の可動範囲を定義したスケルトンモデルの図であり、図１（ｂ）は、人体の各骨格のサイズ、関節の可動範囲の定義例を示す図表であり、図１（ｃ）は、各部位（肩、肘、手首、手）の可動範囲（スケルトンモデル定義の中で規定しているなす角）の一例として、前腕部を示した図である。FIG. 1(a) is a diagram of a skeleton model defining the size of each skeleton of the human body and the range of motion of joints, and FIG. 1(b) is an example of defining the size of each skeleton of the human body and the range of motion of joints. FIG. 1(c) is a diagram showing the forearm as an example of the movable range (angle defined in the skeleton model definition) of each part (shoulder, elbow, wrist, hand) is. 図２は、本発明の実施の形態に係る処理装置が車両に搭載された状態の座標系の関係を３次元的に示す座標図である。FIG. 2 is a coordinate diagram three-dimensionally showing the relationship of the coordinate system when the processing device according to the embodiment of the present invention is mounted on a vehicle. 図３は、３次元画素群の例を示す図である。FIG. 3 is a diagram showing an example of a three-dimensional pixel group. 図４（ａ）は、車室内空間の広さを考慮したボクセル作成の一例を示す図であり、図４（ｂ）は、ボクセル作成パラメータ設定例を示す図表である。FIG. 4(a) is a diagram showing an example of voxel creation considering the size of the vehicle interior space, and FIG. 4(b) is a chart showing an example of voxel creation parameter settings. 図５（ａ）は、スケルトンモデルで定義した長さおよび可動範囲に含まれるボクセルの探索範囲を示す図であり、図５（ｂ）は、探索範囲絞り込みパラメータの例を示す図表である。FIG. 5(a) is a diagram showing a search range of voxels included in the length and movable range defined by the skeleton model, and FIG. 5(b) is a chart showing examples of search range narrowing parameters. 図６は、周囲１ボクセルを含めた３×３×３のボクセルの例を示す図である。FIG. 6 is a diagram showing an example of 3×3×3 voxels including one surrounding voxel. 図７は、起点Ｐとの間をＸ［ｍｍ］間隔で、３×３×３のボクセルが並べられた例を示す図である。FIG. 7 is a diagram showing an example in which 3×3×3 voxels are arranged at intervals of X [mm] from the starting point P. FIG. 図８は、各部位検出におけるパラメータ設定の例を示す図表である。FIG. 8 is a chart showing an example of parameter setting in each part detection. 図９は、本発明の実施の形態に係る処理装置の動作を示すフローチャートである。FIG. 9 is a flow chart showing the operation of the processing device according to the embodiment of the present invention. 図１０は、検出した肩、肘、手首、手の位置を画像上に表示した図である。FIG. 10 is a diagram showing the detected shoulder, elbow, wrist, and hand positions on an image.

（本発明の実施の形態）
本発明の実施の形態に係る処理装置１は、車両８の乗員５を含む撮像領域を撮像対象とした距離画像であって、撮像対象までの距離情報を画素に割り当てた距離画像を取得する取得部としてのＴＯＦカメラ１０と、車両８の車室内空間における座標系と距離画像における座標系との対応関係に従って、特定の座標系を有した検出画像へと変換し、その検出画像において、乗員５の特徴点を少なくとも１つ検出する特徴点検出処理を実行する制御部２０と、を有して構成されている。 (Embodiment of the present invention)
The processing device 1 according to the embodiment of the present invention acquires a distance image obtained by assigning distance information to the imaging target to pixels, which is a distance image whose imaging target is an imaging region including an occupant 5 of a vehicle 8. According to the correspondence relationship between the TOF camera 10 as a unit and the coordinate system in the interior space of the vehicle 8 and the coordinate system in the range image, the detection image is converted into a detection image having a specific coordinate system. and a control unit 20 that executes a feature point detection process for detecting at least one feature point.

本実施の形態に係る処理装置１は、ＴＯＦカメラ１０を用いて、車室内乗員の肩、肘、手首、手といった部位を特徴点として検出する特徴点検出処理を行なうものである。ＴＯＦカメラ１０から得られる距離画像の各画素を３次元空間上に変換した画素群（３次元画素群）を作成し、頭部を起点として、スケルトンモデル（人体の骨格サイズ、可動範囲の定義）をもとに肩、肘、手首、手といった部位を特徴点として検出する。 The processing device 1 according to the present embodiment uses the TOF camera 10 to perform feature point detection processing for detecting parts such as the shoulders, elbows, wrists, and hands of the vehicle occupant as feature points. A pixel group (three-dimensional pixel group) is created by transforming each pixel of the range image obtained from the TOF camera 10 into a three-dimensional space, and a skeleton model (definition of the skeleton size and movable range of the human body) is created with the head as the starting point. Based on this, parts such as shoulders, elbows, wrists, and hands are detected as feature points.

なお、画素群とは、ＴＯＦカメラ１０により撮像された画素に対応する点の集合であって、車両８の車室内空間における座標系、ＴＯＦカメラ１０における座標系等により位置が表示される３次元空間の点の集合である。 Note that the pixel group is a set of points corresponding to the pixels captured by the TOF camera 10, and is a three-dimensional image whose position is displayed by a coordinate system in the vehicle interior space of the vehicle 8, a coordinate system in the TOF camera 10, or the like. A set of points in space.

（スケルトンモデル）
スケルトンモデルは、図１（ａ）に示すように、人体の各骨格のサイズ、関節の可動範囲を定義したものであって、一例を図１（ｂ）に示す。この定義は例えば、産業技術総合研究所が公開している人体寸法データベースや、独立行政法人製品評価技術基盤機構が公開している人間特性データベース等の情報をもとに決めることができる。各部位（肩、肘、手首、手）の可動範囲（スケルトンモデル定義の中で規定しているなす角）は、前腕部を例として図１（ｃ）のように定義する。 (skeleton model)
The skeleton model, as shown in FIG. 1(a), defines the size of each skeleton of the human body and the movable range of joints, and an example is shown in FIG. 1(b). This definition can be determined, for example, based on information such as the human body size database published by the National Institute of Advanced Industrial Science and Technology and the human characteristics database published by the National Institute of Technology and Evaluation. The movable range (angle defined in the skeleton model definition) of each part (shoulder, elbow, wrist, hand) is defined as shown in FIG. 1(c), taking the forearm as an example.

（ＴＯＦカメラ１０）
取得部としては撮像対象の３次元認識が可能なものであれば使用可能であるが、本実施の形態では、取得部としてＴＯＦ（Time Of Flight）カメラ１０を使用する。ＴＯＦカメラ１０は、光源の光が測定対象物に当たって戻るまでの時間を画素毎に検出し、奥行き方向の距離に相当する位置情報を含む立体的な画像を撮影できる。ＴＯＦカメラ１０は、赤外光等を発光後、その光が物体に反射して戻ってきた反射光を受光し、発光から受光までの時間を測定して、画素ごとに撮像対象物までの距離を検出する。 (TOF camera 10)
Any acquisition unit capable of three-dimensional recognition of an imaging target can be used. In the present embodiment, a TOF (Time Of Flight) camera 10 is used as the acquisition unit. The TOF camera 10 can detect the time for each pixel until the light from the light source hits the object to be measured and returns, and can capture a three-dimensional image including position information corresponding to the distance in the depth direction. After emitting infrared light or the like, the TOF camera 10 receives the reflected light that is reflected by the object and returns, measures the time from the light emission to the light reception, and calculates the distance to the imaging object for each pixel. to detect

取得部としてのＴＯＦカメラ１０は、車両８の乗員５を含む撮像領域を撮像対象とした距離画像であって、撮像対象までの距離情報を各画素に割り当てた距離画像を取得する。この距離画像は、所定の時間間隔で撮像されたフレームの１フレームとして取得することができる。取得された距離画像は、以下の処理における入力（３次元画素群）として機能する。距離画像は、上記説明したカメラ座標系、車両座標系へ変換される。これにより、検出エリアに含まれる３次元画素群を抽出できる。 The TOF camera 10 as an acquisition unit acquires a distance image in which an imaging area including the occupant 5 of the vehicle 8 is an imaging target, and in which distance information to the imaging target is assigned to each pixel. This distance image can be obtained as one frame of frames captured at predetermined time intervals. The acquired range image functions as an input (three-dimensional pixel group) in the following processing. The distance image is transformed into the camera coordinate system and vehicle coordinate system described above. Thereby, a three-dimensional pixel group included in the detection area can be extracted.

ＴＯＦカメラ１０は、例えば、図２に示すように、ルームミラー付近に取り付け、車両８の乗員５を含む撮像領域を撮像対象とする。ＴＯＦカメラ１０による撮像画像は、座標（ｕ，ｖ）と、この座標（ｕ，ｖ）における奥行情報としての画素値ｄ（ｕ，ｖ）を含む。取得された画素値ｄ（ｕ，ｖ）（ｕ＝０、１、…Ｕ－１，ｖ＝０、１、…Ｖ－１）は、撮像対象としての物体（ここで言う物体は、運転者や車載機器など）までの距離を意味する。なお、実施形態における符号Ｕは、撮像画像における横幅［ｐｉｘｅｌ］を意味し、符号Ｖは、撮像画像における縦幅［ｐｉｘｅｌ］を意味する。つまり、ＴＯＦカメラ１０で撮像された画像における各画素に、３次元空間における距離情報を割り当てることで距離画像が生成される。 For example, as shown in FIG. 2, the TOF camera 10 is attached near the rearview mirror, and captures an imaging area including the occupant 5 of the vehicle 8 as an imaging target. An image captured by the TOF camera 10 includes coordinates (u, v) and a pixel value d(u, v) as depth information at the coordinates (u, v). The obtained pixel values d(u, v) (u=0, 1, . . . U−1, v=0, 1, . or in-vehicle equipment). In addition, the code|symbol U in embodiment means the horizontal width [pixel] in a captured image, and the code|symbol V means the vertical width [pixel] in a captured image. That is, a distance image is generated by assigning distance information in a three-dimensional space to each pixel in the image captured by the TOF camera 10 .

（距離画像から得られるカメラ座標系）
距離画像における座標値ｕ，ｖと、座標（ｕ，ｖ）における画素値ｄ（ｕ，ｖ）は、以下の式を用いて３次元空間上の点（ｘ，ｙ，ｚ）に変換することで、図２に示すカメラ座標系（ｘ，ｙ，ｚ）における３次元画素群を作成することができる。

なお、（ｃ_ｘ、ｃ_ｙ）は、画像中心座標、ｆは、レンズ焦点距離である。 (Camera coordinate system obtained from range image)
The coordinate values u, v in the range image and the pixel value d (u, v) at the coordinates (u, v) are converted to a point (x, y, z) in the three-dimensional space using the following formula. , a three-dimensional pixel group in the camera coordinate system (x, y, z) shown in FIG. 2 can be created.

Note that (c _x , c _y ) are the image center coordinates, and f is the lens focal length.

（車両座標系）
カメラ座標系（ｘ，ｙ，ｚ）は、検出対象であるドライバ（例えば腰骨の位置）を原点とする車両座標系（ｗ，ｌ，ｈ）に変換することができる。変換方法は、一般的な座標回転変換、平行移動変換、スケール変換の組み合わせである。変換後の座標軸は、ドライバ右方向をｗ軸、前方をｌ軸（エル軸）、上方をｈ軸とし、原点はドライバの腰骨位置２０３とする。図３に３次元画素群の例を示す。 (vehicle coordinate system)
The camera coordinate system (x, y, z) can be transformed into a vehicle coordinate system (w, l, h) with the driver to be detected (for example, the position of the hipbone) as the origin. The transformation method is a combination of general coordinate rotation transformation, translation transformation, and scale transformation. The coordinate axes after conversion are the w-axis in the right direction of the driver, the l-axis (L-axis) in front, and the h-axis in the upper direction, and the origin is the hipbone position 203 of the driver. FIG. 3 shows an example of a three-dimensional pixel group.

（ボクセル作成）
制御部２０は、図４（ａ）に示すように、車室内空間の広さを考慮し、図４（ｂ）に一例として示すボクセル作成パラメータ設定例のサイズでボクセル（３次元空間を一定サイズで格子状に分割し、離散的にエリアを表現する方法）を作成する。各ボクセルは、ボクセル値を持ち、３次元画素群の密度情報（ここでは、各ボクセルに含まれる３次元画素群の数×３、とする）をボクセル値とする。 (Voxel creation)
As shown in FIG. 4(a), the control unit 20 considers the size of the vehicle interior space, and creates voxels (three-dimensional space of a fixed size) with the size of the voxel creation parameter setting example shown in FIG. to divide it into a grid and express the area discretely). Each voxel has a voxel value, and the density information of the three-dimensional pixel group (here, the number of three-dimensional pixel groups included in each voxel×3) is used as the voxel value.

（制御部２０）
制御部２０は、座標変換、頭頂部特定処理、頭部特定処理等を行なうための、例えばマイクロコンピュータを備えている。制御部２０は、図２に示すように、ＴＯＦカメラ１０と接続されている。制御部２０は、記憶されたプログラムに従って、取得したデータに演算、加工などを行うＣＰＵ（Central Processing Unit）２１、半導体メモリであるＲＡＭ（Random Access Memory）２２及びＲＯＭ（Read Only Memory）２３などを備えている。 (control unit 20)
The control unit 20 includes, for example, a microcomputer for performing coordinate conversion, parietal region identification processing, head identification processing, and the like. The control unit 20 is connected to the TOF camera 10 as shown in FIG. The control unit 20 includes a CPU (Central Processing Unit) 21 that performs calculations and processing on acquired data according to a stored program, a semiconductor memory such as a RAM (Random Access Memory) 22 and a ROM (Read Only Memory) 23. I have.

（頭部の検出、特定）
頭部は、規定された１つの特徴点であり、本実施の形態では、予め規定された特徴点探索モデルを順次照合して残りの特徴点（左右の肩部、肘部、手首、手）を検出するための基準となるものである。頭部の検出、特定の手法は任意であり、公知の方法により頭部の検出、特定を行なうことができる。また、制御部２０による検出、特定でなく、頭部の３次元データ（ｗ，ｌ，ｈ）を制御部２０に入力してもよい。 (head detection, identification)
The head is one defined feature point, and in this embodiment, the remaining feature points (left and right shoulders, elbows, wrists, hands) are obtained by sequentially collating predefined feature point search models. It is a standard for detecting Any method can be used to detect and specify the head, and the head can be detected and specified by a known method. Also, three-dimensional data (w, l, h) of the head may be input to the control unit 20 instead of detection and identification by the control unit 20 .

（探索範囲絞り込み）
制御部２０は、図５（ａ）に示すように、スケルトンモデルで定義した長さおよび可動範囲に含まれるボクセルを探索し、抽出する。可動範囲は、前部位起点から現在部位起点Ｐの方向を基準とし、図５（ａ）に示す母線２５０で規定される円錐形状として定義する。探索は、図５（ｂ）に示すような、探索範囲絞り込みパラメータに基づいて行なう。 (Search range narrowing down)
As shown in FIG. 5A, the control unit 20 searches for and extracts voxels included in the length and movable range defined by the skeleton model. The movable range is defined as a conical shape defined by a generatrix 250 shown in FIG. The search is performed based on search range narrowing parameters as shown in FIG. 5(b).

（特徴点としての部位候補抽出）
（１）閾値処理
制御部２０は、探索範囲を絞り込んだ各ボクセルについて、周囲のボクセルを含めたボクセルの集合を考える。ここでは、図６に示すように、周囲１ボクセルを含めた３×３×３のボクセルとする。ここで、当該ボクセルのボクセル値の合計が閾値未満の場合、候補から除外する。すなわち、閾値処理は、特徴点の周囲に位置するボクセルを含めたボクセルの集合を考え、このボクセルの集合におけるボクセル値の合計が閾値未満かどうかにより、特徴点の確からしさを判断するものである。 (Part candidate extraction as feature points)
(1) Threshold Processing For each voxel whose search range has been narrowed down, the control unit 20 considers a set of voxels including surrounding voxels. Here, as shown in FIG. 6, 3×3×3 voxels including one surrounding voxel are used. Here, if the total voxel value of the voxel is less than the threshold, it is excluded from the candidates. That is, the threshold processing considers a set of voxels including voxels located around the feature point, and determines the likelihood of the feature point based on whether the sum of the voxel values in this set of voxels is less than the threshold. .

（２）連続性チェック
制御部２０は、（１）の閾値処理で抽出した各ボクセルについて、図７に示すように、起点Ｐとの間をＸ［ｍｍ］間隔で、（１）と同じ領域（３×３×３のボクセル）を考え、全ての位置でボクセル値の合計が閾値以上の場合、当該ボクセルを部位候補とする。ただし、腕時計等、ＴＯＦカメラで情報が取得できない可能性がある装着物を考慮して、起点Ｐとの間でボクセル値の閾値を満たさない点が存在しても許容できるよう、許容ギャップ数を定義する。例えば、許容ギャップ数が１の場合、閾値を満たさない点が１つまでなら連続性があると判定する。 (2) Continuity check The control unit 20, for each voxel extracted by the threshold processing in (1), as shown in FIG. Considering (3×3×3 voxels), if the sum of voxel values at all positions is equal to or greater than the threshold, the voxel is determined as a part candidate. However, considering wearable objects such as wristwatches that may not be able to acquire information with a TOF camera, the number of allowable gaps is set so that even if there is a point that does not satisfy the voxel value threshold between the starting point P, it is acceptable. Define. For example, when the number of allowable gaps is 1, it is determined that there is continuity if only one point does not satisfy the threshold.

制御部２０は、ボクセル値の合計が閾値未満かどうかを判定する、ボクセルの集合を、規定された１つの特徴点である起点から当該特徴点とは別の特徴点である対象特徴点までの間を所定間隔で存在するボクセル群とすることができる。これにより、確からしさが高い場合に、対象特徴点を確からしいものとできる。 The control unit 20 determines whether the sum of the voxel values is less than a threshold, and divides the set of voxels from the starting point, which is one specified feature point, to the target feature point, which is another feature point. A group of voxels may be present at predetermined intervals between them. With this, when the probability is high, the target feature point can be regarded as highly probable.

さらに、制御部２０は、上記の起点から対象特徴点までの全てのボクセル群の各々におけるボクセル値の合計が閾値以上であれば、対象特徴点を確からしいものとすることができる。これにより、ノイズなどによって一部が欠損したとしても、対象特徴点を確からしいものとできる。 Furthermore, the control unit 20 can determine that the target feature point is probable if the sum of the voxel values in each of all the voxel groups from the starting point to the target feature point is equal to or greater than the threshold. As a result, even if a part of the feature point is missing due to noise or the like, the target feature point can be made probable.

図８は、各部位検出におけるパラメータ設定の例であり、以下に示す各部位の検出は、これらのパラメータに基づいて実行される。上記示した（１）閾値処理、（２）連続性チェックは、部位検出共通ロジックとして各部位の検出に使用される。なお、単位立方体サイズは、密度計算に利用するボクセルの立方体サイズである。また、密度閾値は、終端候補の棄却、連続性の確認に利用する密度の閾値である。また、連続性ステップ長は、連続性確認時に単位立方体をずらしていく際の中心ボクセル間の幅である。また、許容ギャップ数は、連続性確認時に許容する、密度が閾値以下だったステップの数である。 FIG. 8 shows an example of parameter settings for detection of each part, and the detection of each part shown below is executed based on these parameters. The above-mentioned (1) threshold processing and (2) continuity check are used to detect each part as part detection common logic. The unit cubic size is the cubic size of voxels used for density calculation. Also, the density threshold is a density threshold used for rejecting termination candidates and confirming continuity. The continuity step length is the width between central voxels when the unit cube is shifted when confirming continuity. Also, the number of allowable gaps is the number of steps whose density is equal to or less than the threshold, which is allowed when confirming continuity.

（肩検出）
制御部２０は、検出あるいは特定した頭部位置から首位置を算出する。ここでは、頭部位置の座標から真下にオフセットした点を首位置とする。オフセット値は例えば１００ｍｍとする。前部位起点を頭部位置、起点Ｐを首位置として、部位検出共通ロジックを用いて肩候補を抽出する。ただし、首位置がボクセル値の低いボクセルに位置する可能性を考慮し、連続性チェックはしない。首位置より左側（－ｗ軸側）の肩候補を左肩候補、右側（＋ｗ軸側）の肩候補を右肩候補とし、各候補の重心位置をそれぞれ左肩位置、右肩位置とする。 (shoulder detection)
The control unit 20 calculates the neck position from the detected or specified head position. Here, the neck position is a point that is offset directly below the coordinates of the head position. For example, the offset value is 100 mm. With the front part starting point as the head position and the starting point P as the neck position, shoulder candidates are extracted using the part detection common logic. However, considering the possibility that the neck position is located in a voxel with a low voxel value, no continuity check is performed. A shoulder candidate on the left side (−w axis side) of the neck position is taken as a left shoulder candidate, and a shoulder candidate on the right side (+w axis side) is taken as a right shoulder candidate.

（肘検出）
制御部２０は、前部位起点を首位置、起点を肩位置として、部位検出共通ロジックを用いて肘候補を抽出する。制御部２０は、連続性チェックを行なう。連続性チェックがＮＧとなり、肘候補点が０の場合、肩が宙に浮いている（ボクセル値の低い位置にある）と考えられ、位置を補正して再探索する。補正方法は例えば、最も近い閾値以上のボクセル値を持つボクセルを選択することで行う。抽出した肘候補の重心を肘位置とする。左肩を起点とした位置を左肘、右肩を起点とした位置を右肘とする。 (elbow detection)
The control unit 20 extracts an elbow candidate using the part detection common logic, with the front part starting point as the neck position and the starting point as the shoulder position. The control unit 20 performs a continuity check. If the continuity check is NG and the elbow candidate point is 0, it is considered that the shoulder is floating in the air (located at a position with a low voxel value), and the position is corrected and searched again. The correction method is, for example, selecting voxels having voxel values equal to or greater than the nearest threshold. The center of gravity of the extracted elbow candidate is set as the elbow position. The position starting from the left shoulder is the left elbow, and the position starting from the right shoulder is the right elbow.

（手首検出）
制御部２０は、前部位起点を肩位置、起点を肘位置として、部位検出共通ロジックを用いて手首候補を抽出する。制御部２０は、連続性チェックを行ない、連続性チェックがＮＧの場合は、位置を補正して再探索を行なう。抽出した手首候補の重心を手首位置とする。左肘を起点とした位置を左手首、右肘を起点とした位置を右手首とする。 (wrist detection)
The control unit 20 extracts a wrist candidate using the part detection common logic, with the front part starting point as the shoulder position and the starting point as the elbow position. The control unit 20 performs a continuity check, and if the continuity check is NG, corrects the position and performs a search again. The center of gravity of the extracted wrist candidate is set as the wrist position. A position starting from the left elbow is defined as a left wrist, and a position starting from the right elbow is defined as a right wrist.

（手検出）
制御部２０は、前部位起点を肘位置、起点を手首位置として、部位検出共通ロジックを用いて手候補を抽出する。制御部２０は、連続性チェックを行ない、連続性チェックがＮＧの場合は、位置を補正して再探索を行なう。抽出した手候補の重心を手位置とする。左手首を起点とした位置を左手、右手首を起点とした位置を右手とする。 (hand detection)
The control unit 20 extracts a hand candidate using the part detection common logic, with the elbow position as the front part starting point and the wrist position as the starting point. The control unit 20 performs a continuity check, and if the continuity check is NG, corrects the position and performs a search again. The center of gravity of the extracted hand candidate is set as the hand position. The position starting from the left wrist is defined as the left hand, and the position starting from the right wrist is defined as the right hand.

（未検出時の対応）
制御部２０は、各部位が特徴点として未検出の場合（肩が隠れた時、手首が腕時計等で切れた時、等）、前フレームまでの検出結果を用いて補完処理を行う。例えば、１フレーム前の検出結果をそのまま再利用する方法や、ＫＣＦ法等のトラッキング技術を活用する方法などがある。 (Response when not detected)
When each part is not detected as a feature point (when the shoulder is hidden, when the wrist is cut by a wristwatch, etc.), the control unit 20 performs complementation processing using the detection results up to the previous frame. For example, there is a method that reuses the detection result of one frame before as it is, and a method that utilizes a tracking technique such as the KCF method.

（処理装置１の動作）
図９で示す本発明の実施の形態に係る処理装置１の動作を示すフローチャートに基づいて、説明する。制御部２０は、フローチャートに従って以下の演算、処理を実行する。 (Operation of processing device 1)
Description will be made based on the flowchart showing the operation of the processing device 1 according to the embodiment of the present invention shown in FIG. The control unit 20 executes the following calculations and processes according to the flow charts.

（前処理、Ｓｔｅｐ１）
制御部２０には、ＴＯＦカメラ１０からのカメラ画像が入力される。このカメラ画像は、連続するフレーム画像の１フレームとして入力される。制御部２０は、特徴点検出処理の前処理として、カメラ画像を車両座標系へ変換して、図３に示すような３次元画素群を抽出する。また、ボクセル作成パラメータ設定例に基づいてボクセルを作成する。また、特徴点探索の基準となる頭部の検出、特定を行なう。 (Pretreatment, Step 1)
A camera image from the TOF camera 10 is input to the control unit 20 . This camera image is input as one frame of continuous frame images. As preprocessing for the feature point detection processing, the control unit 20 converts the camera image into the vehicle coordinate system and extracts a three-dimensional pixel group as shown in FIG. Also, voxels are created based on voxel creation parameter setting examples. Also, the head is detected and identified as a reference for searching for feature points.

（肩の検出、Ｓｔｅｐ２）
制御部２０は、図１０に示すように、基準となる規定された１つの特徴点である、頭部位置３００から、所定の距離だけオフセットした点を首位置３０５として算出する。起点を首位置３０５として、部位検出共通ロジックを用いて肩候補を抽出する。図１０に示すように、首位置より左側（－ｗ軸側）の肩候補を左肩候補、右側（＋ｗ軸側）の肩候補を右肩候補とし、各候補の重心位置をそれぞれ左肩位置３１１、右肩位置３１０とする。 (Shoulder detection, Step 2)
As shown in FIG. 10 , the control unit 20 calculates a neck position 305 as a point offset by a predetermined distance from the head position 300 , which is one specified reference feature point. With the neck position 305 as the starting point, shoulder candidates are extracted using the part detection common logic. As shown in FIG. 10, the shoulder candidate on the left side (−w axis side) of the neck position is the left shoulder candidate, and the shoulder candidate on the right side (+w axis side) is the right shoulder candidate. A right shoulder position 310 is assumed.

（Ｓｔｅｐ３）
制御部２０は、肩の検出に成功したかどうかを判断する。制御部２０は、上記説明した、（１）閾値処理、及び（２）連続性チェックにより、肩の検出に成功したかどうかを判断することができる。ただし、肩の検出に限って、首位置がボクセル値の低いボクセルに位置する可能性を考慮し、連続性チェックはしないものとする。肩の検出に成功した場合は、Ｓｔｅｐ５へ進み（Ｓｔｅｐ３：Ｙｅｓ）、肩の検出に成功しない場合は、Ｓｔｅｐ４へ進む（Ｓｔｅｐ３：Ｎｏ）。 (Step 3)
The control unit 20 determines whether or not the shoulder has been successfully detected. The control unit 20 can determine whether or not the shoulder has been successfully detected by (1) threshold processing and (2) continuity check described above. However, as far as shoulder detection is concerned, the possibility that the neck position is located in a voxel with a low voxel value is taken into consideration, and no continuity check is performed. If the shoulder detection is successful, the process proceeds to Step 5 (Step 3: Yes), and if the shoulder detection is not successful, the process proceeds to Step 4 (Step 3: No).

（補完処理、Ｓｔｅｐ４）
制御部２０は、肩の部位が未検出の場合（肩が隠れた時、等）、前フレームまでの検出結果を用いて補完処理を行う。例えば、１フレーム前の検出結果をそのまま再利用する方法や、ＫＣＦ法等のトラッキング技術を活用する方法などがある。 (Complementary processing, Step 4)
When the shoulder region is not detected (when the shoulder is hidden, etc.), the control unit 20 performs complementation processing using the detection results up to the previous frame. For example, there is a method that reuses the detection result of one frame before as it is, and a method that utilizes a tracking technique such as the KCF method.

（肘の検出、Ｓｔｅｐ５）
制御部２０は、前部位起点を首位置３０５、起点を肩位置３１０、３１１として、部位検出共通ロジックを用いて肘候補を抽出する。連続性チェックがＮＧとなり、肘候補点が０の場合、肩が宙に浮いている（ボクセル値の低い位置にある）と考えられ、位置を補正して再探索する。補正方法は例えば、最も近い閾値以上のボクセル値を持つボクセルを選択することで行う。抽出した肘候補の重心を肘位置とする。図１０に示すように、左肩を起点とした位置を左肘３２１、右肩を起点とした位置を右肘３２０とする。 (Elbow detection, Step 5)
The control unit 20 extracts the elbow candidate using the part detection common logic, with the neck position 305 as the front part starting point and the shoulder positions 310 and 311 as the starting point. If the continuity check is NG and the elbow candidate point is 0, it is considered that the shoulder is floating in the air (located at a position with a low voxel value), and the position is corrected and searched again. The correction method is, for example, selecting voxels having voxel values equal to or greater than the nearest threshold. The center of gravity of the extracted elbow candidate is set as the elbow position. As shown in FIG. 10, a left elbow 321 is a position starting from the left shoulder, and a right elbow 320 is a position starting from the right shoulder.

（Ｓｔｅｐ６）
制御部２０は、肘の検出に成功したかどうかを判断する。制御部２０は、上記説明した、（１）閾値処理、及び（２）連続性チェックにより、肘の検出に成功したかどうかを判断することができる。肘の検出に成功した場合は、Ｓｔｅｐ８へ進み（Ｓｔｅｐ６：Ｙｅｓ）、肘の検出に成功しない場合は、Ｓｔｅｐ７へ進む（Ｓｔｅｐ６：Ｎｏ）。 (Step 6)
The control unit 20 determines whether or not the elbow has been successfully detected. The control unit 20 can determine whether or not the elbow has been successfully detected by (1) threshold processing and (2) continuity check described above. If the elbow detection is successful, the process proceeds to Step 8 (Step 6: Yes), and if the elbow detection is not successful, the process proceeds to Step 7 (Step 6: No).

（補完処理、Ｓｔｅｐ７）
制御部２０は、肘の部位が未検出の場合（肘が隠れた時、等）、前フレームまでの検出結果を用いて補完処理を行う。例えば、１フレーム前の検出結果をそのまま再利用する方法や、ＫＣＦ法等のトラッキング技術を活用する方法などがある。 (Complementary processing, Step 7)
When the elbow region is not detected (when the elbow is hidden, etc.), the control unit 20 performs complementation processing using the detection results up to the previous frame. For example, there is a method that reuses the detection result of one frame before as it is, and a method that utilizes a tracking technique such as the KCF method.

（手首の検出、Ｓｔｅｐ８）
制御部２０は、前部位起点を肩位置３１０、３１１、起点を肘位置３２０、３２１として、部位検出共通ロジックを用いて手首候補を抽出する。抽出した手首候補の重心を手首位置とする。図１０に示すように、左肘を起点とした位置を左手首３３１、右肘を起点とした位置を右手首３３０とする。 (Wrist detection, Step 8)
The control unit 20 uses shoulder positions 310 and 311 as starting points of the front part and elbow positions 320 and 321 as starting points, and extracts wrist candidates using the common logic for part detection. The center of gravity of the extracted wrist candidate is set as the wrist position. As shown in FIG. 10, the left wrist 331 is the starting point of the left elbow, and the right wrist 330 is the starting point of the right elbow.

（Ｓｔｅｐ９）
制御部２０は、手首の検出に成功したかどうかを判断する。制御部２０は、上記説明した、（１）閾値処理、及び（２）連続性チェックにより、手首の検出に成功したかどうかを判断することができる。手首の検出に成功した場合は、Ｓｔｅｐ１１へ進み（Ｓｔｅｐ９：Ｙｅｓ）、手首の検出に成功しない場合は、Ｓｔｅｐ１０へ進む（Ｓｔｅｐ９：Ｎｏ）。 (Step 9)
The control unit 20 determines whether or not the wrist has been successfully detected. The control unit 20 can determine whether or not the wrist has been successfully detected by (1) threshold processing and (2) continuity check described above. If the wrist detection is successful, the process proceeds to Step 11 (Step 9: Yes), and if the wrist detection is not successful, the process proceeds to Step 10 (Step 9: No).

（補完処理、Ｓｔｅｐ１０）
制御部２０は、手首の部位が未検出の場合（手首が腕時計等で切れた時、等）、前フレームまでの検出結果を用いて補完処理を行う。例えば、１フレーム前の検出結果をそのまま再利用する方法や、ＫＣＦ法等のトラッキング技術を活用する方法などがある。 (Complementary processing, Step 10)
When the wrist region is not detected (eg, when the wrist is cut by a wrist watch, etc.), the control unit 20 performs complementary processing using the detection results up to the previous frame. For example, there is a method that reuses the detection result of one frame before as it is, and a method that utilizes a tracking technique such as the KCF method.

（手の検出、Ｓｔｅｐ１１）
制御部２０は、前部位起点を肘位置３２０、３２１、起点を手首位置３３０、３３１として、部位検出共通ロジックを用いて手候補を抽出する。抽出した手候補の重心を手位置とする。図１０に示すように、左手首を起点とした位置を左手３４１、右手首を起点とした位置を右手３４０とする。 (Hand detection, Step 11)
The control unit 20 uses elbow positions 320 and 321 as starting points of the front part and wrist positions 330 and 331 as starting points, and extracts hand candidates using the common logic for part detection. The center of gravity of the extracted hand candidate is set as the hand position. As shown in FIG. 10, a left hand 341 is the starting point of the left wrist, and a right hand 340 is the starting point of the right wrist.

（Ｓｔｅｐ１２）
制御部２０は、手の検出に成功したかどうかを判断する。制御部２０は、上記説明した、（１）閾値処理、及び（２）連続性チェックにより、手の検出に成功したかどうかを判断することができる。手の検出に成功した場合は、Ｓｔｅｐ１４へ進み（Ｓｔｅｐ１２：Ｙｅｓ）、手の検出に成功しない場合は、Ｓｔｅｐ１３へ進む（Ｓｔｅｐ１２：Ｎｏ）。 (Step 12)
The control unit 20 determines whether or not the hand has been successfully detected. The control unit 20 can determine whether or not the hand has been successfully detected by (1) threshold processing and (2) continuity check described above. If the hand detection is successful, the process proceeds to Step 14 (Step 12: Yes), and if the hand detection is not successful, the process proceeds to Step 13 (Step 12: No).

（補完処理、Ｓｔｅｐ１３）
制御部２０は、手の部位が未検出の場合（手が隠れた時、等）、前フレームまでの検出結果を用いて補完処理を行う。例えば、１フレーム前の検出結果をそのまま再利用する方法や、ＫＣＦ法等のトラッキング技術を活用する方法などがある。 (Complementary processing, Step 13)
When the part of the hand is not detected (when the hand is hidden, etc.), the control unit 20 performs complementation processing using the detection results up to the previous frame. For example, there is a method that reuses the detection result of one frame before as it is, and a method that utilizes a tracking technique such as the KCF method.

（Ｓｔｅｐ１４）
制御部２０は、部位検出結果として、図１０に示すように、左右の肩、肘、手首、手の位置を出力することができる。なお、それぞれの位置は、車両座標系（ｗ，ｌ，ｈ）で出力されるが、座標変換により、カメラ座標系（ｘ，ｙ，ｚ）等の他の座標系での値としても出力可能である。 (Step 14)
The control unit 20 can output the positions of left and right shoulders, elbows, wrists, and hands as part detection results, as shown in FIG. Each position is output in the vehicle coordinate system (w, l, h), but it can also be output as a value in another coordinate system such as the camera coordinate system (x, y, z) by coordinate conversion. is.

上記の処理実行は、上記示した一連の動作として繰り返して実行することができる。 The above process execution can be repeatedly executed as a series of operations shown above.

（プログラムとしての実施形態）
コンピュータに、処理装置１で示した、車両の乗員を含む撮像領域を撮像対象とした距離画像であって、前記撮像対象までの距離情報を画素に割り当てた距離画像を取得する取得ステップと、車両８の車室内空間における座標系と距離画像における座標系との対応関係に従って、特定の座標系を有した検出画像へと変換し、その検出画像において、乗員５の特徴点を少なくとも１つ検出する特徴点検出処理を実行する制御ステップとを、コンピュータに実行させるためのプログラムも、本発明の実施の形態の一つである。 (Embodiment as a program)
an acquisition step of acquiring, in a computer, a distance image in which an imaging area including an occupant of a vehicle is taken as an imaging target, and in which distance information to the imaging target is assigned to pixels; According to the correspondence relationship between the coordinate system in the vehicle interior space and the coordinate system in the range image of 8, the detected image is converted into a detected image having a specific coordinate system, and at least one characteristic point of the occupant 5 is detected in the detected image. A program for causing a computer to execute the control step of executing the feature point detection process is also one of the embodiments of the present invention.

処理装置１の動作で説明したＳｔｅｐ１が距離画像を取得する取得ステップの一例であり、Ｓｔｅｐ２からＳｔｅｐ１４が特定処理を実行する制御ステップの一例であり、図９で示したフローチャートを実行する処理制御を、コンピュータに実行させるためのプログラムの実施形態とすることができる。 Step 1 described in the operation of the processing device 1 is an example of an acquisition step for acquiring a distance image, and Steps 2 to 14 are an example of control steps for executing specific processing. , can be an embodiment of a program to be executed by a computer.

また、上記のようなプログラムを記録したコンピュータ読み取り可能な記録媒体も、本発明の実施の形態の一つである。 A computer-readable recording medium recording the above program is also one embodiment of the present invention.

（実施の形態の効果）
本発明の実施の形態によれば、以下のような効果を有する。
（１）本発明の実施の形態に係る処理装置は、車両８の乗員５を含む撮像領域を撮像対象とした距離画像であって、撮像対象までの距離情報を画素に割り当てた距離画像を取得する取得部としてのＴＯＦカメラ１０と、車両８の車室内空間における座標系と距離画像における座標系との対応関係に従って、特定の座標系を有した検出画像へと変換し、その検出画像において、乗員５の特徴点を少なくとも１つ検出する特徴点検出処理を実行する制御部２０と、を有して構成されている。これにより、人体の特徴点を特定する技術において、その特徴点の特定精度を向上させることができる。
（２）３次元情報を活用することで、外乱光等による濃淡変化の影響によらず人の骨格を精度良く検出することができる。
（３）他の物体で隠れにくい頭部位置を検出の起点としているため、胴体の隠れによらず検出することができる。
（４）スケルトンモデルを用いて部位の探索範囲を絞り込むことで、処理の高速化が可能となる。
（５）前処理でボクセルを作成することで、３次元画素群の密度情報を容易に取得することができ、処理の高速化が可能となる。
（６）３次元情報を用いて前部位との連続性を確認することで、部位の位置を精度良く検出できる。
（７）連続性チェックで説明したように、許容ギャップ数を設けることで、腕時計等の影響による３次元情報の途切れにも対応できる。
（８）補完処理により、各部位の隠れなどにより未検出となった場合でも検出結果を出力することができ、ロバスト性が向上する。 (Effect of Embodiment)
According to the embodiment of the present invention, the following effects are obtained.
(1) The processing device according to the embodiment of the present invention acquires a distance image in which an imaging area including the occupant 5 of the vehicle 8 is an imaging target, and in which distance information to the imaging target is assigned to pixels. A TOF camera 10 as an acquisition unit that converts to a detection image having a specific coordinate system according to the correspondence relationship between the coordinate system in the vehicle interior space of the vehicle 8 and the coordinate system in the range image, and in the detection image, and a control unit 20 that executes a feature point detection process for detecting at least one feature point of the occupant 5 . This makes it possible to improve the accuracy of specifying the feature points in the technique of specifying the feature points of the human body.
(2) By utilizing three-dimensional information, it is possible to accurately detect a human skeleton without being affected by changes in gradation due to ambient light or the like.
(3) Since the head position, which is difficult to hide behind other objects, is used as the starting point of detection, detection can be performed regardless of whether the body is hidden.
(4) Processing speed can be increased by narrowing down the region search range using the skeleton model.
(5) By creating voxels in the preprocessing, it is possible to easily obtain density information of the three-dimensional pixel group, and to speed up the processing.
(6) By confirming the continuity with the front part using the three-dimensional information, the position of the part can be detected with high accuracy.
As described in (7) Continuity check, by setting the number of allowable gaps, it is possible to cope with the interruption of the three-dimensional information due to the influence of a wristwatch or the like.
(8) Complementary processing makes it possible to output the detection result even when it is not detected due to the hiding of each part, etc., improving robustness.

以上、本発明のいくつかの実施の形態を説明したが、これらの実施の形態は、一例に過ぎず、特許請求の範囲に係る発明を限定するものではない。また、これら新規な実施の形態は、その他の様々な形態で実施されることが可能であり、本発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更等を行うことができる。 Although several embodiments of the present invention have been described above, these embodiments are merely examples and do not limit the invention according to the claims. In addition, these novel embodiments can be implemented in various other forms, and various omissions, replacements, changes, etc. can be made without departing from the scope of the present invention.

例えば、カメラ設置位置、角度は上記説明で示した例に限らない。また、カメラはＴＯＦカメラに限らない。また、ステレオカメラ等、他の距離センサでもよい。また、スケルトンモデルの定義で示したパラメータはそれに限らない。また、補完処理はここで示した例に限らない。また、検出する部位は肩、肘、手首、手に限らない。例えば、肩、肘、手としてもよい。また、首位置の算出は上記した方法に限らない。例えば、頭部の傾きを考慮した算出方法としてもよい。 For example, the camera installation positions and angles are not limited to the examples shown in the above description. Also, the camera is not limited to a TOF camera. Other distance sensors such as a stereo camera may also be used. Also, the parameters shown in the definition of the skeleton model are not limited to that. Also, the complementary processing is not limited to the example shown here. Moreover, the parts to be detected are not limited to shoulders, elbows, wrists, and hands. For example, shoulders, elbows, and hands may be used. Further, the calculation of the neck position is not limited to the method described above. For example, a calculation method that considers the inclination of the head may be used.

また、これら実施の形態の中で説明した特徴の組合せの全てが発明の課題を解決するための手段に必須であるとは限らない。さらに、これら実施の形態は、発明の範囲及び要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Moreover, not all the combinations of features described in these embodiments are essential to the means for solving the problems of the invention. Furthermore, these embodiments are included in the scope and gist of the invention, and are included in the scope of the invention described in the claims and equivalents thereof.

１…処理装置、５…乗員、８…車両、１０…ＴＯＦカメラ、２０…制御部、２０３…腰骨位置、２５０…母線、３００…頭部位置、３０５…首位置、３１０、３１１…肩位置、３２０、３２１…肘位置、３３０、３３１…手首位置、３４０、３４１…手位置、Ｐ…起点
Reference Signs List 1 processing device 5 occupant 8 vehicle 10 TOF camera 20 control unit 203 hipbone position 250 generatrix 300 head position 305 neck position 310, 311 shoulder position 320, 321... Elbow position 330, 331... Wrist position 340, 341... Hand position P... Starting point

Claims

an acquisition unit that obtains a distance image in which an imaging area including an occupant of a vehicle is taken as an imaging target, and distance information to the imaging target is assigned to each pixel corresponding to coordinates of a given two-dimensional coordinate system;
converting the coordinates of the pixels in the two-dimensional coordinate system of the distance image acquired by the acquisition unit and the distance information of the coordinates into points in a three-dimensional space to create a three-dimensional pixel group; The first coordinate system of the pixel group is transformed into a second coordinate system having the occupant as the origin, and the three-dimensional pixel group in the transformed second coordinate system, the size of each skeleton of the human body, and the joints. a control unit that executes feature point detection processing for detecting feature points of shoulders, elbows, wrists, and hands with the occupant's head defined in advance based on a skeleton model defining a movable range as a starting point;
has
The feature point detection processing of the control unit sequentially collates predetermined feature point search models starting from the head of the occupant to identify feature points of the shoulder, elbow, wrist, and hand. When detecting
Considering the set of voxels including voxels located around the feature point before determination, a threshold for determining the likelihood of the feature point before determination based on whether the sum of voxel values in the set of voxels is less than the threshold. perform the processing,
The set of voxels for determining whether the sum of the voxel values is less than a threshold is set at a predetermined interval from the feature point of the occupant's head to the target feature point, which is a feature point different from the feature point. Let the voxel group that exists in
processing equipment.

2. The method according to claim 1, wherein the control unit determines that the target feature point is probable if the sum of the voxel values in each of all voxel groups from the starting point to the target feature point is equal to or greater than a threshold. processing equipment.

an acquisition step of obtaining a distance image in which an imaging area including an occupant of a vehicle is taken as an imaging target and distance information to the imaging target is assigned to each pixel corresponding to coordinates of a given two-dimensional coordinate system;
converting the coordinates of the pixels in the two-dimensional coordinate system of the distance image acquired in the acquisition step and the distance information of the coordinates into points in a three-dimensional space to create a three-dimensional pixel group; The first coordinate system of the pixel group is transformed into a second coordinate system having the occupant as the origin, and the three-dimensional pixel group in the transformed second coordinate system, the size of each skeleton of the human body, and the joints. A control step of executing a feature point detection process for detecting feature points of shoulders, elbows, wrists, and hands with the occupant's head defined in advance based on a skeleton model defining a movable range as a starting point;
has
The feature point detection processing of the control step sequentially collates predetermined feature point search models with the occupant's head as a starting point to identify feature points of the shoulder, elbow, wrist, and hand. When detecting
Considering the set of voxels including voxels located around the feature point before determination, a threshold for determining the likelihood of the feature point before determination based on whether the sum of voxel values in the set of voxels is less than the threshold. perform the processing,
The set of voxels for determining whether the sum of the voxel values is less than a threshold is set from the occupant's head as a starting point to a target feature point that is a feature point different from the occupant's head feature point. A program for causing a computer to create a group of voxels existing at predetermined intervals.

4. The method of claim 3 , wherein the control step makes the target feature point probable if the sum of the voxel values in each of all voxel groups from the origin to the target feature point is equal to or greater than a threshold. program.