JP7015152B2

JP7015152B2 - Processing equipment, methods and programs related to key point data

Info

Publication number: JP7015152B2
Application number: JP2017225709A
Authority: JP
Inventors: 建鋒徐; 和之田坂; 広昌柳原
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2017-11-24
Filing date: 2017-11-24
Publication date: 2022-02-15
Anticipated expiration: 2037-11-24
Also published as: JP2019096113A

Description

本発明は、骨格関節データといったようなキーポイントデータが二次元データとして与えられている場合であっても三次元空間における所定姿勢に関連した情報を得ることが可能な、キーポイントデータに関する加工装置、方法及びプログラムに関する。 The present invention is a processing device for keypoint data capable of obtaining information related to a predetermined posture in a three-dimensional space even when keypoint data such as skeletal joint data is given as two-dimensional data. , Methods and programs.

近年、モーションキャプチャーセンサー、特に安価なセンサーのKinect（登録商標）の普及により、図１にその模式例を示すような3次元骨格関節データ（所定の3次元空間において骨格関節位置を与えたデータ、スケルトンデータ）等の取得が容易となった。当該種々の3次元データの取得が容易となったこと等を契機として、例えば非特許文献２に開示されるような、当該3次元データによって表現されるジェスチャや顔表情等を対象とした、深層学習を用いた認識技術が提案されている。 In recent years, with the spread of motion capture sensors, especially the inexpensive sensor Kinect (registered trademark), 3D skeletal joint data (data giving skeletal joint positions in a predetermined 3D space, as shown in Fig. 1). It has become easier to acquire skeleton data). Taking the opportunity of facilitating the acquisition of the various 3D data, for example, a deep layer targeting gestures, facial expressions, etc. expressed by the 3D data, as disclosed in Non-Patent Document 2. A recognition technique using learning has been proposed.

一方、近年、通常の画像のみに基づいて全身または顔の骨格関節データを推定する技術も成熟化してきている。当該技術においてはKinectを含めて特殊なセンサが不要であり、身近なWebカメラ等といったような通常のカメラのみを用いればよいという利点があるが、現状では画像から3次元データとして骨格関節データ等の推定は困難であり、画像上座標の2次元データとしての骨格関節データ等の推定に留まっている。 On the other hand, in recent years, techniques for estimating whole-body or facial skeletal joint data based only on ordinary images have also matured. This technology does not require special sensors including Kinect, and has the advantage that only ordinary cameras such as familiar Web cameras need to be used, but at present, skeletal joint data etc. are used as 3D data from images. It is difficult to estimate, and it is limited to estimating skeletal joint data as two-dimensional data of coordinates on the image.

当該2次元での推定技術として例えば非特許文献１では、映像から同時に複数人の2次元骨格関節データの推定を実現すると共に、各関節について0から1までの信頼度を同時に算出している。 As the two-dimensional estimation technique, for example, Non-Patent Document 1 realizes estimation of two-dimensional skeletal joint data of a plurality of people at the same time from an image, and simultaneously calculates the reliability from 0 to 1 for each joint.

Cao, Zhe, et al. "Realtime multi-person 2d pose estimation using part affinity fields." CVPR2017(2017).Cao, Zhe, et al. "Realtime multi-person 2d pose estimation using part affinity fields." CVPR2017 (2017). Ke, Qiuhong, et al. "A New Representation of Skeleton Sequences for 3D Action Recognition." CVPR2017 (2017).Ke, Qiuhong, et al. "A New Representation of Skeleton Sequences for 3D Action Recognition." CVPR2017 (2017). Wu, Ren, et al. "Deep image: Scaling up image recognition." arXiv preprint arXiv:1501.02876 7.8 (2015).Wu, Ren, et al. "Deep image: Scaling up image recognition." ArXiv preprint arXiv: 1501.02876 7.8 (2015). Wang, Limin, et al. "Temporal segment networks: Towards good practices for deep action recognition." European Conference on Computer Vision. Springer International Publishing, 2016.Wang, Limin, et al. "Temporal segment networks: Towards good practices for deep action recognition." European Conference on Computer Vision. Springer International Publishing, 2016. Shen, Jie, et al. "The first facial landmark tracking in-the-wild challenge: Benchmark and results." Proceedings of the IEEE International Conference on Computer Vision Workshops. 2015.Shen, Jie, et al. "The first facial landmark tracking in-the-wild challenge: Benchmark and results." Proceedings of the IEEE International Conference on Computer Vision Workshops. 2015. Simon, Tomas, et al. "Hand Keypoint Detection in Single Images using Multiview Bootstrapping." arXiv preprint arXiv:1704.07809 (2017).Simon, Tomas, et al. "Hand Keypoint Detection in Single Images using Multiview Bootstrapping." ArXiv preprint arXiv: 1704.07809 (2017).

しかしながら、以上のような非特許文献１等における、通常画像のみから得られることで原理的に3次元情報を得ることが不可能な2次元骨格関節データ等を対象として認識処理等を行うことを考える場合、前処理等として正面等の所定姿勢に補正すること、すなわち加工することが望ましいと考えられるが、いずれの従来技術においてもこのような加工の実現は課題として考慮されていなかったことから、従来技術ではこのような加工は実現不可能であった。 However, in Non-Patent Document 1 and the like as described above, it is necessary to perform recognition processing and the like for 2D skeletal joint data and the like, which cannot obtain 3D information in principle because they can be obtained only from ordinary images. When considering it, it is desirable to correct it to a predetermined posture such as the front surface as pretreatment, that is, to process it, but the realization of such processing was not considered as an issue in any of the prior arts. However, such processing has not been feasible with the prior art.

すなわち、このような加工はデプス情報ないし3次元情報の欠如とセマンティック情報の欠如という原理的な制約から、非常にチャレンジングなものであり、従来技術では実現できないものであった。さらに、当該加工処理は、認識処理における学習データの増強（augmentation）等にも寄与することが期待されるものである。 That is, such processing is extremely challenging due to the principle restrictions of lack of depth information or three-dimensional information and lack of semantic information, and cannot be realized by conventional technology. Further, the processing is expected to contribute to the augmentation of learning data in the recognition processing.

なお、当該課題の関連技術として、認識処理等を考慮しての加工処理に関しては以下のようなものがあるが、いずれも上記のような加工を実現することはできなかった。 In addition, as the technology related to the problem, there are the following processing processes in consideration of recognition processing and the like, but none of them could realize the above-mentioned processing.

例えば、一般の画像認識を対象とした非特許文献３では、深層学習で用いるデータ増強のための加工処理として、画像全体を赤や青といった特定色に偏らせるようにする色かぶり（color casting）加工、画像周辺部分を中心部分よりも暗くする周辺減光（ビネッティング、vignetting）加工、画像中心部の矩形領域を樽型に膨張させる又はその逆に糸巻き型に収縮させる変換としてのレンズ歪（lens distortion）加工などを挙げているが、これらは上記の正面等の所定姿勢へと補正する加工を実現できるものではない。 For example, in Non-Patent Document 3 for general image recognition, color casting is used to bias the entire image to a specific color such as red or blue as a processing process for enhancing data used in deep learning. Processing, peripheral dimming (vignetting) processing that makes the peripheral part of the image darker than the central part, lens distortion as a conversion that expands the rectangular area in the central part of the image into a barrel shape or conversely contracts it into a thread winding type ( Although lens distortion) processing is mentioned, these cannot realize the processing to correct to the predetermined posture such as the front surface.

また、静止画像で成功した深層学習をさらに映像における行動認識に適用することを目指した非特許文献４では、映像を時間軸上で複数の断片（snippet）に区切ったうえで当該複数の断片で深層ネットワークを学習することを提案しており、当該学習の際のデータ増強のための加工処理として、矩形トリミング（corner cropping）加工及びスケールジッタリング（scale jittering）加工を挙げている。矩形トリミングにおいては画像領域を縦横2×2=4等分することによる1/4サイズの画像をトリミングする他、中央部分からも同1/4サイズの画像をトリミングする。スケールジッタリングにおいては当該トリミングした画像の縦及び横サイズを解像度候補｛256, 224,192,168｝の中からランダムに決定したうえでさらに所定サイズ224x224へとリサイズする。非特許文献４における当該矩形トリミング及びスケールジッタリングによるデータ増強も基本的には非特許文献３等が対象とする静止画像に対する手法を踏襲するものであり、やはり上記の正面等の所定姿勢へと補正する加工を実現できるものではない。 Further, in Non-Patent Document 4, which aims to further apply deep learning succeeded in still images to behavior recognition in images, the images are divided into a plurality of fragments (snippets) on the time axis, and then the plurality of fragments are used. It proposes to learn a deep network, and cites rectangular cropping (corner cropping) processing and scale jittering (scale jittering) processing as processing processes for data enhancement during the learning. In rectangular trimming, a 1/4 size image is trimmed by dividing the image area into 2 × 2 = 4 equal parts, and the 1/4 size image is also trimmed from the center part. In scale jittering, the vertical and horizontal sizes of the cropped image are randomly determined from the resolution candidates {256, 224,192,168}, and then resized to a predetermined size of 224x224. The data enhancement by the rectangular trimming and scale jittering in Non-Patent Document 4 basically follows the method for still images targeted by Non-Patent Document 3 and the like, and also moves to the predetermined posture such as the front surface. It is not possible to realize the processing to correct.

上記従来技術の課題に鑑み、本発明は、骨格関節データといったようなキーポイントデータが二次元データとして与えられている場合であっても三次元空間における所定姿勢に関連した情報を得ることが可能なキーポイントデータに関する加工装置、方法及びプログラムを提供することを目的とする。 In view of the above problems of the prior art, the present invention can obtain information related to a predetermined posture in a three-dimensional space even when key point data such as skeletal joint data is given as two-dimensional data. The purpose is to provide processing equipment, methods and programs related to key point data.

上記目的を達成するため、本発明は、キーポイントデータに関する加工装置であって、全身またはその一部分から抽出される二次元キーポイントデータに幾何学変換をそれぞれ適用して複数の加工データを得る候補生成部と、複数の三次元キーポイントデータを所定姿勢において二次元マッピングして得られた複数の参照用二次元キーポイントデータと、前記得た複数の加工データと、の間で最も類似するものを探索し、当該最も類似する加工データ及び／又は当該加工データに適用された幾何学変換の情報を出力する探索部と、を備えることを特徴とする。また、当該加工装置に対応する方法及びプログラムであることを特徴とする。 In order to achieve the above object, the present invention is a processing device for keypoint data, and is a candidate for obtaining a plurality of processing data by applying a geometric transformation to each of the two-dimensional keypoint data extracted from the whole body or a part thereof. The most similar data between the generation unit, the plurality of reference two-dimensional keypoint data obtained by two-dimensionally mapping the plurality of three-dimensional keypoint data in a predetermined posture, and the plurality of obtained machining data. It is characterized by comprising a search unit for searching for the most similar machining data and / or outputting information on the geometric transformation applied to the machining data. Further, it is characterized in that it is a method and a program corresponding to the processing apparatus.

本発明によれば、候補生成部において二次元キーポイントデータに幾何学変換をそれぞれ適用して複数の加工データを得たうえで、複数の三次元キーポイントデータを所定姿勢において二次元マッピングして得られた複数の参照用二次元キーポイントデータとの間で最も類似するものを探索部で探索することにより、当該最も類似する加工データ及び／又は当該加工データに適用された幾何学変換の情報として、三次元空間における所定姿勢に関連した情報を得ることが可能となる。 According to the present invention, after applying geometric transformation to each of the two-dimensional keypoint data in the candidate generation unit to obtain a plurality of processing data, the plurality of three-dimensional keypoint data are two-dimensionally mapped in a predetermined posture. Information on the most similar machining data and / or geometric transformation applied to the machining data by searching the search unit for the most similar ones among the obtained multiple reference two-dimensional key point data. As a result, it is possible to obtain information related to a predetermined posture in a three-dimensional space.

3次元骨格関節データの模式例を示す図である。It is a figure which shows the schematic example of 3D skeletal joint data. 一実施形態に係る加工装置の機能ブロック図である。It is a functional block diagram of the processing apparatus which concerns on one Embodiment. 3次元骨格データセットの一例としてのMSRAction3Dにおけるデータ形式を模式的に示すものである。It schematically shows the data format in MSRAction3D as an example of a 3D skeleton data set. 図３のデータ形式を用いる場合における、正面決定のための面積算出の対象となる所定の多角形の定義例を示す図である。It is a figure which shows the definition example of the predetermined polygon which is the object of area calculation for frontal determination when the data format of FIG. 3 is used. 全身骨格関節データ以外に本発明を適用可能なキーポイントデータの例として、顔の骨格関節データ及び手の骨格関節データの模式例を示す図である。As an example of the key point data to which the present invention can be applied in addition to the whole body skeletal joint data, it is a figure which shows the schematic example of the skeletal joint data of a face and the skeletal joint data of a hand.

図２は、一実施形態に係る加工装置の機能ブロック図である。図示する通り、加工装置10は、候補生成部1、参照対象生成部2及び探索部3を備え、各機能部の概略的な機能は以下の通りである。なお、図２では機能ブロックに加え、説明の理解促進の観点から各部で処理されるデータの模式例を表現した挿絵が括弧で囲って描かれている。 FIG. 2 is a functional block diagram of the processing apparatus according to the embodiment. As shown in the figure, the processing apparatus 10 includes a candidate generation unit 1, a reference target generation unit 2, and a search unit 3, and the schematic functions of each functional unit are as follows. In FIG. 2, in addition to the functional blocks, illustrations expressing schematic examples of the data processed in each part are drawn in parentheses from the viewpoint of promoting understanding of the explanation.

候補生成部1では2次元キーポイントデータD0を入力として受け取り、これに種々のN種類の加工処理Pi(i=1, 2, ..., N)を施すことで複数の加工データDi(i=1,2, ..., N)をそれぞれ得て、当該複数の加工データDiのそれぞれを検索部3へと出力する。参照対象生成部2では、予め構築されたデータベース等として用意されている参照用の複数（M個）の3次元キーポイントデータRj(j=1,2, ...,M)を入力として受け取り、これに変換処理等を施すことで所定姿勢での正規化された2次元キーポイントデータQj(j=1,2, ..., M)を生成し、当該生成した2次元キーポイントデータQjをそれぞれ探索部3へと出力する。 The candidate generation unit 1 receives the two-dimensional key point data D0 as an input, and applies various N types of processing Pi (i = 1, 2, ..., N) to the two-dimensional key point data Di (i). = 1,2, ..., N) are obtained respectively, and each of the plurality of machining data Dis is output to the search unit 3. The reference target generation unit 2 receives multiple (M) 3D key point data Rj (j = 1,2, ..., M) for reference prepared as a database constructed in advance as input. , By applying conversion processing etc. to this, the normalized 2D keypoint data Qj (j = 1,2, ..., M) in the predetermined posture is generated, and the generated 2D keypoint data Qj is generated. Is output to the search unit 3 respectively.

探索部3では、候補生成部1で加工して得られた複数の2次元キーポイントデータDiと、参照対象生成部2で生成して得られた複数の参照用の2次元キーポイントデータQjと、の間で最も類似しているものを探索する。説明上、当該探索により最類似と判定されたデータDi及びQjをi=i_max及びでj=j_maxでそれぞれ指定されるデータDi_max及びQj_maxとして表すものとすると、検索部3ではさらに、当該最類似判定された加工データDi_maxを候補生成部1に入力された元のデータD0に対する加工結果として出力する。なお、探索部3からの出力は、当該加工結果データDi_maxに代えて、あるいは加えて、元のデータD0に対する加工内容Pi_maxとしてもよい。 In the search unit 3, a plurality of two-dimensional key point data Di obtained by processing in the candidate generation unit 1 and a plurality of reference two-dimensional key point data Qj generated in the reference target generation unit 2 are used. Search for the most similar ones between. For the sake of explanation, assuming that the data Di and Qj determined to be the most similar by the search are represented as the data Di _max and Qj _max specified by i = i _max and j = j _max , respectively, the search unit 3 further The machining data Di _max determined to be the most similar is output as a machining result for the original data D0 input to the candidate generation unit 1. The output from the search unit 3 may be replaced with or in addition to the machining result data Di _max , and may be the machining content Pi _max for the original data D0.

なお、当該探索するために探索部3が算出するデータDi及びデータQjの類似度スコア（類似度が高いほど大きな値となるように類似度を数値化したスコア）をscore(Di,Qj)と表すと、最類似結果に対応するi=i_max及びj=j_maxは以下の(1)のように書くことができる。 The similarity score of the data Di and the data Qj calculated by the search unit 3 for the search (the score obtained by quantifying the similarity so that the higher the similarity is, the larger the value is) is called score (Di, Qj). Expressed, i = i _max and j = j _max corresponding to the most similar results can be written as (1) below.

以下、図２の各機能部の処理内容の詳細を参照対象生成部2、候補生成部1、探索部3の順に説明する。ここで、各機能部で2次元又は3次元のデータとして処理するキーポイントデータが、具体的には骨格関節データである場合を例として説明するが、キーポイントデータは骨格関節データ以外の種類のものであってもよい。これと同様に、図２のデータDiやデータQj等の模式例としての挿絵は、キーポイントデータが骨格関節データである場合の例として描かれているが、これは説明の明確化のための例示に過ぎず、キーポイントデータは骨格関節データ以外の種類のものであってもよい。なお、骨格関節データ以外の種類のキーポイントデータに関しては後述する。 Hereinafter, the details of the processing contents of each functional unit in FIG. 2 will be described in the order of the reference target generation unit 2, the candidate generation unit 1, and the search unit 3. Here, a case where the key point data to be processed as two-dimensional or three-dimensional data in each functional unit is specifically skeletal joint data will be described as an example, but the key point data is of a type other than skeletal joint data. It may be a thing. Similarly, the illustration as a schematic example of the data Di, data Qj, etc. in FIG. 2 is drawn as an example when the key point data is skeletal joint data, but this is for clarification of the explanation. The key point data may be of a type other than the skeletal joint data, which is merely an example. Key point data of types other than skeletal joint data will be described later.

＜参照対象生成部2＞
参照対象生成部2では、所定形式で全身骨格をモデル化したものとして予め用意されている既存の3次元骨格関節データセットの各データRjから正面姿勢の2次元骨格関節データを生成し、さらに当該生成した2次元骨格関節データを正規化することでデータQjを得る。所定形式の3次元骨格データセットとしては例えば、行動認識の学術・技術分野などにおいて評価用のデータセットとして利用されているMSRAction3D形式のものを利用することができる。 <Reference target generator 2>
In the reference target generation unit 2, the two-dimensional skeletal joint data of the frontal posture is generated from each data Rj of the existing three-dimensional skeletal joint data set prepared in advance as a model of the whole body skeleton in a predetermined format. Data Qj is obtained by normalizing the generated 2D skeletal joint data. As the 3D skeleton data set of the predetermined format, for example, the MSR Action 3D format used as a data set for evaluation in the academic / technical fields of behavior recognition can be used.

図３は当該3次元骨格データセットの一例としてのMSRAction3Dにおけるデータ形式を模式的に示すものである。（なお、前掲の図１も当該データ形式の模式例である。）図３に示されるように、MSRAction3Dのデータは予め定義されている20個の関節及び19本の骨（関節間の接続関係）を使って全身骨格をモデル表現するものであり、当該20個の関節のそれぞれの3次元座標を与えることによって人物等のキャラクタの姿勢を表現することができるものである。また、時系列上で当該データを与えることでキャラクタのジャスチャ等の行動を表現することができるものである。なお、図３の3次元骨格データのキャラクタは紙面にその顔が向かう状態を表すものであるため、紙面上の右側及び左側の関節が逆に当該キャラクタの左側及び右側の関節に対応している。 FIG. 3 schematically shows the data format in MSRAction3D as an example of the three-dimensional skeleton data set. (Note that FIG. 1 above is also a schematic example of the data format.) As shown in FIG. 3, the data of MSRAction3D is the predefined 20 joints and 19 bones (connection relationship between joints). ) Is used to represent the whole body skeleton as a model, and the posture of a character such as a person can be expressed by giving the three-dimensional coordinates of each of the 20 joints. In addition, by giving the data in time series, it is possible to express the behavior such as the gesture of the character. Since the character of the three-dimensional skeleton data in FIG. 3 represents the state in which the face faces the paper, the joints on the right and left sides of the paper correspond to the joints on the left and right sides of the character. ..

図３に示される通り、MSRAction3Dのデータでは関節及びその接続関係が次のように定義されている。すなわち、座高部分で「頭(head)－首(neck)－背骨(spine)－臀部中央(center hip)」が定義され、右腕側部分で「首(neck)－右肩(right shoulder)－右肘(right elbow)－右手首(right wrist)－右手(right hand)」が定義され、左腕側部分でも同様のものが左側として定義され、右脚側部分で「臀部中央(center hip)－臀部右側(right hip)－右膝(right knee)－右踵(right ankle)－右足(right foot)」が定義され、左脚側部分でも同様のものが左側として定義されている。 As shown in FIG. 3, the MSRAction3D data defines joints and their connection relationships as follows. That is, "head-neck-spine-center hip" is defined in the sitting height part, and "neck-right shoulder-right" in the right arm side part. "Right elbow-right wrist-right hand" is defined, the same is defined as the left side of the left arm side, and the right leg side is "center hip-buttock". "Right hip-right knee-right ankle-right foot" is defined, and the same is defined as the left side of the left leg side.

なお、参照対象生成部2にて読み込む3次元骨格データセットはこのようなMSRAction3Dの形式のものに限らず、同様に関節及びその接続関係をモデル化して定義した任意の所定形式のものを利用することができる。 The 3D skeleton data set read by the reference target generation unit 2 is not limited to the MSRAction3D format, and similarly, any predetermined format defined by modeling the joints and their connection relationships is used. be able to.

具体的に、参照対象生成部2では以下の手順21,22,23によって3次元形式の各データRjから2次元形式の各データQjを生成することができる。 Specifically, the reference target generation unit 2 can generate each data Qj in the two-dimensional format from each data Rj in the three-dimensional format by the following procedures 21, 22, 23.

（手順21）
以下の式(2-1),(2-2)で表現されるように、予め設定しておく種々のパラメータ候補θ_x,θ_y,θ_zのもとでそれぞれ透視投影変換（すなわち、3DのCG（コンピュータグラフィックス）等を扱う数学として既知の変換）を行うことにより、3次元データRjを2次元データBj(θ_x,θ_y,θ_z)にマッピングする。 (Procedure 21)
As expressed by the following equations (2-1) and (2-2), fluoroscopic projection conversion (that is, 3D) is performed under various preset parameter candidates θ _x , θ _y , and θ _z , respectively. 3D data Rj is mapped to 2D data Bj (θ _x , θ _y , θ _z ) by performing a transformation known as mathematics dealing with CG (computer graphics) and the like.

ここで、(a_x,a_y,a_z)は3次元骨格関節データの中の１つの3次元関節座標であり、(c_x,c_y,c_z)は事前に設定しておく3次元カメラ座標であり、各パラメータ候補としての(θ_x,θ_y,θ_z)は2次元投影するためのカメラの向き（オイラー角のうちTait-Bryan angles）であり、式(2-1)で計算される(d_x,d_y,d_z)を介して式(2-2)で得られる(b_x,b_y)が3次元骨格座標(a_x,a_y,a_z)の2次元投影結果としての座標となる。 Here, (a _x , a _y , a _z ) is one 3D joint coordinate in the 3D skeletal joint data, and (c _x , c _y , c _z ) is a 3D preset. It is the camera coordinates, and (θ _x , θ _y , θ _z ) as each parameter candidate is the direction of the camera for two-dimensional projection (Tait-Bryan angles of Euler angles), and is expressed in Eq. (2-1). The (b _x , by) obtained by Eq. (2-2) via the calculated (d _x , d _y , _d _z ) is the 2D of the 3D skeletal coordinates (a _x , a _y , a _z ). It becomes the coordinates as a projection result.

すなわち、3次元データRjを構成する各点(a_x,a_y,a_z)∈Rjを、上記の(2-1),(2-2)によりパラメータ(θ_x,θ_y,θ_z)の下で2次元データの各点(b_x,b_y)∈Bj(θ_x,θ_y,θ_z)へと変換することができる。 That is, each point (a _x , a _y , a _z ) ∈ Rj constituting the three-dimensional data Rj is set to the parameter (θ _x , θ _y , θ _z ) according to the above (2-1) and (2-2). Under, it can be converted into each point (b _x , by) _∈ Bj (θ _x , θ _y , θ _z ) of the two-dimensional data.

当該式(2-1),(2-2)による変換はカメラ位置座標(c_x,c_y,c_z)を固定してカメラ角度(θ_x,θ_y,θ_z)をパラメータとして振るものであったが、これに限定されず、3次元骨格関節データRjを様々な視点から見た際の2次元データとして表現するような任意のパラメータ設定及び変換関係式を利用してマッピングを行えばよい。すなわち、3次元データRjを所定カメラ（ボケ等は存在しない理論モデルとしてのピンホールカメラ）を用いて、パラメータで指定される様々な視点から撮影することで得られる画像平面上のデータとしてマッピングされた2次元データを得るようにすればよい。 The conversion according to the equations (2-1) and (2-2) is that the camera position coordinates (c _x , c _y , c _z ) are fixed and the camera angle (θ _x , θ _y , θ _z ) is used as a parameter. However, it is not limited to this, and if mapping is performed using arbitrary parameter settings and conversion relational expressions that express the 3D skeletal joint data Rj as 2D data when viewed from various viewpoints. good. That is, the 3D data Rj is mapped as data on the image plane obtained by shooting from various viewpoints specified by the parameters using a predetermined camera (a pinhole camera as a theoretical model in which blurring does not exist). It is only necessary to obtain two-dimensional data.

例えば、カメラ角度(θ_x,θ_y,θ_z)はカメラ位置から3次元データRjの所定点（例えば重心）に向かうものとして設定し、3次元データRjを覆う所定球面（例えば当該重心を中心とする所定サイズの球面）上から均一にサンプリングされるカメラ位置座標(c_x,c_y,c_z)をパラメータとしてもよい。ただし、次の手順22を適切に実施する観点からは、3次元骨格関節データRjを様々な視点から見る際の距離が大きく変わることはないようにパラメータ設定等を行うことが望ましい。 For example, the camera angle (θ _x , θ _y , θ _z ) is set from the camera position toward a predetermined point (for example, the center of gravity) of the 3D data Rj, and a predetermined spherical surface (for example, the center of the center) that covers the 3D data Rj is set. The camera position coordinates (c _x , _{cy, c z} ₎ sampled uniformly from above (a spherical surface of a predetermined size) may be used as a parameter. However, from the viewpoint of properly implementing the following procedure 22, it is desirable to set parameters so that the distance when viewing the 3D skeletal joint data Rj from various viewpoints does not change significantly.

なお、以下の説明において、2次元データにおける位置関係等を説明する際は、説明の直感的な把握を促す観点から、当該2次元データの座標に対応するもの等を適宜、「画像」等と称して、また、「画像」関連の用語（画素数やアスペクト比など）を適宜用いて、説明することとする。 In the following explanation, when explaining the positional relationship in the two-dimensional data, from the viewpoint of promoting an intuitive understanding of the explanation, the one corresponding to the coordinates of the two-dimensional data is appropriately referred to as an "image" or the like. In addition, terms related to "image" (number of pixels, aspect ratio, etc.) will be appropriately used for explanation.

（手順22）
上記手順21で各パラメータ(θ_x,θ_y,θ_z)に関して求めた2次元データBj(θ_x,θ_y,θ_z)の中から、所定姿勢の一例としての正面から3次元データRjを見たものに該当するデータBj(θ_x,θ_y,θ_z)_[正面]を決定する。 (Procedure 22)
From the two-dimensional data Bj (θ _x , θ _y , θ _z ) obtained for each parameter (θ _x , θ _y , θ _z ) in step 21, the three-dimensional data Rj is obtained from the front as an example of a predetermined posture. Determine the data Bj (θ _x , θ _y , θ _z ) _[front] that corresponds to what you see.

具体的に、2次元データBj(θ_x,θ_y,θ_z)のうちの所定関節を所定順番で辿ることで形成される多角形の面積をarea(Bj(θ_x,θ_y,θ_z))とすると、当該面積の最大値を与えるようなパラメータ(θ_x,θ_y,θ_z)を正面に該当するパラメータとして決定することができる。式で表現すれば以下の(2-3)の通りである。 Specifically, the area of the polygon formed by tracing the predetermined joints in the two-dimensional data Bj (θ _x , θ _y , θ _z ) in a predetermined order is the area (Bj (θ _x , θ _y , θ _z )). )), The parameters (θ _x , θ _y , θ _z ) that give the maximum value of the area can be determined as the parameters corresponding to the front surface. Expressed in the formula, it is as follows (2-3).

ここで、上記の面積area(Bj(θ_x,θ_y,θ_z))を算出する対象となる多角形を形成するための所定関節及びこれを辿る所定順番に関しては、参照対象生成部2で読み込んだ3次元骨格関節データで定義されている所定種類の関節に対応して、所定のものを定義しておけばよい。当該定義するに際しては、以下の第一及び第二性質を満たすものとして定義すればよい。
（第一性質）キャラクタのポーズ種別（データRjに対応するポーズ種別であり、キャラクタ自体の区別（大人と子供など）を含んでもよい。以下同様とする。）によらず、正面に該当する場合に形成される多角形の面積が最大となる。（なお、必ずしも厳密に真正面で厳密に最大となる必要はなく、人手による判断等で正面とされうる一定方向に近いほど面積が大きくなる傾向があればよい。）
（第二性質）キャラクタのポーズ種別が異なるもの同士であって、そのサイズが異なることがあったとしても、当該多角形の相対的な形状（3次元空間内で見た正面に該当する場合の形状）が大きく異なることはない。 Here, regarding the predetermined joint for forming the polygon to be calculated for the above area area (Bj (θ _x , θ _y , θ _z )) and the predetermined order to follow the joint, the reference target generation unit 2 is used. A predetermined one may be defined corresponding to a predetermined type of joint defined in the read 3D skeletal joint data. In making the definition, it may be defined as satisfying the following first and second properties.
(First property) When the character corresponds to the front regardless of the pose type of the character (the pose type corresponding to the data Rj, which may include the distinction between the characters themselves (adults and children, etc.). The same shall apply hereinafter). The area of the polygon formed in is maximized. (It should be noted that it is not always necessary to be strictly in front and to be strictly maximum, and it is sufficient that the area tends to increase as it is closer to a certain direction that can be regarded as the front by manual judgment or the like.)
(Second property) Even if the pose types of the characters are different and their sizes are different, the relative shape of the polygon (when it corresponds to the front seen in the three-dimensional space). The shape) does not differ significantly.

具体的に、図３で説明したMSRAction3Dのデータ形式の場合であれば、例えば図４の[1]又は[2]のような多角形（太線で示される多角形）を面積算出用として定義しておくことができる。図４の[1]は、「頭→左肩→臀部左側→臀部右側→右肩→頭」の順番で各頂点（すなわち各関節）が並んで形成される5角形を面積算出対象として定義する例である。図４の[2]は、「首→左肩→臀部中央→右肩→首」の順番で各頂点（すなわち各関節）が並んで形成される4角形を面積算出対象として定義する例である。 Specifically, in the case of the MSRAction3D data format described in FIG. 3, for example, a polygon (polygon indicated by a thick line) such as [1] or [2] in FIG. 4 is defined for area calculation. Can be kept. [1] in FIG. 4 is an example of defining a pentagon in which each vertex (that is, each joint) is formed side by side in the order of "head → left shoulder → left buttock → right buttock → right shoulder → head" as an area calculation target. Is. [2] in FIG. 4 is an example of defining a quadrangle formed by arranging each vertex (that is, each joint) in the order of "neck-> left shoulder-> center of buttocks-> right shoulder-> neck" as an area calculation target.

図４の[1],[2]のいずれの例も、キャラクタの胴体に概ね該当する部分の関節から多角形を定義しており、次の第一及び第二特性を有することによって前述の第一及び第二性質を満たすものである。
（第一特性）ポーズ種別が異なったとしても、（ポーズを取るキャラクタが違っている場合のサイズ変化を除き、）胴体形状自体はそれほど変化するものではない。
（第二特性）且つ、当該変化しない胴体形状は概ね平面的な形状であり、3次元空間内での胴体関連の関節座標は全て当該胴体を貫く仮想的な平面（正面向きに対応する平面）から一定距離内にあり、概ね共面の関係にある。 In each of the examples [1] and [2] of FIG. 4, a polygon is defined from the joint of the part generally corresponding to the body of the character, and the above-mentioned first and second characteristics are obtained by having the following first and second characteristics. It satisfies the first and second properties.
(First characteristic) Even if the pose type is different, the body shape itself does not change so much (except for the size change when the pose character is different).
(Second characteristic) Moreover, the unchanged fuselage shape is a generally planar shape, and all the coordinates of the fuselage-related joints in the three-dimensional space are virtual planes penetrating the fuselage (plane corresponding to the front direction). It is within a certain distance from, and is generally in a coplanar relationship.

なお、例えば胴体関連の関節ではなく、ポーズ種別の違いよって大きな相対的位置変化を伴いうる腕や脚の関節を含めて多角形を仮に定義したとすると、上記第一及び第二特性を有さないことからも、前述の第一及び第二性質を満たさないものとなってしまう。 It should be noted that, for example, if a polygon is tentatively defined including the joints of the arms and legs, which may be accompanied by a large relative position change due to the difference in pose type, instead of the joints related to the torso, it has the above-mentioned first and second characteristics. Even if it does not exist, it does not satisfy the above-mentioned first and second properties.

（手順23）
上記の手順22で得た正面データBj(θ_x,θ_y,θ_z)_[正面]を正規化することにより、当初の3次元データRjを所定姿勢で2次元マッピング及び正規化した結果としてのデータQjを得る。当該正規化は探索部3での類似データ探索を適切になし得るようにするためのものであり、具体的には以下のように大きさ（スケール）及び位置に関する正規化を行えばよい。 (Procedure 23)
By normalizing the frontal data Bj (θ _x , θ _y , θ _z ) _[frontal] obtained in step 22 above, the original 3D data Rj is 2D mapped and normalized in a predetermined posture. Get the data Qj. The normalization is to enable the search unit 3 to appropriately search for similar data. Specifically, the normalization regarding the size (scale) and the position may be performed as follows.

すなわち、大きさの正規化として、正面データBj(θ_x,θ_y,θ_z)_[正面]における骨格関節の位置座標のうち、所定の2点間の距離djが正規化用の所定の固定値d_[正規化]（例えば50画素値など）となるように当該正面データのサイズを縦横共に（すなわちアスペクト比を変えることなく）d_[正規化]/dj倍する。ここで、距離djを与える所定の2点については、正面化した骨格関節データにおいて当該2点間の長さがポーズ種別によらず安定して変わらない性質を有するようなものとして設定すればよい。例えば、図３のデータ形式の場合であれば、当該2点として「頭(head)及び首(neck)」、「首(neck)及び背骨(spine)」又は「背骨(spine)及び臀部中央(center hip)」のいずれか等を設定しておけばよい。なお、これらの例は手順22で正面化した際の胴体の中心部分に関連するものであり、2点間距離が安定していることが想定されるものである。 That is, as the normalization of the size, the distance dj between the predetermined two points among the position coordinates of the skeletal joint in the frontal data Bj (θ _x , θ _y , θ _z ) _[frontal] is a predetermined fixation for normalization. The size of the front data is multiplied vertically and horizontally (that is, without changing the aspect ratio) d _{[normalization]} / dj so that the value d _{[normalization]} (for example, 50 pixel value) is obtained. Here, the predetermined two points that give the distance dj may be set so that the length between the two points is stable and does not change regardless of the pose type in the frontalized skeletal joint data. .. For example, in the case of the data format of FIG. 3, the two points are "head and neck", "neck and spine" or "spine and center of the hip (spine)". You can set one of "center hip)". It should be noted that these examples are related to the central part of the fuselage when faced in step 22, and it is assumed that the distance between the two points is stable.

また、位置の正規化として、正面データBj(θ_x,θ_y,θ_z)_[正面]における骨格関節の位置座標のうち、所定の点が原点となるように並進移動を行う。当該並進移動により原点とする所定の点は例えば、大きさの正規化に用いた2点のうちのいずれか1点とすればよく、例えば「頭(head)及び首(neck)」の2点間距離で正規化した場合、首の座標が原点となるように設定しておけばよい。当該原点とする所定点の座標を(x_[基準],y_[基準])とすると、正面データBj(θ_x,θ_y,θ_z)_[正面]における各骨格関節の位置座標は正規化前の(x_{[正規化前]},y_{[正規化前]})から以下の式(2-4)の左辺のように正規化されることとなる。 In addition, as a normalization of the position, translational movement is performed so that a predetermined point is the origin of the position coordinates of the skeletal joint in the frontal data Bj (θ _x , θ _y , θ _z ) _[frontal] . The predetermined point as the origin by the translational movement may be, for example, one of the two points used for the normalization of the size, for example, two points of "head" and "neck". When normalized by the distance, the coordinates of the neck should be set to be the origin. Assuming that the coordinates of the predetermined point as the origin are (x _[reference] , y _[reference] ), the position coordinates of each skeletal joint in the frontal data Bj (θ _x , θ _y , θ _z ) _[frontal] are before normalization. From (x _{[before normalization]} , y _{[before normalization]} ), it will be normalized as shown on the left side of the following equation (2-4).

なお、以上の大きさ及び位置の正規化はいずれの順番で行っても同じ結果であるので、いずれの順番で行ってもよい。また、次に説明する候補生成部1において様々な角度に回転させた候補を生成することから、参照対象生成部2の手順23においては以上のような大きさ及び位置の正規化を行うものの、座標軸方向の正規化を行う必要はない。 Since the above normalization of the size and position gives the same result regardless of the order, the normalization may be performed in any order. In addition, since candidates rotated at various angles are generated in the candidate generation unit 1 described below, the above-mentioned size and position are normalized in the procedure 23 of the reference target generation unit 2, although the above-mentioned size and position are normalized. There is no need to perform axial normalization.

＜候補生成部1＞
候補生成部1では2次元の骨格関節データD0を受け取り、これに種々の幾何学変換としての複数の所定の加工処理Pi(i=1, 2, ..., N)を施すことで複数の加工データDi(i=1,2, ..., N)をそれぞれ得る。当該施す幾何変換としての加工処理Piは、そのうちいずれかの加工処理によって2次元データD0を所定姿勢としての正面の場合のデータへと、少なくとも近似的に補正しうるような複数の処理として所定のパラメータ範囲のものを設定しておけばよい。なお、当該パラメータ範囲の設定の方針としては、参照対象生成部2における手順21での透視投影変換のパラメータ設定の際の方針と概ね同じ方針、すなわち見る角度として球面上の全方位を網羅するようにする方針を採用することができる。当該見る角度の違いによる見え方の違いを網羅するようなものとして、複数の幾何学変換を設定しておくことが望ましい。 <Candidate generation unit 1>
The candidate generation unit 1 receives the two-dimensional skeletal joint data D0, and applies a plurality of predetermined processing Pis (i = 1, 2, ..., N) as various geometric transformations to the two-dimensional skeletal joint data D0. Obtain the machining data Di (i = 1,2, ..., N) respectively. The machining process Pi as the geometric transformation to be performed is predetermined as a plurality of processes that can at least approximately correct the two-dimensional data D0 to the data in the case of the front as a predetermined posture by one of the machining processes. You can set the parameter range. The policy for setting the parameter range is almost the same as the policy for setting the parameters for the perspective projection conversion in step 21 in the reference target generation unit 2, that is, the viewing angle covers all directions on the spherical surface. Can be adopted. It is desirable to set multiple geometric transformations so as to cover the difference in appearance due to the difference in viewing angle.

加工処理Piは具体的には例えば、正面化するための変形ないし回転変換を含むものとして、アフィン変換及び／又はホモグラフィ変換において当該変換の詳細を特定するパラメータを与えたものとして設定しておくことができる。なお、周知のように、アフィン変換はホモグラフィ変換のうちの特別の場合に該当する。候補生成部1では具体的には以下の手順11,12によってデータD0を加工処理Piで加工したデータDiを得ることができる。 Specifically, for example, the processing Pi is set to include a transformation or rotation transformation for frontalization, and is given a parameter for specifying the details of the transformation in the affine transformation and / or the homography transformation. be able to. As is well known, the affine transformation corresponds to a special case of the homography transformation. Specifically, in the candidate generation unit 1, the data Di obtained by processing the data D0 by the processing Pi can be obtained by the following procedures 11 and 12.

（手順11）
データD0∋(x_[変換前],y_[変換前])に対してアフィン変換又はホモグラフィ変換（処理Piに対応する所定パラメータで特定される変換）を行うことでデータEi∋(x_[変換後],y_[変換後])を得る。数学的関係として周知のように同次座標表現を用いて2次元でのアフィン変換（行列要素a_ij）及びホモグラフィ変換（行列要素h_ij）はそれぞれ以下の式(3-1),(3-2)のように表現できる。 (Procedure 11)
Data Ei∋ (x _{[conversion] by performing affine transformation or homography transformation (conversion specified by a predetermined parameter corresponding to the processing Pi) for data D0∋ (x [before conversion]} _, y _{[before conversion]} ) _After] , y _{[after conversion]} ) is obtained. As is well known as a mathematical relationship, the affine transformation (matrix element a _ij ) and homography transformation (matrix element h _ij ) in two dimensions using homogeneous coordinate representation are performed by the following equations (3-1) and (3), respectively. It can be expressed as -2).

（手順12）
上記手順11で得たデータEiを正規化することによって加工されたデータDiを得る。当該正規化は参照対象生成部2の手順23において既に説明した正規化と同じ処理をデータEiに対して適用すればよいので、重複した説明は省略する。 (Procedure 12)
The processed data Di is obtained by normalizing the data Ei obtained in step 11 above. Since the normalization may be performed by applying the same processing as the normalization already described in step 23 of the reference target generation unit 2 to the data Ei, duplicate explanations will be omitted.

なお、候補生成部1において読み込む2次元骨格関節データD0に関しては、例えば前掲の非特許文献１の手法等により当該データD0を抽出することによって用意することができる。非特許文献１では、キャラクタを撮影した画像から身体部分に関する信頼度（part confidence）マップ及び身体部分に関する親和度（part affinity）ベクトルマップを抽出したうえで当該両マップに対する深層学習によってデータD0を算出しており、この際にデータD0を構成する各関節の信頼度も算出される。ここで、次に説明する探索部3の処理を高精度に実現するために、データD0の各関節には当該信頼度も紐づけておくようにすることが好ましい。また、候補生成部1で読み込む2次元データD0は、参照対象生成部2で読み込む3次元データRjのデータ形式において定義されているのと同種類の関節に関して定義されたものとする。例えば3次元データRjが図３で説明したようなMSRAction3Dのデータ形式である場合、2次元データD0も当該データ形式で定義されている図３のような各関節の2次元座標を与えるものとして用意しておくようにする。 The two-dimensional skeletal joint data D0 read by the candidate generation unit 1 can be prepared by extracting the data D0 by, for example, the method of Non-Patent Document 1 described above. In Non-Patent Document 1, data D0 is calculated by deep learning for both maps after extracting a part confidence map for the body part and a part affinity vector map for the body part from the captured image of the character. At this time, the reliability of each joint constituting the data D0 is also calculated. Here, in order to realize the processing of the search unit 3 described next with high accuracy, it is preferable to associate the reliability with each joint of the data D0. Further, it is assumed that the two-dimensional data D0 read by the candidate generation unit 1 is defined for the same type of joint as defined in the data format of the three-dimensional data Rj read by the reference target generation unit 2. For example, if the 3D data Rj is in the MSRAction3D data format as described in FIG. 3, the 2D data D0 is also prepared to give the 2D coordinates of each joint as shown in FIG. 3 defined in the data format. I will try to do it.

＜探索部3＞
探索部3は、候補生成部1から得た加工処理Pi(i=1, 2, ..., N)ごとの加工データDi（2次元骨格関節データ）と、参照対象生成部2から得た複数の3次元骨格関節データRj(j=1,2, ..., M)がそれぞれ所定姿勢として正面化され2次元骨格関節データとして表現されたデータQjと、の間で最類似となるペアとしてのデータDi_max及びQj_maxを探索し、当該探索結果における加工データDi_max及び／又はその加工内容Pi_maxを加工装置10からの出力として出力する。探索部3では具体的に以下の手順31,32によって当該探索処理を行なうことができる。 <Search unit 3>
The search unit 3 is obtained from the processing data Di (two-dimensional skeletal joint data) for each processing Pi (i = 1, 2, ..., N) obtained from the candidate generation unit 1 and the reference target generation unit 2. The pair that is the most similar between the data Qj in which multiple 3D skeletal joint data Rj (j = 1,2, ..., M) are faced as predetermined postures and expressed as 2D skeletal joint data. The data Di _max and Qj _max are searched for, and the machining data Di _max and / or the machining content Pi _max in the search result is output as an output from the machining device 10. The search unit 3 can specifically perform the search process according to the following procedures 31 and 32.

（手順31）
各加工データDiに対して、データQj(j=1,2, ..., M)との類似度スコアscore(Di,Qj)を計算することで最類似となるデータQj_max[i]を探索し、その類似度スコアscore[i]=score(Di, Qj_max[i])を記録しておく。類似度スコアscore(Di,Qj)に関しては、当該２つのデータ間の距離L(Di,Qj)を以下の(4),(5),(6)のように計算し、当該距離が小さいほど類似度スコアが大きくなるように所定の減少関数fを用いてscore(Di,Qj)=f(L(Di,Qj))のように計算すればよい。 (Procedure 31)
For each machining data Di, the most similar data Qj _{max [i]} is obtained by calculating the similarity score score (Di, Qj) with the data Qj (j = 1,2, ..., M). Search and record the similarity score score [i] = score (Di, Qj _{max [i]} ). Regarding the similarity score score (Di, Qj), the distance L (Di, Qj) between the two data is calculated as shown in (4), (5), (6) below, and the smaller the distance, the smaller the distance. It may be calculated as score (Di, Qj) = f (L (Di, Qj)) using a predetermined decreasing function f so that the similarity score becomes large.

距離を計算する式(4)において、x(Di)_kはデータDiのk番目の所定関節の2次元位置座標であり、x(Qj)_kはデータQjのk番目の所定関節の2次元位置座標であり、データDi及びQj（の元となるRj）は同種データを採用する旨を前述した通り、これらk番目の所定関節の種類は同じものである。例えば図３のMSRAction3Dのデータ形式を用いる場合であったとして、Diのk=1番目の関節が「頭」であればQjのk=1番目の関節も「頭」である。KはDi及びQjのデータ形式にて定義されている関節の総数である。 In equation (4) for calculating the distance, x (Di) _k is the two-dimensional position coordinate of the k-th predetermined joint of the data Di, and x (Qj) _k is the two-dimensional position of the k-th predetermined joint of the data Qj. As mentioned above, the data Di and Qj (the original Rj) are the coordinates, and the same kind of data is adopted, and the types of these k-th predetermined joints are the same. For example, in the case of using the data format of MSRAction3D in FIG. 3, if the k = 1st joint of Di is the “head”, the k = 1st joint of Qj is also the “head”. K is the total number of joints defined in the Di and Qj data formats.

そして、式(4)の距離L(Di,Qj)を計算するための重みw_kの計算に関して、式(5)及び／又は(6)を利用することができ、c_kはDiのk番目の関節に関して前掲の非特許文献１等の手法により計算されている信頼度（当該関節に該当する確率が高いほど高い値として、非負の値で計算される信頼度）である。式(5)のTH1,TH2は当該信頼度に対する閾値であり、信頼度が低ければ重みの値を低い0に、中程度であれば重みの値を中程度の0.5に、高ければ重みの値を高い1に設定している。 Then, with respect to the calculation of the weight w _k for calculating the distance L (Di, Qj) of the equation (4), the equations (5) and / or (6) can be used, and c _k is the kth of Di. The reliability calculated by the method of Non-Patent Document 1 or the like described above for the joint (the higher the probability of corresponding to the joint, the higher the reliability calculated with a non-negative value). TH1 and TH2 in Eq. (5) are threshold values for the reliability. If the reliability is low, the weight value is set to low 0, if the reliability is medium, the weight value is set to medium 0.5, and if the reliability is high, the weight value is set to 0.5. Is set to high 1.

なお、式(5)及び式(6)の両方を用いる場合は、式(6)で計算される右辺のw_kを「正規化された信頼度c_k[正規化]」と読み替えて、当該「正規化された信頼度c_k[正規化]」を式(5)の閾値判定に利用するようにして、重みの値が0, 0.5, 1のいずれかとなるように計算すればよい。 When both equations (5) and (6) are used, the right-hand side w _k calculated in equation (6) should be read as "normalized reliability c _{k [normalization]} ". The "normalized reliability c _{k [normalization]} " may be used for the threshold determination in Eq. (5), and the weight value may be calculated to be 0, 0.5, or 1.

なお、データDiの各関節に信頼度が紐づいていない場合（信頼度の情報が利用できない場合）は、全ての重みを例えばw_k=1のように等しい値として設定して、式(4)で距離を計算すればよい。 If the reliability is not linked to each joint of the data Di (when the reliability information is not available), set all the weights as equal values such as w _k = 1, and formula (4). ) To calculate the distance.

また、式(4),(5),(6)の別の実施形態として、信頼度c_kを非負の値として算出しておき、w_k=c_kとすることで信頼度c_kをそのまま重みw_kとして採用して式(4)から距離を計算してもよい。また、式(5)は信頼度の値を3段階に閾値で分けてその大小に応じた重みを与える例であったが、同様に任意数の段階に分けて各段階に応じた重みを与えてもよい。同様に、式(5)の変形例として、所定の増加関数g（非負の値をとる）を用いてw_k=g(c_k)として重みを計算してもよい。 Further, as another embodiment of the equations (4), (5), and (6), the reliability c _k is calculated as a non-negative value, and the reliability c _k remains as it is by setting w _k = c _k . The distance may be calculated from Eq. (4) by adopting it as the weight w _k . In addition, Eq. (5) was an example in which the reliability value was divided into three stages by threshold values and weights were given according to their magnitudes. Similarly, the reliability values were divided into any number of stages and weights were given according to each stage. You may. Similarly, as a modification of Eq. (5), the weight may be calculated as w _k = g (c _k ) using a predetermined increasing function g (which takes a non-negative value).

（手順32）
上記の手順31によって各データDiに関して求まった最類似スコアscore[i]=score(Di, Qj_max[i])のうち、最大値に該当するものを最類似結果Di_maxとして決定し、対応する情報（加工情報Pi_max及び又は加工データDi_max）を結果として出力する。 (Procedure 32)
Of the most similar score score [i] = score (Di, Qj _{max [i]} ) obtained for each data Di by the above procedure 31, the one corresponding to the maximum value is determined as the most similar result Di _max and corresponds. Information (machining information Pi _max and / or machining data Di _max ) is output as a result.

以上、本発明は2次元骨格関節データD0を候補生成部1において種々の候補Diに加工して、多数の三次元データRjから参照対象生成部2でそれぞれ生成された正面化データとしてのデータQjと検索部3において対比することによって、撮影角度依存問題を解決することができる。すなわち、データD0が非特許文献１のような手法で画像から抽出して得られている場合における、撮影角度依存問題を解決することができる。ここで、画像データと比べると、骨格関節データは関節情報のような高次情報を持っており式(4)の計算に反映されることから、より正確に撮影角度の補正が可能となる。ここで更に、関節位置の信頼度も利用することにより、より正確に撮影角度の補正が出来る。 As described above, in the present invention, the two-dimensional skeletal joint data D0 is processed into various candidate Dis in the candidate generation unit 1, and the data Qj as the frontalized data generated by the reference target generation unit 2 from a large number of three-dimensional data Rj. By comparing with the search unit 3, the problem of shooting angle dependence can be solved. That is, it is possible to solve the shooting angle dependence problem when the data D0 is obtained by extracting from the image by the method as in Non-Patent Document 1. Here, compared with the image data, the skeletal joint data has higher-order information such as joint information and is reflected in the calculation of the equation (4), so that the shooting angle can be corrected more accurately. Here, further, by using the reliability of the joint position, the shooting angle can be corrected more accurately.

以下、本発明における説明上の補足を述べる。 Hereinafter, explanatory supplements in the present invention will be described.

（１）以上の説明では、キーポイントデータの例として全身骨格関節データを説明のために用いていたが、本発明は全身骨格データに限らず任意の全身またはその一部から抽出されるキーポイントデータについて同様に適用可能である。例えば、図５の[1A]に模式例を示すような顔の骨格関節データや、図６の[1B]に模式例を示すような手の骨格関節データに対しても、本発明は適用可能である。キーポイントは、当該キーポイントを追跡することでキャラクタの全身又はその一部の動きやジェスチャをモデル化するのに適した点であればよく、実際の関節やモデル化された関節その他の点（関節でなくともよい）として定義されたものであればよい。関節の場合、関節間の「骨」に相当するものが定義されていてもよいし、定義されていなくともよい。 (1) In the above description, whole body skeletal joint data is used for explanation as an example of key point data, but the present invention is not limited to whole body skeleton data, but key points extracted from any whole body or a part thereof. It is equally applicable to data. For example, the present invention can be applied to facial skeletal joint data as shown in FIG. 5 [1A] and hand skeletal joint data as shown in FIG. 6 [1B]. Is. A key point may be a point suitable for modeling the movement or gesture of the character's whole body or part of it by tracking the key point, and may be an actual joint, a modeled joint, or other point ( It does not have to be a joint) as long as it is defined as. In the case of joints, the equivalent of "bone" between joints may or may not be defined.

図５の[1A]の顔の骨格関節データは、左右の眉部分、左右の目の周囲部分、鼻筋及び鼻の下部分、唇の外周及び内周部分、左右の耳から頬を通って顎に至るまでの顔の境界部分、に関して関節がモデル化されたものである。この場合例えば[2A]に黒塗りで示された左右の目尻及び鼻先の3つの関節で形成される3角形を、参照対象生成部2の手順22における正面化のための面積area(Bj(θ_x,θ_y,θ_z))を算出するものとして設定しておけばよい。なお、候補生成部1への入力データとしての、顔の骨格関節データを2次元データとして画像から得る処理は例えば前掲の非特許文献５に開示の処理を用いればよい。3次元データも同様の関節に関して定義されたものを用意して参照対象生成部2に入力すればよい。 The facial skeletal joint data of FIG. 5 [1A] includes the left and right eyebrows, the left and right eye circumferences, the nasal muscles and the lower part of the nose, the outer and inner circumferences of the lips, and the chin from the left and right ears through the cheeks. The joints are modeled with respect to the borders of the face leading up to. In this case, for example, the triangle formed by the three joints of the left and right outer corners of the eyes and the tip of the nose shown in black in [2A] is the area area (Bj (θ)) for frontalization in step 22 of the reference target generator 2. It may be set as the one to calculate _x , θ _y , θ _z )). For the process of obtaining the skeletal joint data of the face as the input data to the candidate generation unit 1 from the image as the two-dimensional data, for example, the process disclosed in Non-Patent Document 5 described above may be used. As for the three-dimensional data, the one defined for the same joint may be prepared and input to the reference target generation unit 2.

図５の[1B]の手の骨格関節データは、手首関節から第１～第５指の各指の第三、第二、第一関節及び指先関節の間にそれぞれ骨を定義してモデル化したものである。この場合例えば[2B]に黒塗りで示された手首関節及び各指の第三関節で定義される6角形を参照対象生成部2の手順22における正面化のための面積area(Bj(θ_x,θ_y,θ_z))を算出するものとして設定しておけばよい。なお、候補生成部1への入力データとしての、手の骨格関節データを2次元データとして画像から得る処理は例えば前掲の非特許文献６に開示の処理を用いればよい。3次元データも同様の関節に関して定義されたものを用意して参照対象生成部2に入力すればよい。 The skeletal joint data of the hand in FIG. 5 [1B] is modeled by defining bones between the wrist joints and the third, second, first and fingertip joints of each of the first to fifth fingers. It was done. In this case, for example, refer to the hexagon defined by the wrist joint and the third joint of each finger shown in black in [ _2B ]. , θ _y , θ _z ))) may be set to be calculated. The process of obtaining the skeletal joint data of the hand as the input data to the candidate generation unit 1 from the image as the two-dimensional data may be, for example, the process disclosed in Non-Patent Document 6 described above. As for the three-dimensional data, the one defined for the same joint may be prepared and input to the reference target generation unit 2.

なお、顔の骨格関節データを用いる場合、本発明は例えば表情認識の前処理等として利用可能であり、手の骨格関節データを用いる場合、本発明は例えば手の操作内容や作業内容の認識の前処理等として利用可能である。 When using facial skeletal joint data, the present invention can be used, for example, as preprocessing for facial expression recognition, and when using hand skeletal joint data, the present invention is, for example, for recognizing hand operation contents and work contents. It can be used as pretreatment.

（２）本発明は所定姿勢の一例として正面向きに該当する結果を探索部3で得る場合に関して説明したが、正面を基準として傾いた方向にある所定姿勢に該当する結果を探索部3で得るようにすることもできる。この場合、参照対象生成部2の手順22で正面パラメータ(θ_x,θ_y,θ_z)_[正面]を求めることでデータRjの空間内での正面方向を把握したうえで、式(2-1),(2-2)の透視投影変換を当該正面方向から傾いた所定方向に対して適用することで、次の手順23による正規化処理の対象となる傾いた所定姿勢のデータBj(θ_x,θ_y,θ_z)_{[傾いた所定姿勢]}を用意するようにすればよい。 (2) The present invention has described the case where the search unit 3 obtains the result corresponding to the front direction as an example of the predetermined posture, but the search unit 3 obtains the result corresponding to the predetermined posture in the tilted direction with respect to the front. You can also do it. In this case, after grasping the front direction of the data Rj in the space by obtaining the front parameters (θ _x , θ _y , θ _z ) _[front] in step 22 of the reference target generation unit 2, the equation (2-). By applying the perspective projection conversion of 1) and (2-2) to the predetermined direction tilted from the front direction, the data Bj (θ) of the tilted predetermined posture that is the target of the normalization processing according to the following procedure 23. _x , θ _y , θ _z ) It suffices to prepare a _{[tilted predetermined posture]} .

（３）3次元データRjに予め正面方向が定義されている場合、参照対象生成部2における手順22のパラメータ探索を省略して、手順21では当該定義されている正面方向のみにおいて透視投影変換を行うことで手順23による正規化処理対象となる2次元データを用意すればよい。 (3) If the front direction is defined in advance in the 3D data Rj, the parameter search in step 22 in the reference target generation unit 2 is omitted, and in step 21, the perspective projection conversion is performed only in the defined front direction. By doing so, it is sufficient to prepare the two-dimensional data to be normalized by the procedure 23.

（４）予め2次元マッピングされたデータQjを用意しておくことで、参照対象生成部2におけるデータRjからQjを得る処理を省略してもよい。 (4) By preparing the two-dimensionally mapped data Qj in advance, the process of obtaining Qj from the data Rj in the reference target generation unit 2 may be omitted.

（５）加工装置10は一般的な構成のコンピュータとして実現可能である。すなわち、CPU（中央演算装置）、当該CPUにワークエリアを提供する主記憶装置、ハードディスクやSSDその他で構成可能な補助記憶装置、キーボード、マウス、タッチパネルその他といったユーザからの入力を受け取る入力インタフェース、ネットワークに接続して通信を行うための通信インタフェース、表示を行うディスプレイ、カメラ及びこれらを接続するバスを備えるような、一般的なコンピュータによって加工装置10を構成することができる。さらに、図２に示す加工装置10の各部の処理はそれぞれ、当該処理を実行させるプログラムを読み込んで実行するCPUによって実現することができるが、任意の一部の処理を別途の専用回路等（GPUを含む）において実現するようにしてもよい。 (5) The processing apparatus 10 can be realized as a computer having a general configuration. That is, a CPU (central processing unit), a main storage device that provides a work area for the CPU, an auxiliary storage device that can be configured with a hard disk, SSD, etc., an input interface that receives input from users such as a keyboard, mouse, touch panel, etc., and a network. The processing unit 10 can be configured by a general computer including a communication interface for connecting to and communicating with, a display for displaying, a camera, and a bus connecting them. Further, the processing of each part of the processing apparatus 10 shown in FIG. 2 can be realized by a CPU that reads and executes a program for executing the processing, but any part of the processing can be performed by a separate dedicated circuit or the like (GPU). It may be realized in).

10…加工装置、1…候補生成部、2…参照対象生成部、3…探索部 10 ... Processing equipment, 1 ... Candidate generation unit, 2 ... Reference target generation unit, 3 ... Search unit

Claims

A candidate generation unit that obtains multiple machining data by applying geometric transformation to each of the two-dimensional key point data extracted from the whole body or a part thereof.
Searching for the most similar data between the plurality of reference two-dimensional keypoint data obtained by two-dimensionally mapping a plurality of three-dimensional keypoint data in a predetermined posture and the plurality of obtained machining data. It comprises a search unit that outputs the most similar machining data and / or the information of the geometric transformation applied to the machining data.
A reference target generation unit for obtaining the plurality of reference two-dimensional keypoint data by two-dimensionally mapping the plurality of three-dimensional keypoint data in a predetermined posture is further provided.
The reference target generation unit performs fluoroscopic projection conversion specified by each of the plurality of parameters for each of the plurality of three-dimensional key point data to obtain a plurality of candidates for the reference two-dimensional key point data, and the plurality of candidates. A processing device for keypoint data, which is characterized by obtaining two-dimensional keypoint data for reference based on the one having the maximum area of a polygon formed with a plurality of predetermined keypoint data as vertices. ..

The reference target generation unit specifies the parameters of the perspective projection conversion in the polygon having the maximum area as those corresponding to the front surface, and determines the predetermined posture for two-dimensional mapping based on the specified front surface. The processing apparatus for the key point data according to claim 1 , wherein the processing apparatus relates to the key point data.

The processing apparatus for key point data according to claim 2 , wherein the predetermined posture is in the specified front surface or is tilted from the front surface.

The present invention according to any one of claims 1 to 3 , wherein the candidate generation unit applies the affine transformation or homography transformation specified by a predetermined parameter as the geometric transformation to obtain the plurality of processing data, respectively. Processing equipment for key point data of.

In the candidate generation unit, a geometric transformation is applied to further set a predetermined point in the two-dimensional keypoint data as a coordinate reference point, and the length between the predetermined two points in the two-dimensional keypoint data is used as a reference. The processing apparatus for key point data according to any one of claims 1 to 4 , wherein the plurality of processing data are obtained as scale-converted to a value.

For each of the plurality of reference two-dimensional key point data, a predetermined point in the two-dimensional key point data is set as a coordinate reference point, and the length between the predetermined two points in the two-dimensional key point data is a reference value. The processing apparatus for key point data according to any one of claims 1 to 5 , wherein the data is obtained as scale-converted so as to be.

Reliability is associated with each point of the two-dimensional key point data that is the source of obtaining a plurality of machining data in the candidate generation unit.
The processing apparatus for key point data according to any one of claims 1 to 6 , wherein when the search unit searches for the most similar one, the linked reliability is taken into consideration.

Each of the two-dimensional and three-dimensional key point data is characterized by being modeled as either whole body skeletal joint data, facial skeletal joint data capable of reflecting facial expressions, or hand skeletal joint data. The processing apparatus related to the key point data according to any one of claims 1 to 7 .

It is a processing method related to key point data executed by a computer.
Candidate generation stage to obtain multiple processing data by applying geometric transformation to 2D key point data extracted from the whole body or a part of it,
Searching for the most similar data between the plurality of reference two-dimensional keypoint data obtained by two-dimensionally mapping a plurality of three-dimensional keypoint data in a predetermined posture and the plurality of obtained machining data. It comprises a search step that outputs the most similar machining data and / or the information of the geometric transformation applied to the machining data.
Further provided with a reference target generation stage for obtaining the plurality of reference two-dimensional keypoint data by two-dimensionally mapping the plurality of three-dimensional keypoint data in a predetermined posture.
In the reference target generation stage, each of the plurality of three-dimensional key point data is subjected to fluoroscopic projection conversion specified by each of the plurality of parameters to obtain a plurality of candidates for the reference two-dimensional key point data, and the plurality of candidates are obtained. A processing method for keypoint data, which comprises obtaining two-dimensional keypoint data for reference based on the one having the maximum area of a polygon formed with a plurality of predetermined keypoint data as vertices. ..

A program for keypoint data, characterized in that the computer functions as a processing device for the keypoint data according to any one of claims 1 to 8 .