JP3855050B2

JP3855050B2 - Clothing state estimation method and program

Info

Publication number: JP3855050B2
Application number: JP2002296128A
Authority: JP
Inventors: 泰代喜多; 伸之喜多
Original assignee: National Institute of Advanced Industrial Science and Technology AIST
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 2002-04-26
Filing date: 2002-10-09
Publication date: 2006-12-06
Anticipated expiration: 2022-10-09
Also published as: JP2004005361A

Description

【０００１】
【発明の属する技術分野】
本発明は、観測画像情報から衣類などのように大きく変形する柔軟物体の状態を算出する方法及びプログラムに関する。ロボットが柔らかい物をハンドリングする際に必須な技術であり、これが実現されると、ロボットをより一般的な状況で活躍させることが可能となる。
【０００２】
【従来の技術】
ロープ操作のための視覚認識など線状形状対象の研究は行われているが、衣類のようにさらに広がりのある対象では、変形の自由度が大きく増し、また、複雑な自己遮蔽も生じるため、さらに問題は難しい。あらかじめ、起こり得るすべての見え方を実際に画像入力して、この見え方をモデルとして利用する方法が提案されているが、対象衣類ごとに多くの労力を必要とする。
【０００３】
【発明が解決しようとする課題】
本発明は、係る問題点を解決して、衣類のように複雑かつ大きく変形する対象に関しても、簡易な対象モデル、簡易な画像入力装置を用いて、その状態を推定する技術を開発することを目的としている。
【０００４】
【課題を解決するための手段】
本発明の衣類状態推定方法及びプログラムは、対象衣類の状態を入力画像から推定する。衣類の大まかな基本形状を表すモデルを作成し、このモデルを用いて、一点で把持した時に起こり得る形状をすべて予測する。この予測形状の解析に応じて、一点で把持された前記対象衣類を撮影した入力画像から得られる特徴量を処理し、どの予測形状に最も近いかを算出することにより、その状態を推定する。
【０００５】
【発明の実施の形態】
以下、例示に基づき、本発明を説明する。図１は、本発明を、トレーナーの形状推定に適用した場合のフロー図を示している。
ステップ1:
衣類の大まかな基本形状を表すモデルを作成する。
【０００６】
図２は対象がトレーナーの場合の一例で、前身頃と後ろ身頃が離れることは想定しない、最大４連結が可能なノードで構成される平面的なモデルで表される。図中、Ｎ1〜Ｎ１３は各ノード点を表し、左右袖部と身頃は異なるパートとして扱われる。ノード間の距離は、身頃幅、着丈、袖丈のような大まかな寸法から算出する。
【０００７】
ステップ2:
モデルを用いて、一点で把持した時に起こり得る形状を予測する。
この予測方法は、必要な予測精度に応じて、種々な方法がとり得るが、衣類のへりに近い一点で空中に把持された時の形状の簡易予測は次のように実現できる。
以後、この予測手法を予測手法Aと呼ぶ。今、あるノードの位置が既にフィックスされているとする。もし、現在のノードが把持による最初の固定ノードである時には、この方向は、鉛直上方向、(0,1)とする。
【０００８】
ノード間の距離は固定とし、まだフィックスされていない隣接ノードの方向は、図3aに示すように、隣接ノードの状態に応じて定めた角度θで決定する。図3ａの(1)〜(6)において、白丸は現在のノード、二重丸は現在のノードの位置を決定する際に既にその位置が決定されていて、親となったノード、黒丸はまだ、位置が決定されておらず、子となるノードを表す。図3ａの(2)〜(6)は、隣接ノードの起こりえる状態を示し、これらの図中の角度 θ1、θ2、θ3は、対象の剛性に応じて変化させる。
【０００９】
図3bは、ノード10が把持された場合に、この規則によって決定した例を示す。重力による影響を次のように考慮する。異なる部分が接合した関節部分は、図2のモデルでは、２つの肩の部分に相当するが、その接続によって支えられる、より下部の部分に関しては、その重心がその関節部分の真下に来るように、関節回りに回転する。図3cは、この操作後の形状を示す。
把持ノードを変えた場合の結果を図4に示す。State１〜13は、図２中の各ノードを保持した場合に対応した予測形状を示す。
【００１０】
ステップ3:
必要なタスクに応じて、予測形状を解析する。
まず、タスク遂行のために必要な識別すべき状態に予測形状をクラス分けする。「両肩で把持する」というタスクが与えられる場合、予測形状の状態は次の3つにクラス分けされる。
状況 A: 既に一つの肩で把持されている。State 1, 3.=> 残りの肩の位置を指示。
状況 B: 肩で把持されていないが、少なくとも一つの肩が(把持しやすい)凸の状態である。State 2, 10, 11, 12, 13.=> 一つめの肩の位置を指示。
状況 C: それ以外(両肩が凹の状態で把持が難しい状況)。State 4, 6, 7, 8,
9.=> 状況Bとする位置を指示。
【００１１】
この例では、すべての予測形状において、袖の先端が常に最下点に位置し、識別しやすい上、この点を把持すれば、状況BのState 11か 13の状態にもって行ける。したがって、状況Cでは、具体的には、=>最下点を指示、とする。
予測された形状に基づき、検出された衣類領域から比較的容易かつ頑健に抽出が可能な特徴を状態判定に用いる。この例では、
I) 最下点の位置。
II) 身頃部分の中心位置。
である。
【００１２】
ステップ4:
カメラから、へりの一点で把持された対象衣類を撮影した画像を入力する。
ステップ5:
画像から衣類領域を抽出する。
ステップ6:
抽出領域と各予測形状モデルとの比較を次のように行い、状態を推定する。
まず、実際の把持点から最下点までの距離の差が小さい順に候補を選択する。抽出領域のうち、水平にあるしきい値以上の幅を持つ領域を身頃領域候補とする。予測形状モデルの身頃中央（ノード5)の垂直位置がこの領域内に含まれていない場合は候補から削る。
【００１３】
ステップ7:
衣類領域内の各部の位置を次のように求め、次に把持すべき点を指示する。
得られた候補の状態が、状況A、Bの場合には、これらの形状モデルを把持点の位置を基準に画像内に重ね合わせる。衣類領域に含まれていないノードに関しては、領域内に含まれるまで、そのノードを水平方向中央に向かって移動する。この処理により、画像上における各ノードに対応する部位の推定を行い、肩に対応するノード1、3に最も近い領域境界の内側の点を次の把持位置として示す。
第一候補が状況Cの場合には、衣類領域最下点を次の把持位置として指示する。
【００１４】
ステップ8:
算出した指示点を把持した場合の形状を予測し、把持後に実際に観測される状態と一致するかを調べることにより、先の状態推定が正しいかどうかを検証する。
ここで、２点把持の状態では、1点把持の場合にくらべ、領域形状の不確定要素が減るため、予測形状と観測領域の比較は、比較的簡単なテンプレートマッチングで行える。基本的には、モデル上の把持位置を、実際の把持位置にあわせて、その重なりを調べれば良い。
「状態推定に基づき、次の把持点を指示」(ステップ6、7)、「把持」、「検証」(ステップ8)を繰り返すことにより、「両肩で把持する」状況に持っていくことができる。ただし、２度目以降の状態推定においては、その前の予測により状態が限定できることを利用し、処理を高速、頑健にする。
【００１５】
次に、対象衣類を図８に表すようなより複雑なモデルで表すことにより、より精度の良い衣類変形予測を行うことについて説明する。以後この予測手法を予測手法Bと呼ぶ。
対象とするトレーナーに対して、その身幅、身丈、袖丈の3つの大まかなサイズに基づき、図８に示す、前身頃と後ろ身頃が離れることは想定しない平面的な簡易モデルを作成する。２０点の代表点を表すノードからなり、各ノード間は、図中直線で示すようにバネで連結される。バネは次の３種類からなる。
【００１６】
K_1タイプバネは、各ノードとその４近傍ノードの一点とを結ぶ。K_2タイプバネは、各ノードとその８近傍ノードから４近傍ノードを除いたものの一点とを結ぶ。K_3タイプバネは、各ノードと一つ間をおいた隣のノードとを結ぶ。ただし、同じ部分内(身頃、右袖、左袖)だけに導入する。例をあげると、N1-N3, N1-N7, N3-N19ノード間などである。K_1,K_2タイプは衣服の伸縮性を、K_3タイプは、その折れ曲がりやすさを表現する。ここで、４つのノードからなる四角形の要素を考えたときに、その辺で隣接するノードを4近傍、その対角線で隣接するノードを8近傍と呼ぶ。
【００１７】
このモデルを用いて、このトレーナーを1点保持した時に、どのような状態が存在しうるかを、次のようにシミュレーションすることにより、予測する。ここで、Y軸を重力方向に一致させた左手系３次元座標系を基準座標系とする。まず、対象モデルをXZ平面に水平な床面に図９aのように広げる。重力は、すべてのノードに働くとする。保持点と仮定する一つのノードを垂直方向(Y軸負方向)に持ち上げて行き、その変形をシミュレーションする(図９b)。
【００１８】
図９cは、その結果得られた、空中に保持した状態を示す。ただし、1点での保持では、保持点に対してY軸廻りの回転要素が始めにおかれた床面での方向などに応じて大きく変化する。本発明者が想定している現実の保持では、一点ではなく、小面積を保有する平らな2つの板状平行面で対象を保持するグリップでの保持となるので、この回転の不確定要素をこのグリップの面の向きが決定すると考えられる。そこで、画像が、常にこのグリップ面に垂直な方向から撮影される状況を想定し、モデル予測形状の保持点近傍での面の法線方向が視線と一致する条件で、予測見え方形状を算出する。図９dは、この結果を示す。
【００１９】
図１０に、各ノード点で保持した、State1からState２０まですべての予測見え方形状算出の結果を示す。この算出において、衣類モデルのバネ定数は、その結果生じる衣類の張りが実際の状態に近いようにマニュアルで決定し、具体的にK_1、K_2、K_3のそれぞれのバネ係数を20000,2000、そして200に設定した。
段落［0010］に記述した分類において、予測手法Bを用いた場合の状態の対応は、それぞれ、以下のようである。
状況A: State 1,3
状況B: State 2,7,9,13,15,17,19
状況C: State 4,5,6,8,10,11,12,14,16,18,20
【００２０】
予測手法Bを用い、観測衣類領域の予測がより実際に近い状態で行えれば、予測領域と観測領域の重なり具合いを直接予測の確らしさの判定に用いることが可能となり、状態推定過程も下記のような手順となる。
あらかじめ、各状態の予測見え方形状から下記の２点の特徴を抽出しておく。
I) 予測見え方形状の最下点の位置: L_m(L_x,L_y)
保持点の位置に対する相対２次元座標値。
II) 直接保持されずに、ぶら下がった袖の数: N_m
最下点近傍にぶらさがっている袖の数で、State 2,5,8,11では、N=2。
【００２１】
観測画像が入力されたら、把握している保持マニピュレータのグリップの３次元位置に基づき、観測画像上の把持点の位置を検出する。衣類領域は、この保持点をヒントに、その真下にある領域特徴が一定である連続領域として抽出する。この領域内で一番下部にある座標をL_oとする。L_oの上方近傍の領域の画像水平方向の幅の平均値を算出し、この値が一つの袖幅以上であれば、N_o=2、それ以外をN_o=1とする。
【００２２】
まず、観測状態とぶら下がり袖数が等しく、最下点の位置関係が近いモデルだけを選択する。具体的に、State m_i(m_i=1-20)のうち、以下の条件を満たす状態を選択する。
N_{m_i} = N_o
|Ly_o-Ly_{m_i}| < C_1
|L_o-L_{m_i}| < C_2
C_1、C_2は、ノードをどれくらい密に設定したか、すなわちノード間の距離に応じて変動する。この実験では、C_1、C_2を一番長いノード間距離のそれぞれ60%と 100%に設定した。
【００２３】
条件を満たした状態の見え方モデルはこれらの形状モデルを保持点の位置を基準に画像内に重ね合わせる。
実際には、身頃部分の形状が予測の難しい折れにより大きく変動するのに対して、ぶら下がった袖部は幅の狭い分、予測形状からの変動が少ない。このことを考慮し、この部分がより実際の観測位置に一致するように見え方モデルの部分的な修正を下記のように行う。ただし、ここでぶら下がった袖部とは、直接保持されていない袖の部分を指すこととする。また、その外側の輪郭とは、(腋窩側ではなく)肩側の輪郭(例としてN1-N13-N15)をさす。
【００２４】
1) 見え方モデルのぶら下がり袖の外側の輪郭を図１１aの太い灰色の線で示すように、垂直方向に観測画像のL_oと等しい位置まで移動する。
【００２５】
2) 外側輪郭上の最下点を除く2つのノード点から水平方向に最も近い観測エッジ(背景と衣類領域の境界画素)をそれぞれ探索する。図１１aの矢印で示すような、元の位置からその点までの水平方向符合付き距離をda、dbとする。もし、この二点の移動量が予測される誤差範囲内で、線の傾きを大きく変化させなければ、この修正を取り入れる。具体的に、下記の3つの条件がすべて満たされていれば、その袖部に属する６つのノード点すべてを図１１bに示すように(da+db)/2移動する。
|da-db| < C3
|da| < C2
|db| < C2
ここで、C３は予測と観測状態における袖の傾きの違いを何度まで許容するかに応じて決定する。
【００２６】
3) この状態で、領域重なり率、Rを算出する。領域重なり率、Rとは、 (重なり部分の面積)/（観測領域面積）と(重なり部分の面積)/（予測形状面積）を足したもの(0〜2.0)である。この値が最大となった見え方モデルの値がしきい値C_4を越えれば、これを推定状態とする。この条件が満たされなかった場合には、候補なしと判定する。
【００２７】
選出された状態がクラスAもしくはBであった場合、部分的修正後の最終状態において、肩に対応するノード1もしくは3にもっとも近い観測衣類領域のエッジ点を次の保持点とする。もし、クラスCの状態が選出された場合には、領域の最下点を次の保持点とする。どの状態も選出されなかった場合には、その旨を宣言し、やはり最下点を指示する。先にも述べたように、この指示によって、次の保持により、必ず、State 15 もしくは19の状態になる。
【００２８】
【実施例】
まず、予測手法Aを用いた実施例の結果を図5、6、7に示す。ノード5を除いた12ノードに対応する位置でトレーナーを把持して撮影した、12画像を用いて実施した結果、このうち6枚の画像において、正しい状態が第１候補として選択された。図5はState 10が正しく選択された時の例を示す。
図5aが入力画像であり、図5bは選択された予測形状(State 10)を、原画像を二値化処理した結果に重ねて表示したものであり、図5cは、ステップ７の処理後の変形した予測形状を原画像に重ねて表示したものである。図5c上のクロスが、自動算出された次に把持すべき位置を示す。
【００２９】
図6に、それぞれ、状況A、B、Cにおいて、算出された指示点（図中のクロス)の例を示す。左から、順に(a)、(b)がState 1，(c)、(d)がState 2, (e)、(f)が State 9が選択された例で、(a),(c),(e)が入力画像で、(b),(d),(f)が最終結果を示す。State 9においては、2.2で述べたように、最下点が次の指示点として選ばれている。
【００３０】
図7に、検証過程の例を示す。図7aは図5cで算出された指示点を、実際に把持した後の状況である。この把持により予測される形状が重ねて表示してあるが、多少のずれを除き、よく一致している。
これに対し、第2候補として選択された、図7bの指示点を把持した場合を、図7cに示す。この時の予測形状が重ねて表示されているが、下部にぶら下がる袖の部分の領域が予測形状と大きくはずれており、始めの推定が間違っていたことを知ることができる。
【００３１】
予測手法Bを用いると、より推定正解率が上がる。先と同様な１２画像を用いて処理した結果の１例を図１２に示す。この実施例において、しきい値、C_1、C_2はあらかじめ与えた身丈、身幅、袖丈のサイズから自動決定し、C_3は、袖領域の予測と観測のずれが１５度まで許容するように設定し、C_4は約70%の重なりをもって正解とするために、1.4と設定した。
【００３２】
図１２に結果の一例を示す。画像上の保持点の位置は手動で与え、その点の下にある、固定しきい値で得られる最も大きな明るい連結領域を観測衣類領域として自動抽出した。図１２a、bはそれぞれ原画像と、抽出された衣類領域を示す。図１２c,d,eは、袖の部分を修正する前の領域重なり率が第一、第二、第三番目に大きい順に候補の状態を示す。図１２fは、第一番目の状態が、部分的に修正された後の見え方モデル状態を示している。この状態での Rが1.64で、すべての見え方モデルの中で最大となったため、推定結果と決定され、図中クロスで示される位置を次に保持すべき点として算出した。
【００３３】
図１３にもう一例示す。図１３aは原画像であり、図１３b,c,dは、袖の部分を修正する前の領域重なり率が第一、第二、第三番目に大きい順に候補の状態を示す。この例では、第一、第二番目が部分修正の処理で、成功せずに落ち、図１３eに示す第三番目の修正状態が重なり率最大となり選択された。
総合して、１２例のうち９例で正しい状態が選択され、正しい指示点が示された。図１４に状況A、状況B、状況Cの各成功例を示す。残り３例のうち１例については、どの状態も選択されず、「状態の未決定」の結果となった。正しい状態が、第一候補として選択されたが、予想外の袖の折れのため、最終決定はされなかった。この場合、不確定と判断して、最下点を次の保持点として指示し、状況クラスBの状態を目指す。
【００３４】
１２例のうちの残り２例だけで、間違った状態が推定された。この例を図１５に示す。正しくは State ４だが、State 7が選択された。主な要因は、現在のシミュレーションでは考慮されていない身頃部の大きな折れが起きているためである。
さらに、ノードに対応する位置の中間点を保持し、もっとも予測状態と異なる状態で撮影された16画像を用いて、同様に実験を行った。これに対しても、12例で正しい状態が選択され、正しい指示点が示された。一例で「状態不確定」の判定、残り三例で誤った状態を推定した。このように、かなり粗い代表点で対象を表しているが、予測見え方算出に使ったノード位置と実際の保持点が離れている場合でも、性能が大きく落ちる現象は見られなかった。
【００３５】
検証過程の試行例を図１６に示す。図１６aは、図１２fに示された指示点を保持した場合のシミュレーション過程を示す。図１６bは、実際にその点を保持した状態の入力画像に、シミュレーション結果を重ねて表示してある。予測形状はかなり重なっている。それに対して、図１６dは、図１６cに示すように間違った状態を推定し、クロスのような指示点が出された場合の、観測画像と予測画像を重ねたものである。重ならない部分の割合が非常に大きく、推定を誤ったことの認識が可能である。また、先に述べた簡易予測手法とは異なり、予測手法Bを用いれば、図１６aの中央に示すような、変形途中の状態を検証に用いることも可能である。
【００３６】
【発明の効果】
従来、衣類のように大きく変形する柔軟物の状態推定を簡易に行うことは不可能であった。今回、対象衣類の種別（トレーナーなど)と、身頃幅、着丈、袖丈のような大まかな寸法を与えるだけで、その起こり得る状態を大まかに算出し、これを利用して状態推定を行う手法により、これが可能となった。この技術は、ロボットが自律的に衣類のような柔軟物を取り扱うことの実現に大きく貢献する。
【図面の簡単な説明】
【図１】トレーナーの状態推定に実施した際のフロー図。
【図２】衣類の簡易モデルの一例を示す図。
【図３】前身頃と後ろ身頃が離れることは想定しない、最大４連結が可能なノードで構成される平面的なモデルを示す図。
【図４】トレーナーがへりに近い1点で把持された時の予測形状を示す図。
【図５】実施例 1を示す図。
【図６】状態推定後の「両肩を把持」のための指示実施例を示す図。
【図７】検証処理実施例を示す図。
【図８】より複雑なモデルで表した対象衣類を示す図。
【図９】対象モデルを示す図。
【図１０】各ノード点で保持した予測見え方形状算出の結果を示す図。
【図１１】見え方モデルの部分的な修正を説明するための図。
【図１２】結果の一例を示す図。
【図１３】結果の別の例を示す図。
【図１４】状況Ａ，状況Ｂ，状況Ｃの各成功例を示す図。
【図１５】間違った状態が推定された例を示す図。
【図１６】検証過程の試行例を示す図。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a method and a program for calculating a state of a flexible object that deforms greatly, such as clothing, from observation image information. This is an indispensable technology for handling a soft object by the robot, and when this is realized, the robot can be used in a more general situation.
[0002]
[Prior art]
Research on linear objects such as visual recognition for rope manipulation has been done, but for objects that are further spread like clothing, the degree of freedom of deformation greatly increases, and complex self-shielding also occurs, The problem is even more difficult. Although a method has been proposed in which all possible appearances are actually input in advance and this appearance is used as a model, a lot of labor is required for each target clothing.
[0003]
[Problems to be solved by the invention]
The present invention solves such problems and develops a technique for estimating the state of a subject that is complicated and greatly deformed like clothing using a simple target model and a simple image input device. It is aimed.
[0004]
[Means for Solving the Problems]
The clothing state estimation method and program of the present invention estimate the state of the target clothing from the input image. A model representing the rough basic shape of clothing is created, and this model is used to predict all the shapes that can occur when gripping at one point. In accordance with the analysis of the predicted shape, the feature amount obtained from the input image obtained by photographing the target clothing held at one point is processed, and the predicted shape is calculated by calculating which predicted shape is closest to the target shape.
[0005]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described based on examples. FIG. 1 shows a flow chart when the present invention is applied to shape estimation of a trainer.
step 1:
Create a model that represents the basic shape of clothing.
[0006]
FIG. 2 shows an example of the case where the target is a trainer, which is represented by a planar model composed of nodes capable of connecting up to four, not assuming that the front body and the back body are separated. In the figure, N1 to N13 represent node points, and the left and right sleeves and the body are treated as different parts. The distance between nodes is calculated from rough dimensions such as body width, length, and sleeve length.
[0007]
Step 2:
The model is used to predict the shape that can occur when gripping at a single point.
Various methods can be used for this prediction method according to the required prediction accuracy. Simple prediction of the shape when grasped in the air at a point close to the edge of the clothing can be realized as follows.
Hereinafter, this prediction method is referred to as prediction method A. Assume that the position of a certain node is already fixed. If the current node is the first fixed node by gripping, this direction is the vertical upward direction (0, 1).
[0008]
The distance between the nodes is fixed, and the direction of the adjacent node that has not been fixed is determined at an angle θ determined according to the state of the adjacent node, as shown in FIG. 3a. In (1) to (6) of FIG. 3a, the white circle is the current node, the double circle is already determined when the position of the current node is determined, and the parent node, the black circle is still The position is not determined and represents a child node. (2) to (6) in FIG. 3a show the possible states of adjacent nodes, and the angles θ1, θ2, and θ3 in these drawings are changed according to the rigidity of the object.
[0009]
FIG. 3b shows an example determined by this rule when the node 10 is gripped. Consider the influence of gravity as follows. The joint part where the different parts are joined corresponds to the two shoulder parts in the model of FIG. 2, but with respect to the lower part supported by the connection, the center of gravity is directly below the joint part. Rotate around the joint. FIG. 3c shows the shape after this operation.
FIG. 4 shows the result when the gripping node is changed. States 1 to 13 indicate predicted shapes corresponding to the case where each node in FIG. 2 is held.
[0010]
Step 3:
Analyze the predicted shape according to the required task.
First, the predicted shapes are classified into the states to be identified necessary for performing the task. When a task of “gripping with both shoulders” is given, the state of the predicted shape is classified into the following three classes.
Situation A: Already gripped by one shoulder. State 1, 3. => Indicates the remaining shoulder position.
Situation B: Not gripped by shoulder, but at least one shoulder is convex (easy to grip). State 2, 10, 11, 12, 13. => Indicates the position of the first shoulder.
Situation C: Otherwise (situations where it is difficult to grip with both shoulders concave). State 4, 6, 7, 8,
9. => Indicate the position of situation B.
[0011]
In this example, the tip of the sleeve is always located at the lowest point in all predicted shapes, and it is easy to identify, and if this point is grasped, it can be brought to the state 11 or 13 state B. Therefore, in situation C, specifically, => indicates the lowest point.
Based on the predicted shape, a feature that can be extracted relatively easily and robustly from the detected clothing region is used for state determination. In this example,
I) The position of the lowest point.
II) The center position of the body part.
It is.
[0012]
Step 4:
From the camera, an image obtained by photographing the target clothing held at one point of the edge is input.
Step 5:
Extract clothing regions from the image.
Step 6:
The state is estimated by comparing the extracted area with each predicted shape model as follows.
First, candidates are selected in ascending order of the difference in distance from the actual gripping point to the lowest point. Among the extracted areas, a horizontal area having a width equal to or larger than a threshold is set as a body area candidate. If the vertical position of the body center (node 5) of the predicted shape model is not included in this region, it is deleted from the candidate.
[0013]
Step 7:
The position of each part in the clothing area is obtained as follows, and the point to be gripped next is indicated.
When the obtained candidate states are situations A and B, these shape models are superimposed on the image based on the position of the gripping point. For a node not included in the clothing area, the node is moved toward the horizontal center until it is included in the area. By this processing, the part corresponding to each node on the image is estimated, and the point inside the region boundary closest to the nodes 1 and 3 corresponding to the shoulder is indicated as the next gripping position.
If the first candidate is situation C, the lowermost point of the clothing area is designated as the next gripping position.
[0014]
Step 8:
It is verified whether or not the previous state estimation is correct by predicting the shape when the calculated indication point is gripped and checking whether it matches the state actually observed after gripping.
Here, in the two-point gripping state, the area shape indeterminate element is reduced as compared with the one-point gripping state, and therefore the comparison between the predicted shape and the observation region can be performed by relatively simple template matching. Basically, it is only necessary to match the gripping position on the model with the actual gripping position and examine the overlap.
`` Instruct the next gripping point based on state estimation '' (steps 6 and 7), `` grip '', and `` verify '' (step 8) can be taken to the `` grip with both shoulders '' situation it can. However, in the second and subsequent state estimation, the state can be limited by the previous prediction, and the processing is made fast and robust.
[0015]
Next, a description will be given of performing clothing deformation prediction with higher accuracy by representing the target clothing by a more complicated model as shown in FIG. Hereinafter, this prediction method is referred to as prediction method B.
For the target trainer, based on the three rough sizes of the body width, the body height, and the sleeve length, a simple planar model that does not assume that the front body and the back body are separated is created as shown in FIG. It consists of nodes representing 20 representative points, and each node is connected by a spring as shown by a straight line in the figure. There are three types of springs:
[0016]
The K_1 type spring connects each node and one point of its four neighboring nodes. The K_2 type spring connects each node and one point obtained by removing the four neighboring nodes from the eight neighboring nodes. The K_3 type spring connects each node to the adjacent node with one space between them. However, it is introduced only in the same part (body, right sleeve, left sleeve). For example, between N1-N3, N1-N7, N3-N19 nodes. The K_1 and K_2 types express the elasticity of clothes, and the K_3 type expresses the ease of bending. Here, when a quadrilateral element composed of four nodes is considered, a node adjacent on the side is referred to as 4 neighborhoods, and a node adjacent on the diagonal is referred to as 8 neighborhoods.
[0017]
This model is used to predict what kind of state can exist when this trainer is held at one point by simulating as follows. Here, a left-handed three-dimensional coordinate system in which the Y axis coincides with the direction of gravity is defined as a reference coordinate system. First, the target model is spread on a floor surface horizontal to the XZ plane as shown in FIG. 9a. Gravity works on all nodes. One node assumed to be a holding point is lifted in the vertical direction (Y-axis negative direction), and its deformation is simulated (FIG. 9b).
[0018]
FIG. 9c shows the resulting state held in the air. However, the holding at one point changes greatly depending on the direction on the floor surface where the rotating element around the Y axis is first placed with respect to the holding point. In the actual holding assumed by the present inventor, this is not a single point, but is held by a grip that holds the object with two flat parallel planes having a small area. It is considered that the orientation of the grip surface is determined. Therefore, assuming that the image is always taken from the direction perpendicular to the grip surface, the predicted appearance shape is calculated under the condition that the normal direction of the surface near the holding point of the model predicted shape matches the line of sight. To do. FIG. 9d shows this result.
[0019]
FIG. 10 shows the results of calculating all predicted appearance shapes from State 1 to State 20 held at each node point. In this calculation, the spring constant of the garment model is determined manually so that the resulting garment tension is close to the actual state, and specifically the spring coefficients of K_1, K_2, and K_3 are set to 20000, 2000, and 200, respectively. Set to.
In the classification described in paragraph [0010], the state correspondence when the prediction method B is used is as follows.
Situation A: State 1,3
Situation B: State 2,7,9,13,15,17,19
Situation C: State 4, 5, 6, 8, 10, 11, 12, 14, 16, 18, 20,
[0020]
If prediction method B can be used to predict the observation clothing region in a more realistic state, the overlap between the prediction region and the observation region can be directly used to determine the accuracy of prediction, and the state estimation process is also described below. The procedure is as follows.
The following two features are extracted from the predicted appearance shape of each state in advance.
I) Position of the lowest point of the predicted appearance shape: L_m (L_x, L_y)
Two-dimensional coordinate value relative to the position of the holding point.
II) Number of sleeves that are not directly held but are hanging: N_m
Number of sleeves hanging near the lowest point. In State 2, 5, 8, and 11, N = 2.
[0021]
When the observation image is input, the position of the grip point on the observation image is detected based on the grasped three-dimensional position of the grip of the holding manipulator. The clothing region is extracted as a continuous region with the region feature directly below it as a hint using this holding point as a hint. Let L_o be the lowest coordinate in this area. The average value in the horizontal direction of the image in the region near the upper side of L_o is calculated. If this value is equal to or greater than one sleeve width, N_o = 2, and other values are set to N_o = 1.
[0022]
First, select only models that have the same observation state as the number of hanging sleeves and the closest positional relationship of the lowest point. Specifically, a state satisfying the following condition is selected from State m_i (m_i = 1-20).
N_ {m_i} = N_o
| Ly_o-Ly_ {m_i} | <C_1
| L_o-L_ {m_i} | <C_2
C_1 and C_2 vary depending on how dense the nodes are set, that is, according to the distance between the nodes. In this experiment, C_1 and C_2 were set to 60% and 100% of the longest distance between nodes, respectively.
[0023]
The appearance model that satisfies the condition superimposes these shape models in the image based on the position of the holding point.
Actually, the shape of the body part greatly fluctuates due to a fold that is difficult to predict, whereas the hanging sleeve part has a small variation because of its narrow width. Considering this, partial modification of the appearance model is performed as follows so that this portion more closely matches the actual observation position. However, the hanging sleeve portion refers to a portion of the sleeve that is not directly held. Further, the outer contour refers to a shoulder-side contour (not N1-N13-N15 as an example) (not an axilla side).
[0024]
1) As shown by the thick gray line in FIG. 11a, the outline of the hanging sleeve of the viewing model is moved vertically to a position equal to L_o of the observation image.
[0025]
2) Search for the observation edge closest to the horizontal direction (boundary pixel between the background and the clothing area) from the two node points excluding the lowest point on the outer contour. The horizontal signed distances from the original position to the point as indicated by the arrows in FIG. If the inclination of the line does not change greatly within the error range where the amount of movement of the two points is predicted, this correction is incorporated. Specifically, if all the following three conditions are satisfied, all six node points belonging to the sleeve are moved (da + db) / 2 as shown in FIG. 11b.
| da-db | <C3
| da | <C2
| db | <C2
Here, C3 is determined according to how many times the difference in the inclination of the sleeve between the prediction and the observation state is allowed.
[0026]
3) In this state, calculate the area overlap ratio, R. The area overlap ratio, R, is the sum of (overlapping area) / (observation area area) and (overlapping area) / (predicted shape area) (0 to 2.0). If the value of the appearance model that maximizes this value exceeds the threshold value C_4, this is set as the estimated state. If this condition is not satisfied, it is determined that there is no candidate.
[0027]
If the selected state is class A or B, the edge point of the observation clothing area closest to node 1 or 3 corresponding to the shoulder in the final state after partial correction is set as the next holding point. If a class C state is selected, the lowest point of the area is set as the next holding point. If no state is elected, declare it and indicate the lowest point. As described above, this instruction always changes to State 15 or 19 by the next holding.
[0028]
【Example】
First, the results of Examples using the prediction method A are shown in FIGS. As a result of carrying out using 12 images captured by grasping the trainer at positions corresponding to 12 nodes excluding node 5, a correct state was selected as the first candidate in 6 of these images. FIG. 5 shows an example when State 10 is correctly selected.
FIG. 5a is an input image, FIG. 5b is a display of the selected predicted shape (State 10) superimposed on the result of binarizing the original image, and FIG. The deformed predicted shape is displayed superimposed on the original image. The cross in FIG. 5c indicates the position to be gripped next that is automatically calculated.
[0029]
FIG. 6 shows an example of calculated indication points (crosses in the figure) in situations A, B, and C, respectively. From left to right, (a), (b) is State 1, (c), (d) is State 2, (e), (f) is State 9 and (a), (c) , (e) are input images, and (b), (d), (f) show final results. In State 9, as described in 2.2, the lowest point is selected as the next indication point.
[0030]
FIG. 7 shows an example of the verification process. FIG. 7a shows a situation after the pointing point calculated in FIG. 5c is actually gripped. The shapes predicted by this grip are displayed in an overlapping manner, but they are in good agreement except for some deviation.
On the other hand, FIG. 7c shows a case where the indication point of FIG. 7b selected as the second candidate is grasped. The predicted shape at this time is displayed in an overlapped manner, but the area of the sleeve portion hanging below is greatly deviated from the predicted shape, and it can be known that the initial estimation was wrong.
[0031]
Using Prediction Method B increases the estimated accuracy rate. An example of the result of processing using 12 images similar to the above is shown in FIG. In this embodiment, the threshold values C_1 and C_2 are automatically determined from the predetermined height, width, and sleeve size, and C_3 is set so that the deviation between prediction and observation of the sleeve region is allowed up to 15 degrees, C_4 was set to 1.4 to make it correct with an overlap of about 70%.
[0032]
FIG. 12 shows an example of the result. The position of the holding point on the image was manually given, and the largest bright connected region obtained with a fixed threshold value under that point was automatically extracted as the observation clothing region. 12a and 12b show the original image and the extracted clothing region, respectively. FIGS. 12c, d, and e show candidate states in the order of the first, second, and third largest area overlap ratios before correcting the sleeve portion. FIG. 12 f shows the appearance model state after the first state is partially corrected. In this state, R was 1.64, which was the largest among all the appearance models, so it was determined as an estimation result, and the position indicated by a cross in the figure was calculated as the next point to be retained.
[0033]
FIG. 13 shows another example. FIG. 13A is an original image, and FIGS. 13B, 13C, and 13D show candidate states in order of the first, second, and third largest area overlap ratios before correcting the sleeve portion. In this example, the first and second correction processes were partial corrections, which were not successful, and the third correction state shown in FIG.
In total, the correct state was selected in 9 out of 12 cases and the correct indication point was indicated. FIG. 14 shows each successful example of situation A, situation B, and situation C. For one of the remaining three cases, no state was selected, resulting in “state undecided”. The correct state was selected as the first candidate, but no final decision was made due to an unexpected sleeve break. In this case, it is determined as indeterminate, the lowest point is designated as the next holding point, and the situation class B state is aimed.
[0034]
In the remaining 2 out of 12 cases, the wrong state was estimated. An example of this is shown in FIG. State 4 is correct, but State 7 is selected. The main factor is that there is a big fold of the body part that is not considered in the current simulation.
Furthermore, an experiment was performed in the same manner using 16 images that were captured in a state most different from the predicted state while holding the midpoint of the position corresponding to the node. Again, the correct state was selected in 12 cases and the correct indication point was shown. In one example, “state indeterminate” was determined, and in the remaining three cases, an incorrect state was estimated. Thus, although the target is represented by a fairly rough representative point, even when the node position used for predictive appearance calculation and the actual holding point are separated from each other, a phenomenon in which the performance is not greatly deteriorated was not seen.
[0035]
A trial example of the verification process is shown in FIG. FIG. 16a shows a simulation process in the case of holding the indicated point shown in FIG. 12f. FIG. 16 b shows the simulation result superimposed on the input image in a state where the point is actually held. The predicted shapes are quite overlapping. On the other hand, FIG. 16d is obtained by superimposing the observed image and the predicted image when the wrong state is estimated as shown in FIG. The percentage of non-overlapping parts is very large, and it is possible to recognize that the estimation is incorrect. Further, unlike the simple prediction method described above, if the prediction method B is used, it is also possible to use a state in the middle of deformation as shown in the center of FIG. 16a for verification.
[0036]
【The invention's effect】
Conventionally, it has been impossible to easily estimate the state of a flexible object that deforms greatly, such as clothing. This time, by just giving rough dimensions such as the type of target clothing (trainer, etc.), body width, length, and sleeve length, the possible states can be roughly calculated and used to estimate the state. This became possible. This technology greatly contributes to the realization of robots autonomously handling flexible objects such as clothing.
[Brief description of the drawings]
FIG. 1 is a flow chart when the state of a trainer is estimated.
FIG. 2 is a diagram showing an example of a simple clothing model.
FIG. 3 is a diagram showing a planar model composed of nodes that can be connected to a maximum of four without assuming that the front body and the back body are separated from each other.
FIG. 4 is a diagram showing a predicted shape when the trainer is gripped at one point close to the edge.
5 is a diagram showing Example 1. FIG.
FIG. 6 is a diagram showing an instruction example for “gripping both shoulders” after state estimation;
FIG. 7 is a diagram showing a verification processing embodiment.
FIG. 8 is a diagram showing a target garment represented by a more complicated model.
FIG. 9 is a diagram showing a target model.
FIG. 10 is a diagram illustrating a result of calculating a predicted appearance shape held at each node point.
FIG. 11 is a diagram for explaining partial correction of the appearance model.
FIG. 12 is a diagram showing an example of a result.
FIG. 13 is a diagram showing another example of the result.
FIG. 14 is a diagram showing each successful example of situation A, situation B, and situation C.
FIG. 15 is a diagram illustrating an example in which an incorrect state is estimated.
FIG. 16 is a diagram illustrating a trial example of a verification process.

Claims

In the clothing state estimation method for estimating the state of the target clothing from the input image,
Represents clothing rough basic shape, and to create a planar model consisting of a plurality of nodes connectable,
Using this model, the possible shape when held by a node of a point of said plurality of nodes, by changing the gripping node predicted for all of the plurality of nodes,
This predictive shape is calculated by calculating a feature amount indicating the degree of overlap when the observation area of the target clothing obtained from the input image obtained by capturing the target clothing held at one point is overlapped with respect to the gripping point. A clothing state estimation method for estimating the state of the target clothing by calculating whether the feature amount of the item is the largest .

The clothing state estimation method according to claim 1, wherein the shape prediction is performed based on a given clothing type and a rough size.

The clothing state estimation method according to claim 1, wherein the analysis of the predicted shape classifies the predicted shape according to a required task, and calculates a correspondence rule after the state estimation.

The prediction of the shape is performed by performing a simulation of grasping and lifting one of the elastic models of the target using the spring on the floor, calculating the deformation, and then near the holding point of the predicted shape of the model. The clothing state estimation method according to claim 1, wherein the clothing state estimation method is performed by calculating a predicted appearance shape under a condition in which a normal direction of the surface of the face matches a line of sight.

The state is estimated by superimposing the predicted appearance shape on the model and partially deforming based on the input image to determine whether the state can be closer to the observation state. The clothing state estimation method according to claim 4 , wherein the clothing state estimation method is performed by setting an object close to an input image in a correct state.

In a clothing state estimation program for estimating the state of a target clothing from an input image,
Represents clothing rough basic shape, and a step of creating a planar model consisting of linkable plurality of nodes,
Using this model, the procedure for predicting the possible shape when held by a node of a point of said plurality of nodes, by changing the gripping nodes for all of the plurality of nodes,
A procedure for processing a feature amount indicating a degree of overlap when the observation area of the target clothing obtained from the input image obtained by capturing the target clothing grasped at one point with the predicted shape is superimposed on the gripping point ;
A procedure for estimating the state of the target clothing by calculating which feature of the predicted shape is the largest ,
Clothing state estimation program for causing a computer to execute.