JP2017204085A

JP2017204085A - Image recognition system

Info

Publication number: JP2017204085A
Application number: JP2016094791A
Authority: JP
Inventors: 和都村瀬; Kazuto Murase
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2016-05-10
Filing date: 2016-05-10
Publication date: 2017-11-16
Anticipated expiration: 2036-05-10
Also published as: JP6759692B2

Abstract

PROBLEM TO BE SOLVED: To provide a technology for improving a success rate of labeling.SOLUTION: An image recognition system 1 is configured to perform semantic segmentation for an external environment of a robot which is necessary for a robot capable of executing a task (work) consisting of a plurality of phases (work processes) to execute the task. The image recognition system 1 comprises: a phase detection part 4 (work process switching detection means) for detecting switching of the phase being executed by the robot; a label selection part 5 (recognition candidate acquisition means) for, when the switching of the phase is detected by the phase detection part 4, acquiring a plurality of labels (recognition candidates) corresponding to the switched phase; and an image recognition part 7 (image recognition means) for executing the semantic segmentation to an image obtained by imaging the external environment of the robot by using an identification model (identifier) corresponding to the plurality of recognition candidates.SELECTED DRAWING: Figure 2

Description

本発明は、画像認識システムに関する。 The present invention relates to an image recognition system.

非特許文献１は、抽象的なラベルを上層、具体的なラベルを下層とする階層的なラベル木を作成し、尤度が一定以上となる最も下層のラベルを割り当てる、セマンティックラベリング技術を開示している。 Non-Patent Document 1 discloses a semantic labeling technique that creates a hierarchical label tree with an abstract label as an upper layer and a specific label as a lower layer, and assigns the lowest layer label with a certain likelihood or more. ing.

Chenxia Wu, Ian Lenz and Ashutosh Saxena、『Hierarchical Semantic Labeling for Task-RelevantRGB-D Perception』、第１頁、[online]、２０１４年６月３日、［２０１４年４月２６日検索］、インターネット（URL:http://www.cs.cornell.edu/~asaxena/papers/wulenzsaxena_hierarchicallabel-rgbd_rss2014.pdf）Chenxia Wu, Ian Lenz and Ashutosh Saxena, “Hierarchical Semantic Labeling for Task-RelevantRGB-D Perception”, page 1, [online], June 3, 2014, [Search April 26, 2014], Internet (URL : http: //www.cs.cornell.edu/~asaxena/papers/wulenzsaxena_hierarchicallabel-rgbd_rss2014.pdf)

しかしながら、ラベル木や枝葉の数が多いとラベリングに失敗する可能性が高くなる。 However, there is a high possibility that labeling will fail if the number of label trees or branches and leaves is large.

本発明の目的は、ラベリングの成功率を向上させる技術を提供することにある。 An object of the present invention is to provide a technique for improving the success rate of labeling.

本願発明の観点によれば、複数の作業工程から成る作業を実行可能なロボットが前記作業を実行するのに必要となる、前記ロボットの外部環境に対するセマンティックセグメンテーションを行なう画像認識システムであって、前記ロボットが実行している前記作業工程の切り替わりを検出する作業工程切り替わり検出手段と、前記作業工程切り替わり検出手段によって前記作業工程の切り替わりが検出されたら、切り替わり後の作業工程、又は、切り替わり後の前記ロボットのプロファイルの何れかに対応する、少なくとも１つの認識候補を取得する認識候補取得手段と、前記少なくとも１つの認識候補に対応する識別器を用いて、前記ロボットの前記外部環境を撮像して得られる画像に対してセマンティックセグメンテーションを実行する画像認識手段と、を備えた画像認識システムが提供される。 According to an aspect of the present invention, there is provided an image recognition system that performs semantic segmentation on an external environment of the robot, which is necessary for a robot capable of performing a work including a plurality of work steps to perform the work, When the switching of the work process is detected by the work process switching detection means for detecting the switching of the work process being executed by the robot and the work process switching detection means, the work process after the switching or the switching after the switching Obtained by imaging the external environment of the robot using recognition candidate acquisition means for acquiring at least one recognition candidate corresponding to any one of the profiles of the robot and a classifier corresponding to the at least one recognition candidate. Perform semantic segmentation on the resulting image An image recognition means, an image recognition system with a provided.

本発明によれば、各作業工程毎に異なる認識候補が用いられるので、ラベリングの成功率が向上する。 According to the present invention, since different recognition candidates are used for each work process, the success rate of labeling is improved.

画像認識システムの機能ブロック図である。It is a functional block diagram of an image recognition system. 画像認識システムの動作のフローチャートである。It is a flowchart of operation | movement of an image recognition system. ロボットのサービス環境の平面図である。It is a top view of the service environment of a robot.

（第１実施形態）
以下、図１及び図２を参照して、第１実施形態を説明する。 (First embodiment)
Hereinafter, the first embodiment will be described with reference to FIGS. 1 and 2.

図１には、画像認識システム１の機能ブロック図を示している。画像認識システム１は、複数のフェイズ（作業工程）から成るタスク（作業）を実行可能なロボットがタスクを実行するのに必要となる、ロボットの外部環境に対するセマンティックセグメンテーション（画像認識、ラベリング）を行なうシステムである。 FIG. 1 shows a functional block diagram of the image recognition system 1. The image recognition system 1 performs semantic segmentation (image recognition and labeling) on the external environment of a robot, which is necessary for a robot capable of executing a task (work) composed of a plurality of phases (work processes) to execute the task. System.

画像認識システム１は、ラベルセットDB２と、ラベル決定プロファイルDB３と、フェイズ検出部４（作業工程切り替わり検出手段）と、ラベル選択部５（認識候補取得手段）と、識別モデル切替部６と、画像認識部７（画像認識手段）と、を備えている。 The image recognition system 1 includes a label set DB 2, a label determination profile DB 3, a phase detection unit 4 (work process switching detection unit), a label selection unit 5 (recognition candidate acquisition unit), an identification model switching unit 6, an image A recognition unit 7 (image recognition means).

以下、説明の便宜上、ロボットのタスクは、移動フェイズ、ドア開けフェイズ、物体把持フェイズを含むものとする。移動フェイズは、ロボットのサービス環境内に設置された冷蔵庫の正面にロボットが移動するフェイズである。ドア開けフェイズは、ロボットが冷蔵庫のドアの取っ手を把持してドアを開けるフェイズである。物体把持フェイズは、冷蔵庫内に収納されている物体をロボットが把持するフェイズである。 Hereinafter, for convenience of explanation, it is assumed that the tasks of the robot include a movement phase, a door opening phase, and an object gripping phase. The movement phase is a phase in which the robot moves to the front of a refrigerator installed in the service environment of the robot. The door opening phase is a phase in which the robot grasps the handle of the refrigerator door and opens the door. The object gripping phase is a phase in which the robot grips an object stored in the refrigerator.

ラベルセットDB２は、各フェイズ毎に、複数のラベル（認識候補）によって構成されるラベルセット（認識候補セット）が関連付けて記憶されている。例えば、移動フェイズには、｛床、冷蔵庫、棚、障害物｝というラベルセットが関連付けられている。ドア開けフェイズには、｛ドア、取っ手、床、シール｝というラベルセットが関連付けられている。物体把持フェイズには、｛テーブル、壁、缶、ペットボトル、その他の物体｝というラベルセットが関連付けられている。 The label set DB 2 stores a label set (recognition candidate set) composed of a plurality of labels (recognition candidates) in association with each phase. For example, a label set {floor, refrigerator, shelf, obstacle} is associated with the movement phase. A label set {door, handle, floor, seal} is associated with the door opening phase. A label set {table, wall, can, plastic bottle, other object} is associated with the object gripping phase.

フェイズ検出部４は、ロボットが実行しているフェイズの切り替わりを検出する。具体的には、フェイズ検出部４は、ロボットが実行しているフェイズが移動フェイズからドア開けフェイズへ切り替わること、又は、ロボットが実行しているフェイズがドア開けフェイズから物体把持フェイズへ切り替わること、を検出する。フェイズ検出部４は、例えばロボットの制御部から、現在実行中であるフェイズの完了報告を受けることでフェイズの切り替わりを検出する。 The phase detection unit 4 detects the change of the phase being executed by the robot. Specifically, the phase detection unit 4 is configured such that the phase being executed by the robot is switched from the movement phase to the door opening phase, or the phase being executed by the robot is switched from the door opening phase to the object gripping phase. Is detected. The phase detection unit 4 detects a phase change by receiving a report of the completion of the phase currently being executed, for example, from the control unit of the robot.

ラベル選択部５は、フェイズ検出部４がフェイズの切り替わりを検出したら、切り替わり後のフェイズに対応するラベルセットをラベルセットDB２を参照して取得する。 When the phase detection unit 4 detects a phase change, the label selection unit 5 acquires a label set corresponding to the phase after the change with reference to the label set DB2.

識別モデル切替部６は、ラベル選択部５が取得したラベルセットに対応した識別モデル（識別器）を取得する。ここで、識別モデルとは、RGBやdepthなどの入力情報をもとに、ラベルを推定して出力する識別器のことである。識別モデル切替部６は、予め生成されている複数の識別モデルから選択して取得してもよいし、都度、識別モデルを生成してもよいし、フェイズの切り替わりの直前に使用していた識別モデルを転移学習させることで新たな識別モデルを生成してもよい。ここで、転移学習とは、出力層の出力数を修正したり、訓練データのGroundTruthをラベルセットに合わせる学習を行なうことを意味する。 The identification model switching unit 6 acquires an identification model (identifier) corresponding to the label set acquired by the label selection unit 5. Here, the discrimination model is a discriminator that estimates and outputs a label based on input information such as RGB and depth. The identification model switching unit 6 may select and acquire from a plurality of identification models generated in advance, or may generate an identification model each time, or the identification used immediately before the phase switching. A new identification model may be generated by performing transfer learning on the model. Here, transfer learning means that the number of outputs in the output layer is corrected, or learning that matches the GroundTruth of the training data with the label set is performed.

具体的には、ロボットが実行しているフェイズが移動フェイズからドア開けフェイズへ切り替わったら、識別モデル切替部６は、｛ドア、取っ手、床、シール｝というラベルセットに対応した識別モデルを取得する。同様に、ロボットが実行しているフェイズがドア開けフェイズから物体把持フェイズへ切り替わったら、識別モデル切替部６は、｛テーブル、壁、缶、ペットボトル、その他の物体｝というラベルセットに対応した識別モデルを取得する。 Specifically, when the phase executed by the robot is switched from the movement phase to the door opening phase, the identification model switching unit 6 acquires an identification model corresponding to the label set {door, handle, floor, seal}. . Similarly, when the phase executed by the robot is switched from the door opening phase to the object gripping phase, the identification model switching unit 6 performs identification corresponding to the label set {table, wall, can, plastic bottle, other object}. Get the model.

そして、画像認識部７は、識別モデル切替部６が取得した識別モデルを用いて、ロボットの外部環境を撮像して得られる画像に対してセマンティックセグメンテーションを実行する。 Then, the image recognition unit 7 uses the identification model acquired by the identification model switching unit 6 to execute semantic segmentation on an image obtained by imaging the external environment of the robot.

次に、図２を参照して、画像認識システム１の動作のフローを説明する。 Next, the operation flow of the image recognition system 1 will be described with reference to FIG.

先ず、ロボットがタスクを開始すると、フェイズ検出部４は、ロボットが現在実行しているフェイズの種別を判定する（S100）。次に、フェイズ検出部４は、前回判定されたフェイズと、今回判定したフェイズが異なっているか判定する（S110）。そして、前回判定したフェイズと、今回判定したフェイズが同じであるとフェイズ検出部４が判定した場合は（S110:NO）、画像認識部７が現在使用している識別モデルをそのまま用いて、ロボットの外部環境を撮像して得られる画像に対してセマンティックセグメンテーションを実行し（S140）、後処理を実行し（S150）、処理をS100に戻す。 First, when the robot starts a task, the phase detection unit 4 determines the type of phase currently being executed by the robot (S100). Next, the phase detection unit 4 determines whether the previously determined phase is different from the currently determined phase (S110). When the phase detection unit 4 determines that the phase determined last time and the phase determined this time are the same (S110: NO), the robot uses the identification model currently used by the image recognition unit 7 as it is. Semantic segmentation is performed on an image obtained by imaging the external environment (S140), post-processing is performed (S150), and the process returns to S100.

これに対し、S110で、前回判定したフェイズと今回判定したフェイズが異なっているとフェイズ検出部４が判定した場合（S110:YES）、ラベル選択部５は、今回判定したフェイズに対応するラベルセットをラベルセットDB２を参照して取得する（S120）。 On the other hand, in S110, when the phase detection unit 4 determines that the previously determined phase is different from the currently determined phase (S110: YES), the label selection unit 5 sets the label set corresponding to the currently determined phase. Is obtained with reference to the label set DB2 (S120).

次に、識別モデル切替部６は、ラベル選択部５が取得したラベルセットに対応する識別モデルを取得する（S130）。 Next, the identification model switching unit 6 acquires an identification model corresponding to the label set acquired by the label selection unit 5 (S130).

そして、画像認識部７は、識別モデル切替部６が取得した識別モデルを用いて、ロボットの外部環境を撮像して得られる画像に対してセマンティックセグメンテーションを実行する（S140）。 Then, the image recognition unit 7 performs semantic segmentation on an image obtained by imaging the external environment of the robot using the identification model acquired by the identification model switching unit 6 (S140).

以上に、上記第１実施形態を説明したが、上記第１実施形態は、以下の特徴を有する。 Although the first embodiment has been described above, the first embodiment has the following features.

画像認識システム１は、複数のフェイズ（作業工程）から成るタスク（作業）を実行可能なロボットがタスクを実行するのに必要となる、ロボットの外部環境に対するセマンティックセグメンテーションを行なうシステムである。画像認識システム１は、ロボットが実行しているフェイズの切り替わりを検出するフェイズ検出部４（作業工程切り替わり検出手段）と、フェイズ検出部４によってフェイズの切り替わりが検出されたら、切り替わり後のフェイズに対応する、複数のラベル（認識候補）を取得するラベル選択部５（認識候補取得手段）と、複数の認識候補に対応する識別モデル（識別器）を用いて、ロボットの外部環境を撮像して得られる画像に対してセマンティックセグメンテーションを実行する画像認識部７（画像認識手段）と、を備える。以上の構成によれば、各作業工程毎に異なる認識候補が用いられるので、ラベリングの成功率が向上する。 The image recognition system 1 is a system that performs semantic segmentation on the external environment of a robot, which is necessary for a robot capable of executing a task (work) composed of a plurality of phases (work processes) to execute the task. The image recognition system 1 responds to the phase after switching when the phase detection unit 4 (work process switching detection means) that detects the phase switching performed by the robot and the phase detection unit 4 detects the phase switching. Using a label selection unit 5 (recognition candidate acquisition means) that acquires a plurality of labels (recognition candidates) and an identification model (discriminator) corresponding to the plurality of recognition candidates, the external environment of the robot is imaged. An image recognizing unit 7 (image recognizing means) that performs semantic segmentation on the obtained image. According to the above configuration, since different recognition candidates are used for each work process, the success rate of labeling is improved.

なお、ラベルセットは、複数のラベルから構成されるとした。ラベルセットは、少なくとも１つのラベルで構成されていればよい。 The label set is composed of a plurality of labels. The label set only needs to be composed of at least one label.

また、フェイズ検出部４は、図３に示すロボットのサービス環境内における位置に基づいて、現在のフェイズを判定するようにしてもよい。例えば、ロボットが冷蔵庫の正面にいない場合は、ロボットが実行しているフェイズは移動フェイズであるとフェイズ検出部４は判定することができる。 The phase detector 4 may determine the current phase based on the position of the robot in the service environment shown in FIG. For example, when the robot is not in front of the refrigerator, the phase detection unit 4 can determine that the phase being executed by the robot is the movement phase.

（第２実施形態）
次に、第２実施形態を説明する。以下、本実施形態が上記第１実施形態と異なる点を中心に説明し、重複する説明は省略する。 (Second Embodiment)
Next, a second embodiment will be described. Hereinafter, the present embodiment will be described with a focus on differences from the first embodiment, and overlapping description will be omitted.

上記第１実施形態において、ラベル選択部５は、フェイズ検出部４によってフェイズの切り替わりが検出されたら、切り替わり後のフェイズに対応するラベルセットを取得するようにしていた。 In the first embodiment, the label selection unit 5 acquires the label set corresponding to the phase after the switching when the phase switching is detected by the phase detection unit 4.

しかし、これに代えて、本実施形態では、ラベル選択部５は、切り替わり後のロボットのプロファイルに対応するラベルセットを取得するようにしてもよい。換言すれば、ラベル選択部５は、ロボットのセンサ等で取得できる情報から柔軟に認識候補であるラベルセットを取得するようにしてもよい。ここで、プロファイルとは、フェイズを除くロボットの状態・属性を意味する。ラベル決定プロファイルDB３は、ロボットのプロファイルをラベルセットと関連付けて記憶している。 However, instead of this, in this embodiment, the label selection unit 5 may acquire a label set corresponding to the profile of the robot after switching. In other words, the label selection unit 5 may flexibly acquire a label set that is a recognition candidate from information that can be acquired by a robot sensor or the like. Here, the profile means the state / attribute of the robot excluding the phase. The label determination profile DB 3 stores the robot profile in association with the label set.

例えば、ラベル選択部５は、ロボットがハンドの動作を制御している場合は、ハンドの動作がプロファイルに該当し、ハンドの動作に対応したラベルセットを取得する。ハンドの動作に対応したラベルセットは、例えば、｛テーブル、壁、缶、ペットボトル、その他の物体｝である。同様に、ラベル選択部５は、ロボットが台車の動作を制御している場合、台車の動作がプロファイルに該当し、台車の動作に対応したラベルセットを取得する。台車の動作に対応したラベルセットは、例えば、｛床、冷蔵庫、棚、障害物｝である。 For example, when the robot controls the movement of the hand, the label selection unit 5 obtains a label set corresponding to the movement of the hand corresponding to the movement of the hand. The label set corresponding to the movement of the hand is, for example, {table, wall, can, plastic bottle, other object}. Similarly, when the robot controls the operation of the carriage, the label selection unit 5 obtains a label set corresponding to the movement of the carriage corresponding to the movement of the carriage. The label set corresponding to the operation of the carriage is, for example, {floor, refrigerator, shelf, obstacle}.

また、ラベル選択部５は、ロボットに備え付けられたカメラの視野が広い場合は、粒度が粗い、即ち、より抽象的なラベルセットを取得し、狭い場合は、粒度が細かい、即ち、より具体的なラベルセットを取得する。抽象的なラベルセットに｛家具｝が含まれるとしたら、具体的なラベルセットには｛冷蔵庫｝が含まれることになる。また、図３に示すサービス環境内においてロボットが部屋の左端から冷蔵庫あたりを見ている場合と、中央から冷蔵庫あたりを見ている場合を考えると、前者の場合は、ロボットは冷蔵庫から遠い位置に位置しており、部屋全体を見渡すような視点となるため、ラベル選択部５は、｛床、構造物、家具、天井｝といった抽象的なラベルセットを取得する。なぜなら、部屋全体を見渡すような視点であるときにラベル選択部５が例えば｛冷蔵庫、洗濯機、電子レンジ｝のようなラベルセットを取得すると、ラベリングの成功率が著しく低下するからである。これに対し、後者の場合は、ロボットは冷蔵庫から近い位置に位置しており、ラベル選択部５は、｛冷蔵庫、洗濯機、電子レンジ｝といった具体的なラベルセットを取得する。なお、ラベル選択部５は、ロボットに備え付けられたカメラの視野の広狭を、例えばDepthセンサの出力値の平均値をモニタリングすることで判断することができる。即ち、Depthセンサの出力値の平均値が相対的に大きな値となった場合は、ロボットは部屋全体を見渡していることになり、相対的に小さな値となった場合は、ロボットは部屋の一部を見ていることになる。 Further, the label selection unit 5 obtains a coarser particle size, that is, a more abstract label set when the field of view of the camera provided in the robot is wide, and if the camera is narrow, the particle size is fine, that is, more specific. A simple label set. If {Furniture} is included in the abstract label set, {Refrigerator} is included in the specific label set. In addition, in the service environment shown in FIG. 3, considering the case where the robot is looking around the refrigerator from the left end of the room and the case where the robot is looking around the refrigerator from the center, in the former case, the robot is located far from the refrigerator. The label selection unit 5 obtains an abstract label set such as {floor, structure, furniture, ceiling} because it is located and has a viewpoint overlooking the entire room. This is because if the label selection unit 5 obtains a label set such as {refrigerator, washing machine, microwave oven} when the viewpoint is to look around the entire room, the labeling success rate is significantly reduced. On the other hand, in the latter case, the robot is located near the refrigerator, and the label selection unit 5 acquires a specific label set such as {refrigerator, washing machine, microwave oven}. Note that the label selection unit 5 can determine the width of the field of view of the camera provided in the robot, for example, by monitoring the average value of the output values of the Depth sensor. In other words, when the average value of the output values of the Depth sensor is a relatively large value, the robot looks over the entire room, and when the average value is relatively small, the robot is in the room. I will be watching the department.

また、他のプロファイルとしては、単に、ロボットの制御に用いられる変数やフラグ情報であってもよい。即ち、ラベル選択部５は、ロボットの制御に用いられる変数に対応するラベルセットを取得したり、ロボットの制御に用いられるフラグ情報に対応するラベルセットを取得するようにしてもよい。 Further, the other profile may simply be a variable or flag information used for controlling the robot. That is, the label selection unit 5 may acquire a label set corresponding to a variable used for robot control or a label set corresponding to flag information used for robot control.

１画像認識システム
２ラベルセットDB
３ラベル決定プロファイルDB
４フェイズ検出部
５ラベル選択部
６識別モデル切替部
７画像認識部 1 Image recognition system 2 Label set DB
3 Label determination profile DB
4 Phase detection unit 5 Label selection unit 6 Identification model switching unit 7 Image recognition unit

Claims

An image recognition system for performing semantic segmentation on the external environment of the robot, which is necessary for a robot capable of performing a work consisting of a plurality of work steps to perform the work,
A work process switching detecting means for detecting switching of the work process being executed by the robot;
When the switching of the work process is detected by the work process switching detecting means, the recognition candidate acquisition for acquiring at least one recognition candidate corresponding to either the work process after switching or the profile of the robot after switching. Means,
Image recognition means for performing semantic segmentation on an image obtained by imaging the external environment of the robot using a discriminator corresponding to the at least one recognition candidate;
An image recognition system.