JP6930389B2

JP6930389B2 - Image collectors, programs, and methods

Info

Publication number: JP6930389B2
Application number: JP2017220734A
Authority: JP
Inventors: 峻司細野; 周平田良島; 隆行黒住; 杵渕　哲也; 哲也杵渕
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-11-16
Filing date: 2017-11-16
Publication date: 2021-09-01
Anticipated expiration: 2037-11-16
Also published as: JP2019091339A

Description

本発明は、画像収集装置、プログラム、及び方法に係り、特に、物体識別に用いられる訓練画像を効率的に収集するための画像収集装置、プログラム、及び方法に関する。 The present invention relates to image acquisition devices, programs, and methods, and more particularly to image collection devices, programs, and methods for efficiently collecting training images used for object identification.

画像に写り込む物体が、物体の識別のために事前に用意された画像、すなわち「訓練画像」に含まれる物体に該当するのか、或いは訓練画像に含まれる何れの物体にも該当しないのかを推定することで、画像に写り込む物体の種類を識別する物体識別技術は、画像をインタフェースとして物体の情報検索及び情報管理を実現する上で重要な要素技術であり、幅広い産業分野で用いられている。 Estimate whether the object reflected in the image corresponds to the object prepared in advance for identifying the object, that is, the object included in the "training image", or does not correspond to any object included in the training image. The object identification technology for identifying the type of object reflected in an image is an important elemental technology for realizing information search and information management of an object using an image as an interface, and is used in a wide range of industrial fields. ..

例えば、ユーザがスマートフォン等で撮影した食品の商品パッケージの画像から商品を特定し、インターネット上のデータベースからアレルギー情報等の商品詳細情報を取得することができれば、商品パッケージに記載されている説明文の言語を母国語とないユーザであっても、容易にデータベースからユーザの母国語で記載された商品詳細情報を得ることができる。 For example, if the user can identify the product from the image of the product package of the food taken with a smartphone or the like and obtain the detailed product information such as allergy information from the database on the Internet, the description described in the product package Even a user whose native language is not the language can easily obtain detailed product information described in the user's native language from the database.

そのため、訓練画像を予め学習し、物体の局所特徴量を用いて物体を識別する物体識別技術（例えば特許文献１参照）や、類似する物体であっても物体間の差異を示すキーポイント（「弁別的キーポイント」と呼ばれる）を用いて、精度よく物体を識別する物体識別技術（例えば非特許文献１参照）が提案されている。 Therefore, an object identification technique (see, for example, Patent Document 1) in which a training image is learned in advance and an object is identified by using a local feature amount of the object, and a key point indicating a difference between objects even if they are similar objects ("" An object identification technique (see, for example, Non-Patent Document 1) for accurately identifying an object by using (called a discriminative key point) has been proposed.

特開２０１５−２０１１２３号公報Japanese Unexamined Patent Publication No. 2015-201123

渡邉之人、入江豪、黒住隆行、杵渕哲也著、「弁別的キーポイントの選択に基づく類似物体群からの特定物体認識」、映像情報メディア学会誌、Vol.71、No.2、pp.J93-J100、2017Written by Noto Watanabe, Go Irie, Takayuki Kurozumi, Tetsuya Fuchibuchi, "Recognition of Specific Objects from Similar Objects Based on Selection of Discriminating Key Points", Journal of the Institute of Image Information and Television Engineers, Vol.71, No.2, pp. J93-J100, 2017

一方、特許文献１及び非特許文献１に示した物体識別技術で精度よく物体を識別するには、識別しようとする物体の撮影条件に近い訓練画像を予め準備しておく必要がある。例えば物体の撮影角度及び撮影距離の変化に伴う、撮影した画像内での物体の見え方の変化に対応するためには、図１１に示すように、物体を様々な方向及び距離で撮影した複数の訓練画像を予め準備しておかなければならない。 On the other hand, in order to accurately identify an object by the object identification techniques shown in Patent Document 1 and Non-Patent Document 1, it is necessary to prepare a training image close to the imaging conditions of the object to be identified in advance. For example, in order to respond to changes in the appearance of an object in a captured image due to changes in the shooting angle and shooting distance of the object, as shown in FIG. 11, a plurality of objects photographed in various directions and distances. Training images must be prepared in advance.

また、物体の訓練画像には、例えば図１２に示すように、当該訓練画像に含まれる物体の種別を一意に識別するための商品名や一般名詞といった情報（以降、「ラベル情報」という）を予め対応付けておく必要があるが、訓練画像に誤ったラベル情報が対応付けられた場合、物体の識別精度が低下することになる。 Further, in the training image of the object, for example, as shown in FIG. 12, information such as a product name and a general noun for uniquely identifying the type of the object included in the training image (hereinafter referred to as "label information") is provided in the training image. Although it is necessary to associate them in advance, if incorrect label information is associated with the training image, the identification accuracy of the object will be lowered.

更に、訓練画像に対応付けられたラベル情報で表される物体と異なる物体が、ラベル情報で表される物体と共に訓練画像に含まれる場合、このままでは何れの物体がラベル情報で表される物体であるのかわからないため、物体の識別精度が低下することになる。したがって、この場合、例えば図１３に示すように、訓練画像からラベル情報で表される物体の位置情報（図１３の例では点線枠で示されている）を取得するといった事前作業が必要になることがある。 Further, when an object different from the object represented by the label information associated with the training image is included in the training image together with the object represented by the label information, any object is the object represented by the label information as it is. Since it is not known whether it exists, the identification accuracy of the object will decrease. Therefore, in this case, for example, as shown in FIG. 13, prior work such as acquiring the position information of the object represented by the label information (indicated by the dotted line frame in the example of FIG. 13) from the training image is required. Sometimes.

このように、訓練画像の準備には非常に手間を要するため、物体識別技術の導入及び維持管理に係るコストが増大する。 As described above, since it takes a lot of time and effort to prepare the training image, the cost related to the introduction and maintenance of the object identification technique increases.

こうした問題を解決するため、これまでにも訓練画像を効率よく収集する手法が提案されている。例えば非特許文献２では、インターネット上で物体の名前（ラベル情報に相当）を検索することで、検索したラベル情報に対応した訓練画像を収集する手法が提案されている。 In order to solve these problems, methods for efficiently collecting training images have been proposed so far. For example, Non-Patent Document 2 proposes a method of collecting training images corresponding to the searched label information by searching the name of an object (corresponding to label information) on the Internet.

［非特許文献２］：Micheal Rubinstein, Armand Joulin, Johannes Kopf, Ce Liu. "Unsupervised Joint Object Discovery and Segmentation in internet Images" in CVPR, 2013 [Non-Patent Document 2]: Michel Rubinstein, Armand Joulin, Johannes Kopf, Ce Liu. "Unsupervised Joint Object Discovery and Segmentation in internet Images" in CVPR, 2013

しかしながら、非特許文献２に示される手法では、例えば店舗でしか取り扱いがなくインターネットで流通していない物体、及び特殊な分野で使用される工業用部品等、インターネット上の記憶装置に蓄積されにくい物体の画像は取得し難くなり、訓練画像の画像量が不足する傾向が見られる。したがって、こうした物体の識別精度は低下する場合がある。 However, in the method shown in Non-Patent Document 2, for example, an object that is handled only in a store and is not distributed on the Internet, an industrial part used in a special field, or an object that is difficult to be accumulated in a storage device on the Internet. It becomes difficult to acquire the image of the above, and there is a tendency that the amount of the training image is insufficient. Therefore, the identification accuracy of such an object may decrease.

また、非特許文献３には、特定の物体の訓練画像を用いて物体の識別を実行した結果、推定対象の物体が当該訓練画像で表される物体であると推定された場合、推定対象の物体が含まれる画像を新たに訓練画像に追加することで、訓練画像を収集する手法が提案されている。 Further, in Non-Patent Document 3, when it is estimated that the object to be estimated is the object represented by the training image as a result of executing the identification of the object using the training image of the specific object, the estimation target is described. A method of collecting training images has been proposed by newly adding an image containing an object to the training image.

［非特許文献３］：出口大輔、道満恵介、井手一郎、村瀬洋著、「遡及型追跡に基づく標識画像の自動収集を用いた標識検出器の高精度化」、電子情報通信学会論文誌、Vol.J95-D、No.1、pp.76-84、2012 [Non-Patent Document 3]: Daisuke Deguchi, Keisuke Michimitsu, Ichiro Ide, Hiroshi Murase, "Higher accuracy of marker detectors using automatic collection of marker images based on retrospective tracking", IEICE Transactions, Vol.J95-D, No.1, pp.76-84, 2012

しかしながら、非特許文献３に示される手法は、訓練画像を用いた物体の識別に誤りが生じないという前提が置かれている。実際には物体を誤って識別する場合もあるため、非特許文献３に示される手法において物体が誤って識別されると、特定の物体の訓練画像に、異なる種類の物体の訓練画像が追加されることになる。以降、当該訓練画像を用いて物体の識別を実行した場合、連鎖的に特定の物体とは異なる種類の物体の訓練画像が追加されやすくなるため、こうした訓練画像を用いた場合、物体の識別精度が低下することになる。 However, the method shown in Non-Patent Document 3 is based on the premise that no error occurs in the identification of the object using the training image. In reality, an object may be erroneously identified. Therefore, if an object is erroneously identified in the method shown in Non-Patent Document 3, a training image of a different type of object is added to the training image of a specific object. Will be. After that, when the object identification is executed using the training image, it becomes easy to add the training image of the object of a type different from the specific object in a chain reaction. Therefore, when such a training image is used, the object identification accuracy Will decrease.

本発明は、上記問題点を解決するために成されたものであり、物体を識別するためのラベル情報が正しく対応付けられた、より多くの訓練画像を効率的に収集することができる画像収集装置、プログラム、及び方法を提供することを目的とする。 The present invention has been made to solve the above problems, and is capable of efficiently collecting more training images to which label information for identifying an object is correctly associated. It is intended to provide equipment, programs, and methods.

上記目的を達成するために、第１の発明に係る画像収集装置は、物体に接近させることで前記物体に付与された前記物体の種別を示すラベル情報を取得するラベル特定装置から前記物体のラベル情報を取得し、前記物体の種別を特定するラベル特定部と、前記ラベル特定装置を用いた前記物体のラベル情報の取得状況を撮影する少なくとも１台の撮影装置が撮影した撮影画像の各々から、前記ラベル特定部で前記物体の種別の特定に用いられた前記物体のラベル情報が取得された瞬間の基準フレーム画像を取得し、取得した前記基準フレーム画像から前記物体が含まれる物体領域を推定する物体領域推定部と、前記ラベル特定部で特定した前記物体の種別、前記物体領域推定部で前記物体の物体領域の推定に用いられた前記基準フレーム画像、及び前記物体領域推定部で推定された前記基準フレーム画像における前記物体の物体領域をそれぞれ対応づけ、訓練画像として記憶装置に登録する訓練画像登録部と、前記撮影装置が撮影した撮影画像の前記基準フレーム画像の各々から、前記ラベル特定装置が含まれる装置領域を推定するラベル特定装置推定部を備え、前記物体領域推定部は、前記ラベル特定装置推定部で推定された前記ラベル特定装置の装置領域の位置に基づいて、前記基準フレーム画像から前記物体が含まれる物体領域を推定する。 In order to achieve the above object, the image collecting device according to the first invention is a label of the object from a label specifying device that acquires label information indicating the type of the object given to the object by approaching the object. From each of the label specifying unit that acquires information and specifies the type of the object, and the captured image captured by at least one imaging device that captures the acquisition status of the label information of the object using the label identifying device. The reference frame image at the moment when the label information of the object used for specifying the type of the object is acquired by the label specifying unit is acquired, and the object region including the object is estimated from the acquired reference frame image. The object area estimation unit, the type of the object specified by the label identification unit, the reference frame image used to estimate the object area of the object by the object area estimation unit, and the object area estimation unit estimated. From each of the training image registration unit that associates the object regions of the objects in the reference frame image and registers them in the storage device as training images and the reference frame image of the captured image captured by the imaging device, the label specifying device. The object area estimation unit includes a label identification device estimation unit that estimates the device area including the above, and the object area estimation unit is based on the position of the device area of the label identification device estimated by the label identification device estimation unit. The object area including the object is estimated from.

また、第１の発明に係る画像収集装置は、前記撮影画像の各々から、前記物体領域推定部で取得された前記基準フレーム画像より前及び後の少なくとも一方の範囲で撮影された画像である隣接フレーム画像を時系列に沿って取得し、取得した前記隣接フレーム画像の各々から前記基準フレーム画像に含まれる前記物体を追跡し、前記物体が含まれる前記隣接フレーム画像の各々から前記物体が含まれる物体領域を推定する物体追跡部を更に備え、前記訓練画像登録部は、前記ラベル特定部で特定した前記物体の種別、前記物体追跡部で前記物体の物体領域の推定に用いられた前記隣接フレーム画像、及び前記物体追跡部で推定された前記隣接フレーム画像における前記物体の物体領域をそれぞれ対応付け、訓練画像として前記記憶装置に登録する。 Further, the image collecting device according to the first invention is an adjacent image taken from each of the captured images in at least one range before and after the reference frame image acquired by the object area estimation unit. The frame images are acquired in chronological order, the object included in the reference frame image is tracked from each of the acquired adjacent frame images, and the object is included from each of the adjacent frame images including the object. The training image registration unit further includes an object tracking unit that estimates an object area, and the training image registration unit includes the type of the object specified by the label specifying unit and the adjacent frame used by the object tracking unit to estimate the object area of the object. The image and the object area of the object in the adjacent frame image estimated by the object tracking unit are associated with each other and registered in the storage device as a training image.

また、第１の発明に係る画像収集装置において、前記ラベル特定部は、光又は電波を利用して前記物体のラベル情報を読み取る前記ラベル特定装置から前記物体のラベル情報を取得する。 Further, in the image collecting device according to the first invention, the label specifying unit acquires the label information of the object from the label specifying device that reads the label information of the object by using light or radio waves.

第２の発明に係るプログラムは、コンピュータを、請求項１〜請求項３の何れか１項に記載の画像収集装置の各部として機能させる。 The program according to the second invention causes the computer to function as each part of the image collecting device according to any one of claims 1 to 3.

第３の発明に係る画像収集方法は、物体に接近させることで前記物体に付与された前記物体の種別を示すラベル情報を取得するラベル特定装置から前記物体のラベル情報を取得し、前記物体の種別を特定するステップと、前記ラベル特定装置を用いた前記物体のラベル情報の取得状況を撮影する少なくとも１台の撮影装置が撮影した撮影画像の各々から、前記物体の種別の特定に用いられた前記物体のラベル情報が取得された瞬間の基準フレーム画像を取得し、取得した前記基準フレーム画像から前記物体が含まれる物体領域を推定するステップと、前記特定された前記物体の種別、前記物体の物体領域の推定に用いられた前記基準フレーム画像、及び前記基準フレーム画像における前記物体の物体領域をそれぞれ対応づけ、訓練画像として記憶装置に登録するステップと、前記撮影装置が撮影した撮影画像の前記基準フレーム画像の各々から、前記ラベル特定装置が含まれる装置領域を推定するステップを含み、前記基準フレーム画像から前記物体が含まれる物体領域を推定するステップにおいて、推定された前記ラベル特定装置の装置領域の位置に基づいて、前記基準フレーム画像から前記物体が含まれる物体領域を推定する。 In the image collecting method according to the third invention, the label information of the object is acquired from a label specifying device that acquires the label information indicating the type of the object given to the object by approaching the object, and the object of the object is obtained. It was used to identify the type of the object from each of the step of specifying the type and the photographed image taken by at least one photographing device for photographing the acquisition status of the label information of the object using the label specifying device. A step of acquiring a reference frame image at the moment when the label information of the object is acquired and estimating an object region including the object from the acquired reference frame image, the specified type of the object, and the object of the object. The step of associating the reference frame image used for estimating the object region and the object region of the object in the reference frame image and registering them in the storage device as a training image, and the step of registering the captured image captured by the imaging device in the storage device. The device of the label specifying device estimated in the step of estimating the device area including the label specifying device from each of the reference frame images and estimating the object area including the object from the reference frame image. Based on the position of the region, the object region including the object is estimated from the reference frame image.

また、第３の発明に係る画像収集方法は、前記撮影画像の各々から、前記基準フレーム画像より前及び後の少なくとも一方の範囲で撮影された画像である隣接フレーム画像を時系列に沿って取得し、取得した前記隣接フレーム画像の各々から前記基準フレーム画像に含まれる前記物体を追跡し、前記物体が含まれる前記隣接フレーム画像の各々から前記物体が含まれる物体領域を推定するステップと、前記特定された前記物体の種別、前記物体の物体領域の推定に用いられた前記隣接フレーム画像、及び前記隣接フレーム画像における前記物体の物体領域をそれぞれ対応付け、訓練画像として前記記憶装置に登録するステップと、を更に備える。 Further, in the image collecting method according to the third invention, adjacent frame images, which are images taken in at least one range before and after the reference frame image, are acquired from each of the captured images in chronological order. A step of tracking the object included in the reference frame image from each of the acquired adjacent frame images and estimating an object region including the object from each of the adjacent frame images including the object. A step of associating the specified type of the object, the adjacent frame image used for estimating the object area of the object, and the object area of the object in the adjacent frame image, and registering the object area as a training image in the storage device. And further prepare.

本発明の画像収集装置、プログラム、及び方法によれば、物体を識別するためのラベル情報が正しく対応付けられた、より多くの訓練画像を効率的に収集することができる、という効果が得られる。 According to the image collecting device, the program, and the method of the present invention, it is possible to efficiently collect more training images to which the label information for identifying the object is correctly associated. ..

画像収集システムの構成例を示す図である。It is a figure which shows the configuration example of an image acquisition system. 第１実施形態に係る画像収集装置の構成例を示す図である。It is a figure which shows the structural example of the image collecting apparatus which concerns on 1st Embodiment. 第１実施形態に係る画像収集ルーチンの流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the image acquisition routine which concerns on 1st Embodiment. 手領域の位置から物体領域を推定する過程の一例を示す図である。It is a figure which shows an example of the process of estimating an object area from the position of a hand area. 第２実施形態に係る画像収集装置の構成例を示す図である。It is a figure which shows the structural example of the image collecting apparatus which concerns on 2nd Embodiment. 第２実施形態に係る画像収集ルーチンの流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the image acquisition routine which concerns on 2nd Embodiment. 隣接フレーム画像から訓練画像を収集する過程の一例を示す図である。It is a figure which shows an example of the process of collecting a training image from an adjacent frame image. 第３実施形態に係る画像収集装置の構成例を示す図である。It is a figure which shows the structural example of the image collecting apparatus which concerns on 3rd Embodiment. 第３実施形態に係る画像収集ルーチンの流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the image acquisition routine which concerns on 3rd Embodiment. 装置領域から物体領域を推定する過程の一例を示す図である。It is a figure which shows an example of the process of estimating an object area from a device area. 訓練画像の一例を示す図である。It is a figure which shows an example of the training image. ラベル情報を対応付けた訓練画像の一例を示す図である。It is a figure which shows an example of the training image which associated the label information. ラベル情報及び物体の位置情報を対応付けた訓練画像の一例を示す図である。It is a figure which shows an example of the training image which associated the label information and the position information of an object.

以下、図面を参照して本発明の実施形態を詳細に説明する。なお、機能が同じ構成要素及び処理には、全図面を通して同じ符合を付与し、重複する説明を省略するものとする。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. It should be noted that components and processes having the same function shall be given the same code throughout the drawings, and duplicate explanations shall be omitted.

＜第１実施形態＞
＜画像収集システムの構成＞
まず、図１を用いて、本発明に係る画像収集システム１の構成について説明する。 <First Embodiment>
<Configuration of image collection system>
First, the configuration of the image acquisition system 1 according to the present invention will be described with reference to FIG.

画像収集システム１は、画像収集装置２、ラベル特定装置４、及び撮影装置６を含み、物体８の訓練画像を収集する。 The image collection system 1 includes an image collection device 2, a label identification device 4, and a photographing device 6, and collects training images of an object 8.

物体８には物体の種別を示すラベル情報が予め付与されており、ユーザが例えばラベル特定装置４に設けられた操作ボタンを押下したり、物体８にラベル特定装置４を予め定めた距離まで接近させたりすることで、ラベル特定装置４は、物体８に付与されたラベル情報を取得する。 Label information indicating the type of the object is given to the object 8 in advance, and the user presses, for example, an operation button provided on the label specifying device 4, or approaches the object 8 to the label specifying device 4 to a predetermined distance. The label specifying device 4 acquires the label information given to the object 8.

具体的には、物体８にラベル情報を示すバーコードが付与されている場合、照射した光の反射光を用いてバーコードで示される物体８のラベル情報を読み取るバーコードリーダーがラベル特定装置４として用いられる。なお、ラベル情報を示すバーコードは、１次元バーコード及び２次元バーコードの何れであってもよい。 Specifically, when a bar code indicating label information is given to the object 8, a bar code reader that reads the label information of the object 8 indicated by the bar code by using the reflected light of the irradiated light is the label specifying device 4. Used as. The barcode indicating the label information may be either a one-dimensional barcode or a two-dimensional barcode.

また、物体８にラベル情報を記憶した電子タグが付与されている場合、電波を利用して電子タグと通信を行い、電子タグに記憶された物体８のラベル情報を読み取るＲＦＩＤ(Radio Frequency IDentification)リーダーがラベル特定装置４として用いられる。なお、電子タグは、ＲＦタグ、ＩＣタグ、非接触タグ、又は無線タグと呼ばれることがあるが、ここでは「電子タグ」と称することにする。 When an electronic tag that stores label information is attached to the object 8, RFID (Radio Frequency IDentification) that communicates with the electronic tag using radio waves and reads the label information of the object 8 stored in the electronic tag. The reader is used as the label identification device 4. The electronic tag may be called an RF tag, an IC tag, a non-contact tag, or a wireless tag, but here, it will be referred to as an "electronic tag".

ラベル特定装置４は画像収集装置２と接続され、ラベル特定装置４は取得したラベル情報を画像収集装置２に送信する。 The label identification device 4 is connected to the image collection device 2, and the label identification device 4 transmits the acquired label information to the image collection device 2.

撮影装置６は画像収集装置２と接続され、ユーザがラベル特定装置４を用いて物体８のラベル情報を取得する状況を撮影する。画像収集システム１で用いられる撮影装置６の台数に制約はなく、少なくとも１台の撮影装置６が含まれればよいが、ユーザがラベル特定装置４を用いて物体８のラベル情報を取得する状況が複数の方向から撮影できるように、複数の撮影装置６を用いる方が好ましい。図１の例では３台の撮影装置６が用いられ、それぞれ異なる方向から、ユーザがラベル特定装置４を用いて物体８のラベル情報を取得する状況を撮影している。物体８と各々の撮影装置６の距離が同じになるように撮影装置６を設置してもよいが、当該距離がそれぞれ異なるように各々の撮影装置６を設置してもよい。 The photographing device 6 is connected to the image collecting device 2, and photographs a situation in which the user acquires the label information of the object 8 using the label specifying device 4. There is no limitation on the number of photographing devices 6 used in the image acquisition system 1, and it is sufficient that at least one photographing device 6 is included. However, there is a situation in which a user acquires label information of an object 8 by using a label specifying device 4. It is preferable to use a plurality of photographing devices 6 so that the images can be photographed from a plurality of directions. In the example of FIG. 1, three photographing devices 6 are used, and a situation in which a user acquires label information of an object 8 by using a label specifying device 4 is photographed from different directions. The imaging device 6 may be installed so that the distance between the object 8 and each imaging device 6 is the same, but each imaging device 6 may be installed so that the distances are different from each other.

なお、時系列に沿って画像を撮影することができれば撮影装置６の種類に制約はなく、例えばスマートフォンに内蔵されたカメラや、ウェブカメラ等の監視カメラを利用することができる。また、撮影装置６で撮影される画像は白黒画像であってもカラー画像であってもよいが、白黒画像よりカラー画像の方が物体８の色情報を含めて撮影することができるため、カラー画像で撮影可能な撮影装置６を用いる方が好ましい。また、物体８が確認でき、かつ、ユーザがラベル特定装置４を用いて物体８のラベル情報を取得する状況が確認できる程度の解像度があれば、撮影装置６の解像度にも特に制約はないが、解像度が高くなるほど撮影画像が鮮明になることから解像度の高い撮影装置６を用いる方が好ましい。 There are no restrictions on the type of the photographing device 6 as long as the images can be photographed in chronological order, and for example, a camera built in a smartphone or a surveillance camera such as a webcam can be used. Further, the image taken by the photographing device 6 may be a black-and-white image or a color image, but since the color image can be taken including the color information of the object 8 as compared with the black-and-white image, the color image can be taken. It is preferable to use a photographing device 6 capable of photographing an image. Further, the resolution of the photographing device 6 is not particularly limited as long as the object 8 can be confirmed and the resolution is such that the user can confirm the situation in which the label information of the object 8 is acquired by using the label specifying device 4. The higher the resolution, the clearer the captured image. Therefore, it is preferable to use the imaging device 6 having a high resolution.

＜第１実施形態に係る画像収集装置の構成＞
次に、画像収集装置２の構成について説明する。本発明の実施形態に係る画像収集装置２は、ＣＰＵと、ＲＡＭと、後述する画像収集ルーチンを実行するための画像収集プログラムや各種データを記憶したＲＯＭと、閾値等のデータを記憶する不揮発性メモリを含むコンピュータで構成することができる。画像収集装置２は、機能的には図２に示すように入力部１０と、演算部２０とを備えている。 <Structure of the image collecting device according to the first embodiment>
Next, the configuration of the image collecting device 2 will be described. The image collection device 2 according to the embodiment of the present invention is a CPU, a RAM, a ROM that stores an image collection program for executing an image collection routine described later, a ROM that stores various data, and a non-volatile memory that stores data such as thresholds. It can be configured with a computer that includes memory. The image collecting device 2 functionally includes an input unit 10 and a calculation unit 20 as shown in FIG.

入力部１０は、ラベル特定装置４で取得されたラベル情報を受け付けると共に、各々の撮影装置６で撮影された撮影画像を受け付け、演算部２０に通知する。 The input unit 10 receives the label information acquired by the label specifying device 4, receives the photographed image photographed by each photographing device 6, and notifies the calculation unit 20 of the photographed image.

演算部２０は、ラベル特定部２２、物体領域推定部２４、訓練画像登録部２６、及び訓練画像記憶部２８を含んで構成される。 The calculation unit 20 includes a label identification unit 22, an object area estimation unit 24, a training image registration unit 26, and a training image storage unit 28.

ラベル特定部２２は、ラベル特定装置４で取得されたラベル情報を受け付けると、受け付けたラベル情報を解析して物体８の種別を特定する。そして、ラベル特定部２２は、特定した物体８の種別を訓練画像登録部２６に通知する。一方、ラベル特定部２２は、物体領域推定部２４に対して処理開始命令を通知する。 When the label specifying unit 22 receives the label information acquired by the label specifying device 4, the label specifying unit 22 analyzes the received label information and identifies the type of the object 8. Then, the label specifying unit 22 notifies the training image registration unit 26 of the type of the specified object 8. On the other hand, the label specifying unit 22 notifies the object area estimation unit 24 of the processing start command.

物体領域推定部２４は、ラベル特定部２２から処理開始命令を受け付けると、ユーザがラベル特定装置４を用いて物体８のラベル情報を取得する状況を撮影した撮影装置６の撮影画像の各々から、ラベル特定部２２で物体８の種別の特定に用いられた物体８のラベル情報が取得された瞬間の画像（以降、「基準フレーム画像３０Ａ」という）を取得する。そして、物体領域推定部２４は、取得した各々の基準フレーム画像３０Ａから、ラベル特定部２２で種別が特定された物体８が含まれる物体領域３２を推定する。 When the object area estimation unit 24 receives the processing start command from the label specifying unit 22, the object area estimation unit 24 uses each of the captured images of the photographing device 6 to capture the situation in which the user acquires the label information of the object 8 using the label specifying device 4. The label specifying unit 22 acquires an image (hereinafter, referred to as “reference frame image 30A”) at the moment when the label information of the object 8 used for specifying the type of the object 8 is acquired. Then, the object area estimation unit 24 estimates the object area 32 including the object 8 whose type is specified by the label identification unit 22 from each of the acquired reference frame images 30A.

物体領域推定部２４は、物体領域３２の推定に用いた各々の基準フレーム画像３０Ａ、及び各々の基準フレーム画像３０Ａから推定した物体８の物体領域３２の位置情報を、訓練画像登録部２６に通知する。 The object area estimation unit 24 notifies the training image registration unit 26 of the position information of each reference frame image 30A used for estimating the object area 32 and the object area 32 of the object 8 estimated from each reference frame image 30A. do.

訓練画像登録部２６は、物体領域推定部２４から基準フレーム画像３０Ａ及び物体領域３２の位置情報を受け付けると共に、ラベル特定部２２から物体８の種別を受け付けると、ラベル特定部２２で特定された物体８の種別、物体領域推定部２４で物体領域３２の推定に用いられた基準フレーム画像３０Ａ、及び基準フレーム画像３０Ａにおける物体８の物体領域３２を基準フレーム画像３０Ａ毎に対応付ける。そして、訓練画像登録部２６は、物体８の種別及び物体領域３２が対応付けられた各々の基準フレーム画像３０Ａを、それぞれ訓練画像として訓練画像記憶部２８に記憶することで、訓練画像記憶部２８に訓練画像を登録する。 When the training image registration unit 26 receives the position information of the reference frame image 30A and the object area 32 from the object area estimation unit 24 and receives the type of the object 8 from the label specifying unit 22, the object specified by the label specifying unit 22 The reference frame image 30A used for estimating the object area 32 by the object area estimation unit 24 and the object area 32 of the object 8 in the reference frame image 30A are associated with each of the reference frame images 30A. Then, the training image registration unit 26 stores each reference frame image 30A associated with the type of the object 8 and the object area 32 in the training image storage unit 28 as a training image, so that the training image storage unit 28 Register the training image in.

訓練画像記憶部２８は、訓練画像記憶部２８に供給される電力が遮断されても記憶した情報を維持する記憶部の一例であり、例えば半導体メモリ又はハードディスクが用いられる。 The training image storage unit 28 is an example of a storage unit that maintains the stored information even when the power supplied to the training image storage unit 28 is cut off, and for example, a semiconductor memory or a hard disk is used.

＜第１実施形態に係る画像収集装置の作用＞
次に、本実施形態に係る画像収集装置２の作用について説明する。入力部１０において、物体８のラベル情報を受け付けると、画像収集装置２は、図３に示す画像収集ルーチンを実行する。なお、入力部１０は、各撮影装置６で撮影された撮影画像を常時受け付け、撮影装置６毎に撮影画像をＲＡＭに記憶するものとする。 <Operation of the image collecting device according to the first embodiment>
Next, the operation of the image collecting device 2 according to the present embodiment will be described. When the input unit 10 receives the label information of the object 8, the image collecting device 2 executes the image collecting routine shown in FIG. The input unit 10 constantly receives the photographed images taken by each photographing device 6, and stores the photographed images in the RAM for each photographing device 6.

まず、ステップＳ１０において、ＣＰＵは、入力部１０で受け付けたラベル特定装置４からのラベル情報を取得して、取得したラベル情報をＲＡＭに記憶する。 First, in step S10, the CPU acquires the label information from the label specifying device 4 received by the input unit 10 and stores the acquired label information in the RAM.

ステップＳ２０において、ＣＰＵは、ステップＳ１０でＲＡＭに記憶したラベル情報を解析して、ラベル情報によって示される物体８の種別を特定する。具体的には、ＣＰＵは、ラベル情報と物体８の種別を対応付けた種別テーブルを参照することで、ラベル情報によって示される物体８の種別を特定する。なお、種別テーブルは、例えば画像収集装置２に実装される不揮発性メモリに予め記憶しておき、これを参照してもよいが、インターネットに接続された種別テーブルを記憶するデータベースを参照して、ラベル情報によって示される物体８の種別を特定してもよい。 In step S20, the CPU analyzes the label information stored in the RAM in step S10 and identifies the type of the object 8 indicated by the label information. Specifically, the CPU specifies the type of the object 8 indicated by the label information by referring to the type table in which the label information and the type of the object 8 are associated with each other. The type table may be stored in advance in a non-volatile memory mounted on the image collecting device 2, for example, and may be referred to. However, the type table may be referred to by referring to a database for storing the type table connected to the Internet. The type of the object 8 indicated by the label information may be specified.

ステップＳ３０において、ＣＰＵは、入力部１０で受け付けた各撮影装置６の撮影画像をＲＡＭから取得する。 In step S30, the CPU acquires the captured image of each photographing device 6 received by the input unit 10 from the RAM.

撮影画像は時系列に沿った複数のフレーム画像で構成されているため、ステップＳ４０において、ＣＰＵは、ステップＳ３０で取得した撮影装置６毎の撮影画像から、それぞれ基準フレーム画像３０Ａを取得する。具体的には、ＣＰＵは、ステップＳ１０でラベル情報を取得した際に撮影された画像を、基準フレーム画像３０Ａとして撮影画像から抽出する。撮影画像に含まれる各々のフレーム画像に時刻情報が付与されている場合には、当該時刻情報を参照して、ステップＳ１０でラベル情報を取得した時刻に対応する時刻情報が付与されているフレーム画像を、基準フレーム画像３０Ａとして取得するようにしてもよい。 Since the captured image is composed of a plurality of frame images in chronological order, in step S40, the CPU acquires the reference frame image 30A from the captured images for each imaging device 6 acquired in step S30. Specifically, the CPU extracts the image captured when the label information is acquired in step S10 as the reference frame image 30A from the captured image. When time information is added to each frame image included in the captured image, the frame image to which the time information corresponding to the time when the label information was acquired in step S10 is added with reference to the time information. May be acquired as the reference frame image 30A.

ステップＳ５０において、ＣＰＵは、ステップＳ４０で取得したそれぞれの基準フレーム画像３０Ａに対して、ステップＳ２０で種別が特定された物体８が含まれる物体領域３２を推定する。この際、物体領域３２の推定手法には、公知の手法を用いることができる。 In step S50, the CPU estimates an object area 32 including the object 8 whose type is specified in step S20 for each reference frame image 30A acquired in step S40. At this time, a known method can be used as the method for estimating the object region 32.

例えば各々の撮影装置６の設置場所及び撮影方向が固定され、撮影装置６の撮影範囲が変わらない場合には、物体８が写っていない背景画像だけを撮影装置６毎に予め撮影して、画像収集装置２の不揮発性メモリ等に予め記憶しておき、撮影装置６毎に基準フレーム画像３０Ａと背景画像の差分を算出することで、基準フレーム画像３０Ａから物体領域３２を推定するようにしてもよい。 For example, when the installation location and shooting direction of each shooting device 6 are fixed and the shooting range of the shooting device 6 does not change, only the background image in which the object 8 is not captured is shot in advance for each shooting device 6 to obtain an image. Even if the object area 32 is estimated from the reference frame image 30A by storing it in the non-volatile memory of the collection device 2 in advance and calculating the difference between the reference frame image 30A and the background image for each photographing device 6. good.

また、例えばスーパーやコンビニエンスストアのレジにおける商品決済時のように、ユーザが物体８（この場合、商品）を手に持った状態でラベル特定装置４を物体８に近づけ、物体８の価格及び品名を含んだラベル情報を取得する場合には、まず基準フレーム画像３０Ａから物体８をつかむユーザの手領域を抽出し、抽出した手領域の位置から物体８が含まれる物体領域３２を推定するようにしてもよい（例えば非特許文献４参照）。 Further, for example, when the product is settled at the cash register of a supermarket or a convenience store, the label specifying device 4 is brought close to the object 8 while the user holds the object 8 (in this case, the product), and the price and product name of the object 8 are obtained. When acquiring the label information including the object 8, first, the hand area of the user who grabs the object 8 is extracted from the reference frame image 30A, and the object area 32 including the object 8 is estimated from the position of the extracted hand area. It may be (see, for example, Non-Patent Document 4).

［非特許文献４］：Xiaolong Zhu, Wei Liu, Xuhui Jia ,Kwan-Yee K.Wong. "A Two-stage Detector for Hand Detection in Ego-centric Videos" in WACV, 2016 [Non-Patent Document 4]: Xiaolong Zhu, Wei Liu, Xuhui Jia, Kwan-Yee K.Wong. "A Two-stage Detector for Hand Detection in Ego-centric Videos" in WACV, 2016

図４は、基準フレーム画像３０Ａからユーザの手領域３１を抽出し、抽出した手領域３１の位置から物体領域３２を推定する過程の一例を示す図である。 FIG. 4 is a diagram showing an example of a process of extracting a user's hand region 31 from the reference frame image 30A and estimating an object region 32 from the position of the extracted hand region 31.

画像収集装置２は、各々の撮影装置６で撮影された基準フレーム画像３０Ａ毎に手領域３１を抽出する。 The image collecting device 2 extracts the hand region 31 for each reference frame image 30A captured by each imaging device 6.

ここで、ｉ（この場合、ｉ＝１〜３）番目の撮影装置６で撮影された基準フレーム画像３０Ａから抽出された手領域３１の中心座標を^ｃ_iとする。また、物体８を手に持った場合の物体８の中心座標（すなわち、物体領域３２の中心座標）は、手領域３１の中心座標^ｃ_iからずれた位置となるため、予め様々な物体８を持った手が含まれる基準フレーム画像３０Ａを用いて、物体領域３２の中心座標^ｘ^' _iと手領域３１の中心座標^ｃ_iのずれ量の平均値を実験的に求めた値をオフセット量^ｂ_iとして記憶しておけば、各基準フレーム画像３０Ａにおける物体領域３２の中心座標^ｘ^' _iは（１）式で表される。 _{Here, let ^ c i} be the center coordinate of the hand region 31 extracted from the reference frame image 30A taken by the i (in this case, i = 1 to 3) th imaging device 6. Further, since the center coordinates of the object 8 (that is, the center coordinates of the object area 32) when the object 8 is held in the hand are _{located at positions deviated from the center coordinates ^ c i} of the hand area 31, various objects 8 are prepared in advance. using the reference frame image 30A including the hand holding the center coordinates ^ x ^_'i and the center coordinate ^ c _i offset value experimentally obtained mean value of the shift amount of the hand region 31 of the object region 32 by storing an amount ^ b _i, the center coordinate ^ x ^_'i of the object region 32 in each reference frame image 30A is expressed by equation (1).

（数１）
^ｘ^'i=^ｃ_i＋^ｂ_i ・・・（１） (Number 1)
^{_{^ x 'i = ^ c i}} + ^ b i ··· (1)

また、予め様々な物体８を持った手が含まれる基準フレーム画像３０Ａを用いて、実験的に求めた物体領域３２の幅の平均値をｗ_i、物体領域３２の高さの平均値をｈ_iとして記憶しておけば、図４に示すように、各基準フレーム画像３０Ａにおける物体領域３２の範囲は、座標^ｘ^' _iを中心とした幅ｗ_i、高さｈ_iの矩形によって表される。 Further, by using the reference frame image 30A including the hand holding the pre various objects 8, the average value w _i of the width of the object region 32 experimentally obtained, the average value of the height of the object region 32 h by storing as _i, as shown in FIG. 4, the range of the object region 32 in each reference frame image 30A has a width w _i around the coordinates ^ x ^_'i, is represented by the height of the rectangle h _i NS.

なお、基準フレーム画像３０Ａからは左右の手に対応する２つの手領域３１が抽出されることになる。しかし、ユーザは予め定めた一方の手で物体８をつかむとの条件を定めておけば、ＣＰＵは物体領域３２の中心座標^ｘ^' _iを推定する際、何れの手領域３１の中心座標^ｃ_iを用いればよいかを判断することができる。 Two hand regions 31 corresponding to the left and right hands are extracted from the reference frame image 30A. However, the user if determined conditions and grab an object 8 with one hand a predetermined, CPU is when estimating the center coordinates ^ x ^_'i of the object region 32, any of the center coordinates of the hand region 31 ^ It is possible to determine whether c _{i should be used.}

また、物体領域３２の形状は矩形に限られず、例えば円、楕円、多角形等の他の形状を用いて物体領域３２を表すようにしてもよい。 Further, the shape of the object region 32 is not limited to a rectangle, and other shapes such as a circle, an ellipse, and a polygon may be used to represent the object region 32.

ステップＳ６０において、ＣＰＵは、ステップＳ５０で物体領域３２の推定に用いられた基準フレーム画像３０Ａに対して、ステップＳ２０で特定された物体領域３２に含まれる物体８の種別と、基準フレーム画像３０Ａにおける物体領域３２の位置情報、すなわち、物体領域３２の中心座標^ｘ^' _i、幅ｗ_i、及び高さｈ_iとを、基準フレーム画像３０Ａ毎に対応付ける。そして、ＣＰＵは、物体８の種別及び物体領域３２の位置情報が対応付けられた基準フレーム画像３０Ａ、すなわち、訓練画像の各々を、例えば画像収集装置２に実装される不揮発性メモリに記憶する。 In step S60, the CPU determines the type of the object 8 included in the object area 32 specified in step S20 and the reference frame image 30A with respect to the reference frame image 30A used for estimating the object area 32 in step S50. position information of the object region 32, i.e., the center coordinates of the object region 32 ^ x ^_'i, the width w _i, and a height h _i, associated to each reference frame image 30A. Then, the CPU stores each of the reference frame image 30A, that is, the training image, in which the type of the object 8 and the position information of the object area 32 are associated with each other, in, for example, the non-volatile memory mounted on the image collecting device 2.

なお、訓練画像の記憶先に制約はなく、例えばインターネットに接続されたデータベースに訓練画像を記憶するようにしてもよい。 There are no restrictions on the storage destination of the training image, and the training image may be stored in a database connected to the Internet, for example.

以上により、図３に示した画像収集ルーチンを終了する。 As a result, the image collection routine shown in FIG. 3 is completed.

このように本実施形態に係る画像収集装置２によれば、ユーザがラベル特定装置４で物体８のラベル情報を取得する瞬間の基準フレーム画像３０Ａに対して、物体８のラベル情報から得られた物体８の種別と、基準フレーム画像３０Ａから推定された物体領域３２を対応付けて訓練画像とする。 As described above, according to the image collecting device 2 according to the present embodiment, the label information of the object 8 is obtained with respect to the reference frame image 30A at the moment when the user acquires the label information of the object 8 with the label specifying device 4. The type of the object 8 and the object area 32 estimated from the reference frame image 30A are associated with each other to form a training image.

バーコードリーダーやＲＦＩＤリーダーのように、光や電波を利用してラベル情報を読み取るラベル特定装置４は、例えば物体８の決済手段及び検品手段として用いられることからもわかるように、他の手法に比べてラベル情報の読み取りミスが低い。したがって、ラベル特定装置４から取得したラベル情報を用いて物体８の種別を特定する場合、例えば物体を撮影した撮影画像に対して画像認識を行うことで物体８の種別を特定する場合と比較して、物体８の種別に対する誤認識率が低くなる。 As can be seen from the fact that the label identification device 4 that reads label information using light or radio waves, such as a barcode reader or RFID reader, is used as a payment means and an inspection means for an object 8, for example, it can be used in other methods. Compared to this, the reading error of label information is low. Therefore, when the type of the object 8 is specified by using the label information acquired from the label specifying device 4, it is compared with the case where the type of the object 8 is specified by performing image recognition on a photographed image of the object, for example. Therefore, the erroneous recognition rate for the type of the object 8 becomes low.

すなわち、画像収集装置２では、画像認識を用いて物体８の種別を特定する場合と比較して、訓練画像に正しい物体８の種別を対応付けることができる。 That is, in the image collecting device 2, the correct type of the object 8 can be associated with the training image as compared with the case where the type of the object 8 is specified by using image recognition.

また、画像収集装置２は、ユーザがラベル特定装置４で物体８のラベル情報を取得する瞬間の画像である基準フレーム画像３０Ａを訓練画像として用いるため、ラベル情報を用いて特定した種別の物体８が、基準フレーム画像３０Ａに含まれる物体８であることが保証される。 Further, since the image collecting device 2 uses the reference frame image 30A, which is an image at the moment when the user acquires the label information of the object 8 with the label specifying device 4, as the training image, the object 8 of the type specified by using the label information. Is guaranteed to be the object 8 included in the reference frame image 30A.

更に、ユーザがラベル特定装置４を用いて物体８のラベル情報を取得する状況を複数の撮影装置６で撮影することで、１度のラベル情報の取得作業によって、様々な方向及び距離で撮影した複数の訓練画像が得られるため、より多くの訓練画像を効率的に収集することができる。 Further, by photographing the situation in which the user acquires the label information of the object 8 using the label specifying device 4 with a plurality of photographing devices 6, the photographing is performed in various directions and distances by one task of acquiring the label information. Since a plurality of training images can be obtained, more training images can be efficiently collected.

＜第２実施形態＞
第１実施形態に係る画像収集装置２は、ユーザがラベル特定装置４で物体８のラベル情報を取得する瞬間の画像である基準フレーム画像３０Ａを訓練画像として用いた。しかしながら、撮影装置６で撮影した撮影画像のうち、物体８が含まれている画像は基準フレーム画像３０Ａに限られない。 <Second Embodiment>
The image collecting device 2 according to the first embodiment uses the reference frame image 30A, which is an image at the moment when the user acquires the label information of the object 8 with the label specifying device 4, as a training image. However, among the captured images captured by the photographing device 6, the image including the object 8 is not limited to the reference frame image 30A.

したがって、第２実施形態では、基準フレーム画像３０Ａを基準として、基準フレーム画像３０Ａより前及び後の少なくとも一方の範囲で撮影された画像である隣接フレーム画像のうち、物体８が含まれる隣接フレーム画像も訓練画像として収集する画像収集装置２Ａについて説明する。 Therefore, in the second embodiment, among the adjacent frame images which are images taken in at least one range before and after the reference frame image 30A with reference to the reference frame image 30A, the adjacent frame image including the object 8 is included. The image collecting device 2A for collecting as a training image will be described.

なお、画像収集装置２Ａを含む画像収集システム１の構成は、図１と同じ構成が用いられる。 The configuration of the image acquisition system 1 including the image acquisition device 2A is the same as that shown in FIG.

＜第２実施形態に係る画像収集装置の構成＞
図５は、画像収集装置２Ａの構成例を示す図である。 <Structure of the image collecting device according to the second embodiment>
FIG. 5 is a diagram showing a configuration example of the image collecting device 2A.

図５に示す画像収集装置２Ａの構成が、図２に示した画像収集装置２の構成と異なる点は、物体追跡部２５が追加され、訓練画像登録部２６が訓練画像登録部２６Ａに置き換えられた点である。 The configuration of the image collecting device 2A shown in FIG. 5 is different from the configuration of the image collecting device 2 shown in FIG. 2 in that the object tracking unit 25 is added and the training image registration unit 26 is replaced with the training image registration unit 26A. This is the point.

物体追跡部２５は、物体領域推定部２４から基準フレーム画像３０Ａ及び物体領域３２の位置情報を受け付けると、基準フレーム画像３０Ａに隣接するフレーム画像から時系列に沿ってフレーム画像を順次取得し、時系列に沿った各フレーム画像、すなわち、隣接フレーム画像の各々に対して、物体領域推定部２４から受け付けた物体領域３２に含まれる物体８と同じ物体８が含まれているか否かを判定する物体追跡を行う。なお、物体追跡部２５は、物体追跡によって物体８が含まれると判定した隣接フレーム画像に対して、物体領域３２の推定を行う。 When the object tracking unit 25 receives the position information of the reference frame image 30A and the object area 32 from the object area estimation unit 24, the object tracking unit 25 sequentially acquires the frame images from the frame images adjacent to the reference frame image 30A in chronological order. An object for determining whether or not the same object 8 as the object 8 included in the object area 32 received from the object area estimation unit 24 is included in each frame image along the series, that is, each of the adjacent frame images. Perform tracking. The object tracking unit 25 estimates the object region 32 with respect to the adjacent frame image determined to include the object 8 by the object tracking.

そして、物体追跡部２５は、物体領域推定部２４から受け付けた物体領域３２に含まれる物体８と同じ物体８が含まれている隣接フレーム画像に、当該隣接フレーム画像から推定した物体８の物体領域３２の位置情報を付加して訓練画像登録部２６Ａに通知する。また、物体追跡部２５は、物体領域推定部２４から受け付けた基準フレーム画像３０Ａ及び基準フレーム画像３０Ａにおける物体領域３２の位置情報を訓練画像登録部２６Ａに通知する。 Then, the object tracking unit 25 adds the object area of the object 8 estimated from the adjacent frame image to the adjacent frame image containing the same object 8 as the object 8 included in the object area 32 received from the object area estimation unit 24. The position information of 32 is added and notified to the training image registration unit 26A. Further, the object tracking unit 25 notifies the training image registration unit 26A of the position information of the reference frame image 30A and the object area 32 in the reference frame image 30A received from the object area estimation unit 24.

なお、物体追跡部２５は、撮影装置６が複数存在するために基準フレーム画像３０Ａを複数受け付けた場合、各基準フレーム画像３０Ａに対応するそれぞれの撮影装置６で撮影された撮影画像から隣接フレーム画像を取得して物体追跡を行う。 When the object tracking unit 25 receives a plurality of reference frame images 30A due to the existence of a plurality of photographing devices 6, the object tracking unit 25 receives adjacent frame images from the captured images captured by the respective photographing devices 6 corresponding to the respective reference frame images 30A. To track the object.

訓練画像登録部２６Ａは訓練画像登録部２６と同様に、物体追跡部２５から基準フレーム画像３０Ａ及び基準フレーム画像３０Ａにおける物体領域３２の位置情報を受け付けると共に、ラベル特定部２２から物体８の種別を受け付けると、ラベル特定部２２で特定された物体８の種別、物体領域推定部２４で物体領域３２の推定に用いられた基準フレーム画像３０Ａ、及び基準フレーム画像３０Ａにおける物体８の物体領域３２を基準フレーム画像３０Ａ毎に対応付ける。 Similar to the training image registration unit 26, the training image registration unit 26A receives the position information of the reference frame image 30A and the object area 32 in the reference frame image 30A from the object tracking unit 25, and selects the type of the object 8 from the label specifying unit 22. Upon receipt, the type of the object 8 specified by the label specifying unit 22, the reference frame image 30A used for estimating the object area 32 by the object area estimation unit 24, and the object area 32 of the object 8 in the reference frame image 30A are used as a reference. Correspondence is made for each frame image 30A.

また、訓練画像登録部２６Ａは、物体追跡部２５から隣接フレーム画像及び隣接フレーム画像における物体領域３２の位置情報を受け付けると共に、ラベル特定部２２から物体８の種別を受け付けると、ラベル特定部２２で特定された物体８の種別、物体追跡部２５で物体８が含まれると判定された隣接フレーム画像、及び物体８が含まれると判定された隣接フレーム画像における物体８の物体領域３２を隣接フレーム画像毎に対応付ける。そして、訓練画像登録部２６Ａは、物体８の種別及び物体領域３２が対応付けられた各々の基準フレーム画像３０Ａ及び各々の隣接フレーム画像を、それぞれ訓練画像として訓練画像記憶部２８に記憶する。 Further, when the training image registration unit 26A receives the position information of the adjacent frame image and the object region 32 in the adjacent frame image from the object tracking unit 25 and receives the type of the object 8 from the label specifying unit 22, the label specifying unit 22 receives the type of the object 8. The type of the identified object 8, the adjacent frame image determined by the object tracking unit 25 to include the object 8, and the object region 32 of the object 8 in the adjacent frame image determined to include the object 8 are adjacent frame images. Correspond to each. Then, the training image registration unit 26A stores each reference frame image 30A and each adjacent frame image associated with the type of the object 8 and the object area 32 in the training image storage unit 28 as training images.

＜第２実施形態に係る画像収集装置の作用＞
次に、本実施形態に係る画像収集装置２Ａの作用について説明する。入力部１０において、物体８のラベル情報を受け付けると、画像収集装置２Ａは、図６に示す画像収集ルーチンを実行する。なお、入力部１０は、各撮影装置６で撮影された撮影画像を常時受け付け、撮影装置６毎に撮影画像をＲＡＭに記憶するものとする。したがって、ＲＡＭには物体８のラベル情報を受け付ける前の撮影画像、及び物体８のラベル情報を受け付けた後の撮影画像が記憶されている。 <Operation of the image collecting device according to the second embodiment>
Next, the operation of the image collecting device 2A according to the present embodiment will be described. When the input unit 10 receives the label information of the object 8, the image collecting device 2A executes the image collecting routine shown in FIG. The input unit 10 constantly receives the photographed images taken by each photographing device 6, and stores the photographed images in the RAM for each photographing device 6. Therefore, the RAM stores the captured image before receiving the label information of the object 8 and the captured image after receiving the label information of the object 8.

図６に示す画像収集ルーチンが図３に示した画像収集装置２の画像収集ルーチンと異なる点は、ステップＳ５２〜Ｓ５８が新たに追加され、ステップＳ６０がステップＳ６０Ａに置き換えられた点である。 The difference between the image collection routine shown in FIG. 6 and the image collection routine of the image collection device 2 shown in FIG. 3 is that steps S52 to S58 are newly added and step S60 is replaced with step S60A.

ステップＳ５０で、基準フレーム画像３０Ａに対してステップＳ２０で種別が特定された物体８が含まれる物体領域３２が推定されるとステップＳ５２が実行される。 In step S50, when the object area 32 including the object 8 whose type is specified in step S20 is estimated with respect to the reference frame image 30A, step S52 is executed.

ステップＳ５２において、ＣＰＵは、物体領域３２が推定された基準フレーム画像３０Ａのうち、何れか１つの基準フレーム画像３０Ａを選択し、選択した基準フレーム画像３０Ａが含まれる撮影画像から、時系列に従って基準フレーム画像３０Ａに隣接する隣接フレーム画像を１枚取得する。この際、ＣＰＵは、基準フレーム画像３０Ａが撮影された時刻より前に撮影された過去方向の時系列に沿って隣接フレーム画像を取得してもよいし、基準フレーム画像３０Ａが撮影された時刻より後に撮影された未来方向の時系列に沿って隣接フレーム画像を取得してもよい。ここでは一例として、過去方向の時系列に沿って隣接フレーム画像を取得するものとして説明を行う。 In step S52, the CPU selects any one of the reference frame images 30A from which the object region 32 is estimated, and refers to the captured images including the selected reference frame image 30A in chronological order. Acquire one adjacent frame image adjacent to the frame image 30A. At this time, the CPU may acquire the adjacent frame image along the time series in the past direction taken before the time when the reference frame image 30A was taken, or from the time when the reference frame image 30A was taken. Adjacent frame images may be acquired along the time series in the future direction taken later. Here, as an example, it will be described assuming that adjacent frame images are acquired along a time series in the past direction.

ステップＳ５４において、ＣＰＵは、ステップＳ５２で取得した隣接フレーム画像に、ステップＳ５０で推定した基準フレーム画像３０Ａの物体領域３２に含まれる物体８と同じ物体８が含まれるか否かを判定する物体追跡を行う。 In step S54, the CPU determines whether or not the adjacent frame image acquired in step S52 includes the same object 8 as the object 8 included in the object area 32 of the reference frame image 30A estimated in step S50. I do.

隣接フレーム画像における物体追跡には、例えば非特許文献５に示すような公知の手法を用いることができる。 For tracking an object in an adjacent frame image, for example, a known method as shown in Non-Patent Document 5 can be used.

［非特許文献５］：Michael Isard, Andrew Blake. "CONDENSATION-Conditional Density Propagation for Visual Tracking" International Journal of Computer Vision, Vol29, No.1, pp.5-28, 1998 [Non-Patent Document 5]: Michael Isard, Andrew Blake. "CONDENSATION-Conditional Density Propagation for Visual Tracking" International Journal of Computer Vision, Vol29, No.1, pp.5-28, 1998

非特許文献５には、尤度を持つ多数の仮説群により離散的な確率密度として追跡対象（この場合、物体８）を表現し、状態遷移モデルを用いて物体８の動きの変動や観測のノイズに対して頑健な追跡を実現する手法が開示されている。したがって、隣接フレーム画像において、物体８が存在する確率である尤度が予め定めた閾値以上となった場合には、隣接フレーム画像に物体８が含まれると判定され、尤度が予め定めた閾値未満となった場合には、隣接フレーム画像には物体８が含まれないと判定される。 In Non-Patent Document 5, the tracked object (in this case, the object 8) is expressed as a discrete probability density by a large number of hypothesis groups having likelihoods, and the movement fluctuation and observation of the object 8 are performed using the state transition model. Techniques for achieving robust tracking of noise have been disclosed. Therefore, when the likelihood of the existence of the object 8 in the adjacent frame image is equal to or higher than a predetermined threshold value, it is determined that the object 8 is included in the adjacent frame image, and the likelihood is a predetermined threshold value. If it is less than, it is determined that the object 8 is not included in the adjacent frame image.

ＣＰＵは、隣接フレーム画像に物体８が含まれる場合、ステップＳ５０と同じ手法を用いて、隣接フレーム画像に対して物体８が含まれる物体領域３２を推定する。 When the object 8 is included in the adjacent frame image, the CPU estimates the object area 32 including the object 8 with respect to the adjacent frame image by using the same method as in step S50.

ステップＳ５６において、ＣＰＵは、ステップＳ５２で取得した隣接フレーム画像に物体領域３２が含まれるか否かを判定する。隣接フレーム画像に物体領域３２が含まれる場合には、当該隣接フレーム画像に隣接する過去方向の時系列に沿った隣接フレーム画像にも物体８が含まれる可能性があるため、ステップＳ５２に移行する。 In step S56, the CPU determines whether or not the object region 32 is included in the adjacent frame image acquired in step S52. When the object region 32 is included in the adjacent frame image, the object 8 may also be included in the adjacent frame image along the time series in the past direction adjacent to the adjacent frame image, so the process proceeds to step S52. ..

そして、ステップＳ５２において、直前に取得した隣接フレーム画像が含まれる撮影画像から、直前に取得した隣接フレーム画像と隣接する過去方向の時系列に沿った隣接フレーム画像を１枚取得する。以降、ステップＳ５６の判定処理において、ステップＳ５２で取得した隣接フレーム画像に物体領域３２が含まれていないと判定されるまで、ステップＳ５２〜Ｓ５６を繰り返し実行することで、基準フレーム画像３０Ａから過去方向に遡った範囲の隣接フレーム画像のうち、物体８が含まれている隣接フレーム画像と当該隣接フレーム画像における物体領域３２を取得することができる。 Then, in step S52, one adjacent frame image along the time series in the past direction adjacent to the adjacent frame image acquired immediately before is acquired from the captured image including the adjacent frame image acquired immediately before. After that, in the determination process of step S56, steps S52 to S56 are repeatedly executed until it is determined that the adjacent frame image acquired in step S52 does not include the object region 32, so that the reference frame image 30A is in the past direction. Among the adjacent frame images in the range traced back to, the adjacent frame image including the object 8 and the object area 32 in the adjacent frame image can be acquired.

一方、ステップＳ５６の判定処理が否定判定の場合、すなわち、ステップＳ５２で取得した隣接フレーム画像に物体領域３２が含まれていないと判定された場合には、ステップＳ５８に移行する。 On the other hand, if the determination process in step S56 is a negative determination, that is, if it is determined that the adjacent frame image acquired in step S52 does not include the object region 32, the process proceeds to step S58.

ステップＳ５８において、ＣＰＵは、物体領域３２が推定された基準フレーム画像３０Ａのうち、未選択の基準フレーム画像３０Ａが存在するか否かを判定する。未選択の基準フレーム画像３０Ａが存在する場合、ステップＳ５２に移行する。 In step S58, the CPU determines whether or not there is an unselected reference frame image 30A among the reference frame images 30A in which the object region 32 is estimated. If there is an unselected reference frame image 30A, the process proceeds to step S52.

そして、ステップＳ５２において、未選択の基準フレーム画像３０Ａの中から基準フレーム画像を１枚選択し、選択した基準フレーム画像３０Ａが含まれる撮影画像から、時系列に従って基準フレーム画像３０Ａに隣接する隣接フレーム画像を１枚取得する。以降、ステップＳ５８の判定処理で未選択の基準フレーム画像３０Ａが存在しないと判定されるまでステップＳ５２〜Ｓ５８を繰り返し実行することで、複数の撮影装置６で撮影された各々の撮影画像から、物体８が含まれている隣接フレーム画像と当該隣接フレーム画像における物体領域３２を取得することができる。 Then, in step S52, one reference frame image is selected from the unselected reference frame images 30A, and adjacent frames adjacent to the reference frame image 30A are selected in chronological order from the captured images including the selected reference frame image 30A. Get one image. After that, by repeatedly executing steps S52 to S58 until it is determined that the unselected reference frame image 30A does not exist in the determination process of step S58, an object is obtained from each of the captured images captured by the plurality of imaging devices 6. It is possible to acquire the adjacent frame image including 8 and the object area 32 in the adjacent frame image.

ステップＳ６０Ａにおいて、ＣＰＵは、ステップＳ５０で物体領域３２の推定に用いられた基準フレーム画像３０Ａに対して、ステップＳ２０で特定された物体領域３２に含まれる物体８の種別と、基準フレーム画像３０Ａにおける物体領域３２の位置情報とを、基準フレーム画像３０Ａ毎に対応付ける。また、ＣＰＵは、物体領域３２が含まれる隣接フレーム画像の各々に対して、ステップＳ２０で特定された物体領域３２に含まれる物体８の種別と、各隣接フレーム画像における物体領域３２の位置情報とを、隣接フレーム画像毎に対応付ける。 In step S60A, the CPU determines the type of the object 8 included in the object area 32 specified in step S20 and the reference frame image 30A with respect to the reference frame image 30A used for estimating the object area 32 in step S50. The position information of the object area 32 is associated with each reference frame image 30A. Further, the CPU determines the type of the object 8 included in the object area 32 specified in step S20 and the position information of the object area 32 in each adjacent frame image for each of the adjacent frame images including the object area 32. Is associated with each adjacent frame image.

そして、ＣＰＵは、物体８の種別及び物体領域３２の位置情報が対応付けられた基準フレーム画像３０Ａ及び隣接フレーム画像の各々を訓練画像として、例えば画像収集装置２Ａに実装される不揮発性メモリに記憶する。 Then, the CPU stores each of the reference frame image 30A and the adjacent frame image associated with the type of the object 8 and the position information of the object area 32 as training images in, for example, a non-volatile memory mounted on the image collecting device 2A. do.

以上により、図６に示した画像収集ルーチンを終了する。 As a result, the image collection routine shown in FIG. 6 is completed.

図６に示した画像収集ルーチンでは、基準フレーム画像３０Ａから過去方向に遡った範囲の隣接フレーム画像を用いて訓練画像を収集するか、基準フレーム画像３０Ａから未来方向に向かった範囲の隣接フレーム画像を用いて訓練画像を収集するかの何れかの例を示したが、基準フレーム画像３０Ａから過去方向に遡った範囲の隣接フレーム画像を用いて訓練画像を収集した後、更に、基準フレーム画像３０Ａから未来方向に向かった範囲の隣接フレーム画像を用いて訓練画像を収集するようにしてもよい。具体的には、図６のステップＳ５８の処理が終了した後、それまで隣接フレーム画像を取得した時系列の方向とは逆の方向に隣接フレーム画像を取得するようにして、ステップＳ５２〜Ｓ５８の処理を再度実行すればよい。 In the image collection routine shown in FIG. 6, the training image is collected using the adjacent frame image in the range retroactive from the reference frame image 30A, or the adjacent frame image in the range from the reference frame image 30A toward the future direction. Although any example of collecting the training image using the above is shown, after collecting the training image using the adjacent frame image in the range retroactive from the reference frame image 30A, the reference frame image 30A is further shown. Training images may be collected using adjacent frame images in the range from to the future. Specifically, after the processing of step S58 of FIG. 6 is completed, the adjacent frame image is acquired in the direction opposite to the time-series direction in which the adjacent frame image was acquired so far, and the adjacent frame images are acquired in steps S52 to S58. The process may be executed again.

この場合、基準フレーム画像３０Ａを基準として未来方向又は過去方向の何れか１方向に沿った隣接フレーム画像から訓練画像を収集する場合と比較して、１つの撮影画像からより多くの訓練画像を効率よく収集することができる。 In this case, more training images can be efficiently obtained from one captured image as compared with the case of collecting training images from adjacent frame images along either one of the future direction and the past direction with reference to the reference frame image 30A. Can be collected well.

図７は、基準フレーム画像３０Ａを基準として未来方向及び過去方向の両方向に沿った隣接フレーム画像から訓練画像を収集する例を示す図である。図７では、基準フレーム画像３０Ａから過去方向の時系列に沿った隣接フレーム画像を「隣接フレーム画像３０Ｂ」で表し、基準フレーム画像３０Ａから未来方向の時系列に沿った隣接フレーム画像を「隣接フレーム画像３０Ｃ」で表している。また、隣接フレーム画像３０Ｂ、３０Ｃに続く符合「−ｎ（ｎ＝１〜３）」は、基準フレーム画像３０Ａからの時系列順を示しており、“ｎ”の値が大きくなるに従って、基準フレーム画像３０Ａからの時間間隔が大きいことを示している。 FIG. 7 is a diagram showing an example of collecting training images from adjacent frame images along both the future direction and the past direction with reference to the reference frame image 30A. In FIG. 7, the adjacent frame image along the time series in the past direction from the reference frame image 30A is represented by the “adjacent frame image 30B”, and the adjacent frame image along the time series in the future direction from the reference frame image 30A is represented by the “adjacent frame”. It is represented by "Image 30C". Further, the sign "-n (n = 1 to 3)" following the adjacent frame images 30B and 30C indicates the time series order from the reference frame image 30A, and the reference frame increases as the value of "n" increases. It shows that the time interval from the image 30A is large.

すなわち、隣接フレーム画像３０Ｂ−３→隣接フレーム画像３０Ｂ−２→隣接フレーム画像３０Ｂ−１→基準フレーム画像３０Ａ→隣接フレーム画像３０Ｃ−１→隣接フレーム画像３０Ｃ−２→隣接フレーム画像３０Ｃ−３の順に各画像が撮影されている。 That is, the order of adjacent frame image 30B-3 → adjacent frame image 30B-2 → adjacent frame image 30B-1 → reference frame image 30A → adjacent frame image 30C-1 → adjacent frame image 30C-2 → adjacent frame image 30C-3. Each image has been taken.

図７の例では、隣接フレーム画像３０Ｂ−３及び隣接フレーム画像３０Ｃ−３に物体領域３２が存在しないため、隣接フレーム画像３０Ｂ−２、隣接フレーム画像３０Ｂ−１、基準フレーム画像３０Ａ、隣接フレーム画像３０Ｃ−１、及び隣接フレーム画像３０Ｃ−２が訓練画像として収集されることになる。 In the example of FIG. 7, since the object region 32 does not exist in the adjacent frame image 30B-3 and the adjacent frame image 30C-3, the adjacent frame image 30B-2, the adjacent frame image 30B-1, the reference frame image 30A, and the adjacent frame image 30C-1 and the adjacent frame image 30C-2 will be collected as training images.

このように本実施形態に係る画像収集装置２Ａによれば、基準フレーム画像３０Ａだけでなく、基準フレーム画像３０Ａの未来方向及び過去方向の少なくとも一方の範囲で撮影された、物体８を含む隣接フレーム画像も訓練画像として収集することができる。したがって、基準フレーム画像３０Ａだけを訓練画像として収集する場合と比較して、１つの撮影画像からより多くの訓練画像を効率よく収集することができる。 As described above, according to the image collecting device 2A according to the present embodiment, not only the reference frame image 30A but also the adjacent frame including the object 8 taken in at least one of the future direction and the past direction of the reference frame image 30A. Images can also be collected as training images. Therefore, as compared with the case where only the reference frame image 30A is collected as the training image, more training images can be efficiently collected from one captured image.

＜第３実施形態＞
第１実施形態に係る画像収集装置２は、基準フレーム画像３０Ａからユーザの手領域３１を抽出し、抽出した手領域３１の位置から物体領域３２を推定した。しかしながら、基準フレーム画像３０Ａにおける物体領域３２は他の方法を用いても推定することができる。 <Third Embodiment>
The image collecting device 2 according to the first embodiment extracts the user's hand region 31 from the reference frame image 30A, and estimates the object region 32 from the position of the extracted hand region 31. However, the object region 32 in the reference frame image 30A can also be estimated using other methods.

したがって、第３実施形態では、基準フレーム画像３０Ａに含まれるラベル特定装置４の位置から物体領域３２を推定する画像収集装置２Ｂについて説明する。 Therefore, in the third embodiment, the image collecting device 2B that estimates the object region 32 from the position of the label specifying device 4 included in the reference frame image 30A will be described.

なお、画像収集装置２Ｂを含む画像収集システム１の構成は、図１と同じ構成が用いられる。 The configuration of the image acquisition system 1 including the image acquisition device 2B is the same as that shown in FIG.

＜第３実施形態に係る画像収集装置の構成＞
図８は、画像収集装置２Ｂの構成例を示す図である。 <Structure of the image collecting device according to the third embodiment>
FIG. 8 is a diagram showing a configuration example of the image collecting device 2B.

図８に示す画像収集装置２Ｂの構成が、図２に示した画像収集装置２の構成と異なる点は、ラベル特定装置推定部２３が追加され、物体領域推定部２４が物体領域推定部２４Ｂに置き換えられた点である。 The configuration of the image collection device 2B shown in FIG. 8 is different from the configuration of the image collection device 2 shown in FIG. 2 in that the label identification device estimation unit 23 is added and the object area estimation unit 24 is added to the object area estimation unit 24B. It is a replaced point.

ラベル特定部２２がラベル特定装置推定部２３に対して処理開始命令を通知すると、ラベル特定装置推定部２３は、撮影装置６で撮影された撮影画像の各々から、基準フレーム画像３０Ａを取得する。そして、ラベル特定装置推定部２３は、取得した各々の基準フレーム画像３０Ａから、ラベル特定装置４が含まれる装置領域３４を推定する。 When the label specifying unit 22 notifies the label specifying device estimation unit 23 of the processing start command, the label specifying device estimation unit 23 acquires the reference frame image 30A from each of the captured images captured by the photographing device 6. Then, the label specifying device estimation unit 23 estimates the device area 34 including the label specifying device 4 from each of the acquired reference frame images 30A.

そして、ラベル特定装置推定部２３は、推定した基準フレーム画像３０Ａ毎の装置領域３４の位置情報を、物体領域推定部２４Ｂに通知する。 Then, the label specifying device estimation unit 23 notifies the object area estimation unit 24B of the position information of the device area 34 for each estimated reference frame image 30A.

物体領域推定部２４Ｂは、ラベル特定装置推定部２３から基準フレーム画像３０Ａ毎の装置領域３４の位置情報を受け付けると、装置領域３４の位置情報から物体８が含まれる物体領域３２を基準フレーム画像３０Ａ毎に推定する。 When the object area estimation unit 24B receives the position information of the device area 34 for each reference frame image 30A from the label specifying device estimation unit 23, the object area 32 including the object 8 is selected from the position information of the device area 34 as the reference frame image 30A. Estimate for each.

そして、物体領域推定部２４Ｂは、物体領域３２の推定に用いた各々の基準フレーム画像３０Ａ、及び各々の基準フレーム画像３０Ａから推定した物体８の物体領域３２の位置情報を、訓練画像登録部２６に通知する。 Then, the object area estimation unit 24B receives the position information of each reference frame image 30A used for estimating the object area 32 and the object area 32 of the object 8 estimated from each reference frame image 30A in the training image registration unit 26. Notify to.

＜第３実施形態に係る画像収集装置の作用＞
次に、本実施形態に係る画像収集装置２Ｂの作用について説明する。入力部１０において、物体８のラベル情報を受け付けると、画像収集装置２Ｂは、図９に示す画像収集ルーチンを実行する。 <Operation of the image collecting device according to the third embodiment>
Next, the operation of the image collecting device 2B according to the present embodiment will be described. When the input unit 10 receives the label information of the object 8, the image collecting device 2B executes the image collecting routine shown in FIG.

図９に示す画像収集ルーチンが図３に示した画像収集装置２の画像収集ルーチンと異なる点は、ステップＳ４２が新たに追加され、ステップＳ５０がステップＳ５０Ａに置き換えられた点である。 The difference between the image collection routine shown in FIG. 9 and the image collection routine of the image collection device 2 shown in FIG. 3 is that step S42 is newly added and step S50 is replaced with step S50A.

ステップＳ４０で、各々の撮影画像からそれぞれ基準フレーム画像３０Ａが取得されるとステップＳ４２が実行される。 In step S40, when the reference frame image 30A is acquired from each captured image, step S42 is executed.

ステップＳ４２において、ＣＰＵは、ステップＳ４０で取得された基準フレーム画像３０Ａの各々から、ラベル特定装置４が含まれる装置領域３４を推定する。この際、装置領域３４の推定手法には、公知の手法を用いることができる。 In step S42, the CPU estimates the device area 34 including the label specifying device 4 from each of the reference frame images 30A acquired in step S40. At this time, a known method can be used as the estimation method of the device region 34.

例えばラベル特定装置４を様々な方向及び距離で撮影したラベル特定装置４の訓練画像を予め準備しておき、特許文献１に示した物体識別技術を用いて、各々の基準フレーム画像３０Ａからラベル特定装置４を含む装置領域３４を推定してもよい。 For example, training images of the label specifying device 4 obtained by photographing the label specifying device 4 in various directions and distances are prepared in advance, and the label is specified from each reference frame image 30A by using the object identification technique shown in Patent Document 1. The device area 34 including the device 4 may be estimated.

ステップＳ５０Ａにおいて、ＣＰＵは、ステップＳ４２で推定したそれぞれの基準フレーム画像３０Ａにおける装置領域３４の位置情報に基づいて、ステップＳ２０で種別が特定された物体８が含まれる物体領域３２を推定する。装置領域３４から物体領域３２を推定する際には、公知の推定手法を用いることができる。 In step S50A, the CPU estimates the object area 32 including the object 8 whose type is specified in step S20 based on the position information of the device area 34 in each reference frame image 30A estimated in step S42. When estimating the object area 32 from the device area 34, a known estimation method can be used.

例えば物体８のラベル情報を取得する場合、ユーザはラベル特定装置４を物体８に近づける動作を行うため、ラベル特定装置４の近傍に物体８が存在する場合が多い。 For example, when acquiring the label information of the object 8, the user performs an operation of bringing the label specifying device 4 closer to the object 8, so that the object 8 is often present in the vicinity of the label specifying device 4.

したがって、ユーザがラベル特定装置４を物体８に近づけてラベル情報を取得しようとした場合のラベル特定装置４に対する物体８の相対位置、及び物体８の平均的な大きさ（すなわち、物体領域３２の大きさ）を、予め実験によって各撮影装置６における基準フレーム画像３０Ａ毎に統計的に求めておく。そして、図１０に示すように、装置領域３４を予め求めておいた物体８の相対位置に基づいて移動させると共に、予め求めておいた物体領域３２の大きさまで装置領域３４を拡大又は縮小することで、各撮影装置６における基準フレーム画像３０Ａ毎に物体領域３２を推定することができる。 Therefore, the relative position of the object 8 with respect to the label specifying device 4 when the user tries to acquire the label information by bringing the label specifying device 4 close to the object 8, and the average size of the object 8 (that is, the object area 32). The size) is statistically obtained in advance for each reference frame image 30A in each photographing device 6 by an experiment. Then, as shown in FIG. 10, the device area 34 is moved based on the relative position of the object 8 obtained in advance, and the device area 34 is enlarged or reduced to the size of the object area 32 obtained in advance. Therefore, the object region 32 can be estimated for each reference frame image 30A in each photographing device 6.

なお、非特許文献６に記載された手法を用いて、装置領域３４を移動、拡大、及び縮小することで、物体領域３２を推定するようにしてもよい。 The object area 32 may be estimated by moving, enlarging, and reducing the device area 34 by using the method described in Non-Patent Document 6.

［非特許文献６］：Carstern Rother, Vladimir Kolmogorov, Andrew Blake. "GrabCut-Interactive Foreground Extraction using Iterated Graph Cuts" in SIGGRAPH, 2014 [Non-Patent Document 6]: Carstern Rother, Vladimir Kolmogorov, Andrew Blake. "GrabCut-Interactive Foreground Extraction using Iterated Graph Cuts" in SIGGRAPH, 2014

後は、既に説明したステップＳ６０において、ステップＳ５０Ａで物体領域３２の推定に用いられた基準フレーム画像３０Ａに対して、ステップＳ２０で特定された物体領域３２に含まれる物体８の種別と、基準フレーム画像３０Ａにおける物体領域３２の位置情報とを、基準フレーム画像３０Ａ毎に対応付けて、訓練画像とすればよい After that, in step S60 already described, the type of the object 8 included in the object area 32 specified in step S20 and the reference frame with respect to the reference frame image 30A used for estimating the object area 32 in step S50A. The position information of the object area 32 in the image 30A may be associated with each reference frame image 30A to obtain a training image.

このように本実施形態に係る画像収集装置２Ｂによれば、基準フレーム画像３０Ａからユーザの手領域３１を推定する代わりに、ラベル特定装置４が含まれる装置領域３４を推定し、推定した装置領域３４に基づいて物体８が含まれる物体領域３２を推定する。 As described above, according to the image collecting device 2B according to the present embodiment, instead of estimating the user's hand area 31 from the reference frame image 30A, the device area 34 including the label specifying device 4 is estimated and the estimated device area is estimated. The object region 32 including the object 8 is estimated based on 34.

手の形状は様々に変化するがラベル特定装置４の形状は不変であるため、手領域３１の推定精度より装置領域３４の推定精度の方が高くなる場合がある。したがって、装置領域３４の推定精度の向上に伴って、物体領域３２の推定精度を向上させることができる。 Since the shape of the hand changes variously but the shape of the label specifying device 4 does not change, the estimation accuracy of the device area 34 may be higher than the estimation accuracy of the hand area 31. Therefore, as the estimation accuracy of the device area 34 improves, the estimation accuracy of the object area 32 can be improved.

なお、ここでは基準フレーム画像３０Ａから装置領域３４を推定した上で、物体領域３２を推定する例について説明したが、隣接フレーム画像から物体領域３２を推定する場合にも適用できることは言うまでもない。 Here, an example of estimating the object area 32 after estimating the device area 34 from the reference frame image 30A has been described, but it goes without saying that it can also be applied to the case of estimating the object area 32 from the adjacent frame image.

本発明は、上述した各実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、上述した各実施形態では、画像収集ルーチンをソフトウエアで実現する例について示したが、ＡＳＩＣ(Application Specific Integrated Circuit)等のハードウエアを用いて実現してもよい。 For example, in each of the above-described embodiments, an example in which the image acquisition routine is realized by software is shown, but it may be realized by using hardware such as an ASIC (Application Specific Integrated Circuit).

また、上述した各実施形態では、画像収集プログラムがＲＯＭにインストールされている形態について説明したが、これに限定されるものではない。本発明に係る画像収集プログラムは、コンピュータ読み取り可能な記憶媒体に記録された形態で提供することも可能である。例えば、本発明に係る画像収集プログラムを、ＣＤ(Compact Disc)−ＲＯＭ、又はＤＶＤ(Digital Versatile Disc)−ＲＯＭ等の光ディスクに記録した形態で提供してもよい。また、本発明に係る画像収集プログラムを、ＵＳＢ(Universal Serial Bus)メモリ及びフラッシュメモリ等の半導体メモリに記録した形態で提供してもよい。更に、画像収集装置がインターネット等の通信回線に接続される場合、通信回線に接続されたサーバ等の端末装置から、本発明に係る画像収集プログラムを取得するようにしてもよい。 Further, in each of the above-described embodiments, the mode in which the image acquisition program is installed in the ROM has been described, but the present invention is not limited to this. The image acquisition program according to the present invention can also be provided in a form recorded on a computer-readable storage medium. For example, the image acquisition program according to the present invention may be provided in the form of being recorded on an optical disk such as a CD (Compact Disc) -ROM or a DVD (Digital Versatile Disc) -ROM. Further, the image acquisition program according to the present invention may be provided in a form of being recorded in a semiconductor memory such as a USB (Universal Serial Bus) memory and a flash memory. Further, when the image collecting device is connected to a communication line such as the Internet, the image collecting program according to the present invention may be acquired from a terminal device such as a server connected to the communication line.

１・・・画像収集システム
２（２Ａ、２Ｂ）・・・画像収集装置
４・・・ラベル特定装置
６・・・撮影装置
８・・・物体
１０・・・入力部
２０・・・演算部
２２・・・ラベル特定部
２３・・・ラベル特定装置推定部
２４（２４Ｂ）・・・物体領域推定部
２５・・・物体追跡部
２６（２６Ａ）・・・訓練画像登録部
２８・・・訓練画像記憶部
３０Ａ・・・基準フレーム画像
３０Ｂ（３０Ｃ）・・・隣接フレーム画像
３１・・・手領域
３２・・・物体領域
３４・・・装置領域 1 ... Image collection system 2 (2A, 2B) ... Image collection device 4 ... Label identification device 6 ... Imaging device 8 ... Object 10 ... Input unit 20 ... Calculation unit 22 ... Label identification unit 23 ... Label identification device estimation unit 24 (24B) ... Object area estimation unit 25 ... Object tracking unit 26 (26A) ... Training image registration unit 28 ... Training image Storage unit 30A ... Reference frame image 30B (30C) ... Adjacent frame image 31 ... Hand area 32 ... Object area 34 ... Device area

Claims

A label specifying unit that acquires label information of the object from a label specifying device that acquires label information indicating the type of the object given to the object by approaching the object, and specifies the type of the object.
The object used to identify the type of the object in the label specifying unit from each of the captured images taken by at least one photographing device that photographs the acquisition status of the label information of the object using the label specifying device. An object area estimation unit that acquires a reference frame image at the moment when the label information of is acquired and estimates an object area including the object from the acquired reference frame image.
The type of the object specified by the label specifying unit, the reference frame image used for estimating the object area of the object by the object area estimation unit, and the reference frame image estimated by the object area estimation unit. A training image registration unit that associates the object areas of objects with each other and registers them in the storage device as training images.
A label specifying device estimation unit that estimates a device area including the label specifying device from each of the reference frame images of the captured image taken by the shooting device is provided.
The object area estimation unit is an image collecting device that estimates an object area including the object from the reference frame image based on the position of the device area of the label identification device estimated by the label identification device estimation unit.

From each of the captured images, adjacent frame images, which are images captured in at least one range before and after the reference frame image acquired by the object area estimation unit, were acquired and acquired in chronological order. Further provided is an object tracking unit that tracks the object included in the reference frame image from each of the adjacent frame images and estimates an object region including the object from each of the adjacent frame images including the object.
The training image registration unit includes the type of the object specified by the label specifying unit, the adjacent frame image used by the object tracking unit to estimate the object area of the object, and the estimated object tracking unit. The image collecting device according to claim 1, wherein the object regions of the objects in the adjacent frame images are associated with each other and registered in the storage device as a training image.

The image collecting device according to claim 1 or 2 , wherein the label specifying unit acquires label information of the object from the label specifying device that reads the label information of the object using light or radio waves.

A program for causing a computer to function as each part of the image collecting device according to any one of claims 1 to 3.

A step of acquiring label information of the object from a label specifying device for acquiring label information indicating the type of the object given to the object by approaching the object and specifying the type of the object.
The label information of the object used for identifying the type of the object is acquired from each of the captured images taken by at least one photographing device for photographing the acquisition status of the label information of the object using the label specifying device. A step of acquiring a reference frame image at the moment when the object is created and estimating an object region including the object from the acquired reference frame image.
A step of associating the specified type of the object, the reference frame image used for estimating the object area of the object, and the object area of the object in the reference frame image, and registering them in the storage device as a training image. When,
A step of estimating the device area including the label specifying device from each of the reference frame images of the captured image captured by the photographing device is included.
In the step of estimating the object area including the object from the reference frame image, the image that estimates the object area including the object from the reference frame image based on the estimated position of the device area of the label specifying device. Collection method.

Adjacent frame images, which are images taken in at least one range before and after the reference frame image, are acquired from each of the captured images in chronological order, and the reference is obtained from each of the acquired adjacent frame images. A step of tracking the object included in the frame image and estimating an object region containing the object from each of the adjacent frame images including the object.
The identified type of the object, the adjacent frame image used for estimating the object area of the object, and the object area of the object in the adjacent frame image are associated with each other and registered in the storage device as a training image. Steps and
5. The image collecting method according to claim 5, further comprising.