JP2013012105A

JP2013012105A - Object recognition support device, program, and method

Info

Publication number: JP2013012105A
Application number: JP2011145151A
Authority: JP
Inventors: Shugo Nakamura; 秋吾中村; Masaki Ishihara; 正樹石原; Takayuki Baba; 孝之馬場; Masahiko Sugimura; 昌彦杉村; Susumu Endo; 進遠藤; Yusuke Uehara; 祐介上原; Daiki Masumoto; 大器増本; Shigemi Osada; 茂美長田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-06-30
Filing date: 2011-06-30
Publication date: 2013-01-17
Anticipated expiration: 2031-06-30
Also published as: JP5664478B2

Abstract

PROBLEM TO BE SOLVED: To provide an object recognition support device for determining image amount adequate for object recognition.SOLUTION: An object recognition support device 1 includes: a first feature extraction unit 11 for extracting a feature of an acquired image; a first feature recording unit 12 for recording the feature; a dispersion calculation unit 13 for calculating dispersion of a feature group recorded in the first feature recording unit 12; a dispersion recording unit 14 for recording the dispersion; and a convergence determination unit 15 for determining whether the dispersion is converged on the basis of a progress of the dispersion recorded in the dispersion recording unit 14 and, when the dispersion is converged, making a notification of the acquisition completion of the image. The object recognition support device 1 supports an object recognition device 3 to perform object recognition with the adequate number of images.

Description

本発明は，物体認識を支援する技術であって，物体認識させる画像を取得する際に，処理に妥当な枚数の画像を取得できるようにする技術に関する。 The present invention relates to a technology that supports object recognition, and relates to a technology that makes it possible to acquire an appropriate number of images for processing when acquiring an image for object recognition.

近年，画像に含まれている物体の名称を推定する物体認識技術は，様々な産業で利用が期待されている。 In recent years, object recognition technology for estimating the name of an object included in an image is expected to be used in various industries.

利用分野の一例としては，健康指導サービス分野が挙げられる。健康指導サービスとは，偏った食生活によって引き起こされる生活習慣病を予防するため，健康指導に関する情報をユーザに提供するサービスである。現在，健康指導サービスでは，ユーザが撮影および送信した食事の画像をもとに，管理栄養士などの専門家がユーザの食事内容を確認してコメントを記述することによって，ユーザに健康指導に関する情報を提供している。 An example of a field of use is the health guidance service field. The health guidance service is a service that provides users with information on health guidance in order to prevent lifestyle-related diseases caused by uneven eating habits. Currently, in health guidance services, experts such as registered dietitians check the contents of meals and write comments based on meal images taken and transmitted by users, so that users can receive information on health guidance. providing.

しかし，画像を見て食事内容を確認するという専門家の作業を介しているために，サービス提供費用が高くなり，利用できるユーザが限られてしまっているという問題がある。 However, there is a problem that the cost of providing the service is high and the number of users who can use the service is limited because the work of an expert who checks the contents of the meal by viewing the image is performed.

より多くのユーザが健康指導サービスを利用できるように費用を低く抑えるためには，サービスの自動化が不可欠である。 Service automation is indispensable in order to keep costs low so that more users can use health guidance services.

そのため，物体認識を利用する他のサービス提供分野でも同様であるが，上記の健康指導サービスでは，物体認識技術を利用して，食事の画像から食事名称を自動的に認識し，食事名称をもと食生活の自動分析を行うことで，人（専門家）の作業負担の低減およびサービス提供の効率化が期待されている。 Therefore, the same applies to other service provision fields that use object recognition. However, the above health guidance service uses object recognition technology to automatically recognize meal names from meal images, and to provide meal names. In addition, automatic analysis of eating habits is expected to reduce the work burden on humans (experts) and improve the efficiency of service provision.

物体認識処理は，画像の特徴情報を抽出する特徴抽出段階と，抽出した特徴情報をもとに画像に含まれる物体の名称を推定する物体認識段階とを有する。物体認識装置に１枚の静止画像を入力すると，画像中に含まれる物体の名称が出力される。 The object recognition process includes a feature extraction stage for extracting feature information of an image and an object recognition stage for estimating the name of an object included in the image based on the extracted feature information. When one still image is input to the object recognition device, the name of the object included in the image is output.

図１２は，ある物体（ラーメン）をある方向から撮影した画像例を示す図，および，図１３は，図１２の物体（ラーメン）を別の方向から撮影した画像例を示す図である。 12 is a diagram showing an example of an image obtained by photographing a certain object (ramen) from a certain direction, and FIG. 13 is a diagram showing an example of an image obtained by photographing the object (ramen) of FIG. 12 from another direction.

図１２に示す画像が物体認識装置に入力されると，物体認識装置は，画像から抽出された特徴をもとに画像の物体を「ラーメン」と認識し，その物体名称を出力する。 When the image shown in FIG. 12 is input to the object recognition apparatus, the object recognition apparatus recognizes the object of the image as “ramen” based on the features extracted from the image, and outputs the object name.

しかし，物体認識装置は，図１３の画像が入力されると，画像の物体を別のもの「チャーハン」と誤推定してしまい，正しい物体「ラーメン」と認識することができない場合がある。 However, when the image of FIG. 13 is input, the object recognition apparatus may erroneously estimate the object of the image as another “fried rice” and may not be able to recognize the correct object “ramen”.

このように，１枚の画像だけから得られる情報に依存して物体認識を行う場合には，入力画像の撮影条件によっては適切に物体認識を行うために必要な特徴が抽出されず，誤った物体として推定してしまうという問題がある。この問題を解決するために，同一物体に対して撮影向きなどの条件を変化させながら複数回の撮影を行い，それらの画像群による推定結果を統合的に用いる方法がある。 In this way, when object recognition is performed depending on information obtained from only one image, the features necessary for proper object recognition are not extracted depending on the shooting conditions of the input image. There is a problem that it is estimated as an object. In order to solve this problem, there is a method in which the same object is shot a plurality of times while changing conditions such as the shooting direction and the estimation results of these image groups are used in an integrated manner.

複数の画像群を使用する物体認識装置は，特徴抽出段階，物体認識段階に，さらに認識統合段階を有する。かかる物体認識装置は，同一物体に対して撮影向きなどの条件を変えながら撮影した複数の画像が入力されると，特徴抽出段階では，各画像について特徴を抽出し，物体認識段階では，各画像についてその特徴をもとに物体名称の推定を行う。そして，物体認識装置は，認識統合段階において，各画像に対する認識結果（物体名称）を統合的に判断し，１つの物体名称を決定する。認識統合段階では，例えば，物体認識段階で出力された物体名称から，多数決などによって正解である可能性が高い物体名称が１つ決定される。 An object recognition apparatus using a plurality of image groups further includes a recognition integration stage in a feature extraction stage and an object recognition stage. Such an object recognition device extracts features for each image in the feature extraction stage when a plurality of images taken while changing conditions such as the shooting direction are input to the same object, and each image in the object recognition stage. The object name is estimated based on the characteristics of. Then, in the recognition integration stage, the object recognition device integrally determines the recognition result (object name) for each image and determines one object name. In the recognition integration stage, for example, one object name that is highly likely to be correct is determined from the object names output in the object recognition stage by a majority vote or the like.

したがって，複数画像を用いる物体認識装置は，様々な撮影条件の画像を統合的に用いて物体認識を行うため，誤認識しやすい画像が含まれていても物体名称を正しく認識することができる。 Therefore, the object recognition apparatus using a plurality of images performs object recognition by using images under various shooting conditions in an integrated manner, and thus can correctly recognize an object name even if an image that is easily misrecognized is included.

ところで，物体認識処理において，処理量を増大させないために，特徴抽出段階で，抽出する特徴点を絞り込むことが知られている。 By the way, in the object recognition processing, it is known that the feature points to be extracted are narrowed down at the feature extraction stage in order not to increase the processing amount.

特開２０１０−２１８０５１号公報JP 2010-218051 A 米国特許第６，７１１，２９３号明細書US Pat. No. 6,711,293 国際公開第２００７／１２８４５２号International Publication No. 2007/128452

チュルカ他著（Csurka, G., Bray, C., Dance, C. and Fan, L.），「Bag of Keypointsを用いた画像分類（Visual Categorization with Bags of Keypoints）」，（チェコ），コンピュータビジョンにおける統計的学習に関するＥＥＣＶワークショップ会報（In Proc. of ECCV Workshop on Statistical Learning in Computer Vision），ＥＥＣＶ，２００４年，ｐ．５９−７４，Churka et al. (Csurka, G., Bray, C., Dance, C. and Fan, L.), "Visual Categorization with Bags of Keypoints", (Czech Republic), Computer Vision In Proc. Of ECCV Workshop on Statistical Learning in Computer Vision, EECV, 2004, p. 59-74, 上東太一他著（Taichi Joutou, Keiji Yanai），「マルチカーネル学習による食事画像認識システム（A Food Image Recognition System with Multiple Kernel Learning）」，（エジプト），第１６回ＩＥＥＥイメージプロセッシングに関する国際会議会報（Proceedings of the 16th IEEE International Conference on Image Processing），２００９年１１月７−１０日，ｐ．２８５−２８８Taichi Joutou et al. (Taichi Joutou, Keiji Yanai), “A Food Image Recognition System with Multiple Kernel Learning”, (Egypt), 16th International Conference on IEEE Image Processing ( Proceedings of the 16th IEEE International Conference on Image Processing), November 7-10, 2009, p. 285-288 デビット・ロウ著（David G. Lowe），「ローカルなスケール不変特徴にもとづく物体認識（Object Recognition from Local Scale-Invariant Features）」，コンピュータビジョンに関する国際学会（International Conference on Computer Vision），（ギリシア），１９９９年９月，ｐ．１１５０−１１５７David G. Lowe, “Object Recognition from Local Scale-Invariant Features”, International Conference on Computer Vision, (Greece), September 1999, p. 1150-1157 ヘルベルト・ベイ他著（Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool），「ＳＵＲＦ：高速化されたロバストな特徴（SURF: Speeded Up Robust Features）」，コンピュータビジョンとイメージ理解（Computer Vision and Image Understanding (CVIU)），（オランダ），２００８年，Ｖｏｌ．１１０，Ｎｏ．３，ｐ．３４６−３５９Herbert Bay et al. (Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool), "SURF: Speeded Up Robust Features", Computer Vision and Image Understanding Understanding (CVIU)), (Netherlands), 2008, Vol. 110, no. 3, p. 346-359

前述する従来方法における複数の入力画像を用いた物体認識装置は，入力した複数画像により，物体の撮影向きなどの撮影条件に対してロバストに物体認識を行うことができる。ところが，物体認識にどのくらいの枚数の画像を使用すれば認識において必要十分であるかは明確にされていない。そのため，物体認識処理では，入力された画像をすべて使用して処理を行っている。 The object recognition apparatus using a plurality of input images in the above-described conventional method can perform object recognition robustly with respect to shooting conditions such as the shooting direction of an object by using the plurality of input images. However, it has not been clarified how many images are used for object recognition to be necessary and sufficient for recognition. Therefore, in object recognition processing, processing is performed using all input images.

さらに，物体認識に用いられる入力画像がユーザによって撮影された画像である場合には，撮影を終えるタイミングの判断がユーザに任せられている。そのため，実際の利用状況においては，ユーザが対象を撮影した画像を物体認識装置に入力し，認識結果が間違っていれば，さらに撮影を行って，撮影した画像（入力画像）を追加するという作業を繰り返すことが行われている。 Furthermore, when the input image used for object recognition is an image taken by the user, it is left to the user to determine the timing for finishing the shooting. Therefore, in the actual usage situation, the user inputs an image obtained by photographing the target to the object recognition device, and if the recognition result is incorrect, the user performs further photographing and adds the photographed image (input image). Is being repeated.

物体認識に使用する画像は，不足していれば誤認識を生じる可能性があるが，過剰であれば，処理における計算負担が増大し非効率となる。そのため，必要十分な画像が物体認識に使用されることが望ましい。 If the image used for object recognition is insufficient, there is a possibility of erroneous recognition, but if it is excessive, the calculation burden in processing increases and becomes inefficient. Therefore, it is desirable that necessary and sufficient images are used for object recognition.

しかし，画像を撮影または入力操作するユーザは，どの程度の量の画像を撮影または入力すれば物体認識に十分であるかを認識することができず，どこで操作を終了すればよいかを知ることができないというストレスを感じる。さらに，ユーザは無駄な入力操作を続けることになるという問題がある。 However, a user who captures or inputs an image cannot recognize how much image is captured or input and is sufficient for object recognition, and knows where to end the operation. I feel the stress of not being able to. Furthermore, there is a problem that the user will continue to perform useless input operations.

また，物体認識装置にとっても，過剰な画像量による物体認識処理は，計算負荷が増え，無駄かつ非効率となるという問題がある。 In addition, the object recognition processing with an excessive amount of images also has a problem that the calculation load increases and is wasteful and inefficient for the object recognition apparatus.

以上のように，従来方法では，物体認識装置に対して，どのくらいの枚数の画像を取得すれば妥当であるのかが不明であるという課題があった。 As described above, in the conventional method, there is a problem that it is unclear how many images should be acquired for the object recognition apparatus.

本発明の目的は，前記の課題に鑑みてなされたものであり，物体認識を行うために妥当な数の画像が取得できたかどうかを自動的に判定し，画像数が充足したら取得完了を通知することにより物体認識処理を支援する装置を提供することである。 An object of the present invention is made in view of the above problems, and automatically determines whether or not a reasonable number of images have been acquired for performing object recognition, and notifies the completion of acquisition when the number of images is sufficient. An object of the present invention is to provide an apparatus that supports object recognition processing.

また，本発明の別の目的は，前記物体認識支援装置で実現される処理をコンピュータに実行させるためのプログラムを提供することである。さらに，本発明の別の目的は，前記物体認識支援装置で実現される処理過程をコンピュータが実行する処理方法を提供することである。 Another object of the present invention is to provide a program for causing a computer to execute processing realized by the object recognition support apparatus. Furthermore, another object of the present invention is to provide a processing method in which a computer executes a processing process realized by the object recognition support apparatus.

本発明の一態様として開示する物体認識支援装置は，物体認識処理を支援するために，１）画像入力手段によって取得された画像の特徴を抽出する第１特徴抽出部と，２）前記抽出された特徴を記録する第１特徴記録部と，３）前記第１特徴記録部に記録された特徴群の分散を算出する分散算出部と，４）前記算出された分散を記録する分散記録部と，５）前記分散記録部に記録された分散の推移をもとに該分散が収束されたかを判定し，該分散が収束している場合に，画像の取得完了を通知する収束判定部とを備える。 An object recognition support apparatus disclosed as one aspect of the present invention includes: 1) a first feature extraction unit that extracts features of an image acquired by an image input unit; and 2) the extracted information to support object recognition processing. A first feature recording unit that records the calculated feature, 3) a variance calculation unit that calculates the variance of the feature group recorded in the first feature recording unit, and 4) a dispersion recording unit that records the calculated variance. 5) determining whether the dispersion has converged based on the dispersion transition recorded in the dispersion recording section, and, if the dispersion has converged, a convergence determining section for notifying completion of image acquisition; Prepare.

開示される物体認識支援装置によれば，物体認識に妥当な画像量を取得している場合に取得完了を通知することにより，ユーザが不必要な画像入力の操作をすることを防ぐことができ，かつ，物体認識装置が無駄な画像に対する処理を行うことを防ぐことができる。 According to the disclosed object recognition support apparatus, it is possible to prevent the user from performing unnecessary image input operations by notifying completion of acquisition when an image amount appropriate for object recognition is acquired. In addition, it is possible to prevent the object recognition apparatus from processing a useless image.

一実施形態における健康指導サービスシステムの構成例を示す図である。It is a figure which shows the structural example of the health guidance service system in one Embodiment. 一実施形態における物体認識支援装置の構成例を示す図である。It is a figure which shows the structural example of the object recognition assistance apparatus in one Embodiment. 一実施形態における画像使用判定装置の構成例を示す図である。It is a figure which shows the structural example of the image use determination apparatus in one Embodiment. 一実施形態における物体認識装置の構成例を示す図である。It is a figure which shows the structural example of the object recognition apparatus in one Embodiment. 画像使用判定装置の処理フロー例を示す図である。It is a figure which shows the example of a processing flow of an image use determination apparatus. 画像使用判定装置によって採用された使用画像の例を示す図である。It is a figure which shows the example of the use image employ | adopted by the image use determination apparatus. 画像の特徴ベクトル群についての分散の例を示す図である。It is a figure which shows the example of dispersion | distribution about the feature vector group of an image. 物体認識支援装置１の処理フロー例を示す図である。It is a figure which shows the example of a processing flow of the object recognition assistance apparatus. ステップＳ２３の処理で実行される特徴ベクトル群の分散の算出例を示す図である。It is a figure which shows the example of calculation of dispersion | distribution of the feature vector group performed by the process of step S23. 一実施形態における健康指導サービスシステムの処理の流れを示す図である。It is a figure which shows the flow of a process of the health guidance service system in one Embodiment. 物体認識による物体名称の推定例を示す図である。It is a figure which shows the example of an estimation of the object name by object recognition. 物体をある方向から撮影した画像例を示す図である。It is a figure which shows the example of an image which image | photographed the object from a certain direction. 図１２の物体を別の方向から撮影した画像例を示す図である。It is a figure which shows the example of an image which image | photographed the object of FIG. 12 from another direction.

以下，本発明の一態様として開示する物体認識支援装置について説明する。 Hereinafter, an object recognition support apparatus disclosed as one aspect of the present invention will be described.

開示する物体認識支援装置の一実施例として，物体認識支援装置および物体認識装置を有するサービス提供システムの１つである健康指導サービスシステムを実施する例を説明する。 As an embodiment of the disclosed object recognition support apparatus, an example in which a health guidance service system that is one of service providing systems having an object recognition support apparatus and an object recognition apparatus will be described.

図１は，一実施形態における健康指導サービスシステムの構成例を示す図である。 FIG. 1 is a diagram illustrating a configuration example of a health guidance service system according to an embodiment.

健康指導サービスシステムは，物体認識支援装置１，画像使用判定装置２，物体認識装置３，サービス提供装置４，および，ユーザ装置５を備える。本実施形態では，物体認識支援装置１および画像使用判定装置２は，ユーザ装置５内に設けられる。 The health guidance service system includes an object recognition support device 1, an image use determination device 2, an object recognition device 3, a service providing device 4, and a user device 5. In the present embodiment, the object recognition support device 1 and the image use determination device 2 are provided in the user device 5.

物体認識支援装置１は，物体認識に妥当な量（数）の画像が取得できた場合に，取得完了を通知する。物体認識支援装置１は，取得した画像群の特徴を抽出し，取得した画像の特徴群の変動の推移から変動が収束しているかを判定し，変動が収束している場合に妥当な数の画像を取得できたと判断して，画像の取得完了を示す取得完了を通知する。 The object recognition support apparatus 1 notifies completion of acquisition when an amount (number) of images appropriate for object recognition can be acquired. The object recognition support device 1 extracts the features of the acquired image group, determines whether or not the change has converged from the change of the feature group of the acquired image, and if the change has converged, a reasonable number It is determined that the image has been acquired, and an acquisition completion indicating that the image has been acquired is notified.

画像使用判定装置２は，ユーザ装置５の画像入力部５１で取得された画像を物体認識に使用するかを判定する。画像使用判定装置２は，画像を取得すると，取得した画像から特徴を抽出し，抽出した特徴が，過去に取得した画像の特徴と一定以上の程度で相違する場合に，今回取得した画像を使用画像として採用する。 The image use determination device 2 determines whether the image acquired by the image input unit 51 of the user device 5 is used for object recognition. When the image is acquired, the image use determination device 2 extracts the feature from the acquired image, and uses the image acquired this time when the extracted feature is different from the feature of the image acquired in the past to a certain degree. Adopt as an image.

物体認識装置３は，複数の画像から抽出された特徴をもとに，画像に含まれる物体を認識し，該当する物体名称を推定する。 The object recognition device 3 recognizes an object included in the image based on the features extracted from the plurality of images, and estimates the corresponding object name.

サービス提供装置４は，推定された物体名称をもとに所定のサービス情報をユーザに提供するサーバ装置である。サービス提供装置４は，画像認識で推定された物体名称を利用した健康指導に関する情報をユーザに対して提供する。 The service providing device 4 is a server device that provides predetermined service information to the user based on the estimated object name. The service providing device 4 provides the user with information regarding health guidance using the object name estimated by image recognition.

サービス提供装置４は，カロリーデータベース４１，カロリー計算部４２を備える。 The service providing device 4 includes a calorie database 41 and a calorie calculation unit 42.

カロリーデータベース４１は，食事の名称とカロリーとを対応付けて記憶する。 The calorie database 41 stores meal names and calories in association with each other.

カロリー計算部４２は，カロリーデータベース４１を参照して，物体名称に対応するカロリーを取得し，取得したカロリーをユーザ装置５の情報出力部５２へ送信する。 The calorie calculation unit 42 refers to the calorie database 41, acquires the calorie corresponding to the object name, and transmits the acquired calorie to the information output unit 52 of the user device 5.

ユーザ装置５は，ユーザが使用するクライアント装置である。ユーザ装置５は，画像入力部５１および情報出力部５２を備える。 The user device 5 is a client device used by the user. The user device 5 includes an image input unit 51 and an information output unit 52.

画像入力部５１は，認識させたい対象が撮影された動画像や静止画像を取得する。本実施形態において，画像入力部５１は，動画像または静止画像を撮影するカメラで実施される。以下の説明では，画像入力部５１が取得する画像データを，入力画像という。 The image input unit 51 acquires a moving image or a still image in which a target to be recognized is captured. In the present embodiment, the image input unit 51 is implemented by a camera that captures a moving image or a still image. In the following description, the image data acquired by the image input unit 51 is referred to as an input image.

情報出力部５２は，サービス提供装置４から送信される情報（カロリー）を取得し，ディスプレイなどに出力する。 The information output unit 52 acquires information (calories) transmitted from the service providing apparatus 4 and outputs the information (calorie) to a display or the like.

図２は，一実施形態における物体認識支援装置１の構成例を示す図である。 FIG. 2 is a diagram illustrating a configuration example of the object recognition support apparatus 1 according to an embodiment.

物体認識支援装置１は，物体認識にとって妥当な数の画像データを収集したかどうかを判断する。かかる判断機能のため，物体認識支援装置１は，第１特徴抽出部１１，第１特徴記録部１２，分散算出部１３，分散記録部１４，および収束判定部１５を備える。 The object recognition assisting apparatus 1 determines whether a number of image data appropriate for object recognition has been collected. For such a determination function, the object recognition support apparatus 1 includes a first feature extraction unit 11, a first feature recording unit 12, a variance calculation unit 13, a variance recording unit 14, and a convergence determination unit 15.

第１特徴抽出部１１は，物体認識に使用される画像として採用された画像（以下，使用画像という）から特徴として特徴ベクトルを抽出する。 The first feature extraction unit 11 extracts a feature vector as a feature from an image adopted as an image used for object recognition (hereinafter referred to as a used image).

第１特徴記録部１２は，第１特徴抽出部１１が抽出した特徴ベクトルを記録する。 The first feature recording unit 12 records the feature vector extracted by the first feature extraction unit 11.

分散算出部１３は，第１特徴抽出部１１が抽出した，取得した使用画像の特徴ベクトルと，第１特徴記録部１２に記録されている今回以前の処理で抽出された使用画像群の特徴ベクトル群との全ての特徴ベクトルから各次元についての分散を計算する。 The variance calculating unit 13 extracts the feature vector of the acquired used image extracted by the first feature extracting unit 11 and the feature vector of the used image group extracted by the previous process recorded in the first feature recording unit 12. Calculate the variance for each dimension from all feature vectors with the group.

分散記録部１４は，分散算出部１３が計算した分散を処理順に記録する。 The variance recording unit 14 records the variances calculated by the variance calculating unit 13 in the order of processing.

収束判定部１５は，分散記録部１４に記録されている過去に計算した分散と，今回計算した分散をもとに分散の推移を求め，分散が収束している場合に取得完了を通知する。取得完了は，画像入力部５１，画像使用判定装置２，物体認識装置３などに対して通知される。 The convergence determination unit 15 obtains a transition of the variance based on the variance calculated in the past recorded in the variance recording unit 14 and the variance calculated this time, and notifies the acquisition completion when the variance has converged. The completion of acquisition is notified to the image input unit 51, the image use determination device 2, the object recognition device 3, and the like.

図３は，一実施形態における画像使用判定装置２の構成例を示す図である。 FIG. 3 is a diagram illustrating a configuration example of the image use determination device 2 according to an embodiment.

入力画像が撮影される際の撮影タイミングに間隔がある場合などは，撮影された全ての画像（入力画像）を物体認識に使用してもよい場合がある。しかし，動画像が撮影される場合には，撮影された全ての画像（フレーム）を物体認識に使用すると，画像が冗長であり，物体認識の計算負荷が大きくなる。そのため，入力画像に対して間引きを行う必要がある。一方で，入力画像の間引きは，物体認識に入力する情報を減少させてしまうために，認識率が低下してしまう可能性がある。したがって，入力画像から得られる特徴のバリエーションを維持できるようにする必要がある。 If there is an interval in the shooting timing when the input image is shot, all shot images (input images) may be used for object recognition. However, when a moving image is shot, if all the shot images (frames) are used for object recognition, the image is redundant and the calculation load for object recognition increases. Therefore, it is necessary to thin out the input image. On the other hand, the thinning out of the input image reduces the information input to the object recognition, which may reduce the recognition rate. Therefore, it is necessary to be able to maintain variations of features obtained from the input image.

画像使用判定装置２は，入力画像から得られる特徴のバリエーションを維持しつつ，似たような特徴を持つ冗長な入力画像だけを省いて使用画像を判定する。かかる判定機能のため，画像使用判定装置２は，第２特徴抽出部２１，特徴書き込み部２２，第２特徴記録部２３，距離計算部２４，使用判定部２５を備える。 The image use determination device 2 determines a use image while omitting only redundant input images having similar features while maintaining variations of features obtained from the input images. For such a determination function, the image use determination device 2 includes a second feature extraction unit 21, a feature writing unit 22, a second feature recording unit 23, a distance calculation unit 24, and a use determination unit 25.

第２特徴抽出部２１は，入力画像から特徴を抽出する。 The second feature extraction unit 21 extracts features from the input image.

特徴書き込み部２２は，使用判定部２５によって使用画像に採用された入力画像の特徴を第２特徴記録部２３へ記録する。 The feature writing unit 22 records the feature of the input image adopted as the use image by the use determination unit 25 in the second feature recording unit 23.

第２特徴記録部２３は，使用画像から抽出された特徴を保持する。 The second feature recording unit 23 holds the features extracted from the use image.

距離計算部２４は，第２特徴抽出部２１が抽出した入力画像の特徴と，第２特徴記録部２３に記録されている使用画像の特徴との相違を計算する。 The distance calculation unit 24 calculates the difference between the feature of the input image extracted by the second feature extraction unit 21 and the feature of the use image recorded in the second feature recording unit 23.

使用判定部２５は，入力画像の特徴と，第２特徴記録部２３に記録されている使用画像の特徴とが予め設定された閾値以上に相違している場合に，入力画像を使用画像として選択する。 The use determination unit 25 selects the input image as the use image when the feature of the input image and the feature of the use image recorded in the second feature recording unit 23 are different from each other by a predetermined threshold or more. To do.

本実施形態においては，画像使用判定装置２は，画像の特徴として特徴ベクトルを算出し，特徴の相違として，特徴ベクトルの距離を用いる。 In the present embodiment, the image use determination device 2 calculates a feature vector as a feature of an image, and uses the distance of the feature vector as a feature difference.

また，画像使用判定装置２は，画像入力部５１で動画像が取得される場合には，予め設定された間隔で抽出したフレーム単位で画像使用を判定し，静止画像が取得される場合には，静止画像データ単位で画像使用を判定する。 In addition, when the moving image is acquired by the image input unit 51, the image use determination device 2 determines the use of the image for each frame extracted at a preset interval, and when the still image is acquired. , Image usage is determined in units of still image data.

図４は，一実施形態における物体認識装置３の構成例を示す図である。 FIG. 4 is a diagram illustrating a configuration example of the object recognition device 3 according to an embodiment.

物体認識装置３は，特徴抽出部３１，物体認識部３２，認識統合部３３を備える。 The object recognition device 3 includes a feature extraction unit 31, an object recognition unit 32, and a recognition integration unit 33.

特徴抽出部３１は，複数の使用画像１，２，…，ｎの各々から特徴を抽出する。物体認識部３２は，各物体の画像の特徴と物体名称とを対応づけて記憶する物体認識記憶部（図示しない）を参照して，使用画像ごとに，抽出された特徴をもとに物体名称の候補を推定する。認識統合部３３は，物体認識部３２の推定結果を統合して，画像に含まれる物体に対する物体名称を決定する。 The feature extraction unit 31 extracts features from each of the plurality of use images 1, 2,..., N. The object recognition unit 32 refers to an object recognition storage unit (not shown) that stores the feature of the image of each object and the object name in association with each other and refers to the object name based on the extracted feature for each used image. Estimate candidates. The recognition integration unit 33 integrates the estimation results of the object recognition unit 32 and determines an object name for the object included in the image.

なお，本実施形態においては，物体認識装置３は，複数の画像を使用する既知の物体認識処理を実行するものであり，詳細な動作についての説明を省略する。 In the present embodiment, the object recognition device 3 executes a known object recognition process using a plurality of images, and a detailed description of the operation is omitted.

まず，画像使用判定装置２をより詳細に説明する。 First, the image use determination device 2 will be described in more detail.

図５は，画像使用判定装置２の処理フロー例を示す図である。 FIG. 5 is a diagram illustrating an example of a processing flow of the image use determination device 2.

画像使用判定装置２の第２特徴記録部２３には，特徴書き込み部２２によって，過去の判定処理において使用画像に採用された入力画像の特徴ベクトルｖ′が記録されている。 In the second feature recording unit 23 of the image use determination device 2, the feature writing unit 22 records the feature vector v ′ of the input image adopted as the use image in the past determination processing.

画像入力部５１が物体認識の対象が撮影された画像を入力すると，画像使用判定装置２の第２特徴抽出部２１は，画像入力部５１から入力画像を取得する（ステップＳ１１）。 When the image input unit 51 inputs an image obtained by photographing the object recognition target, the second feature extraction unit 21 of the image use determination device 2 acquires an input image from the image input unit 51 (step S11).

そして，第２特徴抽出部２１は，入力画像の特徴を抽出する。具体的には，第２特徴抽出部２１は，入力画像をＮ次元の特徴ベクトルｖに変換する（ステップＳ１２）。 Then, the second feature extraction unit 21 extracts features of the input image. Specifically, the second feature extraction unit 21 converts the input image into an N-dimensional feature vector v (step S12).

距離計算部２４は，第２特徴記録部２３から使用画像の特徴ベクトルｖ′を取り出す。距離計算部２４は，入力画像の特徴ベクトルｖと使用画像の特徴ベクトルｖ′との距離ｄを計算する（ステップＳ１３）。 The distance calculation unit 24 extracts the feature vector v ′ of the used image from the second feature recording unit 23. The distance calculation unit 24 calculates the distance d between the feature vector v of the input image and the feature vector v ′ of the used image (step S13).

使用判定部２５は，計算された距離ｄを閾値θと比較する。距離ｄが閾値θより大きければ，すなわち「距離ｄ＞閾値θ」が成立すれば（ステップＳ１４のＹ），使用判定部２５は，その入力画像を使用画像として選択，出力する（ステップＳ１５）。さらに，特徴書き込み部２２は，使用画像に選択された入力画像の特徴ベクトルｖを特徴ベクトルｖ′として第２特徴記録部２３に格納して，終了する（ステップＳ１６）。 The use determination unit 25 compares the calculated distance d with the threshold value θ. If the distance d is larger than the threshold value θ, that is, if “distance d> threshold value θ” is satisfied (Y in step S14), the use determination unit 25 selects and outputs the input image as a use image (step S15). Further, the feature writing unit 22 stores the feature vector v of the input image selected as the use image in the second feature recording unit 23 as the feature vector v ′, and ends (step S16).

一方，距離ｄが閾値θより小さければ，すなわち「距離ｄ＞閾値θ」が成立しなければ（ステップＳ１４のＮ），使用判定部２５は，その入力画像を廃棄し，処理がステップＳ１１へ戻される。 On the other hand, if the distance d is smaller than the threshold value θ, that is, if “distance d> threshold value θ” is not satisfied (N in step S14), the use determining unit 25 discards the input image, and the process returns to step S11. It is.

前記ステップＳ１３の処理において，距離計算部２４は，特徴ベクトルｖ，ｖ′の距離ｄを，例えば，以下の式（１）のように算出する。 In the process of step S13, the distance calculation unit 24 calculates the distance d of the feature vectors v and v ′, for example, as in the following equation (1).

ただし，式（１）において，Ｎは特徴ベクトルの次元数である。ｖ_ｉは特徴ベクトルｖにおけるｉ次元目の値を表す。同様に，ｖ′_ｉは特徴ベクトルｖ′におけるｉ次元目の値を表す。 In Equation (1), N is the number of dimensions of the feature vector. v _i represents the value of the i-th dimension in the feature vector v. Similarly, v ′ _i represents the i-th value in the feature vector v ′.

または，距離計算部２４は，以下の式（２）により，特徴ベクトルの次元毎に距離ｄｎを算出してもよい。 Alternatively, the distance calculation unit 24 may calculate the distance dn for each dimension of the feature vector by the following equation (2).

式（２）において，ｄｎは，特徴ベクトルのｎ次元目に関する距離を表す。 In Expression (2), dn represents the distance related to the nth dimension of the feature vector.

距離計算部２４が，式（２）を用いて距離ｄｎを計算する場合には，使用判定部２５は，前記ステップＳ１４の処理で，各次元ｎにおいて「距離ｄｎ＞閾値θ」の条件判定を行い，１つでも条件が成立すれば，入力画像を使用画像として選択，出力する。 When the distance calculation unit 24 calculates the distance dn using the equation (2), the use determination unit 25 performs the condition determination of “distance dn> threshold θ” in each dimension n in the process of step S14. If one of the conditions is satisfied, the input image is selected and output as a use image.

第２特徴抽出部２１が抽出する画像の特徴は，物体認識支援装置１および物体認識装置３と同様のものにすることが望ましい。物体認識においては，一般的に，ＳＩＦＴ特徴量やＳＵＲＦ特徴量などの局所特徴量にもとづくｂａｇ−ｏｆ−ｗｏｒｄｓ表現がよく用いられる。本実施形態において，物体認識支援装置１，画像使用判定措置２および物体認識装置３は，ＳＩＦＴ特徴量を使用するものとする。 The image features extracted by the second feature extraction unit 21 are preferably the same as those of the object recognition support device 1 and the object recognition device 3. In object recognition, in general, a bag-of-words expression based on local feature amounts such as SIFT feature amounts and SURF feature amounts is often used. In the present embodiment, the object recognition support device 1, the image use determination measure 2, and the object recognition device 3 are assumed to use SIFT feature values.

したがって，閾値θの適切な値は，使用する特徴量によって異なるが，例えばＳＩＦＴによる各次元の総和が１になるように正規化された１０００次元のｂａｇ−ｏｆ−ｗｏｒｄｓ表現を特徴量として用いる場合には，前記の式（１）の閾値θは，およそ１０^−４〜１０^−３程度に設定され，前記式（２）の閾値θは，およそ１０^−２〜１０^−１程度に設定される。 Therefore, an appropriate value of the threshold θ varies depending on the feature amount to be used. For example, when a 1000-dimensional bag-of-words expression normalized so that the sum of each dimension by SIFT becomes 1 is used as the feature amount. The threshold value θ in the above formula (1) is set to about 10 ⁻⁴ to 10 ⁻³ , and the threshold value θ in the above formula (2) is set to about 10 ⁻² to 10 ^−1. .

図６は，画像使用判定装置２によって採用された使用画像の例を示す図である。 FIG. 6 is a diagram illustrating an example of a use image adopted by the image use determination device 2.

図６は，動画像である入力画像から使用画像に採用された複数の画像フレーム（１枚目，２枚目，３枚目，…）を表している。 FIG. 6 shows a plurality of image frames (first image, second image, third image,...) Adopted from the input image, which is a moving image, to the use image.

動画像データの先頭フレーム（１枚目）は，比較する使用画像の特徴が保持されていないため，必ず使用画像に採用される。しかし，動画像から数フレームごとに画像を撮りだしても冗長な場合がある。画像使用判定装置２によれば，動画像の１枚目のフレーム以降の画像については，取り出したフレームの画像の特徴が使用画像に採用された画像の特徴と比較され，閾値以上に相違する特徴を持つフレームのみが判定条件を満たすとして，使用画像に選択される。したがって，入力画像が冗長であっても，特徴が異なる画像が使用画像として選択され，特徴のバリエーションを維持しつつ冗長な画像を間引くことができる。 The first frame (first frame) of the moving image data is always used as the used image because the feature of the used image to be compared is not retained. However, even if an image is taken every few frames from a moving image, it may be redundant. According to the image use determination device 2, for the image after the first frame of the moving image, the feature of the image of the extracted frame is compared with the feature of the image adopted as the use image, and the feature is different from the threshold value or more. Only the frames having “” are selected as use images as the determination condition is satisfied. Therefore, even if the input image is redundant, an image having a different feature is selected as the use image, and the redundant image can be thinned out while maintaining the variation of the feature.

次に，物体認識支援装置１をより詳細に説明する。 Next, the object recognition support apparatus 1 will be described in more detail.

物体認識支援装置１は，物体認識のために妥当な枚数の画像が取得できたかどうかを判定するため，過去に取得された入力画像の特徴ベクトル群についての分散を算出する。 The object recognition support apparatus 1 calculates a variance for a feature vector group of input images acquired in the past in order to determine whether or not an appropriate number of images for object recognition has been acquired.

なお，本実施形態においては，画像使用判定装置２によって，入力画像から物体認識に用いる使用画像が選択されるため，物体認識支援装置１は，使用画像の特徴ベクトル群について分散を算出する。画像使用判定装置２を設けない構成をとる実施形態の場合には，画像入力部５１が取得した入力画像の特徴ベクトル群について分散を算出する。 In the present embodiment, the image use determination device 2 selects a use image to be used for object recognition from the input image. Therefore, the object recognition support device 1 calculates a variance for a feature vector group of the use image. In the embodiment in which the image use determination device 2 is not provided, the variance is calculated for the feature vector group of the input image acquired by the image input unit 51.

図７は，画像の特徴ベクトル群についての分散の例を示す図である。 FIG. 7 is a diagram showing an example of dispersion for a feature vector group of an image.

図７に示すグラフは，使用画像の特徴ベクトル群についての分散の例であり，物体認識支援装置１が取得した使用画像の枚数ｔに対する分散Ｓの推移を表現したものである。 The graph shown in FIG. 7 is an example of the variance for the feature vector group of the used image, and expresses the transition of the variance S with respect to the number t of used images acquired by the object recognition support apparatus 1.

撮影方向などの条件を変化させながら撮影された使用画像（入力画像）を取得していくと，取得した画像の特徴から算出される分散の変動は，始めは大きいが，様々な撮影条件の画像が蓄積されていくにつれて，徐々に変動幅が小さくなる。 When using images (input images) that were taken while changing conditions such as the shooting direction are acquired, the variance of the variance calculated from the characteristics of the acquired images is initially large, but images with various shooting conditions As the value accumulates, the fluctuation range gradually decreases.

したがって，分散の変動が大きければ，今後さらに新しい条件で撮影された画像が取得される可能性が大きく，分散の変動が収束していれば，新しい条件で撮影された画像が入力される可能性が少ないと判断することができる。すなわち，分散の変動が収束していれば，物体認識に必要な異なる特徴を有する画像群がすでに取得されており，妥当な画像数の取得が完了したと判断することができる。 Therefore, if the variance variation is large, it is likely that images captured under new conditions will be acquired in the future. If the variance variation converges, images captured under the new conditions may be input. It can be judged that there are few. That is, if the variance variation has converged, it can be determined that an image group having different features necessary for object recognition has already been acquired, and acquisition of an appropriate number of images has been completed.

図８は，物体認識支援装置１の処理フロー例を示す図である。 FIG. 8 is a diagram illustrating a processing flow example of the object recognition support apparatus 1.

物体認識支援装置１の第１特徴抽出部１１は，画像使用判定装置２から使用画像を取得する（ステップＳ２１）。 The first feature extraction unit 11 of the object recognition support apparatus 1 acquires a use image from the image use determination apparatus 2 (step S21).

第１特徴抽出部１１は，取得した使用画像の特徴ベクトルを抽出し，第１特徴記録部１２に記録する（ステップＳ２２）。 The first feature extraction unit 11 extracts the feature vector of the acquired use image and records it in the first feature recording unit 12 (step S22).

分散算出部１３は，第１特徴記録部１２に保持されている過去に抽出された特徴ベクトルをもとに，全ての特徴ベクトルから各次元についての分散をそれぞれ算出し，算出した分散を分散記録部１４へ記録する（ステップＳ２３）。 The variance calculation unit 13 calculates variances for each dimension from all feature vectors based on feature vectors extracted in the past held in the first feature recording unit 12, and records the calculated variances in a distributed manner. It records in the part 14 (step S23).

収束判定部１５は，分散記録部１４に保持されている分散をもとに，過去に算出された分散の推移にもとづき，分散が収束したかどうかの判定を行う（ステップＳ２４）。分散の推移から，分散の変動が収束していると判定した場合に（ステップＳ２４の「収束」），収束判定部１５は，物体認識を行う上で妥当な枚数の画像が収集できたことを示す「取得完了」を通知して，終了する（ステップＳ２５）。分散が未収束していないと判定した場合に（ステップＳ２４の「未収束」），収束判定部１５は，物体認識を行う上で妥当な枚数の画像がまだ収集できていないことを示す「取得未完了」を通知して，終了する（ステップＳ２６）。なお，収束判定部１５は，ステップＳ２６の処理を行わずに終了してもよい。 The convergence determination unit 15 determines whether or not the variance has converged based on the variance transition calculated in the past based on the variance held in the variance recording unit 14 (step S24). When it is determined from the transition of dispersion that the dispersion fluctuation has converged (“convergence” in step S24), the convergence determination unit 15 confirms that an appropriate number of images have been collected for object recognition. “Acquisition completion” is notified, and the process ends (step S25). When it is determined that the variance has not yet converged (“unconverged” in step S24), the convergence determination unit 15 displays “acquired” indicating that an appropriate number of images have not yet been collected for object recognition. “Not completed” is notified and the process ends (step S26). The convergence determination unit 15 may be terminated without performing the process of step S26.

図９は，前記のステップＳ２３の処理で実行される，特徴ベクトル群の分散の算出例を示す図である。 FIG. 9 is a diagram illustrating a calculation example of the variance of the feature vector group, which is executed in the process of step S23.

分散算出部１３は，それまでに取得され第１特徴記録部１２に保持されているｔ枚の使用画像（画像１，…，画像ｔ）から抽出された特徴ベクトル群（Ｘ_１，…，Ｘ_ｔ）から，ｎ番目の次元について要素をそれぞれ抽出し，要素ベクトルＮ_ｔｎを生成する。使用画像から抽出する際に，特徴ベクトルの各要素の取り得る範囲が０〜１に収まるように正規化しておいてもよい。 The variance calculation unit 13 extracts feature vector groups (X ₁ ,..., X, extracted from t use images (images 1,..., Image t) acquired so far and held in the first feature recording unit 12. _{From t 1} ), elements are extracted for the nth dimension, and an element vector N _tn is generated. When extracting from a use image, you may normalize so that the range which each element of a feature vector can take falls in 0-1.

そして，分散算出部１３は，以下の式（３）でｔ枚の画像の特徴ベクトル群におけるｎ番目の次元についての分散Ｓ_ｎｔを算出する。 Then, the variance calculation unit 13 calculates the variance S _nt for the nth dimension in the feature vector group of t images according to the following equation (3).

ただし，式（３）において，「Ｎ_ｔｎの上にバー」で表す変数は，要素ベクトルＮ_ｔｎの平均であり，以下の式（４）で算出する。 However, in Expression (3), the variable represented by “Bar over N _tn ” is the average of the element vectors N _tn and is calculated by the following Expression (4).

物体認識支援装置１の第１特徴抽出部１１が使用画像を取得して，その特徴ベクトルが第１特徴記録部１２に格納されるたび，分散算出部１３は，以上の処理によって，分散Ｓ_ｎｔを各次元についてそれぞれ算出する。 Every time the first feature extraction unit 11 of the object recognition support apparatus 1 acquires a use image and the feature vector is stored in the first feature recording unit 12, the variance calculation unit 13 performs the variance S _nt by the above processing. Are calculated for each dimension.

ステップＳ２４の処理において，収束判定部１５は，分散算出部１３によって算出された分散の推移にもとづき，以下のいずれかの条件を用いて，収束と判定する。 In the process of step S24, the convergence determination unit 15 determines convergence using one of the following conditions based on the transition of the variance calculated by the variance calculation unit 13.

条件１：分散の変動が十分に小さくなったこと
条件２：分散の変動が減少し続けていること
前記の条件１による収束の検出は，例えば，以下の式（５）に示す条件がＬ回連続で成立したかを確認することにより行われる。 Condition 1: Variance variation has become sufficiently small Condition 2: Variance variation has continued to decrease. For example, the condition shown in the following equation (5) can be detected by L times This is done by confirming whether it has been established continuously.

Ｓ_ｎｔ−Ｓ_{ｎ（ｔ−１）}≦ε 式（５）
ただし，式（５）において，εは閾値である。特徴ベクトルにおいて各要素の取り得る範囲が０〜１に正規化されているならば，閾値εは，０．００１〜０．１程度に設定することが好ましい。 S _nt −S _{n (t−1)} ≦ ε Equation (5)
However, in Formula (5), ε is a threshold value. If the possible range of each element in the feature vector is normalized to 0 to 1, the threshold ε is preferably set to about 0.001 to 0.1.

前記の条件２による収束の検出は，例えば，以下の式（６）に示す条件がＬ回連続で成立したかを確認することにより行われる。 The detection of convergence under the above condition 2 is performed, for example, by confirming whether the condition shown in the following equation (6) is satisfied L times continuously.

条件１および条件２のいずれの場合も，Ｌは，収束の判定を確定するまでの期間を表す。Ｌが小さいと収束判定が早く行えるが誤判定のリスクが増える。Ｌが大きいと収束判定に時間がかかるが誤判定のリスクは減る。分散推移の傾向として，分散が大きく変化しているときは，収束するまでに時間がかかる場合が多い。そのため時間をかけて判定したほうがよい。反対に，分散が小さく変化しているときは，収束が早い場合が多い。そのため時間をかけずに判定してもよい。 In both cases of condition 1 and condition 2, L represents a period until the determination of convergence is finalized. If L is small, convergence determination can be made quickly, but the risk of erroneous determination increases. If L is large, the convergence determination takes time, but the risk of erroneous determination is reduced. As a tendency of variance transition, when variance is changing greatly, it often takes time to converge. Therefore, it is better to make a decision over time. On the other hand, when the variance is small, the convergence is often fast. Therefore, the determination may be made without taking time.

したがって，Ｌは，以下の式（７）に示すように，分散の変化に比例した値を設定する。これにより，収束判定部１５は，効率的な収束判定を行う。 Therefore, L is set to a value proportional to the variance change as shown in the following equation (7). Thereby, the convergence determination part 15 performs efficient convergence determination.

前記の式（７）において，Ｋは判定速度調整パラメータである。Ｋは，通常，１００〜１０００程度の値を設定することが好ましい。 In the above equation (7), K is a determination speed adjustment parameter. In general, K is preferably set to a value of about 100 to 1000.

なお，条件成立が連続したときの初回時のみ，Ｌが算出される。 Note that L is calculated only at the first time when the conditions are continuously satisfied.

収束判定部１５は，以上の処理を要素ベクトルＮ１〜Ｎｎについて行う。そして，収束判定部１５は，各要素の分散について過半数以上が収束したと判定したときに，妥当な枚数の使用画像が収集できたものと判断する。 The convergence determination unit 15 performs the above processing on the element vectors N1 to Nn. When the convergence determination unit 15 determines that more than half of the variances of the elements have converged, the convergence determination unit 15 determines that an appropriate number of used images have been collected.

前記ステップＳ４５の処理において，収束判定部１５は，妥当な枚数の使用画像が収集できたとの判定をもとに，画像使用判定装置２に対し取得完了を通知すると，画像使用判定装置２は，画像使用判定処理を終了し，使用画像の記録が終了する。 In the process of step S45, the convergence determination unit 15 notifies the image use determination device 2 of the completion of acquisition based on the determination that an appropriate number of use images have been collected. The image use determination process ends, and the recording of the use image ends.

また，収束判定部１５は，ユーザに取得完了を通知する。例えば，収束判定部１５は，情報出力部５２でアラームを鳴らし，ユーザに撮影フェーズが完了したことを通知する。アラームにより，ユーザは画像の撮影終了のタイミングを知ることができる。 The convergence determination unit 15 notifies the user of acquisition completion. For example, the convergence determination unit 15 sounds an alarm at the information output unit 52 and notifies the user that the imaging phase has been completed. The alarm allows the user to know the timing of the end of image capturing.

さらに，収束判定部１５は，物体認識装置３に対し取得完了を通知してもよい。物体認識装置３は，取得完了通知を得た段階で画像使用判定装置２から取得している使用画像により推定されている物体名称をサービス提供装置４へ出力する。 Furthermore, the convergence determination unit 15 may notify the object recognition device 3 of acquisition completion. The object recognition device 3 outputs the object name estimated from the use image acquired from the image use determination device 2 to the service providing device 4 when the acquisition completion notification is obtained.

次に，本実施形態における健康指導サービスシステムの処理の流れを説明する。 Next, the processing flow of the health guidance service system in this embodiment will be described.

図１に示す健康指導サービスシステムでは，健康指導サービスとして，ユーザが携帯端末などのカメラにより撮影した食事画像から，物体認識装置および物体認識支援装置を用いて食事名称を自動認識し，認識した食事の摂取カロリーを含む健康指導情報を出力するサービスを想定する。 In the health guidance service system shown in FIG. 1, as a health guidance service, a meal name is automatically recognized from a meal image taken by a user using a camera such as a mobile terminal by using an object recognition device and an object recognition support device. Assume a service that outputs health guidance information including calorie intake.

ユーザ装置５は，ユーザの携帯端末であり，画像入力部５１はカメラにより，情報出力部５２は，携帯端末のディスプレイに情報を表示する処理部により，それぞれ実施される。また，入力画像は，ユーザによって撮影された食事の画像であり，入力画像に含まれる物体は，ユーザによって撮影された食事を表し，その物体名称は，食事名称を表している。 The user device 5 is a user's portable terminal, the image input unit 51 is implemented by a camera, and the information output unit 52 is implemented by a processing unit that displays information on the display of the portable terminal. The input image is an image of a meal photographed by the user, an object included in the input image represents a meal photographed by the user, and the object name represents a meal name.

図１０は，健康指導サービスシステムの処理の流れを示す図である。 FIG. 10 is a diagram showing the flow of processing of the health guidance service system.

ユーザ装置５の画像入力部５１は，認識対象が撮影された入力画像を取得する（ステップＳ３１）。画像使用判定装置２は，入力画像に対して使用画像として採用するかの判定（画像使用判定）を行う（ステップＳ３２）。画像使用判定装置２は，入力画像を使用画像に採用すると判定した場合には（ステップＳ３３のＹ），処理をステップＳ３４へ進ませる。入力画像を使用画像に採用しないと判定した場合には（ステップＳ３３のＮ），処理をステップＳ３１へ戻す。 The image input unit 51 of the user device 5 acquires an input image obtained by photographing the recognition target (step S31). The image use determination device 2 determines whether to use the input image as a use image (image use determination) (step S32). When it is determined that the input image is adopted as the use image (Y in step S33), the image use determination device 2 advances the process to step S34. If it is determined that the input image is not adopted as the use image (N in step S33), the process returns to step S31.

物体認識支援装置１は，画像使用判定装置２から取得した使用画像について妥当な画像数を収集しているかの判定（画像数充足判定）を行う（ステップＳ３４）。物体認識支援装置１は，取得する使用画像群の特徴の変動の推移から，変動が収束していて使用画像群が既に妥当な枚数に達していると判定した場合に（ステップＳ３５のＹ），処理をステップＳ３６へ進ませる。変動が収束しておらず使用画像群が既に妥当な枚数に達していないと判定した場合に（ステップＳ３５のＮ），物体認識支援装置１は，処理をステップＳ３１へ戻す。 The object recognition assisting apparatus 1 determines whether an appropriate number of images are collected for the used images acquired from the image use determining apparatus 2 (image number satisfaction determination) (step S34). When the object recognition assisting apparatus 1 determines that the variation has converged and the number of used image groups has already reached a reasonable number from the transition of the variation in the characteristics of the used image group to be acquired (Y in step S35), The process proceeds to step S36. When it is determined that the variation has not converged and the number of used images has not reached the appropriate number (N in step S35), the object recognition support apparatus 1 returns the process to step S31.

物体認識装置３は，物体認識支援装置１から取得完了の通知を受けるまで，画像使用判定装置２から使用画像を取得して物体認識を行い，使用画像の物体（食事）に対応する物体名称（食事名）を出力する（ステップＳ３６）。 The object recognition device 3 acquires a use image from the image use determination device 2 and performs object recognition until receiving a notification of acquisition completion from the object recognition support device 1, and performs object recognition (object) corresponding to the object (meal) of the use image. (Meal name) is output (step S36).

サービス提供装置４のカロリー計算部４２は，カロリーデータベース４１と照合することにより，推定された物体名称（食事名）の「ラーメン」に対応するカロリーを取得する。そして，カロリー計算部４２は，取得した摂取カロリーを含む情報をユーザ装置５の情報出力部５２へ送信し，ユーザ装置５上に表示させる（ステップＳ３７）。 The calorie calculation unit 42 of the service providing apparatus 4 collates with the calorie database 41 to obtain calories corresponding to “ramen” of the estimated object name (meal name). And the calorie calculation part 42 transmits the information containing the acquired calorie intake to the information output part 52 of the user apparatus 5 and displays it on the user apparatus 5 (step S37).

図１１は，物体認識による物体名称の推定例を示す図である。 FIG. 11 is a diagram illustrating an example of estimating an object name by object recognition.

ステップＳ３６の処理において，物体認識装置３の特徴抽出部３１は，図１１に示すように，同一の物体（食事）が異なる方向から撮影されている使用画像群（使用画像１〜４）を順次取得し，物体認識部３２は，取得した使用画像に対する物体認識により推定した物体名称候補を保持する。物体認識の終了後，認識統合部３３は，保持されている物体名称候補から多数決によって物体名称を決定する。図１１の場合には，物体名称候補「カツ丼」，「ラーメン」が保持され，多数決によって物体名称として「ラーメン」が決定される。 In the process of step S36, the feature extraction unit 31 of the object recognition device 3 sequentially uses use image groups (use images 1 to 4) in which the same object (meal) is taken from different directions as shown in FIG. The object recognition unit 32 acquires the object name candidate estimated by object recognition on the acquired use image. After the completion of the object recognition, the recognition integration unit 33 determines an object name by majority from the held object name candidates. In the case of FIG. 11, object name candidates “katsudon” and “ramen” are held, and “ramen” is determined as the object name by majority vote.

以上の処理により，健康指導サービスシステムは，ユーザに対する健康指導サービスを実現することができる。 Through the above processing, the health guidance service system can realize a health guidance service for the user.

次に，健康指導サービスシステムの各装置の他の構成例について説明する。 Next, another configuration example of each device of the health guidance service system will be described.

図１に示す構成例では，物体認識支援装置１および画像使用判定装置２は，ユーザ装置５内に設けられるものとして説明している。しかし，物体認識支援装置１および画像使用判定装置２は，物体認識装置３の前処理機能として実施されればよく，ユーザ装置５の画像入力部５１と，物体認識装置３との間に構成されていてもよい。例えば，物体認識支援装置１および画像使用判定装置２は，ユーザ装置５の外部の装置として構成されてもよく，または，物体認識装置３と一体の装置に構成されたりしてもよい。 In the configuration example illustrated in FIG. 1, the object recognition support device 1 and the image use determination device 2 are described as being provided in the user device 5. However, the object recognition support device 1 and the image use determination device 2 may be implemented as a preprocessing function of the object recognition device 3 and are configured between the image input unit 51 of the user device 5 and the object recognition device 3. It may be. For example, the object recognition support device 1 and the image use determination device 2 may be configured as devices external to the user device 5 or may be configured as an integrated device with the object recognition device 3.

また，画像使用判定装置２は，物体認識支援装置１の一処理部として構成されるようにしてもよい。 Further, the image use determination device 2 may be configured as one processing unit of the object recognition support device 1.

健康指導サービスシステムの各装置は，ＣＰＵおよびメモリ等を有するハードウェアとソフトウェアプログラムとを備えるコンピュータ・システム，または専用ハードウェアによって実現される。すなわち，物体認識支援装置１，画像使用判定装置２，物体認識装置３，サービス提供装置４，ユーザ装置５は，それぞれ，演算装置（ＣＰＵ），一時記憶装置（ＤＲＡＭ，フラッシュメモリ等）および永続性記憶装置（ＨＤＤ，フラッシュメモリ等）を有し，外部とデータの入出力をするコンピュータによって実施することができる。 Each device of the health guidance service system is realized by a computer system including hardware having a CPU and a memory and a software program, or dedicated hardware. That is, the object recognition support device 1, the image use determination device 2, the object recognition device 3, the service providing device 4, and the user device 5 are respectively a computing device (CPU), a temporary storage device (DRAM, flash memory, etc.) and a persistence. It can be implemented by a computer having a storage device (HDD, flash memory, etc.) and inputting / outputting data from / to the outside.

また，物体認識支援装置１，画像使用判定装置２，それぞれ，コンピュータが実行可能なプログラムによっても実施することができる。例えば図１に示す構成の場合に，物体認識支援装置１および画像使用判定装置２それぞれが有すべき機能の処理内容を記述したプログラムが，ユーザ装置５に提供される。提供されたプログラムをコンピュータであるユーザ装置５が実行することによって，上記説明した物体認識支援装置１および画像使用判定装置２それぞれの処理機能がコンピュータ上で実現される。 Further, the object recognition support device 1 and the image use determination device 2 can each be implemented by a program executable by a computer. For example, in the case of the configuration shown in FIG. 1, a program describing the processing contents of functions that each of the object recognition support device 1 and the image use determination device 2 should have is provided to the user device 5. By executing the provided program by the user device 5 that is a computer, the processing functions of the object recognition support device 1 and the image use determination device 2 described above are realized on the computer.

さらに，物体認識支援装置１および画像使用判定装置２は，それぞれコンピュータとして実施される場合に，可搬型記録媒体から直接プログラムを読み取り，そのプログラムに従った処理を実行することもできる。また，前記プログラムは，コンピュータで読み取り可能な記録媒体に記録しておくことができる。 Furthermore, when the object recognition support apparatus 1 and the image use determination apparatus 2 are each implemented as a computer, the object recognition support apparatus 1 and the image use determination apparatus 2 can also read a program directly from a portable recording medium and execute processing according to the program. The program can be recorded on a computer-readable recording medium.

以上説明したように，本発明の一態様として開示する物体認識支援装置１および画像使用判定装置２によれば，次のような効果がある。 As described above, according to the object recognition support apparatus 1 and the image use determination apparatus 2 disclosed as one aspect of the present invention, the following effects can be obtained.

・物体認識支援装置１によれば，物体認識のための画像を過剰に取得（撮影または入力）するという無駄を防ぐことができる。 The object recognition support apparatus 1 can prevent wasteful acquisition (photographing or input) of images for object recognition.

・物体認識支援装置１によれば，物体認識に使用される画像を妥当な量に抑えて，計算負担の増加を防ぐことができる。 -According to the object recognition support apparatus 1, it is possible to suppress an increase in calculation burden by suppressing an image used for object recognition to an appropriate amount.

・物体認識支援装置１によれば，認識対象の画像を撮影または入力するユーザに対し操作終了のタイミングを通知して，操作終了のタイミングに対するユーザの心理的ストレスを解消することができる。 According to the object recognition support apparatus 1, it is possible to notify a user who captures or inputs an image to be recognized of the operation end timing, and to eliminate the user's psychological stress with respect to the operation end timing.

・画像使用判定装置２によれば，物体認識に用いる画像の特徴のバリエーションを維持しつつ，冗長な画像を間引くことができる。 The image use determination device 2 can thin out redundant images while maintaining variations in image features used for object recognition.

以上のとおり，本発明の実施形態において，発明者によってなされた発明を健康指導サービス分野に適用した場合について説明した。しかし，本発明は，かかる分野の適用に限定されるものではなく，物体認識に対する支援技術を必要とする様々な分野への適用が可能であり，また，その記述の主旨の範囲において種々の変形が可能であることは当然である。 As described above, in the embodiment of the present invention, the case where the invention made by the inventor is applied to the health guidance service field has been described. However, the present invention is not limited to application in such a field, but can be applied to various fields that require assistive technology for object recognition, and various modifications are possible within the scope of the description. Of course it is possible.

１物体認識支援装置
１１第１特徴抽出部
１２第１特徴記録部
１３分散算出部
１４分散記録部
１５収束判定部
２画像使用判定装置
２１第２特徴抽出部
２２特徴書き込み部
２３第２特徴記録部
２４距離計算部
２５使用判定部
３物体認識装置
４サービス提供装置
５ユーザ装置
DESCRIPTION OF SYMBOLS 1 Object recognition assistance apparatus 11 1st characteristic extraction part 12 1st characteristic recording part 13 Variance calculation part 14 Dispersion recording part 15 Convergence determination part 2 Image use determination apparatus 21 2nd feature extraction part 22 Feature writing part 23 2nd characteristic recording part 24 distance calculation unit 25 use determination unit 3 object recognition device 4 service providing device 5 user device

Claims

An object recognition support device for supporting object recognition processing,
A first feature extraction unit for extracting features of the image acquired by the image input means;
A first feature recording unit for recording the extracted features;
A variance calculating unit that calculates the variance of the feature group recorded in the first feature recording unit;
A dispersion recording unit for recording the calculated dispersion;
A convergence determining unit that determines whether the variance has converged based on a variance transition recorded in the variance recording unit, and notifies the completion of image acquisition when the variance has converged. Characteristic object recognition support device.

A second feature recording unit for recording features of a used image used for object recognition;
A second feature extraction unit for extracting features of the image acquired by the image input means;
A distance calculation unit for calculating a difference between the acquired image feature and the image feature recorded in the second feature recording unit;
When the calculated feature difference is greater than or equal to a certain value, the acquired image is selected as a use image acquired by the first feature extraction unit, and the feature of the use image is stored in the second feature recording unit The object recognition support apparatus according to claim 1, further comprising:

To support the object recognition process,
Processing for extracting features of the image acquired by the image input means;
A process of recording the extracted features in a first feature recording unit;
Processing for calculating the variance of the feature group recorded in the first feature recording unit;
A process of recording the calculated variance in a variance recording unit;
An object for determining whether or not the dispersion has converged based on the transition of the dispersion recorded in the dispersion recording unit, and executing processing for notifying completion of image acquisition when the dispersion has converged Recognition support program.

To support the object recognition process, the computer
Extract image features acquired by image input means,
Recording the extracted features in a first feature recording unit;
Calculating a variance of the feature group recorded in the first feature recording unit;
Record the calculated variance in a variance recording unit;
Determining whether the variance has converged based on the transition of the variance recorded in the variance recording unit;
An object recognition support method comprising notifying completion of image acquisition when the variance has converged.