JP6762544B2

JP6762544B2 - Image processing equipment, image processing method, and image processing program

Info

Publication number: JP6762544B2
Application number: JP2018556754A
Authority: JP
Inventors: 英起多田; 玲志相宅
Original assignee: NURVE, INC.
Current assignee: NURVE, INC.
Priority date: 2016-12-15
Filing date: 2017-12-14
Publication date: 2020-09-30
Anticipated expiration: 2037-12-14
Also published as: JPWO2018110680A1; WO2018110680A1

Description

本発明は、画像に写った物体の像を認識する画像処理装置、画像処理方法、及び画像処理プログラムに関する。 The present invention relates to an image processing device, an image processing method, and an image processing program for recognizing an image of an object in an image.

近年、仮想現実（バーチャルリアリティ、以下「ＶＲ」とも記す）をユーザに体験させる技術が、ゲームやエンターテインメント或いは職業訓練等の様々な分野で利用されている。ＶＲにおいては、通常、ヘッドマウントディスプレイ（以下、「ＨＭＤ」とも記す）と呼ばれるメガネ型又はゴーグル型の表示装置が用いられる。ＨＭＤをユーザの頭部に装着し、ＨＭＤに内蔵された画面を両眼で見ることにより、ユーザは立体感のある画像を鑑賞することができる。ＨＭＤにはジャイロセンサや加速度センサが内蔵されており、これらのセンサによって検出したユーザの頭部の動きに応じて、画面に表示する画像が変化する。それにより、ユーザは、あたかも表示された画像の中に入り込んだような体験をすることができる。 In recent years, technology that allows users to experience virtual reality (hereinafter also referred to as "VR") has been used in various fields such as games, entertainment, and vocational training. In VR, a glasses-type or goggle-type display device called a head-mounted display (hereinafter, also referred to as “HMD”) is usually used. By attaching the HMD to the user's head and viewing the screen built into the HMD with both eyes, the user can appreciate a three-dimensional image. The HMD has a built-in gyro sensor and an acceleration sensor, and the image displayed on the screen changes according to the movement of the user's head detected by these sensors. As a result, the user can experience as if he / she is in the displayed image.

このようなＶＲの技術分野においては、画像の切り換えや画像内のオブジェクトの移動などの操作をユーザのジェスチャによって行うユーザインタフェースが研究されている。例えば、特許文献１には、仮想空間にユーザインタフェースを配置し、ユーザがこのユーザインタフェースを操作するために用いる操作部が撮像部の視野内にあり、且つ、操作部とユーザインタフェースとの位置関係が規定の位置関係にある場合に、ユーザインタフェースに対する操作があったと判断する画像処理装置が開示されている。 In such a VR technical field, a user interface for performing operations such as switching images and moving objects in an image by a user's gesture is being studied. For example, in Patent Document 1, a user interface is arranged in a virtual space, an operation unit used by the user to operate the user interface is within the field of view of the imaging unit, and the positional relationship between the operation unit and the user interface. Discloses an image processing device that determines that there has been an operation on the user interface when is in a specified positional relationship.

また、特許文献２には、動画像内に含まれる物体を、色特徴量を用いて認識する技術が開示されている。詳細には、処理対象のフレーム画像が有する色相の中で最も広い面積を占める色相である第１の基準色と、該処理対象のフレーム画像のＲＧＢヒストグラムを作成し、そのヒストグラムの所定の閾値以上で、且つ第１の基準色に対して最も離れたピークを示す第２の基準色とを特定し、エッジ検出により、上記フレーム画像の中に存在する閉領域を特定し、該閉領域の中で、上記フレーム画像の１つ前に処理を行った画像フレームで特定された第１及び第２の基準色を含み、且つ該１つ前の画像フレームで上記物体として認識された閉領域に相当するエリアと少なくとも一部が重複する閉領域を上記物体として認識する。 Further, Patent Document 2 discloses a technique of recognizing an object included in a moving image by using a color feature amount. Specifically, a first reference color, which is a hue that occupies the largest area of the hue of the frame image to be processed, and an RGB histogram of the frame image to be processed are created, and a predetermined threshold value or more of the histogram is created. In addition, the second reference color showing the peak farthest from the first reference color is specified, and the closed region existing in the frame image is specified by edge detection, and the inside of the closed region is specified. Corresponds to a closed region that includes the first and second reference colors specified in the image frame processed immediately before the frame image and is recognized as the object in the previous image frame. A closed area that overlaps at least a part of the area is recognized as the object.

特開２０１２−４８６５６号公報Japanese Unexamined Patent Publication No. 2012-48656 特許第５８８７２６４号公報Japanese Patent No. 5887264

ジェスチャにより操作を行う場合、ジェスチャを行う物体の像を画像内から精度良く認識する必要がある。しかしながら、ユーザの手や指などでジェスチャを行う場合、認識対象である手や指の色は個人差が非常に大きく、また同一人物の手や指であっても、部分によって色が異なるため、認識対象を過不足なく抽出するためには、適切な色特徴量を設定することが非常に重要である。 When performing an operation by a gesture, it is necessary to accurately recognize the image of the object to be gestured from the image. However, when performing a gesture with the user's hand or finger, the color of the hand or finger to be recognized varies greatly from person to person, and even if the hand or finger of the same person is used, the color differs depending on the part. It is very important to set an appropriate amount of color features in order to extract the recognition target in just proportion.

物体認識のための色特徴量に関し、特許文献２においては、１つ前のフレーム画像において認識すべき物体として判断された閉領域から、次の画像フレームの処理で用いられる色特徴量（第１及び第２の基準色）を求めている（特許文献２の段落００４６参照）。しかしながら、そもそも、最初の処理対象のフレーム画像については、どのようにすれば適切な色特徴量を求めることができるのか、特許文献２には詳細が開示されていない。 Regarding the color feature amount for object recognition, in Patent Document 2, from the closed region determined as the object to be recognized in the previous frame image, the color feature amount used in the processing of the next image frame (first). And a second reference color) (see paragraph 0046 of Patent Document 2). However, in the first place, Patent Document 2 does not disclose details on how to obtain an appropriate color feature amount for the frame image to be processed first.

本発明は上記に鑑みてなされたものであって、画像に写った特定の物体の像を精度良く認識するための適切な色特徴量を設定することができる画像処理装置、画像処理方法、及び画像処理プログラムを提供することを目的とする。 The present invention has been made in view of the above, and is an image processing apparatus, an image processing method, and an image processing method capable of setting an appropriate color feature amount for accurately recognizing an image of a specific object in an image. An object of the present invention is to provide an image processing program.

上記課題を解決するために、本発明の一態様である画像処理装置は、特定の物体が写った画像から前記物体の像を認識する画像処理装置であって、当該画像処理装置が存在する空間に関する画像情報を取得する外部情報取得部と、前記画像情報に基づく画像内の所定の位置における複数の画素の色特徴量を用いて、前記画像から前記物体の像の領域を抽出し、該像の領域内の特定点の位置を取得する特定点取得部と、前記特定点を含む所定範囲内に位置する複数の画素の色特徴量に基づいて、前記物体を認識するための色特徴量を設定する色特徴量設定部と、を備えるものである。 In order to solve the above problems, the image processing device according to one aspect of the present invention is an image processing device that recognizes an image of the object from an image of a specific object, and is a space in which the image processing device exists. An image region of the object is extracted from the image by using an external information acquisition unit that acquires image information about the object and color features of a plurality of pixels at predetermined positions in the image based on the image information. Based on the specific point acquisition unit that acquires the position of the specific point in the region and the color feature amount of a plurality of pixels located within a predetermined range including the specific point, the color feature amount for recognizing the object is obtained. It is provided with a color feature amount setting unit for setting.

上記画像処理装置は、前記画像を画面に表示する表示部と、前記画像が表示される画面に対し、所定のオブジェクトを合成して表示させる画像合成部をさらに備え、前記所定の位置は、前記オブジェクトの内側に設定されていても良い。 The image processing device further includes a display unit that displays the image on the screen and an image composition unit that synthesizes and displays a predetermined object on the screen on which the image is displayed, and the predetermined position is the said. It may be set inside the object.

上記画像処理装置において、前記特定点取得部は、前記所定の位置における複数の画素の色特徴量のヒストグラムを作成し、該ヒストグラムの所定の範囲に含まれる色特徴量を有する画素の領域を、前記物体の像の候補領域として前記画像から抽出する物体領域抽出部と、前記候補領域の輪郭を抽出し、該輪郭に基づいて、前記候補領域が前記物体の像の領域であるか否かを判定する物体判定部と、前記候補領域が前記物体の像の領域であると判定された場合に、当該候補領域内の特定点を前記物体の像の領域内の特定点として出力する特定点決定部と、を有しても良い。 In the image processing apparatus, the specific point acquisition unit creates a histogram of the color features of a plurality of pixels at the predetermined positions, and creates a region of pixels having the color features included in the predetermined range of the histogram. An object area extraction unit extracted from the image as a candidate area for the image of the object and a contour of the candidate region are extracted, and based on the contour, whether or not the candidate region is a region of the image of the object is determined. The object determination unit to be determined, and a specific point determination that outputs a specific point in the candidate area as a specific point in the image area of the object when it is determined that the candidate area is an image area of the object. It may have a part and.

上記画像処理装置において、前記物体判定部は、前記候補領域の面積が所定値以上である場合、当該候補領域は前記物体の像の領域であると判定しても良い。 In the image processing device, when the area of the candidate region is equal to or larger than a predetermined value, the object determination unit may determine that the candidate region is an image region of the object.

上記画像処理装置において、前記物体判定部は、前記候補領域が複数抽出された場合、面積が最大の候補領域について、該面積が前記所定値以上であるか否かを判定しても良い。 In the image processing apparatus, when a plurality of the candidate regions are extracted, the object determination unit may determine whether or not the area is equal to or larger than the predetermined value for the candidate region having the largest area.

上記画像処理装置において、前記物体はユーザの手であり、前記物体判定部は、前記候補領域の輪郭における凹凸を検出し、凸部が４つ以上存在し、且つ、前記輪郭の重心位置から各凸部までの距離の平均値が、前記特定点から各凹部までの距離の平均値の所定倍以上である場合に、前記候補領域が前記物体の像の領域であると判定しても良い。 In the image processing apparatus, the object is a user's hand, and the object determination unit detects irregularities in the contour of the candidate region, has four or more convex portions, and is each from the position of the center of gravity of the contour. When the average value of the distances to the convex portions is a predetermined times or more the average value of the distances from the specific points to the concave portions, it may be determined that the candidate region is the region of the image of the object.

上記画像処理装置において、前記色特徴量設定部は、前記特定点を含む前記所定範囲内に位置する複数の画素の色特徴量の平均値、中央値、又は最頻値を算出し、該平均値、中央値、又は最頻値を含む所定範囲の値を、前記物体を認識するための色特徴量として設定しても良い。 In the image processing apparatus, the color feature amount setting unit calculates an average value, a median value, or a mode value of color feature amounts of a plurality of pixels located within the predetermined range including the specific point, and the average value. A value in a predetermined range including a value, a median value, or a mode value may be set as a color feature amount for recognizing the object.

上記画像処理装置において、前記色特徴量は、少なくともＨＳＶ色空間における色相を含んでも良い。 In the image processing apparatus, the color feature amount may include at least hue in the HSV color space.

本発明の別の態様である画像処理方法は、特定の物体が写った画像から前記物体の像を認識する画像処理装置が実行する画像処理方法であって、前記画像処理装置は、当該画像処理装置が存在する空間に関する画像情報を取得する外部情報取得部を備え、前記画像情報に基づく画像内の所定の位置における複数の画素の色特徴量を用いて、前記画像から前記物体の像の領域を抽出し、該像の領域内の特定点の位置を取得するステップ（ａ）と、前記特定点を含む所定範囲内に位置する複数の画素の色特徴量に基づいて、前記物体を認識するための色特徴量を設定するステップ（ｂ）と、を含むものである。 An image processing method according to another aspect of the present invention is an image processing method executed by an image processing device that recognizes an image of the object from an image of a specific object, and the image processing device is the image processing. It is provided with an external information acquisition unit that acquires image information about the space in which the device exists, and a region of an image of the object from the image is used by using color features of a plurality of pixels at predetermined positions in the image based on the image information. Is extracted, and the object is recognized based on the step (a) of acquiring the position of a specific point in the region of the image and the color features of a plurality of pixels located within a predetermined range including the specific point. It includes a step (b) of setting a color feature amount for the purpose.

本発明のさらに別の態様である画像処理プログラムは、特定の物体が写った画像から前記物体の像を認識する画像処理装置に実行させる画像処理プログラムであって、前記画像処理装置は、当該画像処理装置が存在する空間に関する画像情報を取得する外部情報取得部を備え、前記画像情報に基づく画像内の所定の位置における複数の画素の色特徴量を用いて、前記画像から前記物体の像の領域を抽出し、該像の領域内の特定点の位置を取得するステップ（ａ）と、前記特定点を含む所定範囲内に位置する複数の画素の色特徴量に基づいて、前記物体を認識するための色特徴量を設定するステップ（ｂ）と、を実行させるものである。 An image processing program according to still another aspect of the present invention is an image processing program executed by an image processing device that recognizes an image of the object from an image of a specific object, and the image processing device is the image. An external information acquisition unit that acquires image information about the space in which the processing device exists is provided, and color features of a plurality of pixels at predetermined positions in the image based on the image information are used to obtain an image of the object from the image. The object is recognized based on the step (a) of extracting a region and acquiring the position of a specific point in the region of the image and the color features of a plurality of pixels located within a predetermined range including the specific point. The step (b) of setting the color feature amount for this purpose is executed.

本発明によれば、画像内の所定の位置における複数の画素の色特徴量を用いて、画像内から認識対象である物体の像の領域を暫定的に抽出すると共に、該領域内の特定点の周囲に位置する複数の画素の色特徴量に基づいて、上記物体を認識するための色特徴量を設定するので、画像に写った特定の物体の像を精度良く認識するための適切な色特徴量を設定することが可能となる。 According to the present invention, a region of an image of an object to be recognized is tentatively extracted from the image by using the color features of a plurality of pixels at predetermined positions in the image, and a specific point in the region is tentatively extracted. Since the color feature amount for recognizing the object is set based on the color feature amount of a plurality of pixels located around the image, an appropriate color for accurately recognizing the image of a specific object in the image. It is possible to set the feature amount.

本発明の実施形態に係る画像処理装置の概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the image processing apparatus which concerns on embodiment of this invention. 画像処理装置をユーザに装着させた状態を示す模式図である。It is a schematic diagram which shows the state which the image processing apparatus was attached to the user. 図１に示す表示部に表示される画面を例示する模式図である。It is a schematic diagram which illustrates the screen displayed on the display part shown in FIG. 図１に示す画像処理装置の動作を示すフローチャートである。It is a flowchart which shows the operation of the image processing apparatus shown in FIG. 図１に示す表示部に表示される画面を例示する模式図である。It is a schematic diagram which illustrates the screen displayed on the display part shown in FIG. 図１に示す表示部に表示される画面を例示する模式図である。It is a schematic diagram which illustrates the screen displayed on the display part shown in FIG. 手の像の重心位置の取得処理を示すフローチャートである。It is a flowchart which shows the acquisition process of the center of gravity position of a hand image. 手の像の重心位置の取得処理を説明するための模式図である。It is a schematic diagram for demonstrating the acquisition process of the center of gravity position of the hand image. 重心位置をもとに色特徴量の範囲を設定する処理を示すフローチャートである。It is a flowchart which shows the process of setting the range of a color feature amount based on the position of the center of gravity. 認識対象である手の像が抽出された状態を示す模式図である。It is a schematic diagram which shows the state which the image of the hand which is a recognition target is extracted.

以下、本発明の実施形態に係る表示装置について、図面を参照しながら説明する。なお、これらの実施形態によって本発明が限定されるものではない。また、各図面の記載において、同一部分には同一の符号を付して示している。 Hereinafter, the display device according to the embodiment of the present invention will be described with reference to the drawings. The present invention is not limited to these embodiments. Further, in the description of each drawing, the same parts are indicated by the same reference numerals.

以下の実施形態においては、一例として、ユーザに３次元的な空間を認識させるための画面が設けられる、所謂立体視が可能な画像処理装置に本発明を適用する場合を説明する。しかしながら、本発明はこれに限定されず、画像に写った特定の物体の像を認識して各種処理を行う様々な画像処理装置に適用することができる。 In the following embodiment, as an example, a case where the present invention is applied to a so-called stereoscopic image processing device provided with a screen for allowing a user to recognize a three-dimensional space will be described. However, the present invention is not limited to this, and can be applied to various image processing devices that recognize an image of a specific object in an image and perform various processes.

図１は、本発明の実施形態に係る画像処理装置の概略構成を示すブロック図である。本実施形態に係る画像処理装置１は、ユーザに両眼で画面を見させることにより、３次元的な空間を認識させる装置であり、図１に示すように、表示部１１と、記憶部１２と、各種演算処理を行う演算部１３と、当該画像処理装置１の外部に関する情報（以下、外部情報という）を取得する外部情報取得部１４とを備える。 FIG. 1 is a block diagram showing a schematic configuration of an image processing apparatus according to an embodiment of the present invention. The image processing device 1 according to the present embodiment is a device that allows the user to recognize the three-dimensional space by viewing the screen with both eyes, and as shown in FIG. 1, the display unit 11 and the storage unit 12 A calculation unit 13 that performs various calculation processes, and an external information acquisition unit 14 that acquires information (hereinafter, referred to as external information) regarding the outside of the image processing device 1 are provided.

図２は、画像処理装置１をユーザ２に装着させた状態を示す模式図である。画像処理装置１としては、スマートフォン、携帯情報端末（ＰＤＡ）、携帯型ゲーム装置のように、ディスプレイ及びカメラを備えた汎用の機器を用いることができる。画像処理装置１として例えばスマートフォンを用いる場合、ディスプレイが設けられた表側をユーザに向けてホルダー３に取り付けることにより、ユーザはハンズフリーでディスプレイを見ることができる。ホルダー３の内部には、ユーザの左右の眼に対応する位置に２つのレンズがそれぞれ取り付けられている。画像処理装置１に立体視用のコンテンツを表示させる場合、ユーザは、これらのレンズを通してディスプレイを見ることにより、ディスプレイに映った画像を３次元的な空間のように認識することができる。また、スマートフォンの背面には一般にカメラ４が設けられており、カメラ４によりユーザ２の周囲の空間を撮像して画像を取り込み、当該画像をそのままディスプレイに表示したり、当該画像を予め用意されたコンテンツ（動画や静止画）と合成して表示したりすることができる。 FIG. 2 is a schematic view showing a state in which the image processing device 1 is attached to the user 2. As the image processing device 1, a general-purpose device provided with a display and a camera, such as a smartphone, a personal digital assistant (PDA), and a portable game device, can be used. When a smartphone is used as the image processing device 1, for example, the user can view the display hands-free by attaching the display to the holder 3 with the front side provided with the display facing the user. Inside the holder 3, two lenses are attached at positions corresponding to the left and right eyes of the user. When displaying the content for stereoscopic viewing on the image processing device 1, the user can recognize the image displayed on the display as if it were a three-dimensional space by looking at the display through these lenses. In addition, a camera 4 is generally provided on the back surface of the smartphone, and the camera 4 captures the space around the user 2 to capture an image, and the image is displayed as it is on the display, or the image is prepared in advance. It can be combined with the content (video or still image) and displayed.

もっとも、画像処理装置１の外観は、図２に示すものに限定されない。例えば、画像処理装置とホルダーが一体化された立体視専用の画像処理装置を用いても良い。なお、このような専用の画像処理装置は、ヘッドマウントディスプレイとも呼ばれる。また、画像処理装置１としては、スマートフォンなどの小型携帯機器に限らず、例えば、デスクトップ型のパーソナルコンピュータにウェブカメラを接続して用いても良いし、カメラが内蔵されたノート型のパーソナルコンピュータやタブレット端末を用いても良い。 However, the appearance of the image processing device 1 is not limited to that shown in FIG. For example, an image processing device dedicated to stereoscopic viewing in which an image processing device and a holder are integrated may be used. Such a dedicated image processing device is also called a head-mounted display. Further, the image processing device 1 is not limited to a small portable device such as a smartphone, and may be used, for example, by connecting a webcam to a desktop personal computer, a notebook personal computer having a built-in camera, or the like. A tablet terminal may be used.

再び図１を参照すると、表示部１１は、例えば液晶又は有機ＥＬ（エレクトロルミネッセンス）によって形成された表示パネル及び駆動部を含むディスプレイである。図３は、表示部１１に表示される画面の例を示す模式図である。静止画又は動画をユーザに３次元的に認識させる際には、図３に示すように、表示部１１の表示パネルを２つの領域に分け、互いに視差を設けた２つの画像をこれらの領域１１ａ、１１ｂに表示する。ユーザ２は、領域１１ａ、１１ｂに表示された２つの画像を左右の眼でそれぞれ見ることにより、３次元的な空間を認識することができる。 Referring to FIG. 1 again, the display unit 11 is a display including a display panel and a drive unit formed of, for example, a liquid crystal or an organic EL (electroluminescence). FIG. 3 is a schematic diagram showing an example of a screen displayed on the display unit 11. When the user is made to recognize a still image or a moving image three-dimensionally, as shown in FIG. 3, the display panel of the display unit 11 is divided into two areas, and two images having parallax with each other are divided into these areas 11a. , 11b. The user 2 can recognize the three-dimensional space by viewing the two images displayed in the areas 11a and 11b with the left and right eyes, respectively.

記憶部１２は、例えばＲＯＭやＲＡＭといった半導体メモリ等のコンピュータ読取可能な記憶媒体である。記憶部１２は、オペレーティングシステムプログラム及びドライバプログラムに加えて、各種機能を実行するアプリケーションプログラムや、これらのプログラムの実行中に使用される各種パラメータ等を記憶するプログラム記憶部１２１と、画像に写った特定の物体の像を認識する画像処理において使用される色特徴量を記憶する色特徴量記憶部１２２とを有する。この他、記憶部１２は、表示部１１に表示される各種コンテンツの画像データ及び音声データや、上記カメラ４により取得された画像の画像データ等を記憶しても良い。 The storage unit 12 is a computer-readable storage medium such as a semiconductor memory such as a ROM or RAM. In addition to the operating system program and the driver program, the storage unit 12 is an image of an application program that executes various functions, a program storage unit 121 that stores various parameters used during execution of these programs, and the like. It has a color feature amount storage unit 122 that stores a color feature amount used in image processing for recognizing an image of a specific object. In addition, the storage unit 12 may store image data and audio data of various contents displayed on the display unit 11, image data of the image acquired by the camera 4, and the like.

演算部１３は、例えばＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）を用いて構成され、プログラム記憶部１２１に記憶された各種プログラムを読み込むことにより、画像処理装置１の各部を統括的に制御すると共に、種々の画像を表示するための各種演算処理を実行する。演算部１３の詳細な構成については後述する。 The arithmetic unit 13 is configured by using, for example, a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit), and by reading various programs stored in the program storage unit 121, each unit of the image processing device 1 is integrated. While controlling it, it executes various arithmetic processes for displaying various images. The detailed configuration of the calculation unit 13 will be described later.

外部情報取得部１４は、画像処理装置１の周囲の現実空間に関する画像情報（画像データ）を取得する。外部情報取得部１４の構成は、周囲の物体の位置や動きを検出可能な構成であれば特に限定されず、例えば光学カメラ、赤外線カメラ、超音波発信機及び受信機等を外部情報取得部１４として用いることができる。例えば図２に示すようなスマートフォンを画像処理装置１として用いる場合、スマートフォンに内蔵されたカメラ４が外部情報取得部１４として用いられる。 The external information acquisition unit 14 acquires image information (image data) related to the real space around the image processing device 1. The configuration of the external information acquisition unit 14 is not particularly limited as long as it can detect the position and movement of surrounding objects. For example, an optical camera, an infrared camera, an ultrasonic transmitter, a receiver, and the like are included in the external information acquisition unit 14. Can be used as. For example, when a smartphone as shown in FIG. 2 is used as the image processing device 1, the camera 4 built in the smartphone is used as the external information acquisition unit 14.

次に、演算部１３の詳細な構成について説明する。演算部１３は、プログラム記憶部１２１に記憶された画像処理プログラムを読み込むことにより、外部情報取得部１４によって取得された画像情報に基づいて画像を生成すると共に、該画像に写った特定の物体の像を認識する画像処理を実行する。 Next, the detailed configuration of the calculation unit 13 will be described. By reading the image processing program stored in the program storage unit 121, the calculation unit 13 generates an image based on the image information acquired by the external information acquisition unit 14, and also generates an image of a specific object captured in the image. Performs image processing to recognize the image.

図１に示すように、演算部１３は、画像に写った特定の物体の像を認識する際に用いられる色特徴量（以下、物体認識用の色特徴量ともいう）を設定する重心位置取得部１３１及び色特徴量設定部１３２と、設定された色特徴量を用いて画像内の特定の物体の像を認識する物体認識部１３３と、表示部１１に表示される画像の合成処理を行う画像合成部１３４とを有する。言い換えると、重心位置取得部１３１及び色特徴量設定部１３２は、物体認識部１３３において用いられる色特徴量のキャリブレーションを行う。 As shown in FIG. 1, the calculation unit 13 acquires the position of the center of gravity for setting a color feature amount (hereinafter, also referred to as a color feature amount for object recognition) used when recognizing an image of a specific object captured in an image. The image displayed on the display unit 11 is combined with the unit 131 and the color feature setting unit 132, the object recognition unit 133 that recognizes the image of a specific object in the image using the set color feature amount, and the display unit 11. It has an image compositing unit 134. In other words, the center of gravity position acquisition unit 131 and the color feature amount setting unit 132 calibrate the color feature amount used in the object recognition unit 133.

重心位置取得部１３１は、画像から特定の物体の像の領域を抽出し、該領域内の特定点を取得する特定点取得部である。本実施形態においては、抽出された領域の重心位置を特定点として取得する。詳細には、重心位置取得部１３１は、物体領域抽出部１３１ａと、物体判定部１３１ｂと、重心決定部１３１ｃとを有する。このうち、物体領域抽出部１３１ａは、画像内の所定の位置における複数の画素の色特徴量に基づいて、特定の物体の像の候補領域を画像内から抽出する。物体判定部１３１ｂは、物体領域抽出部１３１ａにより抽出された候補領域が、特定の物体の像の領域であるか否かを判定する。重心決定部１３１ｃは、特定の像の領域と判定された候補領域の重心位置を特定点として出力する特定点決定部である。 The center of gravity position acquisition unit 131 is a specific point acquisition unit that extracts a region of an image of a specific object from an image and acquires a specific point in the region. In the present embodiment, the position of the center of gravity of the extracted region is acquired as a specific point. Specifically, the center of gravity position acquisition unit 131 includes an object region extraction unit 131a, an object determination unit 131b, and a center of gravity determination unit 131c. Among them, the object region extraction unit 131a extracts a candidate region of an image of a specific object from the image based on the color features of a plurality of pixels at predetermined positions in the image. The object determination unit 131b determines whether or not the candidate region extracted by the object region extraction unit 131a is an image region of a specific object. The center of gravity determination unit 131c is a specific point determination unit that outputs the position of the center of gravity of the candidate region determined to be the region of the specific image as a specific point.

色特徴量設定部１３２は、特定点（重心位置）及びその近傍の複数の画素の色特徴量に基づいて、物体認識用の色特徴量を設定する。詳細には、色特徴量設定部１３２は、特定点（重心位置）を含む所定範囲内に位置する複数の画素の色特徴量の平均値を算出し、該平均値を含む所定範囲の値を物体認識用の色特徴量として設定する。 The color feature amount setting unit 132 sets the color feature amount for object recognition based on the color feature amount of a plurality of pixels at the specific point (center of gravity position) and its vicinity. Specifically, the color feature amount setting unit 132 calculates an average value of color feature amounts of a plurality of pixels located within a predetermined range including a specific point (center of gravity position), and sets a value in a predetermined range including the average value. It is set as a color feature amount for object recognition.

物体認識部１３３は、色特徴量設定部１３２により設定された物体認識用の色特徴量を有する画素を画像内から抽出することにより、画像に写った特定の物体の像を認識する。 The object recognition unit 133 recognizes an image of a specific object in the image by extracting pixels having the color feature amount for object recognition set by the color feature amount setting unit 132 from the image.

画像合成部１３４は、現実空間の画像に対し、物体認識用の色特徴量を設定する際に用いられるオブジェクトを合成したり、物体認識部１３３により認識された物体の像を予め用意されたコンテンツの画像に合成するといった画像処理を行う。 The image synthesizing unit 134 synthesizes an object used when setting a color feature amount for object recognition with an image in the real space, or prepares an image of the object recognized by the object recognizing unit 133 in advance. Performs image processing such as compositing with the image of.

次に、画像処理装置１の動作について説明する。
本実施形態において、認識対象とする物体は特に限定されず、例えば、ユーザの手や指、スタイラスペンなど種々の物体を認識対象とすることができる。ただし、物体認識用の色特徴量を設定するため、認識対象の物体の一般的な形状の特徴を予め記憶部１２に記憶させておく。例えば、人の手には、通常、指先に相当する凸部と、指と指の間に相当する凹部とが存在し、且つ、手のひらの中の１点（例えば重心）から各凸部までの距離が、同じ点から各凹部までの距離よりも長いという特徴がある。この場合、人の手の一般的形状の特徴を、輪郭に少なくとも４つの凸部を有し、且つ、重心位置から各凸部までの距離の平均値が重心位置から各凹部までの距離の平均値の所定倍と表すことができる。そこで、このような特徴を記憶部１２に記憶させておく。以下においては、ユーザの手を認識対象として説明する。Next, the operation of the image processing device 1 will be described.
In the present embodiment, the object to be recognized is not particularly limited, and various objects such as a user's hand, finger, and stylus pen can be recognized. However, in order to set the color feature amount for object recognition, the general shape feature of the object to be recognized is stored in the storage unit 12 in advance. For example, a human hand usually has a convex portion corresponding to a fingertip and a concave portion corresponding to between fingers, and from one point (for example, the center of gravity) in the palm to each convex portion. The feature is that the distance is longer than the distance from the same point to each recess. In this case, the characteristics of the general shape of a human hand are that the contour has at least four convex portions, and the average value of the distance from the center of gravity position to each convex portion is the average of the distances from the center of gravity position to each concave portion. It can be expressed as a predetermined multiple of the value. Therefore, such a feature is stored in the storage unit 12. In the following, the user's hand will be described as a recognition target.

また、物体認識用の色特徴量の種類は特に限定されず、例えば、画素値（ＲＧＢ値）、色比、色差、ＨＳＶ色空間における色相、明度、及び彩度など、種々の値を用いることができる。以下においては、画素値（ＲＧＢ値）をＨＳＶ色空間における色相、彩度、明度に変換した各値を色特徴量として用いる場合を説明する。 The type of color feature amount for object recognition is not particularly limited, and various values such as pixel value (RGB value), color ratio, color difference, hue in HSV color space, lightness, and saturation may be used. Can be done. In the following, a case where each value obtained by converting a pixel value (RGB value) into hue, saturation, and lightness in the HSV color space is used as a color feature amount will be described.

ここで、ＲＧＢ値を用いる場合には３つの値で色合いを表す必要があるのに対し、ＨＳＶ色空間における色相を用いる場合には１つの値で色合いを表すことができるという利点がある。また、色相を用いる場合、画像の明るさの変動や、認識対象の物体への光の当たり方の違いなどの影響を抑制することができる。 Here, when RGB values are used, it is necessary to express the hue with three values, whereas when using the hue in the HSV color space, there is an advantage that the hue can be expressed with one value. In addition, when hue is used, it is possible to suppress the influence of fluctuations in the brightness of the image and differences in the way light hits the object to be recognized.

図４は、画像処理装置１の動作を示すフローチャートである。図５及び図６は、表示部１１に表示される画面を例示する模式図であり、外部情報取得部１４により取得された外部空間の画像が画面内の２つの領域１１ａ、１１ｂに表示されている状態を示している。 FIG. 4 is a flowchart showing the operation of the image processing device 1. 5 and 6 are schematic views illustrating the screen displayed on the display unit 11, and the image of the external space acquired by the external information acquisition unit 14 is displayed in the two areas 11a and 11b in the screen. Indicates the state of being.

まず、ステップＳ１０において、画像合成部１３４は、画面に手型の枠ｍ１を描画する（図５参照）。詳細には、画面内の各領域１１ａ、１１ｂに表示された外部空間の画像に、手を形取ったオブジェクトを合成して表示する。この際、画像合成部１３４は、枠ｍ１にユーザの手を合わせるよう指示するメッセージｍ２を画面（各領域１１ａ、１１ｂ）に表示しても良い。ユーザは、外部情報取得部１４（例えば図２に示すカメラ４）に対する自身の手の位置を調節することにより、図６に示すように、画面（各領域１１ａ、１１ｂ）に写った手の像ｍ４を枠ｍ１に概ね合わせる。 First, in step S10, the image synthesizing unit 134 draws a hand-shaped frame m1 on the screen (see FIG. 5). Specifically, the object in the shape of a hand is combined with the image of the external space displayed in each of the areas 11a and 11b in the screen and displayed. At this time, the image synthesizing unit 134 may display a message m2 instructing the frame m1 to put the user's hand on the screen (each area 11a, 11b). The user adjusts the position of his / her hand with respect to the external information acquisition unit 14 (for example, the camera 4 shown in FIG. 2), and as shown in FIG. 6, the image of the hand reflected on the screen (each area 11a, 11b). Align m4 with the frame m1.

続くステップＳ１２において、重心位置取得部１３１は、画面（各領域１１ａ、１１ｂ）に写った手の像ｍ４の重心位置を取得する。図７は、手の像の重心位置の取得処理を示すフローチャートである。図８は、手の像の重心位置の取得処理を説明するための模式図である。 In the following step S12, the center-of-gravity position acquisition unit 131 acquires the position of the center of gravity of the hand image m4 captured on the screen (each region 11a, 11b). FIG. 7 is a flowchart showing a process of acquiring the position of the center of gravity of the hand image. FIG. 8 is a schematic diagram for explaining the process of acquiring the position of the center of gravity of the hand image.

まず、ステップＳ１２０において、重心位置取得部１３１は、外部情報取得部１４により撮像された現実空間の画像を取得し、記憶部１２に一時的に記憶させる。 First, in step S120, the center of gravity position acquisition unit 131 acquires an image of the real space captured by the external information acquisition unit 14, and temporarily stores it in the storage unit 12.

続くステップＳ１２１において、物体領域抽出部１３１ａは、予め設定された画像内の所定の座標のエリアｍ３内に位置する画素の画素値（ＲＧＢ値）を取得する。ここで、エリアｍ３は、手型の枠ｍ１の内側に収まるように設定されているため、ユーザが自身の手の像ｍ４を枠ｍ１に合わせると、大抵の場合、少なくともエリアｍ３内はユーザの手の像ｍ４で占められることになる。 In the following step S121, the object area extraction unit 131a acquires the pixel value (RGB value) of the pixel located in the area m3 of the predetermined coordinates in the preset image. Here, since the area m3 is set so as to fit inside the hand-shaped frame m1, when the user adjusts the image m4 of his / her hand to the frame m1, in most cases, at least the inside of the area m3 is the user. It will be occupied by the hand image m4.

続くステップＳ１２２において、物体領域抽出部１３１ａは、エリアｍ３内に位置する複数の画素の各々の画素値（ＲＧＢ値）を、ＨＳＶ色空間における色特徴量（即ち、色相、彩度、明度）に変換する。 In the following step S122, the object area extraction unit 131a converts the pixel values (RGB values) of the plurality of pixels located in the area m3 into the color feature amounts (that is, hue, saturation, brightness) in the HSV color space. Convert.

続くステップＳ１２３において、物体領域抽出部１３１ａは、エリアｍ３内に位置する複数の画素の色特徴量のヒストグラムを作成する。具体的には、ステップＳ１２２において変換された色相、彩度、明度の各々についてヒストグラムを作成する。 In the following step S123, the object area extraction unit 131a creates a histogram of the color features of a plurality of pixels located in the area m3. Specifically, a histogram is created for each of the hue, saturation, and lightness converted in step S122.

続くステップＳ１２４において、物体領域抽出部１３１ａは、ステップＳ１２３において作成されたヒストグラムに基づいて、手の像ｍ４を抽出するための色特徴量の範囲を設定する。詳細には、各色特徴量（色相、彩度、明度）のヒストグラムにおいて、最大頻度の色特徴量を含む所定の範囲を、手の像ｍ４を抽出するための色特徴量の範囲として設定する。例えば、頻度が１番目から所定番目の階級に含まれる色相、彩度、明度の範囲を設定する。或いは、色相の最頻値±Δ°、彩度又は明度の最頻値±Δ％といった範囲を設定しても良い（符号Δは任意の値）。 In the following step S124, the object area extraction unit 131a sets the range of the color feature amount for extracting the hand image m4 based on the histogram created in step S123. Specifically, in the histogram of each color feature amount (hue, saturation, lightness), a predetermined range including the maximum frequency color feature amount is set as a range of the color feature amount for extracting the hand image m4. For example, the range of hue, saturation, and lightness included in the first to predetermined frequency classes is set. Alternatively, a range such as the mode of hue ± Δ ° and the mode of saturation or lightness ± Δ% may be set (the symbol Δ is an arbitrary value).

続くステップＳ１２５において、物体領域抽出部１３１ａは、ステップＳ１２４において設定された色特徴量を有する画素を、画像全体から抽出する。詳細には、ステップＳ１２４において決定された色特徴量の範囲の両端を閾値として二値化処理を実行する。これにより、図６に示すエリアｍ３内の画素の典型的な色特徴量に近似する領域が画像から抽出される。このような処理は、ヒストグラムのバックプロジェクションとも呼ばれる。 In the following step S125, the object area extraction unit 131a extracts the pixel having the color feature amount set in step S124 from the entire image. Specifically, the binarization process is executed with both ends of the range of the color feature amount determined in step S124 as threshold values. As a result, a region in the area m3 shown in FIG. 6 that approximates the typical color feature amount of the pixel is extracted from the image. Such processing is also called histogram backprojection.

続くステップＳ１２６において、物体判定部１３１ｂは、ステップＳ１２５において抽出された領域に対して輪郭追跡を行うことにより、輪郭を抽出する。詳細には、二値画像において値を持つ画素（ステップＳ１２５において抽出された画素）を黒画素、それ以外の画素を白画素とすると、まず、ラスタスキャンによって白画素から黒画素に変化する画素点を探索する。黒画素が見つかると、探索した方向（進入方向）を起点に当該黒画素の周囲を右回りして黒画素を探索する。続いて、見つかった黒画素に移動し、進入方向を起点に当該黒画素の周囲を右回りして黒画素を探索する。この処理を繰り返し、見つかった黒画素が開始点（最初に見つかった黒画素）であり、且つ、次の移動点が既に探索済みである場合、処理を終了する。 In the following step S126, the object determination unit 131b extracts the contour by tracking the contour of the region extracted in step S125. Specifically, assuming that the pixel having a value in the binary image (the pixel extracted in step S125) is a black pixel and the other pixels are white pixels, first, a pixel point that changes from a white pixel to a black pixel by raster scanning. To explore. When a black pixel is found, the black pixel is searched by turning clockwise around the black pixel starting from the searched direction (approach direction). Subsequently, the black pixel is moved to the found black pixel, and the black pixel is searched for by turning clockwise around the black pixel starting from the approach direction. This process is repeated, and if the found black pixel is the start point (the first black pixel found) and the next moving point has already been searched, the process ends.

ここまでの処理により、図８に示すように、手の像に近似する領域（輪郭ｍ５参照）を抽出できるものの、それと共に、手の像の色に近い領域（輪郭ｍ６、ｍ７参照）が抽出されることがある。以下、抽出された輪郭によって囲まれる領域を候補領域という。 By the processing up to this point, as shown in FIG. 8, a region close to the image of the hand (see contour m5) can be extracted, but at the same time, a region close to the color of the image of the hand (see contours m6 and m7) can be extracted. May be done. Hereinafter, the area surrounded by the extracted contour is referred to as a candidate area.

ステップＳ１２７において、物体判定部１３１ｂは、複数の候補領域が抽出された場合、面積が最大の候補領域を選択する。ここで、ステップＳ１２５において用いられる色特徴量は、エリアｍ３に重ねられた手の像ｍ４（図６参照）の色特徴量をもとに設定されているので、通常、図８に示すように、手の像ｍ４に含まれる領域の輪郭ｍ５が、他の物体の像に含まれる領域の輪郭ｍ６、ｍ７よりも大きくなるからである。 In step S127, when a plurality of candidate regions are extracted, the object determination unit 131b selects the candidate region having the largest area. Here, the color feature amount used in step S125 is set based on the color feature amount of the hand image m4 (see FIG. 6) superimposed on the area m3, and therefore is usually as shown in FIG. This is because the contour m5 of the region included in the hand image m4 is larger than the contours m6 and m7 of the region included in the image of another object.

続くステップＳ１２８において、物体判定部１３１ｂは、選択された候補領域の面積が所定値以上であるか否かを判定する。ここで、手の像ｍ４は手型の枠ｍ１に合わせるように配置されるので、ステップＳ１２７において選択された候補領域が手の像ｍ４に含まれる領域であれば、ある程度の大きさ以上になるはずだからである。このように面積で判別することにより、万が一、画像内に色が類似する他人の手の像が写り込んでいたとしても、当該他人の像に基づいて物体認識用の色特徴量が設定されてしまう事態を防ぐことができる。この判定において用いられる所定値は、手型の枠ｍ１の大きさに基づいて適宜設定される。 In the following step S128, the object determination unit 131b determines whether or not the area of the selected candidate region is equal to or larger than a predetermined value. Here, since the hand image m4 is arranged so as to fit the hand-shaped frame m1, if the candidate area selected in step S127 is included in the hand image m4, the size is larger than a certain size. Because it should be. By discriminating by area in this way, even if an image of another person's hand with similar colors is reflected in the image, the color feature amount for object recognition is set based on the image of the other person. It is possible to prevent the situation where it ends up. The predetermined value used in this determination is appropriately set based on the size of the hand-shaped frame m1.

選択された候補領域の面積が所定値以上である場合（ステップＳ１２８：Ｙｅｓ）、物体判定部１３１ｂは、当該候補領域の輪郭の凹凸を判定する（ステップＳ１２９）。詳細には、当該候補領域の輪郭から角点を検出し、各角点が凹部であるか凸部であるかを判別する。角点の検出方法は公知の種々の方法を用いることができる。一例として、輪郭上の任意の点Ｐ_iと、該点Ｐ_iに対してｋ画素分だけ輪郭に沿って前後に離れた２つの点Ｐ_i-k、Ｐ_i+kとをそれぞれ結ぶ線分同士のなす角度が１８０°よりも小さいとき、又は１８０°よりも大きいとき、上記点Ｐ_iが角点として検出される。そして、検出された角点において、線分Ｐ_i-kＰ_iと線分Ｐ_iＰ_i+kとのなす角度が鋭角である場合、当該角点は凸部と判定される。また、上記角度が１８０°を超える場合、当該角点は凹部であると判定される。なお、上記角度が９０°より大きく１８０°未満（即ち鈍角）の角点は、厳密には凸部であるが、ステップＳ１２９においては、ユーザの指に対応する領域を検出する趣旨から、鈍角の角点は凸部とみなさないこととする。図８に示す輪郭ｍ５においては、角点Ｐ１、Ｐ２、Ｐ３、Ｐ４、Ｐ５が凸部と判別され、角点Ｑ１、Ｑ２、Ｑ３、Ｑ４が凹部と判別される。When the area of the selected candidate region is equal to or larger than a predetermined value (step S128: Yes), the object determination unit 131b determines the unevenness of the contour of the candidate region (step S129). Specifically, corner points are detected from the contour of the candidate region, and it is determined whether each corner point is a concave portion or a convex portion. As a method for detecting a corner point, various known methods can be used. As an example, a P _i arbitrary point on the contour, the two points P _ik apart back and forth along the k pixels outline only against the point P _i, between a line segment connecting each and P i _{+ k} When the angle formed is smaller than 180 ° or larger than 180 °, the point P _i is detected as a square point. Then, when the angle formed by the line segment P _ik P _i and the line segment P _i P _{i + k} at the detected corner point is an acute angle, the corner point is determined to be a convex portion. If the angle exceeds 180 °, it is determined that the angle is a recess. Strictly speaking, an angle point having an angle greater than 90 ° and less than 180 ° (that is, an obtuse angle) is a convex portion, but in step S129, the obtuse angle is intended to detect an area corresponding to the user's finger. The corner points are not regarded as convex parts. In the contour m5 shown in FIG. 8, the corner points P1, P2, P3, P4, and P5 are determined to be convex portions, and the corner points Q1, Q2, Q3, and Q4 are determined to be concave portions.

続くステップＳ１３０において、物体判定部１３１ｂは、判別対象の輪郭において凸部が４つ以上存在するか否かを判定する。４本以上とするのは、指同士が近接しているなどの場合に、５本全ての指を凸部として検出できない可能性を考慮しているためである。 In the following step S130, the object determination unit 131b determines whether or not there are four or more convex portions in the contour of the determination target. The reason why the number of fingers is four or more is that it is possible that all five fingers cannot be detected as convex parts when the fingers are close to each other.

凸部が４つ以上存在する場合（ステップＳ１３０：Ｙｅｓ）、物体判定部１３１ｂは、当該候補領域の重心位置（重心点Ｇ）を算出する（ステップＳ１３１）。 When four or more convex portions are present (step S130: Yes), the object determination unit 131b calculates the position of the center of gravity (center of gravity point G) of the candidate region (step S131).

続くステップＳ１３２において、物体判定部１３１ｂは、重心位置から各凹部、及び各凸部までの距離を算出する。例えば、図８においては、重心点Ｇと、角点Ｐ１、Ｐ２、Ｐ３、Ｐ４、Ｐ５、Ｑ１、Ｑ２、Ｑ３、Ｑ４との間の距離がそれぞれ算出される。 In the following step S132, the object determination unit 131b calculates the distance from the position of the center of gravity to each concave portion and each convex portion. For example, in FIG. 8, the distances between the center of gravity point G and the angle points P1, P2, P3, P4, P5, Q1, Q2, Q3, and Q4 are calculated, respectively.

続くステップＳ１３３において、物体判定部１３１ｂは、重心位置から凸部までの距離の平均値（平均距離）が、重心位置から凹部までの距離の平均値（同上）の所定倍以上であるか否かを判定する。ここで、人の手においては、手のひらの概ね中心にある重心位置から指先までの距離は、重心位置から手のひらの端部までの距離よりも長く、個人差はあるが、一般には少なくとも１．２倍以上と言える。そこで、図８に示すように、重心点Ｇから凸部である角点Ｐ１、Ｐ２、Ｐ３、Ｐ４、Ｐ５までの平均距離と、重心点Ｇから凹部である角点Ｑ１、Ｑ２、Ｑ３、Ｑ４までの平均距離とを算出し、両者の比率に基づいて、抽出された候補領域の形状が手の形状に近似しているか否かを判断する。 In the following step S133, the object determination unit 131b determines whether or not the average value (average distance) of the distance from the center of gravity position to the convex portion is a predetermined times or more the average value (same as above) of the distance from the center of gravity position to the concave portion. To judge. Here, in a human hand, the distance from the center of gravity position at the approximate center of the palm to the fingertip is longer than the distance from the center of gravity position to the end of the palm, and although there are individual differences, it is generally at least 1.2. It can be said that it is more than double. Therefore, as shown in FIG. 8, the average distance from the center of gravity point G to the convex portions P1, P2, P3, P4, and P5 and the concave corner points Q1, Q2, Q3, and Q4 from the center of gravity point G The average distance to is calculated, and based on the ratio of both, it is determined whether or not the shape of the extracted candidate region is close to the shape of the hand.

重心位置から凸部までの平均距離が重心位置から凹部までの平均距離の所定倍以上である場合（ステップＳ１３３：Ｙｅｓ）、物体判定部１３１ｂは、選択された候補領域が手の像の領域であると判断する。この場合、重心決定部１３１ｃは、候補領域の重心位置を手の像の重心位置として決定し、出力する（ステップＳ１３４）。その後、処理はメインルーチンに戻る。 When the average distance from the center of gravity position to the convex portion is equal to or more than a predetermined time of the average distance from the center of gravity position to the concave portion (step S133: Yes), in the object determination unit 131b, the selected candidate region is the region of the hand image. Judge that there is. In this case, the center of gravity determination unit 131c determines the position of the center of gravity of the candidate region as the position of the center of gravity of the hand image and outputs it (step S134). After that, the process returns to the main routine.

他方、ステップＳ１２７において選択された候補領域の面積が所定値未満である場合（ステップＳ１２８：Ｎｏ）、選択された候補領域の輪郭の凸部が３つ以下である場合（ステップＳ１３０：Ｎｏ）、又は、重心位置から凸部までの平均距離が重心位置から凹部までの平均距離の所定倍未満である場合（ステップＳ１３３：Ｎｏ）、処理はステップＳ１２０に戻る。 On the other hand, when the area of the candidate region selected in step S127 is less than a predetermined value (step S128: No), and when the contour of the selected candidate region has three or less convex portions (step S130: No). Alternatively, when the average distance from the center of gravity position to the convex portion is less than a predetermined time of the average distance from the center of gravity position to the concave portion (step S133: No), the process returns to step S120.

再び図４を参照すると、ステップＳ１２に続くステップＳ１４において、色特徴量設定部１３２は、ステップＳ１２において取得された重心位置をもとに、物体認識用の色特徴量を設定する。図９は、重心位置をもとに物体認識用の色特徴量を設定する処理を示すフローチャートである。 Referring to FIG. 4 again, in step S14 following step S12, the color feature amount setting unit 132 sets the color feature amount for object recognition based on the position of the center of gravity acquired in step S12. FIG. 9 is a flowchart showing a process of setting a color feature amount for object recognition based on the position of the center of gravity.

まず、ステップＳ１４０において、色特徴量設定部１３２は、ステップＳ１２０（図７参照）において取得された画像から、ステップＳ１２において取得された重心位置及びその近傍の複数の画素の画素値（ＲＧＢ値）を取得する。画素値を取得する画素の範囲は、例えば、重心位置から数画素〜数十画素の範囲内などと、予め設定しておく。 First, in step S140, the color feature amount setting unit 132 determines the pixel values (RGB values) of the center of gravity position and the plurality of pixels in the vicinity thereof acquired in step S12 from the image acquired in step S120 (see FIG. 7). To get. The range of pixels for which the pixel value is acquired is set in advance, for example, within the range of several pixels to several tens of pixels from the position of the center of gravity.

続くステップＳ１４１において、色特徴量設定部１３２は、ステップＳ１４０において取得された複数の画素の各々の画素値（ＲＧＢ値）を、ＨＳＶ色空間における色特徴量（色相、彩度、明度）に変換する。 In the following step S141, the color feature amount setting unit 132 converts each pixel value (RGB value) of the plurality of pixels acquired in step S140 into a color feature amount (hue, saturation, lightness) in the HSV color space. To do.

続くステップＳ１４２において、色特徴量設定部１３２は、ステップＳ１４０において取得された複数の画素の間における色特徴量の平均値を算出する。具体的には、色相、彩度、明度の各々について平均値を算出する。 In the following step S142, the color feature amount setting unit 132 calculates the average value of the color feature amounts among the plurality of pixels acquired in step S140. Specifically, the average value is calculated for each of hue, saturation, and lightness.

続くステップＳ１４３において、色特徴量設定部１３２は、ステップＳ１４２において算出された平均値をもとに、物体認識用の色特徴量を設定する。詳細には、色相、彩度、明度の各々について、平均値を含む所定の範囲を物体認識用の色特徴量として設定する。例えば、色相の平均値±Δ°、彩度又は明度の平均値±Δ％といった範囲が設定される（符号Δは任意の値）。その後、処理はメインルーチンに戻る。 In the following step S143, the color feature amount setting unit 132 sets the color feature amount for object recognition based on the average value calculated in step S142. Specifically, for each of hue, saturation, and lightness, a predetermined range including an average value is set as a color feature amount for object recognition. For example, a range such as an average value of hue ± Δ ° and an average value of saturation or lightness ± Δ% is set (the symbol Δ is an arbitrary value). After that, the process returns to the main routine.

図４のステップＳ１６において、物体認識部１３３は、ステップＳ１４において設定された物体認識用の色特徴量を用いて、外部情報取得部１４によって取得された画像から画素を抽出する。このようにして抽出された画素の領域が、手の像として認識される。図１０は、認識対象である手の像ｍ８が抽出された状態を示す模式図である。 In step S16 of FIG. 4, the object recognition unit 133 extracts pixels from the image acquired by the external information acquisition unit 14 by using the color feature amount for object recognition set in step S14. The pixel area extracted in this way is recognized as a hand image. FIG. 10 is a schematic view showing a state in which the image m8 of the hand to be recognized is extracted.

以上説明したように、本実施形態によれば、画像内において予め設定されたエリアｍ３内の画素の画素値をもとに暫定的に設定された色特徴量を用いて、手の像の候補領域を抽出し、この候補領域が手の形状の特徴を有すると判断される場合に、候補領域の重心位置及びその近傍の画素の画素値をもとに、物体認識用の色特徴量を設定するので、このように設定された物体認識用の色特徴量を用いることにより、画像内における手の像を精度良く認識することが可能となる。特に、本実施形態のように、色特徴量に関して個人差や場所ごとの差が大きいユーザの手を認識対象とする場合であっても、ユーザの手に合った物体認識用の色特徴量を精度良く設定することができる。 As described above, according to the present embodiment, a candidate for a hand image is used by using a color feature amount tentatively set based on the pixel values of the pixels in the area m3 set in advance in the image. When an area is extracted and it is determined that this candidate area has hand-shaped features, the color feature amount for object recognition is set based on the position of the center of gravity of the candidate area and the pixel values of the pixels in the vicinity thereof. Therefore, by using the color feature amount for object recognition set in this way, it is possible to accurately recognize the image of the hand in the image. In particular, as in the present embodiment, even when the user's hand, which has a large individual difference or a difference between places with respect to the color feature amount, is to be recognized, the color feature amount for object recognition suitable for the user's hand can be obtained. It can be set with high accuracy.

上記実施形態においては、特定点として、画像から抽出された認識対象の物体の像の領域の重心位置を取得することとしたが、特定点はこれに限定されない。例えば、認識対象の物体の像の領域の水平方向の中心線と垂直方向の中心線とが交差する点や、角点、或いは、角点同士を結んだ対角線の交差点などを特定点としても良い。特定点は、認識対象とする物体に応じて、該物体の典型的な色が現れる部分を適宜設定すれば良い。 In the above embodiment, as a specific point, the position of the center of gravity of the region of the image of the object to be recognized extracted from the image is acquired, but the specific point is not limited to this. For example, a point where the horizontal center line and the vertical center line of the image region of the object to be recognized intersect, a corner point, or a diagonal intersection connecting the corner points may be set as a specific point. .. As the specific point, a portion where a typical color of the object appears may be appropriately set according to the object to be recognized.

また、上記実施形態においては、色特徴量としてＨＳＶ色空間における色相、彩度、及び明度を用いたが、それ以外の値を用いても良い。例えば画素のＲＧＢ値を用いる場合、手の像の重心位置を取得する際には（図７参照）、ステップＳ１２２を省略し、Ｒ値、Ｇ値、Ｂ値それぞれについてヒストグラムを作成し（ステップＳ１２３）、これらのヒストグラムに基づいてＲ値、Ｇ値、Ｂ値それぞれの範囲を設定し（ステップＳ１２４）、設定された範囲内のＲ値、Ｇ値、Ｂ値を有する画素を抽出すれば良い（ステップＳ１２５）。また、重心位置をもとに物体認識用の色特徴量を設定する際には（図９参照）、ステップＳ１４１を省略し、Ｒ値、Ｇ値、Ｂ値それぞれについて平均値を算出し（ステップＳ１４２）、これらの平均値をもとに、Ｒ値、Ｇ値、Ｂ値の範囲を設定すれば良い（ステップＳ１４３）。或いは、色特徴量として、ＨＳＶ色空間における色相のみを用いることとしても良いし、色相と彩度、又は色相と明度といった組み合わせを用いても良い。 Further, in the above embodiment, the hue, saturation, and lightness in the HSV color space are used as the color feature amount, but other values may be used. For example, when using the RGB values of pixels, when acquiring the position of the center of gravity of the hand image (see FIG. 7), step S122 is omitted, and histograms are created for each of the R value, G value, and B value (step S123). ), Each range of R value, G value, and B value is set based on these histograms (step S124), and pixels having R value, G value, and B value within the set range may be extracted (step S124). Step S125). Further, when setting the color feature amount for object recognition based on the position of the center of gravity (see FIG. 9), step S141 is omitted and the average value is calculated for each of the R value, G value, and B value (step). S142), the range of the R value, the G value, and the B value may be set based on these average values (step S143). Alternatively, as the color feature amount, only the hue in the HSV color space may be used, or a combination such as hue and saturation, or hue and lightness may be used.

また、上記実施形態においては、画像から抽出された認識対象の物体の像の特定点及びその近傍の画素の色特徴量の平均値を算出し（ステップＳ１４２）、この平均値をもとに物体認識用の色特徴量を設定したが（ステップＳ１４３）、平均値に限らず、特定点及びその近傍の画素の色特徴量の中央値や最頻値などの統計値をもとに物体認識用の色特徴量を設定しても良い。 Further, in the above embodiment, the average value of the color features of the pixels at the specific point of the image of the object to be recognized extracted from the image and the pixels in the vicinity thereof is calculated (step S142), and the object is based on this average value. Although the color feature amount for recognition is set (step S143), it is not limited to the average value, but is used for object recognition based on statistical values such as the median value and the mode value of the color feature amount of pixels at a specific point and its vicinity. The color feature amount of may be set.

本発明は、上記実施形態及び変形例に限定されるものではなく、上記実施形態及び変形例に開示されている複数の構成要素を適宜組み合わせることによって、種々の発明を形成することができる。例えば、上記実施形態及び変形例に示した全構成要素からいくつかの構成要素を除外して形成しても良いし、上記実施形態及び変形例に示した構成要素を適宜組み合わせて形成しても良い。 The present invention is not limited to the above-described embodiments and modifications, and various inventions can be formed by appropriately combining a plurality of components disclosed in the above-described embodiments and modifications. For example, some components may be excluded from all the components shown in the above-described embodiment and the modified example, or the components shown in the above-described embodiment and the modified example may be appropriately combined and formed. good.

１画像処理装置
２ユーザ
３ホルダー
４カメラ
１１表示部
１１ａ、１１ｂ領域
１２記憶部
１３演算部
１４外部情報取得部
１２１プログラム記憶部
１２２色特徴量記憶部
１３１重心位置取得部
１３１ａ物体領域抽出部
１３１ｂ物体判定部
１３１ｃ重心決定部
１３２色特徴量設定部
１３３物体認識部
１３４画像合成部

1 Image processing device 2 User 3 Holder 4 Camera 11 Display unit 11a, 11b Area 12 Storage unit 13 Calculation unit 14 External information acquisition unit 121 Program storage unit 122 Color feature amount storage unit 131 Center of gravity position acquisition unit 131a Object area extraction unit 131b Object Judgment unit 131c Center of gravity determination unit 132 Color feature amount setting unit 133 Object recognition unit 134 Image composition unit

Claims

An image processing device that recognizes an image of a specific object from an image of the object.
An external information acquisition unit that acquires image information about the space in which the image processing device exists, and
Using the color features of a plurality of pixels at predetermined positions in an image based on the image information, a region of an image of the object is extracted from the image, and the position of a specific point in the region of the image is acquired. Point acquisition department and
A color feature amount setting unit that sets a color feature amount for recognizing the object based on the color feature amounts of a plurality of pixels located within a predetermined range including the specific point.
A display unit that displays the image on the screen and
An image compositing unit that synthesizes and displays a predetermined object on the screen on which the image is displayed,
With
An image processing device whose predetermined position is set inside the object.

The specific point acquisition unit
A histogram of the color features of a plurality of pixels at the predetermined positions is created, and a region of pixels having color features included in the predetermined range of the histogram is extracted from the image as a candidate region for an image of the object. Object area extraction unit and
An object determination unit that extracts the contour of the candidate region and determines whether or not the candidate region is an image region of the object based on the contour.
When the candidate region is determined to be the region of the image of the object, the center of gravity determination unit that outputs a specific point in the candidate region as a specific point in the region of the image of the object.
The image processing apparatus according to claim 1.

The image processing apparatus according to claim 3, wherein the object determination unit determines that the candidate region is an image region of the object when the area of the candidate region is equal to or larger than a predetermined value.

The image processing apparatus according to claim 4, wherein the object determination unit determines whether or not the area of the candidate region having the largest area is equal to or greater than the predetermined value when a plurality of the candidate regions are extracted.

The object is the user's hand
The object determination unit detects unevenness in the contour of the candidate region, has four or more convex portions, and the average value of the distances from the center of gravity position of the contour to each convex portion is each from the specific point. When the average value of the distances to the recesses is at least a predetermined time, it is determined that the candidate region is the region of the image of the object.
The image processing apparatus according to any one of claims 3 to 5.

The color feature amount setting unit calculates an average value, a median value, or a mode value of the color feature amounts of a plurality of pixels located within the predetermined range including the specific point, and the average value, the median value, or the mode value. The image processing apparatus according to any one of claims 1, 3 to 6, wherein a value in a predetermined range including the mode value is set as a color feature amount for recognizing the object.

The image processing apparatus according to any one of claims 1, 3 to 7, wherein the color feature amount includes at least a hue in the HSV color space.

An image processing method executed by an image processing device that recognizes an image of a specific object from an image of the object.
The image processing device includes an external information acquisition unit that acquires image information about the space in which the image processing device exists, and a display unit that displays the image on a screen.
A step of extracting an image region of the object from the image and acquiring the position of a specific point in the image region using the color features of a plurality of pixels at predetermined positions in the image based on the image information. (A) and
The step (b) of setting the color feature amount for recognizing the object based on the color feature amount of a plurality of pixels located within a predetermined range including the specific point.
The step (c) of synthesizing and displaying a predetermined object on the screen on which the image is displayed, and
Including
An image processing method in which the predetermined position is set inside the object.

An image processing program executed by an image processing device that recognizes an image of a specific object from an image of the object.
The image processing device includes an external information acquisition unit that acquires image information about the space in which the image processing device exists, and a display unit that displays the image on a screen.
A step of extracting an image region of the object from the image and acquiring the position of a specific point in the image region using the color features of a plurality of pixels at predetermined positions in the image based on the image information. (A) and
The step (b) of setting the color feature amount for recognizing the object based on the color feature amount of a plurality of pixels located within a predetermined range including the specific point.
The step (c) of synthesizing and displaying a predetermined object on the screen on which the image is displayed, and
To execute,
An image processing program in which the predetermined position is set inside the object.