JP7470069B2

JP7470069B2 - Pointing object detection device, pointing object detection method, and pointing object detection system

Info

Publication number: JP7470069B2
Application number: JP2021023229A
Authority: JP
Inventors: サンジェイクマルディヴェディ; 高行秋山
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2021-02-17
Filing date: 2021-02-17
Publication date: 2024-04-17
Anticipated expiration: 2041-02-17
Also published as: JP2022125570A

Description

本開示は、指示物体検出装置、指示物体検出方法及び指示物体検出システムに関する。 This disclosure relates to a pointing object detection device, a pointing object detection method, and a pointing object detection system.

近年、マン・マシンインターフェイスの実用化を目指し、様々な人間の動きを認識する研究が行われている。人間の動きを認識することで、ユーザの意思を自然な形態でコンピューターに伝えることが可能となる。 In recent years, research has been conducted into recognizing various human movements, with the aim of putting man-machine interfaces into practical use. By recognizing human movements, it becomes possible to convey the user's intentions to the computer in a natural way.

ユーザの意思をコンピューターに伝える有効なコミュニケーション手段の１つとして、指差し動作を検出する手法が注目されている。指差し動作は、人間にとって自然かつ直感的であり、特殊な器具の装着も不要なため、人であるユーザに負担を与えない上、低コストである。
指差し動作を検出する手法によれば、例えば家事、福祉、産業等の様々な場面でマン・マシンインターフェイスを実現し、人を支援するサービスが可能となる。 Techniques for detecting pointing gestures have been attracting attention as an effective means of communication for conveying a user's intentions to a computer. Pointing gestures are natural and intuitive for humans, and do not require the wearing of any special equipment, so they do not burden the human user and are low-cost.
Techniques for detecting pointing gestures can be used to realize man-machine interfaces in a variety of situations, such as housework, welfare, and industry, enabling services to support people.

指差し動作を検出する手法の１つとしては、例えばH. Asano, T. Nagayasu, T. Orimo, K. Terabayashi, M. Ohta and K. Umeda, "Recognition of finger-pointing direction using color clustering and image segmentation," The SICE Annual Conference 2013, Nagoya, Japan, 2013, pp. 2029-2034.（非特許文献１）がある。
非特許文献１には、複数のカメラを使用して指差し方向を認識する正確な方法が提案されている。この手法において、入力画像は、形状分析の前にカラークラスタリングによってセグメント化される。そして、操作領域の狭さや事前定義された肌の色など、以前の方法のいくつかの制限が解除されている。この結果、8方向の指さし認識は、平均90％の再現率と93％の精度で実現される旨が記載されている。 One method for detecting a pointing gesture is described in, for example, H. Asano, T. Nagayasu, T. Orimo, K. Terabayashi, M. Ohta and K. Umeda, "Recognition of finger-pointing direction using color clustering and image segmentation," The SICE Annual Conference 2013, Nagoya, Japan, 2013, pp. 2029-2034. (Non-Patent Document 1).
Non-Patent Document 1 proposes an accurate method for recognizing pointing directions using multiple cameras. In this method, the input image is segmented by color clustering before shape analysis. Then, some limitations of previous methods, such as narrow operation areas and predefined skin colors, are lifted. As a result, it is described that eight-direction pointing recognition is achieved with an average recall rate of 90% and accuracy of 93%.

H. Asano, T. Nagayasu, T. Orimo, K. Terabayashi, M. Ohta and K. Umeda, "Recognition of finger-pointing direction using color clustering and image segmentation," The SICE Annual Conference 2013, Nagoya, Japan, 2013, pp. 2029-2034.H. Asano, T. Nagayasu, T. Orimo, K. Terabayashi, M. Ohta and K. Umeda, "Recognition of finger-pointing direction using color clustering and image segmentation," The SICE Annual Conference 2013, Nagoya, Japan, 2013, pp. 2029-2034.

非特許文献１には、人間の手を示す画像において、幅が狭い領域を指として抽出した後、当該指に属する画素に対して最小二乗法を適用することで、指差し方向を判定する。
しかし、非特許文献１に記載の方法では、画像における指の抽出は、手の各部位の相対的な幅に基づいて行われるため、画角、ノイズ、オブジェクト検出の誤り等によって手の各部位の幅が判定できない場合や、洋服の袖等によって手が部分的に隠され、手の部位の見かけの幅が正確でない場合には、指差し方向を判定する精度が限定されてしまう。 In Non-Patent Document 1, narrow regions are extracted as fingers from an image showing a human hand, and then the pointing direction is determined by applying the least squares method to pixels belonging to the fingers.
However, in the method described in Non-Patent Document 1, fingers in an image are extracted based on the relative width of each part of the hand. This limits the accuracy of determining the pointing direction when the width of each part of the hand cannot be determined due to the angle of view, noise, object detection errors, etc., or when the hand is partially hidden by a sleeve or the like and the apparent width of the hand part is inaccurate.

そこで、本開示は、手の画像における画素の曲率と、手の中心からの距離とに基づいて指差し方向を判定し、判定した指差し方向に基づいて指差し動作の対象となる指示物体を検出することで、画角やノイズ等で手の画素が鮮明に写らない画像の場合であっても、高精度の指示物体検出が可能な頑強性が高い指示物体検出を提供することを目的とする。 The present disclosure aims to provide a highly robust pointing object detection method that can detect a pointing object with high accuracy even in an image in which the pixels of the hand are not clearly visible due to the angle of view, noise, etc., by determining the pointing direction based on the curvature of the pixels in the hand image and the distance from the center of the hand, and detecting the pointing object that is the target of the pointing action based on the determined pointing direction.

上記の課題を解決するために、代表的な本開示の指示物体検出装置の一つは、ユーザの手と、少なくとも１つの指示物体候補とを含む入力映像を取得する画像入力部と、前記入力映像を解析し、前記手と前記指示物体候補とを検出するオブジェクト検出部と、前記手によるジェスチャーを判定するジェスチャー判定部と、前記手による前記ジェスチャーが指差し動作と判定された場合、前記手のセントロイドを特定し、前記セントロイドから第１の距離基準を満たし、且つ、所定の曲率基準を満たす領域を、前記指差し動作に用いられた指し指の指先として特定し、特定した前記指先に属する画素の分布に基づいて、前記指差し動作の指示方向を判定する指示方向判定部と、前記指示方向に基づいて、前記入力映像において前記指差し動作によって指示される物体を、前記指示物体候補の中から特定する指示物体特定部とを含む。 In order to solve the above problem, one representative pointed object detection device of the present disclosure includes an image input unit that acquires an input video including a user's hand and at least one pointed object candidate, an object detection unit that analyzes the input video and detects the hand and the pointed object candidate, a gesture determination unit that determines a gesture made by the hand, a pointing direction determination unit that, if the gesture made by the hand is determined to be a pointing motion, identifies a centroid of the hand, and identifies an area that satisfies a first distance criterion from the centroid and a predetermined curvature criterion as the fingertip of the index finger used in the pointing motion, and determines the pointing direction of the pointing motion based on the distribution of pixels belonging to the identified fingertip, and a pointed object identification unit that identifies an object pointed to by the pointing motion in the input video from among the pointed object candidates based on the pointing direction.

本開示によれば、手の画像における画素の曲率と手の中心からの距離とに基づいて指差し方向を判定し、判定した指差し方向に基づいて指差し動作の対象となる指示物体を検出することで、画角やノイズ等で手の画素が鮮明に写らない画像の場合であっても、高精度の指示物体検出が可能であり、頑強性が高い指示物体検出手段を提供することができる。
上記以外の課題、構成及び効果は、以下の発明を実施するための形態における説明により明らかにされる。 According to the present disclosure, by determining the pointing direction based on the curvature of the pixels in the hand image and the distance from the center of the hand, and detecting the pointing object that is the target of the pointing action based on the determined pointing direction, it is possible to provide a highly robust pointing object detection means that enables highly accurate pointing object detection even in cases where the pixels of the hand are not clearly visible due to the angle of view, noise, etc.
Other objects, configurations and effects will become apparent from the following description of the preferred embodiment of the invention.

図１は、本開示の実施例を実施するためのコンピューターシステムのハードウェア構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of a hardware configuration of a computer system for implementing an embodiment of the present disclosure. 図２は、本開示の実施例１に係る指示物体検出システムの構成を示す図である。FIG. 2 is a diagram illustrating a configuration of a pointing object detection system according to the first embodiment of the present disclosure. 図３は、本開示の実施例１に係る指示物体検出方法の流れを示すフローチャートである。FIG. 3 is a flowchart showing a flow of the pointing object detection method according to the first embodiment of the present disclosure. 図４は、本開示の実施例１に係る指示方向判定処理の流れを示すフローチャートである。FIG. 4 is a flowchart illustrating a flow of a pointing direction determination process according to the first embodiment of the present disclosure. 図５は、本開示の実施例１に係る第１の指示方向判定手段の一例を示す図である。FIG. 5 is a diagram illustrating an example of a first pointing direction determination unit according to the first embodiment of the present disclosure. 図６は、本開示の実施例１に係る第２の指示方向判定手段の一例を示す図である。FIG. 6 is a diagram illustrating an example of a second pointing direction determination unit according to the first embodiment of the present disclosure. 図８は、本開示の実施例１に係る第３の指示方向判定手段の一例を示す図である。FIG. 8 is a diagram illustrating an example of a third pointing direction determination unit according to the first embodiment of the present disclosure. 図８は、本開示の実施例１に係る第４の指示方向判定手段の一例を示す図である。FIG. 8 is a diagram illustrating an example of a fourth pointing direction determination unit according to the first embodiment of the present disclosure. 図９は、本開示の実施例１に係る第５の指示方向判定手段の一例を示す図である。FIG. 9 is a diagram illustrating an example of a fifth pointing direction determination unit according to the first embodiment of the present disclosure. 図１０は、本開示の実施例２に係る指示物体検出システムの構成を示す図である。FIG. 10 is a diagram illustrating a configuration of a pointing object detection system according to the second embodiment of the present disclosure. 図１１は、本開示の実施例２に係る物体管理部の構成の一例を示す図である。FIG. 11 is a diagram illustrating an example of the configuration of an object management unit according to the second embodiment of the present disclosure. 図１２は、本開示の実施例２に係るユーザ物体シーケンス情報取得処理の流れを示すフローチャートである。FIG. 12 is a flowchart showing the flow of a user object sequence information acquisition process according to the second embodiment of the present disclosure. 図１３は、本開示の実施例２に係る物体取り扱いシーケンス検証処理の流れを示すフローチャートである。FIG. 13 is a flowchart showing the flow of an object handling sequence verification process according to the second embodiment of the present disclosure. 図１４は、本開示の実施例２に係る環境認識部の構成の一例を示す図である。FIG. 14 is a diagram illustrating an example of a configuration of an environment recognition unit according to the second embodiment of the present disclosure. 図１５は、本開示の実施例２に係るユーザ情報取得部の構成の一例を示す図である。FIG. 15 is a diagram illustrating an example of the configuration of a user information acquisition unit according to the second embodiment of the present disclosure. 図１６は、本開示の実施例２に係る更新部の構成の一例を示す図である。FIG. 16 is a diagram illustrating an example of a configuration of an update unit according to the second embodiment of the present disclosure. 図１７は、本開示の実施例３に係る指示物体検出システムの構成を示す図である。FIG. 17 is a diagram illustrating a configuration of a pointing object detection system according to a third embodiment of the present disclosure. 図１８は、本開示の実施例３に係る作業管理部による作業分配処理の流れを示す図である。FIG. 18 is a diagram illustrating a flow of a task distribution process performed by a task management unit according to the third embodiment of the present disclosure.

本開示の実施例に係る指示物体検出手段は、例えば家事、福祉、産業等の様々な場面でマン・マシンインターフェイスを実現し、直接的に人を支援するために用いられてもよいが、説明の便宜上、本開示の実施例に係る指示物体検出手段を産業に応用した場合の例を中心に説明する。 The pointed object detection means according to the embodiments of the present disclosure may be used to realize a man-machine interface and directly assist people in various situations, such as housework, welfare, and industry, but for the sake of convenience, the following description focuses on examples in which the pointed object detection means according to the embodiments of the present disclosure is applied to industry.

工場等では、作業員が機械や装置を点検したり、修理したりする過程において、機械や装置の解体や組み立てを行う場合がある。この場合、作業員が特定の部品の点検を失念したり、機械の解体又は組み立ての順序を誤ることが起こりえる。そこで、本開示の実施例に係る物体検出手段を用いることにより、作業員が指差す部品を検出することで、作業員の動作を記録したり、順序が誤った動作を検知したりすることが可能となり、作業の正確性及び効率を向上させることができる。 In factories, etc., workers may disassemble or assemble machines or equipment while inspecting or repairing them. In such cases, the worker may forget to inspect a particular part or may disassemble or assemble the machine in the wrong order. Therefore, by using the object detection means according to the embodiment of the present disclosure to detect the part the worker is pointing at, it becomes possible to record the worker's actions and detect actions that are out of order, thereby improving the accuracy and efficiency of work.

一例として、作業員は、例えば特定の部品（例えば、機械や装置を構成する部品の一つ）を取り扱う前に、当該部品を指差しすることにすれば、この指差し動作を示す入力映像がカメラ等の撮影装置によって取得される。その後、取得した入力映像を、後述する指示物体検出装置の各機能部によって処理することで、入力映像における画素の曲率や手の中心からの距離に基づいて指先が示す指差し方向を判定し、判定した指差し方向に基づいて指差し動作の対象となる指示物体を検出することができる。その後、作業員の動作の順番を示すユーザ物体シーケンス情報を記録し、このユーザ物体シーケンス情報を、例えば事前に用意され、正しい動作の順番を示す指定物体シーケンス情報と比較することで、機械の取り扱い手順における異常（誤り等）を検出することができる。 As an example, if a worker points at a specific part (e.g., one of the parts that make up a machine or device) before handling it, an input image showing this pointing action is captured by an imaging device such as a camera. The captured input image is then processed by each functional unit of the pointing object detection device described below, and the pointing direction indicated by the fingertip is determined based on the curvature of the pixels in the input image and the distance from the center of the hand, and the pointing object that is the target of the pointing action can be detected based on the determined pointing direction. User object sequence information indicating the order of the worker's actions is then recorded, and this user object sequence information can be compared with specified object sequence information that is prepared in advance and indicates the correct order of actions, to detect abnormalities (such as errors) in the machine handling procedures.

このように、本開示によれば、手の画像における画素の曲率と手の中心からの距離とに基づいて指差し方向を判定し、判定した指差し方向に基づいて指差し動作の対象となる指示物体を検出することで、画角やノイズ等で手の画素が鮮明に写らない画像の場合であっても、高精度に指示物体検出が可能であり、頑強性が高い指示物体検出手段を提供することができる。 In this way, according to the present disclosure, by determining the pointing direction based on the curvature of the pixels in the hand image and the distance from the center of the hand, and detecting the pointed object that is the target of the pointing action based on the determined pointing direction, it is possible to provide a highly robust pointed object detection means that can detect the pointed object with high accuracy even in cases where the hand pixels are not clearly visible in the image due to the angle of view, noise, etc.

以下、図面を参照して、本発明の実施例について説明する。なお、この実施例により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。 Below, an embodiment of the present invention will be described with reference to the drawings. Note that the present invention is not limited to this embodiment. In addition, in the description of the drawings, the same parts are denoted by the same reference numerals.

まず、図１を参照して、本開示の実施例を実施するためのコンピュータシステム３００について説明する。本明細書で開示される様々な実施例の機構及び装置は、任意の適切なコンピューティングシステムに適用されてもよい。コンピュータシステム３００の主要コンポーネントは、１つ以上のプロセッサ３０２、メモリ３０４、端末インターフェース３１２、ストレージインタフェース３１４、Ｉ／Ｏ（入出力）デバイスインタフェース３１６、及びネットワークインターフェース３１８を含む。これらのコンポーネントは、メモリバス３０６、Ｉ／Ｏバス３０８、バスインターフェースユニット３０９、及びＩ／Ｏバスインターフェースユニット３１０を介して、相互的に接続されてもよい。 First, referring to FIG. 1, a computer system 300 for implementing an embodiment of the present disclosure will be described. The mechanisms and devices of the various embodiments disclosed herein may be applied to any suitable computing system. The main components of the computer system 300 include one or more processors 302, memory 304, a terminal interface 312, a storage interface 314, an I/O (input/output) device interface 316, and a network interface 318. These components may be interconnected via a memory bus 306, an I/O bus 308, a bus interface unit 309, and an I/O bus interface unit 310.

コンピュータシステム３００は、プロセッサ３０２と総称される１つ又は複数の汎用プログラマブル中央処理装置（ＣＰＵ）３０２Ａ及び３０２Ｂを含んでもよい。ある実施例では、コンピュータシステム３００は複数のプロセッサを備えてもよく、また別の実施例では、コンピュータシステム３００は単一のＣＰＵシステムであってもよい。各プロセッサ３０２は、メモリ３０４に格納された命令を実行し、オンボードキャッシュを含んでもよい。 Computer system 300 may include one or more general purpose programmable central processing units (CPUs) 302A and 302B, collectively referred to as processors 302. In some embodiments, computer system 300 may include multiple processors, and in other embodiments, computer system 300 may be a single CPU system. Each processor 302 executes instructions stored in memory 304 and may include an on-board cache.

ある実施例では、メモリ３０４は、データ及びプログラムを記憶するためのランダムアクセス半導体メモリ、記憶装置、又は記憶媒体（揮発性又は不揮発性のいずれか）を含んでもよい。メモリ３０４は、本明細書で説明する機能を実施するプログラム、モジュール、及びデータ構造のすべて又は一部を格納してもよい。例えば、メモリ３０４は、指示物体検出アプリケーション３５０を格納していてもよい。ある実施例では、指示物体検出アプリケーション３５０は、後述する機能をプロセッサ３０２上で実行する命令又は記述を含んでもよい。 In some embodiments, memory 304 may include random access semiconductor memory, storage devices, or storage media (either volatile or non-volatile) for storing data and programs. Memory 304 may store all or a portion of the programs, modules, and data structures that implement the functions described herein. For example, memory 304 may store a pointing object detection application 350. In some embodiments, pointing object detection application 350 may include instructions or descriptions that execute on processor 302 the functions described below.

ある実施例では、指示物体検出アプリケーション３５０は、プロセッサベースのシステムの代わりに、またはプロセッサベースのシステムに加えて、半導体デバイス、チップ、論理ゲート、回路、回路カード、および/または他の物理ハードウェアデバイスを介してハードウェアで実施されてもよい。ある実施例では、指示物体検出アプリケーション３５０は、命令又は記述以外のデータを含んでもよい。ある実施例では、カメラ、センサ、または他のデータ入力デバイス（図示せず）が、バスインターフェースユニット３０９、プロセッサ３０２、またはコンピュータシステム３００の他のハードウェアと直接通信するように提供されてもよい。 In some embodiments, the pointing object detection application 350 may be implemented in hardware via semiconductor devices, chips, logic gates, circuits, circuit cards, and/or other physical hardware devices instead of or in addition to a processor-based system. In some embodiments, the pointing object detection application 350 may include data other than instructions or descriptions. In some embodiments, cameras, sensors, or other data input devices (not shown) may be provided to communicate directly with the bus interface unit 309, the processor 302, or other hardware of the computer system 300.

コンピュータシステム３００は、プロセッサ３０２、メモリ３０４、表示システム３２４、及びＩ／Ｏバスインターフェースユニット３１０間の通信を行うバスインターフェースユニット３０９を含んでもよい。Ｉ／Ｏバスインターフェースユニット３１０は、様々なＩ／Ｏユニットとの間でデータを転送するためのＩ／Ｏバス３０８と連結していてもよい。Ｉ／Ｏバスインターフェースユニット３１０は、Ｉ／Ｏバス３０８を介して、Ｉ／Ｏプロセッサ（ＩＯＰ）又はＩ／Ｏアダプタ（ＩＯＡ）としても知られる複数のＩ／Ｏインタフェースユニット３１２，３１４，３１６、及び３１８と通信してもよい。 Computer system 300 may include a bus interface unit 309 that provides communication between processor 302, memory 304, display system 324, and I/O bus interface unit 310. I/O bus interface unit 310 may be coupled to an I/O bus 308 for transferring data to and from various I/O units. I/O bus interface unit 310 may communicate via I/O bus 308 with multiple I/O interface units 312, 314, 316, and 318, also known as I/O processors (IOPs) or I/O adapters (IOAs).

表示システム３２４は、表示コントローラ、表示メモリ、又はその両方を含んでもよい。表示コントローラは、ビデオ、オーディオ、又はその両方のデータを表示装置３２６に提供することができる。また、コンピュータシステム３００は、データを収集し、プロセッサ３０２に当該データを提供するように構成された1つまたは複数のセンサ等のデバイスを含んでもよい。 The display system 324 may include a display controller, a display memory, or both. The display controller may provide video, audio, or both data to the display device 326. The computer system 300 may also include one or more sensors or other devices configured to collect data and provide the data to the processor 302.

例えば、コンピュータシステム３００は、心拍数データやストレスレベルデータ等を収集するバイオメトリックセンサ、湿度データ、温度データ、圧力データ等を収集する環境センサ、及び加速度データ、運動データ等を収集するモーションセンサ等を含んでもよい。これ以外のタイプのセンサも使用可能である。表示システム３２４は、単独のディスプレイ画面、テレビ、タブレット、又は携帯型デバイスなどの表示装置３２６に接続されてもよい。 For example, computer system 300 may include biometric sensors to collect data such as heart rate data or stress level data, environmental sensors to collect data such as humidity data, temperature data, or pressure data, and motion sensors to collect acceleration data, movement data, etc. Other types of sensors may also be used. Display system 324 may be connected to a display device 326, such as a standalone display screen, a television, a tablet, or a handheld device.

Ｉ／Ｏインタフェースユニットは、様々なストレージ又はＩ／Ｏデバイスと通信する機能を備える。例えば、端末インタフェースユニット３１２は、ビデオ表示装置、スピーカテレビ等のユーザ出力デバイスや、キーボード、マウス、キーパッド、タッチパッド、トラックボール、ボタン、ライトペン、又は他のポインティングデバイス等のユーザ入力デバイスのようなユーザＩ／Ｏデバイス３２０の取り付けが可能である。ユーザは、ユーザインターフェースを使用して、ユーザ入力デバイスを操作することで、ユーザＩ／Ｏデバイス３２０及びコンピュータシステム３００に対して入力データや指示を入力し、コンピュータシステム３００からの出力データを受け取ってもよい。ユーザインターフェースは例えば、ユーザＩ／Ｏデバイス３２０を介して、表示装置に表示されたり、スピーカによって再生されたり、プリンタを介して印刷されたりしてもよい。 The I/O interface unit provides the ability to communicate with various storage or I/O devices. For example, the terminal interface unit 312 can be fitted with user I/O devices 320, such as user output devices such as a video display, a speaker television, and user input devices such as a keyboard, a mouse, a keypad, a touchpad, a trackball, buttons, a light pen, or other pointing devices. A user may use a user interface to input input data or instructions to the user I/O devices 320 and the computer system 300 and receive output data from the computer system 300 by operating the user input devices. The user interface may be displayed on a display device, played through a speaker, or printed via a printer via the user I/O devices 320, for example.

ストレージインタフェース３１４は、１つ又は複数のディスクドライブや直接アクセスストレージ装置３２２（通常は磁気ディスクドライブストレージ装置であるが、単一のディスクドライブとして見えるように構成されたディスクドライブのアレイ又は他のストレージ装置であってもよい）の取り付けが可能である。ある実施例では、ストレージ装置３２２は、任意の二次記憶装置として実装されてもよい。メモリ３０４の内容は、ストレージ装置３２２に記憶され、必要に応じてストレージ装置３２２から読み出されてもよい。Ｉ／Ｏデバイスインタフェース３１６は、プリンタ、ファックスマシン等の他のＩ／Ｏデバイスに対するインターフェースを提供してもよい。ネットワークインターフェース３１８は、コンピュータシステム３００と他のデバイスが相互的に通信できるように、通信経路を提供してもよい。この通信経路は、例えば、ネットワーク３３０であってもよい。 The storage interface 314 may be attached to one or more disk drives or direct access storage devices 322 (usually magnetic disk drive storage devices, but may also be an array of disk drives or other storage devices configured to appear as a single disk drive). In some embodiments, the storage device 322 may be implemented as any secondary storage device. The contents of the memory 304 may be stored in the storage device 322 and retrieved from the storage device 322 as needed. The I/O device interface 316 may provide an interface to other I/O devices, such as printers, fax machines, etc. The network interface 318 may provide a communications path to allow the computer system 300 and other devices to communicate with each other. This communications path may be, for example, a network 330.

ある実施例では、コンピュータシステム３００は、マルチユーザメインフレームコンピュータシステム、シングルユーザシステム、又はサーバコンピュータ等の、直接的ユーザインターフェースを有しない、他のコンピュータシステム（クライアント）からの要求を受信するデバイスであってもよい。他の実施例では、コンピュータシステム３００は、デスクトップコンピュータ、携帯型コンピュータ、ノートパソコン、タブレットコンピュータ、ポケットコンピュータ、電話、スマートフォン、又は任意の他の適切な電子機器であってもよい。 In some embodiments, computer system 300 may be a device that receives requests from other computer systems (clients) without a direct user interface, such as a multi-user mainframe computer system, a single-user system, or a server computer. In other embodiments, computer system 300 may be a desktop computer, a portable computer, a laptop, a tablet computer, a pocket computer, a telephone, a smartphone, or any other suitable electronic device.

次に図２を参照して、本開示の実施例１に係る指示物体検出システムの構成について説明する。 Next, the configuration of the pointing object detection system according to the first embodiment of the present disclosure will be described with reference to FIG.

図２は、本開示の実施例１に係る指示物体検出システム２００の構成を示す図である。図２に示すように、本開示の実施例１に係る指示物体検出システム２００は、指示物体検出装置２１０及びユーザ端末２５０を含む。指示物体検出装置２１０及びユーザ端末２５０は、例えばＬＡＮやインターネット等の通信ネットワーク２３４を介して接続されてもよい。 FIG. 2 is a diagram showing a configuration of a pointed object detection system 200 according to a first embodiment of the present disclosure. As shown in FIG. 2, the pointed object detection system 200 according to a first embodiment of the present disclosure includes a pointed object detection device 210 and a user terminal 250. The pointed object detection device 210 and the user terminal 250 may be connected via a communication network 234 such as a LAN or the Internet.

指示物体検出装置２１０は、ユーザ２３５によって行われる指差し動作を示す入力映像を解析することで、当該指差し動作の指示方向を判定し、判定した指示方向に基づいて指差し動作の対象となる指示物体２４０を検出するための装置である。指示物体検出装置２１０は、例えばデスクトップパソコン、ノートパソコン、タブレット、スマートフォン等、任意のコンピューティングデバイスによって実施されてもよい。
また、図２に示すように、指示物体検出装置２１０は、オブジェクト検出部２１２、画像加工部２１４、ジェスチャー判定部２１８、指示方向判定部２２０、指示物体特定部２２２、画像入力部２２４、プロセッサ２２６及び記憶部２２８を含む。 Pointing object detection device 210 is a device for analyzing an input video showing a pointing action performed by user 235, determining the pointing direction of the pointing action, and detecting pointing object 240 that is the target of the pointing action based on the determined pointing direction. Pointing object detection device 210 may be implemented by any computing device, such as a desktop computer, a laptop computer, a tablet, or a smartphone.
As shown in FIG. 2 , the pointed object detection device 210 includes an object detection unit 212 , an image processing unit 214 , a gesture determination unit 218 , a pointed direction determination unit 220 , a pointed object identification unit 222 , an image input unit 224 , a processor 226 , and a memory unit 228 .

画像入力部２２４は、ユーザ２３５の手と、少なくとも１つの指示物体候補とを示す入力映像を取得するための機能部である。画像入力部２２４は、例えばＲＧＢカメラなどの撮影部であってもよいし、画像信号を受信する機能部であってもよい。
指示物体検出の精度を向上する観点から、画像入力部２２４が撮像部の場合には、この撮像部は、ユーザ２３５の手や、ユーザ２３５の作業の対象となる物体が容易に撮影可能な位置に配置されることが望ましい。
以下の説明では、画像入力部２２４が撮影部を意味するものとして説明するが、画像入力部２２４は、画像信号を受信する機能部であってもよい。
なお、図２では、画像入力部２２４が指示物体検出装置２１０に含まれている構成の場合を一例として示しているが、本開示はこれに限定されず、画像入力部２２４は、指示物体検出装置２１０とは独立した、別途の装置であってもよい。この場合、画像入力部２２４は、取得した入力映像を通信ネットワーク２３４を介して指示物体検出装置２１０に送信してもよい。 The image input unit 224 is a functional unit for acquiring an input video showing the hand of the user 235 and at least one pointing object candidate. The image input unit 224 may be, for example, a photographing unit such as an RGB camera, or may be a functional unit for receiving an image signal.
From the standpoint of improving the accuracy of pointing object detection, when the image input unit 224 is an imaging unit, it is desirable that this imaging unit be positioned in a position where it can easily capture an image of the hand of the user 235 or an object that is the subject of the user's work.
In the following description, the image input unit 224 is assumed to mean an image capturing unit, but the image input unit 224 may be a functional unit that receives an image signal.
2 shows an example of a configuration in which image input unit 224 is included in pointed object detection device 210, but the present disclosure is not limited to this, and image input unit 224 may be a separate device independent of pointed object detection device 210. In this case, image input unit 224 may transmit the acquired input video to pointed object detection device 210 via communication network 234.

オブジェクト検出部２１２は、画像入力部２２４によって取得された入力映像を解析し、ユーザ２３５の手と指示物体候補とを検出するための機能部である。オブジェクト検出部２１２は、入力映像に写る全てのオブジェクトを検出し、検出したオブジェクトのクラスを示すラベルを生成してもよい。本開示の実施例に係るオブジェクト検出部２１２の手段は特に限定されず、例えばHaar-Like特徴に基づくViola Jones物体検出フレームワーク、スケール不変特徴量変換 (SIFT)、Hog特徴量等の機械学習アプローチや、領域提案（R-CNN、Fast R-CNN、Faster R-CNN、cascade R-CNN）、Single Shot MultiBox Detector（SSD）、You Only Look Once（YOLO）、Single-Shot Refinement Neural Network for Object Detection (RefineDet)、Retina-Net、Deformable convolutional networks等の深層学習によるアプローチ等、任意のオブジェクト検出手法を含んでもよい。 The object detection unit 212 is a functional unit for analyzing the input image acquired by the image input unit 224 and detecting the hand of the user 235 and candidate pointing objects. The object detection unit 212 may detect all objects appearing in the input image and generate a label indicating the class of the detected object. The means of the object detection unit 212 according to the embodiment of the present disclosure are not particularly limited, and may include any object detection method, such as a machine learning approach based on Haar-Like features, a scale-invariant feature transform (SIFT), Hog features, or a deep learning approach such as region proposal (R-CNN, Fast R-CNN, Faster R-CNN, cascade R-CNN), Single Shot MultiBox Detector (SSD), You Only Look Once (YOLO), Single-Shot Refinement Neural Network for Object Detection (RefineDet), Retina-Net, and Deformable convolutional networks.

画像加工部２１４は、オブジェクト検出部２１２によって各オブジェクトが検出された入力映像を解析することで、ユーザ２３５の手のみを示す手画像を入力映像から抽出する。例えば、画像加工部２１４は、ユーザ２３５の手を示す画像を入力映像から切り出して（クロッピング）もよく、ユーザ２３５の手の肌の色に基づいた画像分割（ｓｋｉｎ－ｃｏｌｏｒｂａｓｅｄｉｍａｇｅｓｅｇｍｅｎｔａｔｉｏｎ）を用いてもよく、ｓｅｇｎｅｔ、ＩＣＮＥＴ、ＦＰＮ等の深層学習によるアプローチを用いてもよい。
画像加工部２１４による処理の結果、ユーザ２３５の手に属する画素の輝度値が１以上となり、ユーザ２３５の手に属さない画素の輝度値が０となるバイナリー画像が得られる（図５～図９参照）。 The image processing unit 214 analyzes the input video in which each object is detected by the object detection unit 212, and extracts from the input video a hand image showing only the hand of the user 235. For example, the image processing unit 214 may cut out (crop) an image showing the hand of the user 235 from the input video, may use skin-color based image segmentation of the hand of the user 235, or may use a deep learning approach such as segnet, ICNET, or FPN.
As a result of processing by the image processing unit 214, a binary image is obtained in which the brightness values of pixels belonging to the hand of the user 235 are 1 or greater, and the brightness values of pixels not belonging to the hand of the user 235 are 0 (see Figures 5 to 9).

ジェスチャー判定部２１８は、画像加工部２１４によって抽出された手画像において、ユーザ２３５の手が示すジェスチャーを判定する機能部である。ジェスチャー判定部２１８は、例えば、ユーザ２３５の手のキーポイント（指、間接、手首、手の甲等）を特定し、特定した手のキーポイントの分布に基づいて、手が示すジェスチャーが指差し動作か否かを判定してもよい。ここでの指差し動作とは、対象のものに対して指を向けることを意味する。また、この指差し動作は、人差し指や中指等、任意の指で行われてもよい。ここでは、指差し動作に用いられているユーザ２３５の指を「指し指」という。
ここでのジェスチャー判定部２１８は、ユーザ２３５の手が示すジェスチャーを判定するために、例えば3Dモデルベースアルゴリズム、骨格ベースアルゴリズム、筋電計ベースモデル等、任意の既存の手段を用いてもよい。 The gesture determination unit 218 is a functional unit that determines a gesture shown by the hand of the user 235 in the hand image extracted by the image processing unit 214. For example, the gesture determination unit 218 may identify key points (fingers, joints, wrist, back of hand, etc.) of the user 235's hand and determine whether the gesture shown by the hand is a pointing motion or not based on the distribution of the identified key points of the hand. The pointing motion here means pointing a finger at a target object. This pointing motion may also be performed with any finger, such as the index finger or middle finger. Here, the finger of the user 235 used in the pointing motion is called the "pointing finger".
The gesture determination unit 218 here may use any existing means, such as a 3D model-based algorithm, a skeletal-based algorithm, an electromyograph-based model, or the like, to determine the gesture indicated by the hand of the user 235 .

指示方向判定部２２０は、指差し動作の指示方向を判定するための機能部である。ここでの「指示方向」とは、ユーザ２３５の指し指が指示する方向である。より具体的には、指示方向判定部２２０は、ユーザ２３５の手のセントロイドを特定し、このセントロイドから第１の距離基準を満たし、且つ、所定の曲率基準を満たす領域を、指し指の指先として特定し、特定した指先に属する画素の分布に基づいて、指差し動作の指示方向を判定してもよい。
本開示においては「セントロイド」とは一般的な質量中心を意味するが、画像上の幾何中心であってもよいし、幾何中心に代えて物理的な重心を用いてもよい。
なお、指示方向判定部２２０を用いて指示方向を判定する処理の詳細については後述する。 The pointing direction determination unit 220 is a functional unit for determining the pointing direction of the pointing motion. The "pointing direction" here refers to the direction pointed by the index finger of the user 235. More specifically, the pointing direction determination unit 220 may identify a centroid of the hand of the user 235, identify an area that satisfies a first distance criterion from the centroid and also satisfies a predetermined curvature criterion as the tip of the index finger, and determine the pointing direction of the pointing motion based on the distribution of pixels belonging to the identified fingertip.
In this disclosure, "centroid" generally refers to the center of mass, but may be the geometric center on an image, or a physical center of gravity may be used instead of the geometric center.
The process of determining the pointing direction using the pointing direction determining unit 220 will be described in detail later.

指示物体特定部２２２は、指示方向判定部２２０によって判定された指示方向に基づいて、入力映像において指差し動作によって指示される指示物体２４０を、指示物体候補の中から特定する。ここでは、指示物体特定部２２２は、判定された指示方向に加えて、オブジェクト検出部２１２によって検出されたオブジェクトのラベルを用いてもよい。 The pointing object identification unit 222 identifies the pointing object 240 pointed to by the pointing action in the input video from among the pointing object candidates, based on the pointing direction determined by the pointing direction determination unit 220. Here, the pointing object identification unit 222 may use the label of the object detected by the object detection unit 212 in addition to the determined pointing direction.

プロセッサ２２６は、指示物体検出装置２１０の各機能部の機能を実現するための命令を実行するための演算装置であり、例えば図１に示すプロセッサ３０２と実質的に同様であるため、ここではその説明を省略する。 The processor 226 is a computing device for executing instructions to realize the functions of each functional unit of the pointed-object detection device 210, and is substantially similar to the processor 302 shown in FIG. 1, for example, and therefore will not be described here.

記憶部２２８は、指示物体検出装置２１０の機能部に用いられる各種データを格納するためのストレージ部である。ここでの記憶部２２８は、例えばハードディスクドライブやソリッドステートドライブ等のローカルストレージであってもよく、クラウドのような分散型ストレージサービスであってもよい。 The memory unit 228 is a storage unit for storing various data used by the functional units of the pointed-object detection device 210. The memory unit 228 here may be, for example, a local storage such as a hard disk drive or a solid-state drive, or may be a distributed storage service such as the cloud.

ユーザ２３５は、指示物体検出装置２１０の解析の対象となる指差し動作を行うユーザである。ユーザ２３５は、例えば工場で作業する作業員であってもよく、患者の介護を行う介護士等、任意のタスクを行うユーザであってもよい。また、本開示では、説明の便宜上、ユーザ２３５が人間の場合を一例として説明するが、本開示はこれに限定されず、ユーザ２３５は例えばチンパンジーやオランウータン等のサルであってもよく、指差し動作を行う手があれば、任意の生命体や人造物等であってもよい。 The user 235 is a user who performs a pointing action that is the subject of analysis by the pointing object detection device 210. The user 235 may be, for example, a worker working in a factory, or a user performing any task, such as a caregiver who cares for a patient. In addition, for the sake of convenience in this disclosure, a case in which the user 235 is a human will be described as an example, but the present disclosure is not limited to this, and the user 235 may be, for example, a monkey such as a chimpanzee or an orangutan, or any living organism or artificial object, etc., as long as it has a hand that can perform a pointing action.

指示物体２４０は、ユーザ２３５による指差し動作の対象となる物体である。指示物体２４０は、例えば点検の対象となる機械や装置を構成する部品、ハンマーやドライバー等の工具、テーブルの上に配置されているフォークやナイフ等、ユーザ２３５のタスクによって任意のオブジェクトであってもよい。 The pointing object 240 is an object that is the target of a pointing action by the user 235. The pointing object 240 may be any object depending on the task of the user 235, such as a part that constitutes a machine or device to be inspected, a tool such as a hammer or a screwdriver, or a fork or knife placed on a table.

ユーザ端末２５０は、例えばユーザ２３５が利用する端末装置である。ユーザ端末２５０は、例えばユーザ２３５に関するユーザ情報、ユーザ２３５の環境に関する環境情報等、指示物体検出を支援する任意の情報の入力を受け付けてもよい。また、ある実施例では、指示物体の取り扱い手順において異常が発生したと判定した場合、当該異常を示す異常通知がユーザ端末２５０に送信されてもよい。このように、ユーザ端末２５０のユーザ２３５は、検出された異常を解決する行動を取ることができる。 The user terminal 250 is, for example, a terminal device used by the user 235. The user terminal 250 may accept input of any information that assists in pointing object detection, such as user information about the user 235 and environmental information about the environment of the user 235. In addition, in one embodiment, if it is determined that an abnormality has occurred in the handling procedure of the pointing object, an abnormality notification indicating the abnormality may be transmitted to the user terminal 250. In this way, the user 235 of the user terminal 250 can take action to resolve the detected abnormality.

以上説明したように構成した指示物体検出システム２００によれば、手の画像における画素の曲率と手の中心からの距離とに基づいて指差し方向を判定し、判定した指差し方向に基づいて指差し動作の対象となる指示物体を検出することで、画角やノイズ等で手の画素が鮮明に写らない画像の場合であっても、高精度の指示物体検出が可能な頑強性が高い指示物体検出を提供することができる。 With the pointed object detection system 200 configured as described above, the pointing direction is determined based on the curvature of the pixels in the hand image and the distance from the center of the hand, and the pointed object that is the target of the pointing action is detected based on the determined pointing direction. This makes it possible to provide highly robust pointed object detection that can detect the pointed object with high accuracy even in cases where the pixels of the hand are not clearly visible due to the angle of view, noise, etc.

次に、図３を参照して、本開示の実施例１に係る指示物体検出方法について説明する。 Next, the pointing object detection method according to the first embodiment of the present disclosure will be described with reference to FIG.

図３は、本開示の実施例１に係る指示物体検出方法３６０の流れを示すフローチャートである。図３に示す指示物体検出方法３６０は、指差し動作を示す入力映像を解析することで、当該指差し動作の指示方向を判定し、判定した指示方向に基づいて指差し動作の対象となる指示物体２４０を検出するための方法であり、指示物体検出装置（例えば、図２に示す指示物体検出装置２１０）によって実施される。 Fig. 3 is a flowchart showing the flow of a pointing object detection method 360 according to the first embodiment of the present disclosure. The pointing object detection method 360 shown in Fig. 3 is a method for determining the pointing direction of a pointing motion by analyzing an input video showing the pointing motion, and detecting a pointing object 240 that is the target of the pointing motion based on the determined pointing direction, and is implemented by a pointing object detection device (e.g., the pointing object detection device 210 shown in Fig. 2).

まず、ステップＳ３６２では、撮影部（例えば、図２に示す画像入力部２２４）は、ユーザの手と、少なくとも１つの指示物体候補とを示す入力映像３６３を取得する。この入力映像３６３は、例えば静止画像であってもよく、多数の画像フレームから構成される動画であってもよい。本開示に係るある態様では、ユーザは、上述したユーザ端末（例えば図２に示すユーザ端末２５０）を介して、作業を開始する旨の指示を入力した後、撮影部はユーザの作業の撮影を開始し、入力映像３６３を取得してもよい。
撮影部によって取得される入力映像３６３は、上述した記憶部（例えば、図２に示す記憶部２２８）に格納されてもよい。 First, in step S362, the image capture unit (e.g., image input unit 224 shown in FIG. 2) acquires an input video 363 showing the user's hand and at least one pointing object candidate. This input video 363 may be, for example, a still image or a video composed of a number of image frames. In one aspect of the present disclosure, the user may input an instruction to start work via the above-mentioned user terminal (e.g., user terminal 250 shown in FIG. 2), and then the image capture unit may start capturing the user's work and acquire the input video 363.
The input video 363 acquired by the imaging unit may be stored in the above-mentioned storage unit (for example, the storage unit 228 shown in FIG. 2).

次に、ステップＳ３６４では、オブジェクト検出部（例えば、図２に示すオブジェクト検出部２１２）は、撮影部によって取得された入力映像３６３を解析し、ユーザの手と指示物体候補とを検出する。ここでの指示物体候補とは、入力映像３６３に含まれており、指差し動作の対象となる物体である可能性がある物体である。
上述したように、オブジェクト検出部２１２は、入力映像３６３に含まれる全てのオブジェクトを検出し、検出したオブジェクトのクラスを示すラベル３６５を生成してもよい。これらのラベル３６５は、例えば上述した記憶部に格納されてもよい。 Next, in step S364, an object detection unit (for example, object detection unit 212 shown in FIG. 2) analyzes input image 363 acquired by the image capture unit, and detects the user's hand and a pointing object candidate. The pointing object candidate here is an object that is included in input image 363 and may be a target object for the pointing action.
As described above, the object detection unit 212 may detect all objects included in the input video 363 and generate labels 365 indicating the classes of the detected objects. These labels 365 may be stored in, for example, the storage unit described above.

ステップＳ３６４でのオブジェクト検出部による処理の結果、手として認識されたオブジェクトが入力映像３６３に存在すると判定された（つまり、ラベル３６５のうち、「手」とのラベルに該当するオブジェクトが検出された）場合、本処理はステップＳ３６６へ進む。一方、ステップＳ３６４でのオブジェクト検出部による処理の結果、手として認識されたオブジェクトが入力映像３６３に存在しないと判定された場合、本処理はステップＳ３６２へ戻り、撮影部が入力映像３６３の取得を継続する。 If the result of the processing by the object detection unit in step S364 is that it is determined that the object recognized as a hand is present in the input video 363 (i.e., an object corresponding to the label "hand" among the labels 365 is detected), the process proceeds to step S366. On the other hand, if the result of the processing by the object detection unit in step S364 is that it is determined that the object recognized as a hand is not present in the input video 363, the process returns to step S362, and the image capture unit continues acquiring the input video 363.

次に、ステップＳ３６６では、画像加工部（例えば、図２に示す画像加工部２１４）は、入力映像３６３の中から、ユーザの手のみを示す手画像を抽出する。例えば、画像加工部は、ユーザの手を示す画像を入力映像から切り出して（クロッピング）もよく、ユーザの手の肌の色に基づいた画像分割（ｓｋｉｎ－ｃｏｌｏｒｂａｓｅｄｉｍａｇｅｓｅｇｍｅｎｔａｔｉｏｎ）を用いてもよく、ｓｅｇｎｅｔ、ＩＣＮＥＴ、ＦＰＮ等の深層学習によるアプローチを用いてもよい。画像加工部による処理の結果、ユーザの手に属する画素の輝度値が１以上となり、ユーザの手に属さない画素の輝度値が０となる手画像が得られる。 Next, in step S366, the image processing unit (for example, the image processing unit 214 shown in FIG. 2) extracts a hand image showing only the user's hand from the input video 363. For example, the image processing unit may cut out (crop) an image showing the user's hand from the input video, may use skin-color based image segmentation of the user's hand, or may use a deep learning approach such as segnet, ICNET, or FPN. As a result of processing by the image processing unit, a hand image is obtained in which the luminance values of pixels belonging to the user's hand are 1 or more, and the luminance values of pixels not belonging to the user's hand are 0.

次に、ステップＳ３６８では、ジェスチャー判定部（例えば、図２に示すジェスチャー判定部２１８）は、ステップＳ３６６で抽出された手画像を解析することで、手が示すジェスチャーを判定する。ジェスチャー判定部は、例えば、ユーザの手のキーポイント（指、間接、手首、手の甲等）を特定し、特定した手のキーポイントの分布に基づいて、手が示すジェスチャーが指差し動作か否かを判定してもよい。 Next, in step S368, a gesture determination unit (e.g., gesture determination unit 218 shown in FIG. 2) determines the gesture indicated by the hand by analyzing the hand image extracted in step S366. The gesture determination unit may, for example, identify key points of the user's hand (fingers, joints, wrist, back of the hand, etc.) and determine whether the gesture indicated by the hand is a pointing motion based on the distribution of the identified key points of the hand.

ステップＳ３６８でのジェスチャー判定部の処理の結果、手が示すジェスチャーが指差し動作であると判定された場合、本処理はステップＳ３７０へ進む。一方、ステップＳ３６８でのジェスチャー判定部の処理の結果、手が示すジェスチャーが指差し動作でないと判定された場合、本処理はステップＳ３６２へ戻り、撮影部が入力映像３６３の取得を継続する。 If the result of the processing by the gesture determination unit in step S368 indicates that the gesture indicated by the hand is a pointing motion, the process proceeds to step S370. On the other hand, if the result of the processing by the gesture determination unit in step S368 indicates that the gesture indicated by the hand is not a pointing motion, the process returns to step S362, and the image capture unit continues acquiring the input video 363.

次に、ステップＳ３７０では、指示方向判定部（例えば、図２に示す指示方向判定部２２０）は、指差し動作の指示方向を判定する。より具体的には、指示方向判定部は、ユーザの手のセントロイドを特定し、このセントロイドから第１の距離基準を満たし、且つ、所定の曲率基準を満たす領域を、指し指の指先として特定し、特定した指先に属する画素の分布に基づいて、指差し動作の指示方向を判定する。
なお、指示方向判定部を用いて指示方向を判定する処理の詳細については後述する。 Next, in step S370, a pointing direction determination unit (for example, the pointing direction determination unit 220 shown in FIG. 2) determines the pointing direction of the pointing motion. More specifically, the pointing direction determination unit identifies a centroid of the user's hand, identifies an area that satisfies a first distance criterion from the centroid and also satisfies a predetermined curvature criterion as the tip of the index finger, and determines the pointing direction of the pointing motion based on the distribution of pixels belonging to the identified tip.
The process of determining the pointing direction using the pointing direction determination unit will be described later in detail.

次に、ステップＳ３７２では、指示物体特定部（例えば、図２に示す指示物体特定部２２２）は、ステップＳ３７２で指示方向判定部によって判定された指示方向と、ステップS３６４で判定されたラベル３６５とに基づいて、入力映像３６３において指差し動作によって指示される指示物体２４０を、指示物体候補の中から特定する。ここで特定した指示物体の情報は、上述した記憶部（例えば、図２に示す記憶部２２８）に格納されてもよい。 Next, in step S372, a pointing object identification unit (e.g., pointing object identification unit 222 shown in FIG. 2) identifies, from among the pointing object candidates, the pointing object 240 that is pointed to by the pointing action in the input video 363, based on the pointing direction determined by the pointing direction determination unit in step S372 and the label 365 determined in step S364. Information on the pointing object identified here may be stored in the storage unit described above (e.g., storage unit 228 shown in FIG. 2).

以上説明した指示物体検出方法３６０によれば、手の画像における画素の曲率と手の中心からの距離とに基づいて指差し方向を判定し、判定した指差し方向に基づいて指差し動作の対象となる指示物体を検出することで、画角やノイズ等で手の画素が鮮明に写らない画像の場合であっても、高精度の指示物体検出が可能な頑強性が高い指示物体検出を提供することができる。 According to the pointing object detection method 360 described above, the pointing direction is determined based on the curvature of the pixels in the hand image and the distance from the center of the hand, and the pointing object that is the target of the pointing action is detected based on the determined pointing direction. This makes it possible to provide highly robust pointing object detection that enables highly accurate pointing object detection even in cases where the pixels of the hand are not clearly visible due to the angle of view, noise, etc.

次に、図４を参照して、本開示の実施例１に係る指示方向判定処理について説明する。 Next, referring to FIG. 4, the pointing direction determination process according to the first embodiment of the present disclosure will be described.

図４は、本開示の実施例１に係る指示方向判定処理４００の流れを示す図である。図４に示す指示方向判定処理４００は、指差し動作の指示方向を判定するための処理であり、指示方向判定部（例えば図２に示す指示方向判定部２２０）によって実行される。
なお、図４に示す指示方向判定処理４００は、図３に示す指示物体検出方法３６０におけるステップＳ３７０に実質的に対応する。また、図４に示す指示方向判定処理４００は、指差し動作の指示方向を判定するための大まかな処理であり、指差し動作の指示方向を判定する処理の詳細は、図５～図９に示す指示方向判定手段の具体例を参照して説明する。 4 is a diagram showing a flow of a pointing direction determination process 400 according to the first embodiment of the present disclosure. The pointing direction determination process 400 shown in Fig. 4 is a process for determining the pointing direction of a pointing motion, and is executed by a pointing direction determination unit (e.g., the pointing direction determination unit 220 shown in Fig. 2).
The pointing direction determination process 400 shown in Fig. 4 substantially corresponds to step S370 in the pointing object detection method 360 shown in Fig. 3. The pointing direction determination process 400 shown in Fig. 4 is a rough process for determining the pointing direction of a pointing motion, and the details of the process for determining the pointing direction of a pointing motion will be described with reference to specific examples of the pointing direction determination means shown in Figs.

まず、ステップＳ４１０では、指示方向判定部は、入力映像から抽出された手画像における指差し動作に用いられている指し指の指先を特定する。より具体的には、まず、指示方向判定部は、手画像における手のセントロイドを特定する。実施例1におけるセントロイドは、既存の画像処理手法や統計的手法に基づいて求められてもよい。手のセントロイドを特定した後、指示方向判定部は、このセントロイドから第１の距離基準を満たし、且つ、所定の曲率基準を満たす領域を、指し指の指先として特定する。
この第１の距離基準は、ある領域が指先として認定されるために、当該領域のセントロイドからの必要な距離の下限を指定してもよい。
同様に、この曲率基準は、ある領域が指先として認定されるために、必要な曲率の下限を指定してもよい。
このように、上述した距離基準及び曲率基準を用いることで、手のセントロイド（つまり、手の中心部）から所定の距離以上に離れており、所定の曲率以上を持つ領域を指し指の指先として特定できる。
なお、ここでの第１の距離基準及び曲率基準は、ユーザによって指定されてもよく、過去に収集されたテスト用画像に基づいて適切な値を設定するように学習された機械学習手段によって指定されてもよい。 First, in step S410, the pointing direction determination unit identifies the tip of the index finger used in the pointing motion in the hand image extracted from the input video. More specifically, the pointing direction determination unit first identifies the centroid of the hand in the hand image. The centroid in the first embodiment may be obtained based on an existing image processing method or statistical method. After identifying the centroid of the hand, the pointing direction determination unit identifies an area that satisfies a first distance criterion from the centroid and also satisfies a predetermined curvature criterion as the tip of the index finger.
This first distance criterion may specify a lower limit on the distance a region must be from the centroid for it to qualify as a fingertip.
Similarly, the curvature criterion may specify a lower limit for the curvature required for a region to qualify as a fingertip.
In this way, by using the distance and curvature criteria described above, an area that is a predetermined distance or more away from the centroid of the hand (i.e., the center of the hand) and has a predetermined curvature or more can be identified as the tip of the index finger.
It should be noted that the first distance and curvature criteria here may be specified by a user or may be specified by a machine learning means trained to set appropriate values based on previously collected test images.

次に、ステップＳ４２０では、指示方向判定部は、ステップＳ４１０で特定した指し指の指先に属する画素の分布を判定する。ここでは、「画素の分布」とは、指先に属する画素と、他の画素との空間的関係を意味する。一例として、指示方向判定部は、ステップＳ４１０で特定した指し指の指先に属する画素と、セントロイドに属する画素との相対的な位置関係を特定してもよい。 Next, in step S420, the pointing direction determination unit determines the distribution of pixels belonging to the tip of the index finger identified in step S410. Here, "pixel distribution" means the spatial relationship between the pixels belonging to the fingertip and other pixels. As an example, the pointing direction determination unit may identify the relative positional relationship between the pixels belonging to the tip of the index finger identified in step S410 and the pixels belonging to the centroid.

次に、ステップＳ４３０では、指示方向判定部は、ステップＳ４２０で特定した画素の分布に基づいて指差し動作の指示方向を判定する。例えば、ある実施例では、指示方向判定部は、指し指の指先に属する画素と、セントロイドに属する画素とを通過する架空線の方向を指示方向としてもよい。ステップＳ４３０で判定した指示方向は、上述した記憶部（例えば、図２に示す記憶部２２８）に格納されてもよい。
なお、上述したように、画素の分布に基づいて指差し動作の指示方向を判定する処理の詳細は、図５～図９に示す指示方向判定手段の具体例を参照して説明する。 Next, in step S430, the pointing direction determination unit determines the pointing direction of the pointing motion based on the distribution of pixels identified in step S420. For example, in one embodiment, the pointing direction determination unit may determine the direction of an imaginary line passing through pixels belonging to the fingertip of the index finger and pixels belonging to the centroid as the pointing direction. The pointing direction determined in step S430 may be stored in the storage unit described above (for example, the storage unit 228 shown in FIG. 2).
As described above, the details of the process of determining the direction of the pointing action based on the distribution of pixels will be described with reference to specific examples of the pointing direction determining means shown in FIGS.

以上説明した指示方向判定処理４００によれば、手の画像における画素の曲率と手の中心からの距離とに基づいて指差し方向を判定し、判定した指差し方向に基づいて指差し動作の対象となる指示物体を検出することで、画角やノイズ等で手の画素が鮮明に写らない画像の場合であっても、高精度の指示物体検出が可能な頑強性が高い指示物体検出手段を提供することができる。 According to the pointing direction determination process 400 described above, the pointing direction is determined based on the curvature of the pixels in the hand image and the distance from the center of the hand, and the pointing object that is the target of the pointing action is detected based on the determined pointing direction, thereby providing a highly robust pointing object detection means that is capable of detecting the pointing object with high accuracy even in the case of an image in which the pixels of the hand are not clearly visible due to the angle of view, noise, etc.

次に、図５～図９を参照して、指差し動作の指示方向を判定する処理の具体例について説明する。 Next, a specific example of the process for determining the direction of a pointing gesture will be described with reference to Figures 5 to 9.

図５は、本開示の実施例１に係る第１の指示方向判定手段５００の一例を示す図である。図５に示すように、手画像５０５には、ユーザの手５０２が写っており、上述したジェスチャー判定部（例えば、図２に示すジェスチャー判定部２１８）の処理の結果、手５０２が指差し動作となっていることが判定されている。 Fig. 5 is a diagram showing an example of the first pointing direction determination means 500 according to the first embodiment of the present disclosure. As shown in Fig. 5, a user's hand 502 is shown in a hand image 505, and as a result of processing by the above-mentioned gesture determination unit (e.g., the gesture determination unit 218 shown in Fig. 2), it is determined that the hand 502 is making a pointing gesture.

まず、上述したように、指示方向判定部（例えば、図２に示す指示方向判定部２２０）は、手５０２のセントロイド５１０を特定する。このセントロイド５１０とは、手５０２の質量中心であり、既存の画像処理手法や統計的手法に基づいて求められてもよい。手５０２のセントロイド５１０を特定した後、指示方向判定部は、このセントロイド５１０から第１の距離基準を満たし、且つ、所定の曲率基準を満たす領域を、指し指５０４の指先５１２として特定する。 First, as described above, the pointing direction determination unit (for example, the pointing direction determination unit 220 shown in FIG. 2) identifies the centroid 510 of the hand 502. This centroid 510 is the center of mass of the hand 502, and may be determined based on existing image processing or statistical methods. After identifying the centroid 510 of the hand 502, the pointing direction determination unit identifies an area that satisfies a first distance criterion from the centroid 510 and also satisfies a predetermined curvature criterion as the fingertip 512 of the index finger 504.

次に、指示方向判定部は、特定した指先５１２と、セントロイド５１０との距離ｄに基づいて、指先５１２を中心とする対象領域５１１を判定する。一例として、対象領域５１１は、距離ｄ／３を半径とする円形領域であってもよい。
なお、対象領域５１１の半径はハイパーパラメータであり、ユーザに設定されてもよく、既存のハイパーパラメータ最適化方法に基づいて最適化されてもよい。 Next, the pointing direction determination unit determines a target region 511 centered on the fingertip 512 based on a distance d between the identified fingertip 512 and the centroid 510. As an example, the target region 511 may be a circular region with a radius of the distance d/3.
It should be noted that the radius of the target region 511 is a hyperparameter, and may be set by the user or may be optimized based on an existing hyperparameter optimization method.

次に、指示方向判定部は、対象領域５１１において、所定の輝度基準を満たす画素を、指差し動作に用いられる手５０２の指し指５０４に属する指画素として特定する。例えば、手画像５０５は、手５０２に属する画素の輝度値が１以上となり、手５０２に属さない画素の輝度値が０となるバイナリー画像の場合、この輝度基準は「１」であってもよい。
このように、輝度基準を「１」として設定することで、対象領域５１１における明るい画素（つまり、対象領域５１１において、指し指５０４に属する画素）を特定することができる。
なお、以上では、対象領域及び画素の輝度値に基づいて指画素を特定する場合を一例として説明したが、本開示はこれに限定されず、例えばクラスタリングや回帰（recursion）等、任意の手法を用いてもよい。 Next, the pointing direction determination unit specifies pixels in the target region 511 that satisfy a predetermined luminance criterion as finger pixels that belong to the index finger 504 of the hand 502 used for the pointing motion. For example, if the hand image 505 is a binary image in which the luminance value of pixels that belong to the hand 502 is 1 or more and the luminance value of pixels that do not belong to the hand 502 is 0, this luminance criterion may be "1".
In this way, by setting the brightness standard as "1", it is possible to identify bright pixels in the target region 511 (that is, pixels in the target region 511 that belong to the index finger 504).
Note that, although the above describes an example in which finger pixels are identified based on the target region and pixel luminance values, the present disclosure is not limited to this, and any method such as clustering or recursion may be used.

次に、指示方向判定部は、特定した指画素の平均画素座標５１４（ｍｅａｎｐｉｘｅｌｃｏｏｒｄｉｎａｔｅ）を計算する。この平均画素座標５１４は、例えば任意の既存の画像処理手段によって計算されてもよい。
平均画素座標５１４を計算した後、指示方向判定部は、計算した平均画素座標５１４と、指先５１２とを通る架空線５１６の方向を、指示方向として判定する。 Next, the pointing direction determination unit calculates the mean pixel coordinates of the identified finger pixels 514. The mean pixel coordinates 514 may be calculated, for example, by any existing image processing means.
After calculating the average pixel coordinates 514, the pointing direction determination unit determines the direction of an imaginary line 516 that passes through the calculated average pixel coordinates 514 and the fingertip 512 as the pointing direction.

図６は、本開示の実施例１に係る第２の指示方向判定手段６００の一例を示す図である。図６に示すように、手画像６０５には、ユーザの手６０２が写っており、上述したジェスチャー判定部（例えば、図２に示すジェスチャー判定部２１８）の処理の結果、手６０２が指差し動作となっていることが判定されている。 Fig. 6 is a diagram showing an example of the second pointing direction determination means 600 according to the first embodiment of the present disclosure. As shown in Fig. 6, a user's hand 602 is shown in a hand image 605, and as a result of processing by the gesture determination unit described above (e.g., the gesture determination unit 218 shown in Fig. 2), it is determined that the hand 602 is making a pointing gesture.

まず、上述したように、指示方向判定部（例えば、図２に示す指示方向判定部２２０）は、手６０２のセントロイド６１０を特定する。このセントロイド６１０とは、手６０２の質量中心であり、既存の画像処理手法や統計的手法に基づいて求められてもよい。手６０２のセントロイド６１０を特定した後、指示方向判定部は、このセントロイド６１０から第１の距離基準を満たし、且つ、所定の曲率基準を満たす領域を、指し指６０４の指先６１２として特定する。 First, as described above, the pointing direction determination unit (for example, the pointing direction determination unit 220 shown in FIG. 2) identifies the centroid 610 of the hand 602. This centroid 610 is the center of mass of the hand 602, and may be determined based on existing image processing or statistical methods. After identifying the centroid 610 of the hand 602, the pointing direction determination unit identifies an area that satisfies a first distance criterion from the centroid 610 and also satisfies a predetermined curvature criterion as the fingertip 612 of the index finger 604.

次に、指示方向判定部は、特定した指先６１２と、セントロイド６１０との距離ｄに基づいて、指先６１２を中心とする第１の対象領域６１１を判定する。一例として、第１の対象領域６１１は、距離ｄ／６を半径とする円形領域であってもよい。 Next, the pointing direction determination unit determines a first target region 611 centered on the fingertip 612 based on the distance d between the identified fingertip 612 and the centroid 610. As an example, the first target region 611 may be a circular region with a radius of the distance d/6.

次に、指示方向判定部は、第１の対象領域６１１において、所定の輝度基準を満たす画素を、指差し動作に用いられる手６０２の指し指６０４に属する指画素として特定する。例えば、手画像６０５は、手６０２に属する画素の輝度値が１以上となり、手６０２に属さない画素の輝度値が０となるバイナリー画像の場合、この輝度基準は「１」であってもよい。
このように、輝度基準を「１」として設定することで、第１の対象領域６１１における明るい画素（つまり、第１の対象領域６１１において、指し指６０４に属する画素）を特定することができる。 Next, the pointing direction determination unit specifies pixels in the first target region 611 that satisfy a predetermined luminance criterion as finger pixels belonging to the index finger 604 of the hand 602 used for the pointing motion. For example, if the hand image 605 is a binary image in which the luminance value of pixels belonging to the hand 602 is 1 or more and the luminance value of pixels not belonging to the hand 602 is 0, this luminance criterion may be "1".
In this way, by setting the brightness standard as "1", it is possible to identify bright pixels in the first object region 611 (i.e., pixels in the first object region 611 that belong to the index finger 604).

次に、指示方向判定部は、特定した指画素の平均画素座標６１４（ｍｅａｎｐｉｘｅｌｃｏｏｒｄｉｎａｔｅ）を計算する。 Next, the pointing direction determination unit calculates the mean pixel coordinate 614 of the identified finger pixels.

次に、指示方向判定部は、計算した平均画素座標６１４を中心とする第２の対象領域６１３を判定する。一例として、第２の対象領域６１３は、第１の対象領域６１１と同様に、距離ｄ／６を半径とする円形領域であってもよい。 Next, the pointing direction determination unit determines a second target region 613 centered on the calculated average pixel coordinate 614. As an example, the second target region 613 may be a circular region with a radius of the distance d/6, similar to the first target region 611.

次に、指示方向判定部は、第２の対象領域の中心である平均画素座標６１４と、第１の対象領域の中心である指先６１２とを通る架空線６１６の方向を、指示方向として判定する。 Next, the pointing direction determination unit determines the direction of an imaginary line 616 that passes through the average pixel coordinate 614, which is the center of the second target area, and the fingertip 612, which is the center of the first target area, as the pointing direction.

以上説明した図６に示す第２の指示方向判定手段６００では、図５に示す第１の指示方向判定手段５００に比べて、半径が小さい対象領域を複数用いることで、指示方向を判定する精度を向上させることができる。
なお、以上では、対象領域を二つ用いる場合を一例として説明したが、本開示はこれに限定されず、３つ以上の対象領域を用いてもよい。また、以上では、対象領域を円形とした場合を一例として説明したが、本開示はこれに限定されず、例えば長方形等、任意の形状であってもよい。 The second pointing direction determination means 600 shown in FIG. 6 described above can improve the accuracy of determining the pointing direction by using multiple target regions with smaller radii than the first pointing direction determination means 500 shown in FIG. 5 .
In the above, the case where two target regions are used has been described as an example, but the present disclosure is not limited to this, and three or more target regions may be used. In addition, in the above, the case where the target region is a circle has been described as an example, but the present disclosure is not limited to this, and the target region may be any shape, such as a rectangle.

図７は、本開示の実施例１に係る第３の指示方向判定手段７００の一例を示す図である。図７に示すように、手画像７０５には、ユーザの手７０２が写っており、上述したジェスチャー判定部（例えば、図２に示すジェスチャー判定部２１８）の処理の結果、手７０２が指差し動作となっていることが判定されている。 Fig. 7 is a diagram showing an example of a third pointing direction determination means 700 according to the first embodiment of the present disclosure. As shown in Fig. 7, a user's hand 702 is shown in a hand image 705, and as a result of processing by the gesture determination unit described above (e.g., the gesture determination unit 218 shown in Fig. 2), it is determined that the hand 702 is making a pointing gesture.

まず、上述したように、指示方向判定部（例えば、図２に示す指示方向判定部２２０）は、手７０２のセントロイド７１０を特定する。このセントロイド７１０とは、手７０２の質量中心であり、既存の画像処理手法や統計的手法に基づいて求められてもよい。手７０２のセントロイド７１０を特定した後、指示方向判定部は、このセントロイド７１０から第１の距離基準を満たし、且つ、所定の曲率基準を満たす領域を、指し指７０４の指先７１４として特定する。 First, as described above, the pointing direction determination unit (for example, the pointing direction determination unit 220 shown in FIG. 2) identifies the centroid 710 of the hand 702. This centroid 710 is the center of mass of the hand 702, and may be determined based on existing image processing or statistical methods. After identifying the centroid 710 of the hand 702, the pointing direction determination unit identifies an area that satisfies a first distance criterion from the centroid 710 and also satisfies a predetermined curvature criterion as the fingertip 714 of the index finger 704.

次に、指示方向判定部は、特定した指先７１４と、セントロイド７１０とを繋ぐ第１の架空線７０６を設定する。その後、指示方向判定部は、第１の架空線７０６に対して垂直であり、指先７１４に対して第２の距離基準を満たす第２の架空線７０８を設定すると共に、第１の架空線７０６に対して垂直であり、第２の架空線７０８に対して平行であり、指先７１４に対して第３の距離基準を満たす第３の架空線７１２を設定する。また、第２の架空線７０８及び第３の架空線７１２は、輝度値が１以上の領域のみにおいて判定されてもよい（つまり、指し指７０４の境界を超えて背景まで伸びることはない）。
ここでの第２の距離基準及び第３の距離基準は、互いに異なっており、ユーザ又は機械学習手段によって設定される値であってもよい。第２の距離基準及び第３の距離基準は、互いに異なることにより、第２の架空線７０８と第３の架空線７１２との間に間隔を開けることができる。 Next, the pointing direction determination unit sets a first imaginary line 706 connecting the identified fingertip 714 and the centroid 710. Thereafter, the pointing direction determination unit sets a second imaginary line 708 that is perpendicular to the first imaginary line 706 and satisfies a second distance criterion with respect to the fingertip 714, and sets a third imaginary line 712 that is perpendicular to the first imaginary line 706, parallel to the second imaginary line 708, and satisfies a third distance criterion with respect to the fingertip 714. In addition, the second imaginary line 708 and the third imaginary line 712 may be determined only in an area where the luminance value is 1 or more (i.e., they do not extend beyond the boundary of the index finger 704 to the background).
The second distance criterion and the third distance criterion here may be different from each other and may be values set by a user or a machine learning means. By being different from each other, the second distance criterion and the third distance criterion can space the second and third imaginary lines 708 and 712 apart.

次に、指示方向判定部は、第２の架空線７０８の中心点と、第３の架空線７１２の中心点とを通る第４の架空線７１８の方向を、指示方向として判定してもよい。 Next, the indication direction determination unit may determine the direction of a fourth imaginary line 718 that passes through the center point of the second imaginary line 708 and the center point of the third imaginary line 712 as the indication direction.

以上説明した図７に示す第３の指示方向判定手段７００では、図５及び図６に示す第１、第２の指示方向判定手段５００、６００に比べて、必要なコンピューティング資源を抑えると共に、処理速度を向上させることができる。
なお、以上では、第１の架空線７０６に対して垂直な架空線を２つ判定した場合を一例として説明したが、本開示はこれに限定されず、第１の架空線７０６に対して垂直な架空線を例えば３つ以上としてもよい。 The third pointing direction determination means 700 shown in FIG. 7 described above can reduce the required computing resources and improve the processing speed, compared to the first and second pointing direction determination means 500, 600 shown in FIGS. 5 and 6.
Note that, although the above describes an example in which two overhead lines are determined to be perpendicular to the first overhead line 706, the present disclosure is not limited to this, and the number of overhead lines perpendicular to the first overhead line 706 may be, for example, three or more.

図８は、本開示の実施例１に係る第４の指示方向判定手段８００の一例を示す図である。図８に示すように、手画像８０５には、ユーザの手８０２が写っており、上述したジェスチャー判定部（例えば、図２に示すジェスチャー判定部２１８）の処理の結果、手８０２が指差し動作となっていることが判定されている。 Fig. 8 is a diagram showing an example of a fourth pointing direction determination means 800 according to the first embodiment of the present disclosure. As shown in Fig. 8, a user's hand 802 is shown in a hand image 805, and as a result of processing by the gesture determination unit described above (e.g., the gesture determination unit 218 shown in Fig. 2), it is determined that the hand 802 is making a pointing gesture.

まず、上述したように、指示方向判定部（例えば、図２に示す指示方向判定部２２０）は、手８０２のセントロイド８１０を特定する。このセントロイド８１０とは、手８０２の質量中心であり、既存の画像処理手法や統計的手法に基づいて求められてもよい。手８０２のセントロイド８１０を特定した後、指示方向判定部は、このセントロイド８１０から第１の距離基準を満たし、且つ、所定の曲率基準を満たす領域を、指し指８０４の指先８１４として特定する。 First, as described above, the pointing direction determination unit (for example, the pointing direction determination unit 220 shown in FIG. 2) identifies the centroid 810 of the hand 802. This centroid 810 is the center of mass of the hand 802, and may be determined based on existing image processing or statistical methods. After identifying the centroid 810 of the hand 802, the pointing direction determination unit identifies an area that satisfies a first distance criterion from the centroid 810 and also satisfies a predetermined curvature criterion as the fingertip 814 of the index finger 804.

次に、指示方向判定部は、特定した特定した指先８１４と、セントロイド８１０とに基づいて、指し指８０４に属する画素を指画素として特定する。ここでは、指示方向判定部は、上述したように、対象領域内に存在し、所定の輝度基準を満たす画素を指画素として特定してもよい。 Next, the pointing direction determination unit identifies pixels belonging to the index finger 804 as finger pixels based on the identified fingertip 814 and the centroid 810. Here, the pointing direction determination unit may identify pixels that exist within the target area and satisfy a predetermined brightness standard as finger pixels, as described above.

次に、指示方向判定部は、最小二乗法（ＬｅａｓｔＳｑｕａｒｅＬｉｎｅＦｉｔｔｉｎｇ）を用いて、指画素に基づいて指示方向８１５を判定する。ただし、本開示は最小二乗法に限定されず、指画素を特定した後、主成分分析（ＰｒｉｎｃｉｐａｌＣｏｍｐｏｎｅｎｔＡｎａｌｙｓｉｓ）を用いて指示方向８１５を特定してもよい。 Next, the pointing direction determination unit determines the pointing direction 815 based on the finger pixels using least squares fitting. However, the present disclosure is not limited to the least squares fitting, and after identifying the finger pixels, the pointing direction 815 may be identified using principal component analysis.

図９は、本開示の実施例１に係る第５の指示方向判定手段９００の一例を示す図である。図９に示すように、手画像９０５には、ユーザの手９０２が写っており、上述したジェスチャー判定部（例えば、図２に示すジェスチャー判定部２１８）の処理の結果、手９０２が指差し動作となっていることが判定されている。 Fig. 9 is a diagram illustrating an example of a fifth pointing direction determination means 900 according to the first embodiment of the present disclosure. As shown in Fig. 9, a user's hand 902 is shown in a hand image 905, and as a result of processing by the gesture determination unit described above (e.g., the gesture determination unit 218 shown in Fig. 2), it is determined that the hand 902 is making a pointing gesture.

まず、指示方向判定部は、手９０２にバウンディングボックス９１５をフィットさせる。ここでのバウンディングボックスは、例えば長方形であってもよく、楕円形であってもよく、任意の形状であってもよい。その後、指示方向判定部は、バウンディングボックス９１５に基づいて、手９０２に最良適合線９２０をフィットさせる。指示方向判定部は、この最良適合線９２０の方向を指差し動作の指示方向としてもよい。 First, the pointing direction determination unit fits a bounding box 915 to the hand 902. The bounding box here may be, for example, a rectangle, an ellipse, or any other shape. Then, the pointing direction determination unit fits a best fit line 920 to the hand 902 based on the bounding box 915. The pointing direction determination unit may determine the direction of the best fit line 920 as the pointing direction of the pointing motion.

以上、図５から図９を参照して、本開示の実施例に係る指差し動作の指示方向を判定する処理の具体例について説明したが、本開示はこれに限定されず、指差し動作の指示方向を高精度で判定できれば、任意の手段であってもよい。
例えば、もう一例として、指示方向判定部は、ユーザの手の画素の２次元の座標（横軸の座標及び縦軸の座標）を求め、これらの画素の座標の分布、平均、及び分散（ｄｉｓｔｒｉｂｕｔｉｏｎ，ｍｅａｎ，ａｎｄｖａｒｉａｎｃｅ）を解析することで、手の向き及び指示方向を判定してもよい。 Above, with reference to Figures 5 to 9, a specific example of a process for determining the direction of a pointing gesture according to an embodiment of the present disclosure has been described. However, the present disclosure is not limited to this, and any means may be used as long as it can determine the direction of a pointing gesture with high accuracy.
For example, as another example, the pointing direction determination unit may determine the two-dimensional coordinates (horizontal and vertical coordinates) of the pixels of the user's hand, and analyze the distribution, mean, and variance of these pixel coordinates to determine the orientation and pointing direction of the hand.

次に、図１０を参照して、本開示の実施例２に係る指示物体検出システムの構成について説明する。 Next, the configuration of a pointing object detection system according to Example 2 of the present disclosure will be described with reference to FIG. 10.

図１０は、本開示の実施例２に係る指示物体検出システム１０００の構成を示す図である。本開示の実施例２に係る指示物体検出システム１０００の指示物体検出装置１０１０は、物体管理部１０３０、環境認識部１０３２、ユーザ情報取得部１０３４及び更新部１０３６を含む構成となっている点において、図２に示す、本開示の実施例１に係る指示物体検出システム２００の指示物体検出装置２１０と異なる。物体管理部１０３０、環境認識部１０３２、ユーザ情報取得部１０３４及び更新部１０３６を含む点を除いて、実施例２に係る指示物体検出システム１０００の構成は、実施例１に係る指示物体検出システム２００と実質的に同様であるため、同一部分には同一の符号を付して示している。また、説明の便宜上、以下では、上述した部分の説明を省略し、指示物体検出システム１０００と指示物体検出システム２００との相違点を中心に説明する。 FIG. 10 is a diagram showing the configuration of a pointed object detection system 1000 according to a second embodiment of the present disclosure. The pointed object detection device 1010 of the pointed object detection system 1000 according to the second embodiment of the present disclosure is different from the pointed object detection device 210 of the pointed object detection system 200 according to the first embodiment of the present disclosure shown in FIG. 2 in that the pointed object detection device 1010 includes an object management unit 1030, an environment recognition unit 1032, a user information acquisition unit 1034, and an update unit 1036. Except for the inclusion of the object management unit 1030, the environment recognition unit 1032, the user information acquisition unit 1034, and the update unit 1036, the configuration of the pointed object detection system 1000 according to the second embodiment is substantially similar to that of the pointed object detection system 200 according to the first embodiment, and therefore the same parts are denoted by the same reference numerals. For convenience of explanation, the explanation of the above-mentioned parts will be omitted below, and the differences between the pointed object detection system 1000 and the pointed object detection system 200 will be mainly explained.

物体管理部１０３０は、指示物体の取り扱い手順を管理するための機能部である。なお、本開示において、取り扱い手順には、物体への加工、組立、処理、検査がなどの個々の作業や工程、その順序、時間的な管理項目等を含めることができる。
物体管理部１０３０は、複数の指示物体２４０がユーザ２３５の指差し動作によって指示される順番を示すユーザ物体シーケンス情報を作成し、格納することができる。また、このユーザ物体シーケンス情報は、指示物体２４０がユーザ２３５の指差し動作によって指示される順番に加えて、指示物体２４０の種類（ハンマー、釘、フォーク等）や、指示物体２４０が指示される時刻等を記録してもよい。
また、物体管理部１０３０は、複数の指示物体２４０を取り扱う順番を指定する指定物体シーケンス情報を格納してもよい。この指定物体シーケンス情報は、例えば事前に作成され、指示物体２４０を取り扱う正しい順番を規定する情報である。
物体管理部１０３０は、上述したユーザ物体シーケンス情報と、指定物体シーケンス情報とを比較することで、指示物体の取り扱いが正しいか否か（つまり、指示物体を取り扱う順番が間違っているか等）を判定することができる。 The object management unit 1030 is a functional unit for managing the handling procedure of the pointed object. In the present disclosure, the handling procedure may include individual tasks or steps such as processing, assembly, treatment, and inspection of the object, their order, and time-related management items.
The object management unit 1030 can create and store user object sequence information indicating the order in which the multiple pointing objects 240 are pointed to by the pointing action of the user 235. Furthermore, this user object sequence information may record the type of the pointing object 240 (hammer, nail, fork, etc.), the time when the pointing object 240 is pointed to, etc., in addition to the order in which the pointing objects 240 are pointed to by the pointing action of the user 235.
Furthermore, the object management unit 1030 may store designated object sequence information that specifies the order in which to handle the multiple pointing objects 240. This designated object sequence information is, for example, information that is created in advance and specifies the correct order in which to handle the pointing objects 240.
The object management unit 1030 can determine whether the pointed object is being handled correctly (i.e., whether the order in which the pointed object is handled is incorrect, etc.) by comparing the above-mentioned user object sequence information with the specified object sequence information.

環境認識部１０３２は、ユーザ２３５や指示物体２４０の周辺環境に関する環境情報を取得するための機能部である。この環境情報は、例えば、照明条件、画像入力部２２４の位置、ユーザ２３５と指示物体２４０との距離等、周辺環境に関する任意の情報を含んでもよい。環境認識部１０３２は、これらの環境情報を特定のユーザインタフェース（ユーザ端末２５０のユーザインタフェース等）を介して入力してもよく、これらの環境情報を画像入力部２２４によって取得されている映像から判定するための機械学習手段を含んでもよい。
後述するように、環境認識部１０３２は、ユーザ２３５や指示物体２４０の周辺環境に関する環境情報を用いることで、指示方向や指示物体２４０の判定の精度を向上させると共に、ユーザ２３５の指示物体２４０に対する取り扱いをより効率良く検証することができる。 The environment recognition unit 1032 is a functional unit for acquiring environmental information related to the surrounding environment of the user 235 and the pointing object 240. This environmental information may include any information related to the surrounding environment, such as lighting conditions, the position of the image input unit 224, and the distance between the user 235 and the pointing object 240. The environment recognition unit 1032 may input the environmental information via a specific user interface (such as a user interface of the user terminal 250), and may include a machine learning means for determining the environmental information from the video acquired by the image input unit 224.
As described below, the environmental recognition unit 1032 uses environmental information regarding the surrounding environment of the user 235 and the pointing object 240 to improve the accuracy of determining the pointing direction and the pointing object 240, and to more efficiently verify the user's 235 handling of the pointing object 240.

ユーザ情報取得部１０３４は、ユーザ２３５に関するユーザ情報を取得するための機能部である。このユーザ情報は、例えば、ユーザの氏名、肌の色、身長、役職名、資格の有無等、ユーザ２３５に関する任意の情報を含んでもよい。ユーザ情報取得部１０３４は、これらのユーザ情報を特定のユーザインタフェース（ユーザ端末２５０のユーザインタフェース等）を介して入力してもよく、これらのユーザ情報を画像入力部２２４によって取得されている映像から判定するための機械学習手段を含んでもよい。
後述するように、ユーザ情報取得部１０３４は、ユーザ２３５に関するユーザ情報を用いることで、指示方向や指示物体２４０の判定の精度を向上させると共に、ユーザ２３５の指示物体２４０に対する取り扱いをより効率良く検証することができる。 The user information acquisition unit 1034 is a functional unit for acquiring user information regarding the user 235. This user information may include any information regarding the user 235, such as the user's name, skin color, height, job title, and the presence or absence of qualifications. The user information acquisition unit 1034 may input the user information via a specific user interface (such as the user interface of the user terminal 250), and may include a machine learning means for determining the user information from the video acquired by the image input unit 224.
As described below, the user information acquisition unit 1034 uses user information about the user 235 to improve the accuracy of determining the pointing direction and the pointing object 240, and to more efficiently verify the user's 235 handling of the pointing object 240.

更新部１０３６は、上述した環境認識部１０３２やユーザ情報取得部１０３４によって取得された環境情報及びユーザ情報に基づいて、指示物体検出装置１０１０の各機能部のパラメータを更新するための機能部である。 The update unit 1036 is a functional unit for updating the parameters of each functional unit of the pointed object detection device 1010 based on the environmental information and user information acquired by the above-mentioned environmental recognition unit 1032 and user information acquisition unit 1034.

以上説明した本開示の実施例２に係る指示物体検出システム１０００によれば、ユーザの情報や周辺環境の情報を考慮した上でユーザの指差し動作の対象となる指示物体２４０を検出できると共に、指示物体２４０の取り扱い手順を検証することが可能となる。 The pointed object detection system 1000 according to the second embodiment of the present disclosure described above can detect the pointed object 240 that is the target of the user's pointing action while taking into account the user's information and the surrounding environment information, and can also verify the handling procedure of the pointed object 240.

次に、図１１を参照して、本開示の実施例２に係る物体管理部について説明する。 Next, the object management unit according to the second embodiment of the present disclosure will be described with reference to FIG.

図１１は、本開示の実施例２に係る物体管理部１０３０の構成の一例を示す図である。上述したように、物体管理部１０３０は、指示物体の取り扱い手順を管理するための機能部である。 FIG. 11 is a diagram showing an example of the configuration of the object management unit 1030 according to the second embodiment of the present disclosure. As described above, the object management unit 1030 is a functional unit for managing the handling procedure of the pointed object.

図１１に示すように、物体管理部１０３０は、指示物体の取り扱いを管理するための情報を格納する物体シーケンス情報データベース１１２０を含む。より具体的には、物体シーケンス情報データベース１１２０は、複数の指示物体を取り扱う正しい順番を指定する指定物体シーケンス情報１１３０と、複数の指示物体がユーザの指差し動作によって指示される順番を示すユーザ物体シーケンス情報１１４０とを格納する。
上述したように、この指定物体シーケンス情報１１３０は、事前に作成され、指示物体２４０を取り扱う正しい順番をタスク毎に規定する情報である。また、ユーザ物体シーケンス情報１１４０は、ユーザが実際に指示した指示物体の順番をタスク毎に示す情報であり、リアルタイムで指示物体が特定されることに応じて記録されてもよい。
一例として、指定物体シーケンス情報１１３０は、Task-1に含まれるSubTask-1について、「Object1,Object2, Object3」との取り扱う順番を規定してもよい。一方、ユーザの指差し動作の対象となる指示物体が指示される順番を記録した結果、物体管理部１０３０は、Task-1に含まれるSubTask-1について、「Object2, Object3,Object1」との取り扱う順番を示すユーザ物体シーケンス情報１１４０を取得してもよい。 11, the object management unit 1030 includes an object sequence information database 1120 that stores information for managing the handling of pointing objects. More specifically, the object sequence information database 1120 stores designated object sequence information 1130 that specifies the correct order for handling multiple pointing objects, and user object sequence information 1140 that indicates the order in which multiple pointing objects are pointed to by the user's pointing action.
As described above, the designated object sequence information 1130 is information that is created in advance and that specifies for each task the correct order of handling the designated object 240. The user object sequence information 1140 is information that indicates for each task the order of the designated objects that the user actually designated, and may be recorded in response to the designated objects being identified in real time.
As an example, the specified object sequence information 1130 may specify the order in which "Object1, Object2, Object3" are handled for SubTask-1 included in Task-1. Meanwhile, as a result of recording the order in which the pointing object targeted by the user's pointing action is pointed to, the object management unit 1030 may obtain user object sequence information 1140 indicating the order in which "Object2, Object3, Object1" are handled for SubTask-1 included in Task-1.

次に、図１２を参照して、本開示の実施例２に係るユーザ物体シーケンス情報取得処理について説明する。 Next, the user object sequence information acquisition process according to the second embodiment of the present disclosure will be described with reference to FIG.

図１２は、本開示の実施例２に係るユーザ物体シーケンス情報取得処理１２００の流れを示すフローチャートである。本開示の実施例２に係るユーザ物体シーケンス情報取得処理１２００は、複数の指示物体がユーザの指差し動作によって指示される順番を示すユーザ物体シーケンス情報１１４０を取得するための処理であり、物体管理部１０３０によって実施される。 FIG. 12 is a flowchart showing the flow of a user object sequence information acquisition process 1200 according to the second embodiment of the present disclosure. The user object sequence information acquisition process 1200 according to the second embodiment of the present disclosure is a process for acquiring user object sequence information 1140 indicating the order in which multiple pointing objects are pointed to by the user's pointing action, and is executed by the object management unit 1030.

まず、ステップＳ１２１０では、物体管理部１０３０は、指示物体検出システムが管理する作業場において、現在進行中の各作業を担当するユーザの情報を取得する。ここでは、物体管理部１０３０は、現在進行中の各作業を担当するユーザの情報を後述するユーザ情報データベースから取得してもよい。ここでは、ユーザの情報は、例えば作業を担当するユーザの氏名や、ユーザを一意に識別する識別子（Worker ID）等を含んでもよい。 First, in step S1210, the object management unit 1030 acquires information about the users in charge of each task currently in progress in the workplace managed by the pointed object detection system. Here, the object management unit 1030 may acquire information about the users in charge of each task currently in progress from a user information database described below. Here, the user information may include, for example, the name of the user in charge of the task, an identifier (Worker ID) that uniquely identifies the user, etc.

次に、ステップＳ１２２０では、物体管理部１０３０は、対象の作業において特定した指示物体の情報を取得する。ここで、物体管理部１０３０は、例えば図３に示す指示物体検出方法３６０によって特定された指示物体の情報を上述した記憶部（例えば、図１０に示す記憶部２２８）から取得してもよい。対象の作業について複数の指示物体が特定された場合、物体管理部１０３０は、特定された各指示物体を取得する。
ここでの指示物体の情報は、例えば指示物体を一意に識別する識別子（Object ID）や、当該指示物体が指示された時刻を含んでもよい。 Next, in step S1220, the object management unit 1030 acquires information on the pointing object identified in the target task. Here, the object management unit 1030 may acquire information on the pointing object identified by, for example, the pointing object detection method 360 shown in Fig. 3 from the above-mentioned storage unit (for example, the storage unit 228 shown in Fig. 10). When multiple pointing objects are identified for the target task, the object management unit 1030 acquires each of the identified pointing objects.
The information on the pointing object here may include, for example, an identifier (Object ID) that uniquely identifies the pointing object and the time when the pointing object is pointed.

次に、ステップＳ１２３０では、物体管理部１０３０は、ステップＳ１２１０で取得した各作業を担当するユーザの情報と、ステップＳ１２２０で取得した指示物体の情報とを対応付けることで、作業のタスク毎に、特定のユーザによって指示された指示物体の情報を、指示された順番で示すユーザ物体シーケンス情報１１４０を生成することができる。このユーザ物体シーケンス情報１１４０は、例えば上述した物体シーケンス情報データベース１１２０に格納されてもよい。 Next, in step S1230, the object management unit 1030 can generate user object sequence information 1140 that indicates information on the pointed objects pointed to by a specific user for each work task in the order in which they were pointed to, by associating the information on the users in charge of each work task obtained in step S1210 with the information on the pointed objects obtained in step S1220. This user object sequence information 1140 may be stored, for example, in the object sequence information database 1120 described above.

以上説明したユーザ物体シーケンス情報取得処理１２００によれば、ある物体がユーザの指差し動作の対象となる指示物体として特定される度に、特定された指示物体に関する情報（指示物体の種類、指示物体が指示される時刻等）を記録し、ユーザの情報に対応付けることで、特定のユーザが実際に指示した指示物体の順番をタスク毎に示す情報をユーザ物体シーケンス情報１１４０として取得することができる。 According to the user object sequence information acquisition process 1200 described above, each time an object is identified as a pointed object targeted by a user's pointing action, information about the identified pointed object (such as the type of pointed object and the time when the pointed object is pointed to) is recorded and associated with user information, so that information indicating the order in which the pointed objects were actually pointed to by a specific user for each task can be acquired as user object sequence information 1140.

次に、図１３を参照して、本開示の実施例２に係る物体取り扱いシーケンス検証処理について説明する。 Next, the object handling sequence verification process according to the second embodiment of the present disclosure will be described with reference to FIG. 13.

図１３は、本開示の実施例２に係る物体取り扱いシーケンス検証処理１３００の流れを示すフローチャートである。物体取り扱いシーケンス検証処理１３００は、指示物体の取り扱い手順において異常が発生したか否かを判定するための処理であり、上述した物体管理部１０３０によって実施される。 FIG. 13 is a flowchart showing the flow of the object handling sequence verification process 1300 according to the second embodiment of the present disclosure. The object handling sequence verification process 1300 is a process for determining whether or not an abnormality has occurred in the handling procedure of the pointed object, and is performed by the object management unit 1030 described above.

まず、ステップＳ１３１０では、物体管理部１０３０は、ユーザ物体シーケンス情報と、指定物体シーケンス情報とを取得する。
ここで、物体管理部１０３０は、図１２を参照して上述したユーザ物体シーケンス情報取得処理１２００を用いてユーザ物体シーケンス情報を取得してもよい。
また、物体管理部１０３０は、指定物体シーケンス情報を、例えば管理者（工場の責任者等）に入力させることで取得してもよい。また、本開示に係るある態様では、物体管理部１０３０は、対象のタスクを過去に正しく行ったユーザの動作に基づいて作成されたユーザ物体シーケンス情報を、指定物体シーケンス情報として用いてもよい。 First, in step S1310, the object management unit 1030 acquires user object sequence information and designated object sequence information.
Here, the object management unit 1030 may obtain the user object sequence information using the user object sequence information obtaining process 1200 described above with reference to FIG.
The object management unit 1030 may obtain the designated object sequence information by, for example, having an administrator (such as a factory manager) input the designated object sequence information. In an aspect of the present disclosure, the object management unit 1030 may use, as the designated object sequence information, user object sequence information created based on the actions of a user who has correctly performed a target task in the past.

次に、ステップＳ１３２０では、物体管理部１０３０は、ユーザ物体シーケンス情報と、指定物体シーケンス情報とを比較することで、指示物体の取り扱い手順が正しいか否か（つまり、指示物体を取り扱う順番が間違っているか等）を判定する。
より具体的には、物体管理部１０３０は、ユーザ物体シーケンス情報と、指定物体シーケンス情報とを比較することで、ユーザ物体シーケンス情報を検証し、ユーザ物体シーケンス情報が、指定物体シーケンス情報に対する所定の類似度基準を満たすか否かを判定する。ユーザ物体シーケンス情報が、指定物体シーケンス情報に対する所定の類似度基準を満たすと判定した場合、本処理はステップＳ１３１０に戻り、ユーザ物体シーケンス情報と及び指定物体シーケンス情報の取得を継続する。ユーザ物体シーケンス情報が、指定物体シーケンス情報に対する所定の類似度基準を満たさないと判定した場合、本処理はステップＳ１３３０へ進む。
なお、この類似度基準とは、ユーザ物体シーケンス情報と、指定物体シーケンス情報とが互いに一致することを判定するための必要最小限の類似度を指定する情報であり、事前に管理者等に設定されてもよい。 Next, in step S1320, the object management unit 1030 compares the user object sequence information with the specified object sequence information to determine whether the handling procedure for the pointed object is correct (i.e., whether the order in which the pointed object is handled is incorrect, etc.).
More specifically, the object management unit 1030 verifies the user object sequence information by comparing the user object sequence information with the designated object sequence information, and determines whether the user object sequence information satisfies a predetermined similarity criterion for the designated object sequence information. If it is determined that the user object sequence information satisfies the predetermined similarity criterion for the designated object sequence information, the process returns to step S1310 and continues to acquire the user object sequence information and the designated object sequence information. If it is determined that the user object sequence information does not satisfy the predetermined similarity criterion for the designated object sequence information, the process proceeds to step S1330.
Note that this similarity standard is information that specifies the minimum similarity required to determine that the user object sequence information and the specified object sequence information match each other, and may be set in advance by an administrator, etc.

次に、次に、ステップＳ１３３０では、物体管理部１０３０は、指示物体の取り扱い手順において異常や誤りが発生したと判定し、当該異常を示す異常通知を出力する。例えば、物体管理部１０３０は、異常が発生したタスクの識別情報、異常の内容（物体の取り扱い手順が誤っている等）、異常の発生時刻、タスクの担当者等の情報を含む異常通知を、例えばユーザ端末２５０に出力してもよい。 Next, in step S1330, the object management unit 1030 determines that an abnormality or error has occurred in the handling procedure of the pointed object, and outputs an abnormality notification indicating the abnormality. For example, the object management unit 1030 may output, to, for example, the user terminal 250, an abnormality notification including information such as the identification information of the task in which the abnormality occurred, the details of the abnormality (e.g., the handling procedure of the object is incorrect), the time the abnormality occurred, and the person in charge of the task.

一例として、指定物体シーケンス情報は、Task-1に含まれるSubTask-1について、「Object1,Object2, Object3」との取り扱う順番を規定するとする。一方、ユーザの指差し動作の対象となる指示物体が指示される順番を記録した結果、物体管理部１０３０は、Task-1に含まれるSubTask-1について、「Object2, Object3,Object1」との取り扱う順番を示すユーザ物体シーケンス情報が取得されたとする。
この場合、物体管理部１０３０は、指定物体シーケンス情報と、ユーザ物体シーケンス情報とを比較した結果、ユーザ物体シーケンス情報が指定物体シーケンス情報に一致しないと判定し、指示物体の取り扱い手順に関する異常が発生したと判定する。その後、物体管理部は、当該異常を示す異常通知を生成し、ユーザ端末２５０に通知してもよい。このように、ユーザ端末２５０のユーザは、検出された異常を解決する行動を取ることができる。 As an example, the designated object sequence information specifies the order of handling "Object1, Object2, Object3" for SubTask-1 included in Task-1. Meanwhile, as a result of recording the order in which the pointing object targeted by the user's pointing action is pointed to, the object management unit 1030 acquires user object sequence information indicating the order of handling "Object2, Object3, Object1" for SubTask-1 included in Task-1.
In this case, the object management unit 1030 compares the designated object sequence information with the user object sequence information, and determines that the user object sequence information does not match the designated object sequence information, and determines that an abnormality has occurred regarding the handling procedure of the pointed object. The object management unit may then generate an abnormality notification indicating the abnormality and notify the user terminal 250. In this way, the user of the user terminal 250 can take action to resolve the detected abnormality.

このように、以上説明した物体取り扱いシーケンス検証処理１３００によれば、指示物体の取り扱い手順をリアルタイムで検証し、取り扱い手順に異常が発生したか否かを判定することができる。これにより、例えば工場等の作業場では、作業員の手違い等を早期に検出することができるため、作業の正確性及び効率を向上させることができる。 In this way, according to the object handling sequence verification process 1300 described above, the handling procedure for the pointed-to object can be verified in real time, and it can be determined whether or not an abnormality has occurred in the handling procedure. This allows for early detection of worker mistakes in workplaces such as factories, thereby improving the accuracy and efficiency of work.

次に、図１４を参照して、本開示の実施例２に係る環境認識部について説明する。 Next, the environment recognition unit according to the second embodiment of the present disclosure will be described with reference to FIG. 14.

図１４は、本開示の実施例２に係る環境認識部１０３２の構成の一例を示す図である。上述したように、環境認識部１０３２は、ユーザや指示物体の周辺環境に関する環境情報を取得するための機能部である。
また、図１４に示すように、環境認識部１０３２は、通信ネットワーク２３４を介して、ユーザ端末２５０と接続されてもよい。 14 is a diagram illustrating an example of the configuration of the environment recognition unit 1032 according to the second embodiment of the present disclosure. As described above, the environment recognition unit 1032 is a functional unit for acquiring environment information related to the surrounding environment of the user and the pointing object.
As shown in FIG. 14 , the environment recognition unit 1032 may be connected to a user terminal 250 via a communication network 234 .

図１４に示すように、環境認識部１０３２は、環境情報を受け付ける受付部１４１２と、環境情報を映像から判定する機械学習部１４１４と、環境情報データベース１４５０とを含む。
環境情報データベース１４５０は、ユーザや指示物体の周辺環境に関する環境情報１４５５を格納するためのデータベースである。図１４に示すように、この環境情報１４５５は、例えば、周辺環境の照明条件、撮影部の位置、ユーザと指示物体との距離等、周辺環境に関する任意の情報を含んでもよい。 As shown in FIG. 14 , the environment recognition unit 1032 includes a reception unit 1412 that receives environment information, a machine learning unit 1414 that determines the environment information from video, and an environment information database 1450 .
The environment information database 1450 is a database for storing environment information 1455 related to the surrounding environment of the user and the pointing object. As shown in Fig. 14, the environment information 1455 may include any information related to the surrounding environment, such as the lighting conditions of the surrounding environment, the position of the image capturing unit, and the distance between the user and the pointing object.

この環境情報１４５５は、ユーザによって直接に入力されてもよく、撮影部によって取得される入力映像を機械学習部１４１４によって解析することで判定されてもよい。
例えば、環境情報１４５５は、ユーザによって直接に入力される場合、受付部１４１２は、ユーザ端末２５０のユーザインタフェース１４１０を介して入力される環境情報１４５５を、通信ネットワーク２３４を介して受け付けた後、受信した環境情報１４５５を環境情報データベース１４５０に格納してもよい。ここでのユーザインタフェース１４１０は、例えばユーザ端末から利用可能なウエブページやアプリであってもよい。
一方、環境情報１４５５は、機械学習部１４１４によって判定される場合、機械学習部１４１４は、撮影部（図１４に図示せず）によって取得される入力映像を、所定のオブジェクト検出や画像処理を実行するように学習された機械学習手段によって解析することで、環境情報１４５５を判定する。その後、機械学習部１４１４は、判定した環境情報１４５５を環境情報データベース１４５０に格納してもよい。 This environmental information 1455 may be input directly by the user, or may be determined by the machine learning unit 1414 analyzing the input video captured by the imaging unit.
For example, when the environmental information 1455 is directly input by a user, the reception unit 1412 may receive the environmental information 1455 input via the user interface 1410 of the user terminal 250 via the communication network 234, and then store the received environmental information 1455 in the environmental information database 1450. The user interface 1410 here may be, for example, a web page or an application available from the user terminal.
On the other hand, when the environmental information 1455 is determined by the machine learning unit 1414, the machine learning unit 1414 analyzes an input video acquired by a shooting unit (not shown in FIG. 14 ) using a machine learning means that has been trained to execute predetermined object detection and image processing, thereby determining the environmental information 1455. Thereafter, the machine learning unit 1414 may store the determined environmental information 1455 in the environmental information database 1450.

環境情報１４５５が取得され、環境情報データベース１４５０に格納された後、上述したオブジェクト検出部、画像加工部、ジェスチャー判定部、指示方向判定部、及び／又は指示物体特定部は、この環境情報１４５５を適宜に利用してもよい。これにより、オブジェクト検出部、画像加工部、ジェスチャー判定部、指示方向判定部、及び指示物体特定部によって実行される各種処理の精度を向上させることが可能となる。 After the environmental information 1455 is acquired and stored in the environmental information database 1450, the object detection unit, image processing unit, gesture determination unit, pointing direction determination unit, and/or pointing object identification unit described above may use this environmental information 1455 as appropriate. This makes it possible to improve the accuracy of the various processes executed by the object detection unit, image processing unit, gesture determination unit, pointing direction determination unit, and pointing object identification unit.

次に、図１５を参照して、本開示の実施例２に係るユーザ情報取得部について説明する。 Next, the user information acquisition unit according to the second embodiment of the present disclosure will be described with reference to FIG. 15.

図１５は、本開示の実施例２に係るユーザ情報取得部１０３４の構成の一例を示す図である。ユーザ情報取得部１０３４は、ユーザに関するユーザ情報を取得するための機能部である。
また、図１５に示すように、ユーザ情報取得部１０３４は、通信ネットワーク２３４を介して、ユーザ端末２５０と接続されてもよい。 15 is a diagram illustrating an example of the configuration of the user information acquisition unit 1034 according to the second embodiment of the present disclosure. The user information acquisition unit 1034 is a functional unit for acquiring user information related to a user.
As shown in FIG. 15 , the user information acquisition unit 1034 may be connected to a user terminal 250 via a communication network 234 .

図１５に示すように、ユーザ情報取得部１０３４は、ユーザ情報を受け付ける受付部１５１２と、ユーザ情報を映像から判定する機械学習部１５１４と、ユーザ情報データベース１５５０とを含む。
ユーザ情報データベース１５５０は、ユーザに関するユーザ情報１５５５を格納するためのデータベースである。図１５に示すように、このユーザ情報１５５５は、例えば、ユーザを一意に識別するための識別子、氏名、身長、肌色、担当タスク、現在のタスク等、ユーザに関する任意の情報を含んでもよい。 As shown in FIG. 15 , the user information acquisition unit 1034 includes a reception unit 1512 that receives user information, a machine learning unit 1514 that determines the user information from video, and a user information database 1550.
The user information database 1550 is a database for storing user information 1555 relating to users. As shown in Fig. 15, the user information 1555 may include any information relating to the user, such as an identifier for uniquely identifying the user, the name, height, skin color, assigned tasks, and current task.

このユーザ情報１５５５は、ユーザによって直接に入力されてもよく、撮影部によって取得される入力映像を機械学習部１５１４によって解析することで判定されてもよい。
例えば、ユーザ情報１５５５は、ユーザによって直接に入力される場合、受付部１５１２は、ユーザ端末２５０のユーザインタフェース１５１０を介して入力されるユーザ情報１５５５を、通信ネットワーク２３４を介して受け付けた後、受信したユーザ情報１５５５をユーザ情報データベース１５５０に格納してもよい。ここでのユーザインタフェース１５１０は、例えばユーザ端末から利用可能なウエブページやアプリであってもよい。
一方、ユーザ情報１５５５は、機械学習部１５１４によって判定される場合、機械学習部１５１４は、撮影部（図１４に図示せず）によって取得される入力映像を、所定のオブジェクト検出や画像処理を実行するように学習された機械学習手段によって解析することで、ユーザ情報１５５５を判定する。その後、機械学習部１５１４は、判定したユーザ情報１５５５をユーザ情報データベース１５５０に格納してもよい。 This user information 1555 may be input directly by the user, or may be determined by the machine learning unit 1514 analyzing the input video acquired by the imaging unit.
For example, when the user information 1555 is directly input by the user, the reception unit 1512 may receive the user information 1555 input via the user interface 1510 of the user terminal 250 via the communication network 234, and then store the received user information 1555 in the user information database 1550. The user interface 1510 here may be, for example, a web page or an application available from the user terminal.
On the other hand, when the user information 1555 is determined by the machine learning unit 1514, the machine learning unit 1514 analyzes an input video acquired by a shooting unit (not shown in FIG. 14 ) using a machine learning means that has been trained to perform predetermined object detection and image processing, thereby determining the user information 1555. Thereafter, the machine learning unit 1514 may store the determined user information 1555 in the user information database 1550.

ユーザ情報１５５５が取得され、ユーザ情報データベース１５５０に格納された後、上述したオブジェクト検出部、画像加工部、ジェスチャー判定部、指示方向判定部、及び指示物体特定部は、このユーザ情報１５５５を適宜に利用してもよい。これにより、オブジェクト検出部、画像加工部、ジェスチャー判定部、指示方向判定部、及び指示物体特定部によって実行される各種処理の精度を向上させることが可能となる。 After the user information 1555 is acquired and stored in the user information database 1550, the object detection unit, image processing unit, gesture determination unit, pointing direction determination unit, and pointing object identification unit described above may use this user information 1555 as appropriate. This makes it possible to improve the accuracy of the various processes executed by the object detection unit, image processing unit, gesture determination unit, pointing direction determination unit, and pointing object identification unit.

次に、図１６を参照して、本開示の実施例２に係る更新部について説明する。 Next, the update unit according to the second embodiment of the present disclosure will be described with reference to FIG. 16.

図１６は、本開示の実施例２に係る更新部１０３６の構成の一例を示す図である。上述したように、更新部１０３６は、環境認識部やユーザ情報取得部によって取得された環境情報及びユーザ情報に基づいて、指示物体検出装置の各機能部のパラメータを更新するための機能部である。 FIG. 16 is a diagram illustrating an example of the configuration of the update unit 1036 according to the second embodiment of the present disclosure. As described above, the update unit 1036 is a functional unit for updating the parameters of each functional unit of the pointed object detection device based on the environmental information and user information acquired by the environmental recognition unit and the user information acquisition unit.

図１６に示すように、更新部１０３６は、上述した環境情報データベース１４５０及びユーザ情報データベース１５５０に接続されており、環境情報１４５５及びユーザ情報１５５５をアクセスできるように構成されている。 As shown in FIG. 16, the update unit 1036 is connected to the above-mentioned environmental information database 1450 and user information database 1550, and is configured to be able to access environmental information 1455 and user information 1555.

更新部１０３６は、環境情報データベース１４５０から環境情報１４５５を取得し、ユーザ情報データベース１５５０からユーザ情報１５５５を取得した後、取得した環境情報１４５５及びユーザ情報１５５５に基づいて指示物体検出装置の各機能部のパラメータを更新する。更新部１０３６は、環境情報１４５５及びユーザ情報１５５５に基づいた各機能部の更新を定期的（１分毎、５分毎、１時間毎）に行ってもよく、新たな情報が環境情報データベース１４５０又はユーザ情報データベース１５５０に追加される度に更新を行ってもよい。 The update unit 1036 acquires environmental information 1455 from the environmental information database 1450, acquires user information 1555 from the user information database 1550, and then updates the parameters of each functional unit of the pointed object detection device based on the acquired environmental information 1455 and user information 1555. The update unit 1036 may update each functional unit based on the environmental information 1455 and user information 1555 periodically (every minute, every five minutes, or every hour), or may update each time new information is added to the environmental information database 1450 or the user information database 1550.

また、環境情報１４５５及びユーザ情報１５５５に基づいた更新は、各機能部の処理の精度及び効率を向上させるための処理であり、この更新で調整されるパラメータは、各機能部の機能や性能に応じて適宜に決定されてもよい。
一例として、環境情報１４５５は、ユーザ及び指示物体の周辺環境が暗いと示す場合、更新部１０３６は、撮影部のパラメータを、暗い環境に適したパラメータに変更してもよい。これにより、より鮮明な入力映像を取得することが可能となり、オブジェクト検出部によるオブジェクト検出の精度を向上させることができる。
また、もう一例として、更新部１０３６は、ユーザ情報１５５５に記録されているユーザの肌色に基づいて、画像加工部のパラメータを、ユーザの肌色に適したパラメータに変更してもよい。これにより、画像加工部による手画像の抽出の精度を向上させることができる。 In addition, the update based on the environmental information 1455 and the user information 1555 is a process for improving the accuracy and efficiency of the processing of each functional unit, and the parameters adjusted in this update may be determined appropriately according to the function and performance of each functional unit.
For example, when the environmental information 1455 indicates that the surrounding environment of the user and the pointing object is dark, the update unit 1036 may change the parameters of the image capturing unit to parameters suitable for a dark environment. This makes it possible to obtain a clearer input image, thereby improving the accuracy of object detection by the object detection unit.
As another example, the update unit 1036 may change the parameters of the image processing unit to parameters suitable for the skin color of the user, based on the skin color of the user recorded in the user information 1555. This can improve the accuracy of extraction of the hand image by the image processing unit.

以上説明したように構成した更新部１０３６によれば、ユーザの情報や環境の情報を用いて指示物体検出装置の各機能部のパラメータを更新することで、各機能部による処理の精度及び効率を向上させることが可能となる。 The update unit 1036 configured as described above can update the parameters of each functional unit of the pointed object detection device using user information and environmental information, thereby improving the accuracy and efficiency of processing by each functional unit.

次に、図１７を参照して、本開示の実施例３に係る指示物体検出システムの構成について説明する。 Next, the configuration of a pointing object detection system according to Example 3 of the present disclosure will be described with reference to FIG. 17.

図１７は、本開示の実施例３に係る指示物体検出システム１７００の構成を示す図である。本開示の実施例３に係る指示物体検出システム１７００の指示物体検出装置１７１０は、作業管理部１７４０を含む構成となっている点において、図１０に示す、本開示の実施例２に係る指示物体検出システム１０００の指示物体検出装置１０１０と異なる。作業管理部１７４０を含む点を除いて、実施例３に係る指示物体検出システム１７００の構成は、実施例２に係る指示物体検出システム１０００と実質的に同様であるため、同一部分には同一の符号を付して示している。また、説明の便宜上、以下では、上述した部分の説明を省略し、指示物体検出システム１７００と指示物体検出システム１０００との相違点を中心に説明する。 FIG. 17 is a diagram showing the configuration of a pointed object detection system 1700 according to a third embodiment of the present disclosure. A pointed object detection device 1710 of the pointed object detection system 1700 according to the third embodiment of the present disclosure differs from the pointed object detection device 1010 of the pointed object detection system 1000 according to the second embodiment of the present disclosure shown in FIG. 10 in that the pointed object detection device 1710 includes an operation management unit 1740. Except for the inclusion of the operation management unit 1740, the configuration of the pointed object detection system 1700 according to the third embodiment is substantially similar to that of the pointed object detection system 1000 according to the second embodiment, and therefore the same parts are denoted by the same reference numerals. For convenience of explanation, the following description will omit the above-mentioned parts and focus on the differences between the pointed object detection system 1700 and the pointed object detection system 1000.

作業管理部１７４０は、ユーザ等の作業員が行う作業を管理するための機能部である。例えば、作業管理部１７４０は、作業の担当者への割り当てや作業の分配を、例えば物体管理部１０３０によって取得されるユーザ物体シーケンス情報に基づいて管理してもよい。一例として、作業管理部１７４０は、ユーザ物体シーケンス情報に基づいて、各作業者の作業の進捗状況を判定し、この進捗状況に応じて作業の割り当てを行ってもよい。また、もう一例として、作業管理部１７４０は、ユーザ物体シーケンス情報に基づいて、各作業者の作業毎の効率を判定し、この作業毎の効率に応じて作業の割り当てを行ってもよい。
なお、作業管理部１７４０による処理の詳細については後述するため、ここではその説明を省略する。 The work management unit 1740 is a functional unit for managing work performed by workers such as users. For example, the work management unit 1740 may manage the allocation of work to workers and the distribution of work based on, for example, user object sequence information acquired by the object management unit 1030. As one example, the work management unit 1740 may determine the progress of work of each worker based on the user object sequence information, and allocate work according to this progress. As another example, the work management unit 1740 may determine the efficiency of each work of each worker based on the user object sequence information, and allocate work according to this efficiency of each work.
The details of the processing performed by the operation management unit 1740 will be described later, and therefore will not be described here.

以上説明した本開示の実施例３に係る指示物体検出システム１７００によれば、ユーザの指差し動作の対象となる指示物体２４０を検出できると共に、ユーザ物体シーケンス情報によって判定される各作業者の作業状況に応じて作業の分配を管理することが可能となる。 The pointed object detection system 1700 according to the third embodiment of the present disclosure described above can detect the pointed object 240 that is the target of the user's pointing action, and can manage the distribution of work according to the work status of each worker determined by the user object sequence information.

次に、図１８を参照して、本開示の実施例３に係る作業管理部による作業分配処理について説明する。 Next, the work distribution process performed by the work management unit according to the third embodiment of the present disclosure will be described with reference to FIG.

図１８は、本開示の実施例３に係る作業管理部による作業分配処理１８００の流れを示す図である。図１８に示す作業分配処理１８００は、作業をユーザに分配するための処理であり、作業管理部１７４０によって実施される。 Figure 18 is a diagram showing the flow of work distribution processing 1800 by the work management unit according to the third embodiment of the present disclosure. The work distribution processing 1800 shown in Figure 18 is a process for distributing work to users, and is performed by the work management unit 1740.

まず、ステップＳ１８１０では、作業管理部１７４０は、ユーザ物体シーケンス情報を、上述した物体シーケンス情報データベース（例えば、図１１に示す物体シーケンス情報データベース１１２０）から取得し、取得したユーザ物体シーケンス情報に基づいて、対象の作業の進捗状況を判定する。上述したように、ユーザ物体シーケンス情報には、対象の作業のタスク毎に、特定のユーザによって指示された指示物体の情報が、指示された順番で示されている。このため、ユーザ物体シーケンス情報を、当該作業の正しい動作の順番を示す指定物体シーケンス情報に比較することで、現在進行中の作業の進捗状況を判定することが可能である。
例えば、作業管理部１７４０は、物体シーケンス情報データベースから取得したユーザ物体シーケンス情報と、指定物体シーケンス情報とを比較することで、現在進行中の作業の進捗状況を示す情報として、当該作業の完了率（３０％、５４％、７０％等）を判定してもよい。
なお、このステップＳ１８１０では、作業管理部１７４０は、作業の進捗状況を判定するためのユーザ物体シーケンス情報を少なくとも取得するが、本開示はこれに限定されず、作業管理部１７４０は、ユーザ物体シーケンス情報に加えて、上述したユーザ情報や環境情報等、他の情報を取得してもよい。後述するように、ユーザ物体シーケンス情報に加えて、取得したユーザ情報や環境情報は作業を分配する際に用いられてもよい。 First, in step S1810, the work management unit 1740 acquires user object sequence information from the above-mentioned object sequence information database (for example, the object sequence information database 1120 shown in FIG. 11 ), and determines the progress of the target work based on the acquired user object sequence information. As described above, the user object sequence information indicates information on the pointing objects designated by a specific user for each task of the target work in the order in which they were designated. Therefore, it is possible to determine the progress of the work currently in progress by comparing the user object sequence information with the designated object sequence information indicating the correct order of actions for the work.
For example, the work management unit 1740 may compare the user object sequence information obtained from the object sequence information database with the specified object sequence information to determine the completion rate of the work (30%, 54%, 70%, etc.) as information indicating the progress of the work currently in progress.
In step S1810, the work management unit 1740 acquires at least user object sequence information for determining the progress of the work, but the present disclosure is not limited to this, and the work management unit 1740 may acquire other information such as the above-mentioned user information and environmental information in addition to the user object sequence information. As described later, in addition to the user object sequence information, the acquired user information and environmental information may be used when distributing the work.

次に、ステップＳ１８２０では、作業管理部１７４０は、ステップＳ１８１０で判定した作業の進捗状況と、作業管理部１７４０に格納されている作業分配ルール１８２５とに基づいて、作業の分配を行ってもよい。より具体的には、ここで、作業管理部１７４０は、作業の進捗状況と、作業分配ルール１８２５とに基づいて、新たに発生した新規の作業を特定のユーザ（第１のユーザ）に割り当ててもよく、特定のユーザ（第１のユーザ）に既に割り当てられた作業を別のユーザ（第２のユーザ）に再分配してもよい。
作業分配ルール１８２５は、作業の分配方法を規定するルールである。言い換えれば、作業分配ルール１８２５は、どのような作業を、どのようなタイミングで、どのようなユーザに割り当てるべきかを指定する情報である。例えば、作業分配ルール１８２５は、「ユーザの現在の作業の完了率が７０％以上の場合、新規の作業を割り当てる」ルールや、「ユーザの現在の作業の完了率が２０％以下の場合、新規の作業を割り当てない」ルール等を含んでもよい。
また、作業分配ルール１８２５は、現在の作業の進捗状況以外にも、上述したユーザ情報及び環境情報を考慮するルールを含んでもよい。例えば、作業分配ルール１８２５は、ユーザ情報から判定したユーザの資格の有無に基づいて作業の分配を決定してもよく、環境の証明条件等に基づいて作業の分配を決定してもよい。 Next, in step S1820, the work management unit 1740 may distribute the work based on the progress of the work determined in step S1810 and the work distribution rules 1825 stored in the work management unit 1740. More specifically, here, the work management unit 1740 may assign a newly generated new work to a specific user (first user) based on the progress of the work and the work distribution rules 1825, and may redistribute a work that has already been assigned to a specific user (first user) to another user (second user).
The work distribution rules 1825 are rules that stipulate a method of distributing work. In other words, the work distribution rules 1825 are information that specifies what work should be assigned, at what timing, and to what user. For example, the work distribution rules 1825 may include a rule that "if the completion rate of the user's current work is 70% or more, assign a new work" or a rule that "if the completion rate of the user's current work is 20% or less, do not assign a new work".
Furthermore, the work distribution rules 1825 may include rules that consider the above-mentioned user information and environmental information in addition to the current progress of the work. For example, the work distribution rules 1825 may determine the distribution of work based on the presence or absence of a user's qualifications determined from the user information, or may determine the distribution of work based on the certification conditions of the environment, etc.

次に、ステップＳ１８３０では、作業管理部１７４０は、作業の分配を行った後、ユーザ情報データベース（例えば、図１５に示すユーザ情報データベース１５５０）や、物体シーケンス情報データベース（例えば、図１１及び図１２に示す物体シーケンス情報データベース）を更新してもよい。例えば、作業管理部１７４０は、ユーザ情報データベース及び物体シーケンス情報データベースにおいて、ステップＳ１８２０で行った作業の分配に合わせて、各ユーザが担当する作業の内容を更新してもよい。 Next, in step S1830, the work management unit 1740 may update the user information database (e.g., the user information database 1550 shown in FIG. 15) and the object sequence information database (e.g., the object sequence information database shown in FIG. 11 and FIG. 12) after distributing the work. For example, the work management unit 1740 may update the content of the work for which each user is responsible in the user information database and the object sequence information database in accordance with the distribution of work performed in step S1820.

以上説明した作業分配処理１８００によれば、本開示の実施例に係る指示物体検出方法によって取得された情報（ユーザ情報、環境情報、ユーザ物体シーケンス情報）を用いて、作業を効率よくユーザに分配することが可能となる。 According to the task distribution process 1800 described above, tasks can be efficiently distributed to users using information (user information, environmental information, user object sequence information) acquired by the pointed object detection method according to the embodiment of the present disclosure.

以上、本発明の実施の形態について説明したが、本発明は、上述した実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能である。
物体管理部や環境認識部の構成や、管理される項目についても、上述した内容に限定されるものではなく様々な変更が可能である。 Although the embodiment of the present invention has been described above, the present invention is not limited to the above-described embodiment, and various modifications are possible without departing from the gist of the present invention.
The configurations of the object management unit and the environment recognition unit, and the items to be managed are not limited to those described above, and various modifications are possible.

２００、１０００、１７００指示物体検出システム
２１０、１０１０、１７１０指示物体検出装置
２１２オブジェクト検出部
２１４画像加工部
２１８ジェスチャー判定部
２２０指示方向判定部
２２２指示物体特定部
２２４画像入力部
２２６プロセッサ
２２８記憶部
２３４通信ネットワーク
２３５ユーザ
２４０指示物体
２５０ユーザ端末
１０３０物体管理部
１０３２環境認識部
１０３６更新部 200, 1000, 1700 Pointing object detection system 210, 1010, 1710 Pointing object detection device 212 Object detection unit 214 Image processing unit 218 Gesture determination unit 220 Pointing direction determination unit 222 Pointing object identification unit 224 Image input unit 226 Processor 228 Storage unit 234 Communication network 235 User 240 Pointing object 250 User terminal 1030 Object management unit 1032 Environment recognition unit 1036 Update unit

Claims

A pointing object detection device that detects a pointing object that is a target of a pointing action,
an image input unit that acquires an input image including a user's hand and at least one pointing object candidate;
an object detection unit that analyzes the input video and detects the hand and the pointing object candidate;
a gesture determination unit for determining a gesture made by the hand;
If the gesture by the hand is determined to be a pointing gesture,
Identifying a centroid of the hand;
Identifying a region that satisfies a first distance criterion from the centroid and also satisfies a predetermined curvature criterion as the fingertip of the index finger used in the pointing motion;
a pointing direction determination unit that determines a pointing direction of the pointing motion based on the identified distribution of pixels belonging to the fingertip;
a pointing object identification unit that identifies an object pointed to by the pointing motion in the input video from among the pointing object candidates based on the pointing direction;
Including,
The instruction direction determination unit
In a hand image extracted from the input video and showing only the hand,
determining a region of interest centered on the fingertip based on a distance between the fingertip and the centroid;
Identifying pixels in the target region that satisfy a predetermined luminance standard as finger pixels belonging to the index finger;
Calculating the average pixel coordinates of the finger pixels;
determining a direction of an imaginary line passing through the average pixel coordinates and the fingertip as the pointing direction;
A pointing object detection device comprising:

The instruction direction determination unit
In a hand image extracted from the input video and showing only the hand,
setting a first imaginary line passing through the fingertip and the centroid;
setting a second imaginary line that is perpendicular to the first imaginary line and satisfies a second distance criterion with respect to the fingertip;
setting a third imaginary line that is perpendicular to the first imaginary line, parallel to the second imaginary line, and satisfies a third distance criterion with respect to the fingertip;
determining a direction of a fourth imaginary line passing through a center point of the second imaginary line and a center point of the third imaginary line as the indicated direction;
2. The pointing object detection device according to claim 1, wherein the pointing object detection device is a pointing object detection device.

The instruction direction determination unit
In a hand image extracted from the input video and showing only the hand,
determining a region of interest centered on the fingertip based on a distance between the fingertip and the centroid;
Identifying pixels in the target region that satisfy a predetermined luminance standard as finger pixels belonging to the index finger;
determining the pointing direction based on the finger pixels using a least squares method;
2. The pointing object detection device according to claim 1, wherein the pointing object detection device is a pointing object detection device.

The instruction direction determination unit
In a hand image extracted from the input video and showing only the hand,
determining a region of interest centered on the fingertip based on a distance between the fingertip and the centroid;
Identifying pixels in the target region that satisfy a predetermined luminance standard as finger pixels belonging to the index finger;
determining the pointing direction based on the finger pixels using principal component analysis;
2. The pointing object detection device according to claim 1, wherein the pointing object detection device is a pointing object detection device.

The pointing object detection device is
Further comprising an object management unit for managing a handling procedure of the pointing object,
The object management unit
creating and storing user object sequence information indicating an order in which the plurality of pointing objects are pointed to by the user's pointing action;
2. The pointing object detection device according to claim 1, wherein the pointing object detection device is a pointing object detection device.

The object management unit
storing designated object sequence information that designates an order in which the plurality of designated objects are to be handled;
verifying the user object sequence information by comparing the user object sequence information with the designated object sequence information;
If the user object sequence information does not satisfy a predetermined similarity criterion with respect to the designated object sequence information,
determining that an abnormality has occurred in the handling procedure of the pointing object, and outputting an abnormality notification indicating the abnormality;
6. The pointing object detection device according to claim 5 ,

The pointing object detection device is
A user information acquisition unit that acquires user information related to the user;
an environment recognition unit that acquires environment information related to a surrounding environment of the user;
an update unit that updates at least one parameter of the image input unit, the object detection unit, the gesture determination unit, the pointing direction determination unit, the pointing object identification unit, and the object management unit based on the user information or the environmental information;
The pointing object detection device according to claim 5 , further comprising:

The pointing object detection device is
a task management unit for managing tasks performed by the user;
The work management unit includes:
determining a progress status of a first task assigned to a first user based on the user object sequence information;
determining a second user who will be in charge of a second task based on the determined progress status of the first task and a task distribution rule that specifies a task distribution method, and distributing the second task to the second user;
7. The pointing object detection device according to claim 6 ,

A pointing object detection method for detecting a pointing object that is a target of a pointing action, comprising:
acquiring an input video including a user's hand and at least one potential pointing object;
analyzing the input video to detect the hand and the pointing object candidates;
extracting a hand image showing only the hand from the input video;
analyzing the hand image to determine a gesture made by the hand;
determining a centroid of the hand when the gesture by the hand is determined to be a pointing motion, and determining, as the fingertip of the index finger, an area that satisfies a first distance criterion from the centroid and also satisfies a predetermined curvature criterion;
determining a region of interest centered on the fingertip based on a distance between the fingertip and the centroid;
identifying pixels in the target region that satisfy a predetermined luminance standard as finger pixels belonging to the finger used in the pointing motion;
calculating average pixel coordinates of the finger pixels;
determining a direction of an imaginary line passing through the average pixel coordinates and the fingertip as a direction of the pointing motion;
identifying an object pointed to by the pointing motion in the input video from among the pointing object candidates based on the pointing direction;
A pointing object detection method comprising:

A pointing object detection system for detecting a pointing object that is a target of a pointing action, comprising:
The pointing object detection system includes:
A pointing object detection device;
a user terminal;
the pointing object detection device and the user terminal are connected via a communication network;
The pointing object detection device is
an image input unit that acquires an input video showing a user's hand and at least one pointing object candidate;
an object detection unit that analyzes the input video and detects the hand and the pointing object candidate;
a gesture determination unit for determining a gesture made by the hand;
If the gesture by the hand is determined to be a pointing gesture,
identifying a centroid of the hand, and identifying a region that satisfies a first distance criterion from the centroid and also satisfies a predetermined curvature criterion as the fingertip of the index finger used in the pointing motion;
a pointing direction determination unit that determines a pointing direction of the pointing motion based on the identified distribution of pixels belonging to the fingertip;
a pointing object identification unit that identifies an object pointed to by the pointing motion in the input video from among the pointing object candidates based on the pointing direction;
an object management unit that creates user object sequence information indicating the order in which the multiple pointing objects are pointed to by the user's pointing action based on the identified pointing object, verifies the user object sequence information by comparing the user object sequence information with designated object sequence information that specifies the order in which the multiple pointing objects are handled, and outputs an abnormality notification to the user terminal when the user object sequence information does not satisfy a predetermined similarity standard with respect to the designated object sequence information,
The instruction direction determination unit
In a hand image extracted from the input video and showing only the hand,
determining a region of interest centered on the fingertip based on a distance between the fingertip and the centroid;
Identifying pixels in the target region that satisfy a predetermined luminance standard as finger pixels belonging to the index finger;
Calculating the average pixel coordinates of the finger pixels;
determining a direction of an imaginary line passing through the average pixel coordinates and the fingertip as the pointing direction;
A pointing object detection system comprising: