JP2021009552A

JP2021009552A - Information processing apparatus, information processing method, and program

Info

Publication number: JP2021009552A
Application number: JP2019122974A
Authority: JP
Inventors: 真暉近藤; Masaki Kondo
Original assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2019-07-01
Filing date: 2019-07-01
Publication date: 2021-01-28

Abstract

To provide an information processing apparatus, an information processing method, and a program capable of correctly selecting an object intended by a user even when an object-in-object is included in a plurality of objects existing in a field of view of the user.SOLUTION: An information processing apparatus 100 comprises a detection unit 12, a calculation unit 17, and a selection unit 18. The detection unit detects a designated area designated by an operation body that moves within a field of view of a user according to a motion of the user. The calculation unit calculates the degree of overlapping with the designated area for an object that overlaps with the designated area in the field of view of the user, among a plurality of objects that exist in the field of view of the user and include one or more objects contained within another object. The selection unit selects a target object from the plurality of objects based on the degree of overlapping and the size of the object.SELECTED DRAWING: Figure 4

Description

本発明の実施形態は、情報処理装置、情報処理方法およびプログラムに関する。 Embodiments of the present invention relate to information processing devices, information processing methods and programs.

近年、仮想現実（ＶＲ：Virtual Reality）や拡張現実（ＡＲ：Augmented Reality）をユーザに体感させるＨＭＤ（Head Mounted Display）などの情報処理装置が普及している。この種の情報処理装置では、ユーザの視野内に存在する複数のオブジェクトの中から所望のオブジェクトが選択されると、選択されたオブジェクトに基づいて所定の処理が実行される。オブジェクトを選択するためのユーザインタフェースとしては、ユーザの視野内に仮想的なポインタを配置し、ユーザがこのポインタを動かすことでオブジェクトを選択できるようにするものが一般的である。 In recent years, information processing devices such as HMDs (Head Mounted Display) that allow users to experience virtual reality (VR) and augmented reality (AR) have become widespread. In this type of information processing device, when a desired object is selected from a plurality of objects existing in the user's field of view, a predetermined process is executed based on the selected object. As a user interface for selecting an object, a virtual pointer is generally placed in the user's field of view so that the user can select the object by moving the pointer.

しかし、このような従来の方法では、ユーザの視野内に存在する複数のオブジェクトのうち、ユーザがどのオブジェクトを選択しようとしているのか一意に特定できない場合がある。例えば、複数のオブジェクトの中に、大きなオブジェクトに内包される小さなオブジェクト（以下、これを「オブジェクト内オブジェクト」と呼ぶ）が含まれている場合、仮想的なポインタの位置が大きなオブジェクトとその中のオブジェクト内オブジェクトの双方と重なり、ユーザが大きなオブジェクトを選択しようとしているのか、あるいはオブジェクト内オブジェクトを選択しようとしているのかを一意に特定できない場合があるため、改善が求められる。 However, with such a conventional method, it may not be possible to uniquely identify which object the user is trying to select from among the plurality of objects existing in the user's field of view. For example, if multiple objects include a small object contained in a large object (hereinafter referred to as "in-object object"), the object with a large virtual pointer position and the object within it. It overlaps with both in-object objects, and it may not be possible to uniquely identify whether the user is trying to select a large object or an in-object object, so improvement is required.

特開２００２−４２１７２号公報JP-A-2002-42172 特開２０１５−２０３８８９号公報JP-A-2015-203889 特開２０１１−２０３８２３号公報Japanese Unexamined Patent Publication No. 2011-203823

本発明が解決しようとする課題は、ユーザの視野内に存在する複数のオブジェクトにオブジェクト内オブジェクトが含まれる場合であっても、ユーザの意図したオブジェクトを正しく選択することができる情報処理装置、情報処理方法およびプログラムを提供することである。 The problem to be solved by the present invention is an information processing device and information that can correctly select an object intended by the user even when a plurality of objects existing in the user's field of view include an object in the object. It is to provide a processing method and a program.

実施形態の情報処理装置は、検出部と、算出部と、選択部と、を備える。検出部は、ユーザの動作に応じて前記ユーザの視野内で動く操作体により指定される指定領域を検出する。算出部は、前記ユーザの視野内に存在する複数のオブジェクトであって、他のオブジェクトに内包される一つ以上のオブジェクトを含む前記複数のオブジェクトのうち、前記ユーザの視野内で前記指定領域と重なるオブジェクトについて、前記指定領域との重畳度を算出する。選択部は、前記重畳度とオブジェクトの大きさとに基づいて、前記複数のオブジェクトの中からターゲットとなるオブジェクトを選択する。 The information processing device of the embodiment includes a detection unit, a calculation unit, and a selection unit. The detection unit detects a designated area designated by the operating body that moves in the user's field of view according to the user's operation. The calculation unit is a plurality of objects existing in the user's field of view, and among the plurality of objects including one or more objects included in the other objects, the calculation unit and the designated area in the user's field of view. For the overlapping objects, the degree of superimposition with the designated area is calculated. The selection unit selects a target object from the plurality of objects based on the degree of superimposition and the size of the object.

図１は、オブジェクトを選択する従来の手法の課題を説明する図である。FIG. 1 is a diagram illustrating a problem of a conventional method of selecting an object. 図２は、第１実施形態に係る情報処理装置をユーザが装着した様子を示す図である。FIG. 2 is a diagram showing a state in which the user wears the information processing device according to the first embodiment. 図３は、第１実施形態に係る情報処理装置のハードウェア構成例を示すブロック図である。FIG. 3 is a block diagram showing a hardware configuration example of the information processing apparatus according to the first embodiment. 図４は、第１実施形態に係る情報処理装置の機能的な構成例を示すブロック図である。FIG. 4 is a block diagram showing a functional configuration example of the information processing apparatus according to the first embodiment. 図５−１は、指定領域の一例を説明する図である。FIG. 5-1 is a diagram illustrating an example of a designated area. 図５−２は、指定領域の一例を説明する図である。FIG. 5-2 is a diagram illustrating an example of a designated area. 図５−３は、指定領域の一例を説明する図である。FIG. 5-3 is a diagram illustrating an example of a designated area. 図５−４は、指定領域の一例を説明する図である。FIG. 5-4 is a diagram illustrating an example of a designated area. 図５−５は、指定領域の一例を説明する図である。FIG. 5-5 is a diagram illustrating an example of a designated area. 図５−６は、指定領域の一例を説明する図である。FIG. 5-6 is a diagram illustrating an example of a designated area. 図６は、画像補間部による処理の概要を説明する図である。FIG. 6 is a diagram illustrating an outline of processing by the image interpolation unit. 図７−１は、オブジェクト選択の具体例を説明する図である。FIG. 7-1 is a diagram illustrating a specific example of object selection. 図７−２は、オブジェクト選択の具体例を説明する図である。FIG. 7-2 is a diagram illustrating a specific example of object selection. 図７−３は、オブジェクト選択の具体例を説明する図である。FIG. 7-3 is a diagram illustrating a specific example of object selection. 図７−４は、オブジェクト選択の具体例を説明する図である。FIG. 7-4 is a diagram illustrating a specific example of object selection. 図７−５は、オブジェクト選択の具体例を説明する図である。FIG. 7-5 is a diagram illustrating a specific example of object selection. 図８は、選択されたオブジェクトをユーザに認識させる手法の一例を示す図である。FIG. 8 is a diagram showing an example of a method for causing the user to recognize the selected object. 図９は、選択されたオブジェクトをユーザに認識させる手法の一例を示す図である。FIG. 9 is a diagram showing an example of a method for causing the user to recognize the selected object. 図１０は、画像を拡大・縮小する処理を説明する図である。FIG. 10 is a diagram illustrating a process of enlarging / reducing an image. 図１１は、情報処理装置の動作概要を示すフローチャートである。FIG. 11 is a flowchart showing an outline of the operation of the information processing apparatus. 図１２は、第２実施形態に係る情報処理装置の機能的な構成例を示すブロック図である。FIG. 12 is a block diagram showing a functional configuration example of the information processing apparatus according to the second embodiment.

以下、実施形態の情報処理装置、情報処理方法およびプログラムについて、添付図面を参照して詳細に説明する。 Hereinafter, the information processing apparatus, the information processing method, and the program of the embodiment will be described in detail with reference to the accompanying drawings.

＜実施形態の概要＞
ＶＲ技術やＡＲ技術を用いたＨＭＤなどの情報処理装置において、ユーザの視野内に存在する複数のオブジェクトの中から所望のオブジェクトを選択するためのユーザインタフェースとしては、ユーザの視野内に配置された仮想的なポインタを例えばジェスチャなどを通じてユーザが動かすことで、所望のオブジェクトを選択できるようにするものが一般的である。しかし、このような仮想的なポインタを用いたオブジェクト選択の手法では、選択の候補となる複数のオブジェクトにオブジェクト内オブジェクトが含まれる場合、ユーザがどのオブジェクトを選択しようとしているのか一意に特定できない場合がある。 <Outline of Embodiment>
In an information processing device such as an HMD using VR technology or AR technology, a user interface for selecting a desired object from a plurality of objects existing in the user's field of view is arranged in the user's field of view. Generally, a virtual pointer is moved by a user through a gesture or the like so that a desired object can be selected. However, in such an object selection method using a virtual pointer, when multiple objects that are candidates for selection include objects within the object, it is not possible to uniquely identify which object the user is trying to select. There is.

例えば図１に示すように、ユーザの視野内にＰＣ（パーソナルコンピュータ）の画像またはＣＧ（コンピュータグラフィックス）パーツがあり、ＰＣ全体、ＰＣ内のキーボード、キーボード内の各キーが、それぞれ選択の候補となるオブジェクトとして認識されていたとする。ここで、ユーザがポインタＰを図１に示す位置に動かした場合、ポインタＰが指し示す位置は、ＰＣ全体とＰＣ内のキーボードとキーボード内のキーのそれぞれに重なるため、ユーザが選択しようとしているオブジェクトがＰＣ全体なのか、ＰＣ内のキーボードなのか、キーボード内のキーなのか、一意に特定することができない。 For example, as shown in FIG. 1, there is a PC (personal computer) image or CG (computer graphics) part in the user's field of view, and the entire PC, the keyboard in the PC, and each key in the keyboard are candidates for selection. It is assumed that it is recognized as an object. Here, when the user moves the pointer P to the position shown in FIG. 1, the position pointed to by the pointer P overlaps the entire PC, the keyboard in the PC, and the keys in the keyboard, so that the object the user is trying to select. It is not possible to uniquely identify whether is the entire PC, the keyboard in the PC, or the keys in the keyboard.

そこで、本実施形態では、ユーザの動作に応じてユーザの視野内で動く操作体により指定される指定領域との重畳度とオブジェクトの大きさを用いて、ユーザの視野内に含まれる複数のオブジェクトの中からユーザの意図したオブジェクトを正しく選択できるようにする新規な技術案を提案する。以下では、ＡＲ技術を用いたＨＭＤへの適用例を第１実施形態、ＶＲ技術を用いたＨＭＤへの適用例を第２実施形態として順に説明する。なお、以下の説明において、同様の機能を持つ構成要素については同一の符号を付して、重複した説明は適宜省略する。 Therefore, in the present embodiment, a plurality of objects included in the user's field of view are used by using the degree of superimposition with the designated area specified by the operating body that moves in the user's field of view according to the user's movement and the size of the object. We propose a new technical proposal that enables the user to correctly select the object intended by the user. In the following, an example of application to the HMD using the AR technology will be described as the first embodiment, and an example of application to the HMD using the VR technology will be described in order as the second embodiment. In the following description, components having the same function are designated by the same reference numerals, and duplicate description will be omitted as appropriate.

＜第１実施形態＞
第１実施形態に係る情報処理装置は、ＡＲ技術を用いたＨＭＤとして構成される。ＡＲ技術を用いたＨＭＤは、一般に「ＡＲグラス」と呼ばれる。図２は、第１実施形態に係る情報処理装置１００をユーザＵが装着した様子を示している。ＡＲグラスとして構成される情報処理装置１００は、一般的なメガネと同様に、ユーザＵが頭部に装着して使用することができる。 <First Embodiment>
The information processing device according to the first embodiment is configured as an HMD using AR technology. HMDs using AR technology are commonly referred to as "AR glasses". FIG. 2 shows a state in which the user U wears the information processing device 100 according to the first embodiment. The information processing device 100 configured as AR glasses can be worn by the user U on the head and used in the same manner as general glasses.

この情報処理装置１００は、ユーザＵの前方の実空間を撮像するカメラ１０４を搭載し、このカメラ１０４により撮像された実空間の画像（映像）が、メガネのレンズ部分に相当するユーザＵの目の前方に配置されたディスプレイ１０６に表示される。また、情報処理装置１００は、様々な情報を例えば仮想的なＣＧパーツとして生成し、このＣＧパーツをカメラ１０４により撮像された実空間の画像に重畳してディスプレイ１０６に表示させることができる。これにより、情報処理装置１００を装着したユーザＵは、実空間の中にＣＧパーツが入り込んだＡＲ空間を体感することができる。 The information processing device 100 is equipped with a camera 104 that captures a real space in front of the user U, and the image (video) of the real space captured by the camera 104 corresponds to the eye of the user U corresponding to the lens portion of the glasses. It is displayed on the display 106 arranged in front of the. Further, the information processing device 100 can generate various information as, for example, virtual CG parts, superimpose the CG parts on the real space image captured by the camera 104, and display the CG parts on the display 106. As a result, the user U equipped with the information processing device 100 can experience the AR space in which the CG parts are contained in the real space.

この情報処理装置１００では、カメラ１０４により撮像された実空間の画像に対して、例えば、下記の参考文献１に記載されているＳｅｇＮｅｔによるセマンティックセグメンテーションなどの処理が行われる。セマンティックセグメンテーションは、画像のピクセルごとにそのピクセルが表す物体などのラベルを付与することで、画像内の物体を識別する技術である。これにより、ユーザＵの前方の実空間に存在する物体（オブジェクト）が認識される。このオブジェクトの認識は比較的細かい粒度で実施され、例えば図１に例示したＰＣ内のキーボード、キーボード内のキーなどといったオブジェクト内オブジェクトも認識される。ここで認識されたオブジェクトは、情報処理装置１００を装着したユーザＵが情報処理装置１００に対して所定の処理を実行させる際に選択するオブジェクトの候補となる。
（参考文献１）Vijay Badrinarayanan et al．，“SegNet：A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation”，arXiv:1511.00561v3，[cs.CV]，10．Oct．2016． In the information processing apparatus 100, a process such as semantic segmentation by SegNet described in Reference 1 below is performed on the image in the real space captured by the camera 104. Semantic segmentation is a technique for identifying an object in an image by assigning a label such as an object represented by the pixel to each pixel of the image. As a result, an object (object) existing in the real space in front of the user U is recognized. The recognition of this object is carried out with a relatively fine particle size, and objects in the object such as the keyboard in the PC and the keys in the keyboard illustrated in FIG. 1 are also recognized. The object recognized here is a candidate for an object selected by the user U wearing the information processing device 100 when the information processing device 100 is made to execute a predetermined process.
(Reference 1) Vijay Badrinarayanan et al. , "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation", arXiv: 1511.00561v3, [cs.CV], 10. Oct. 2016.

また、カメラ１０４により撮像された実空間の画像に重畳して表示されるＣＧパーツも、オブジェクト内オブジェクトを含むように生成される場合もある。これらのＣＧパーツもオブジェクト選択の候補となる。つまり、情報処理装置１００を装着したユーザＵの視野内には、オブジェクト選択の候補として、カメラ１０４により撮像された実空間の画像に映り込む物体や情報処理装置１０が生成したＣＧパーツなどの複数のオブジェクトが存在し、これら複数のオブジェクトの中にはオブジェクト内オブジェクトが含まれている。 Further, the CG part displayed by superimposing the image in the real space captured by the camera 104 may also be generated so as to include the object in the object. These CG parts are also candidates for object selection. That is, in the field of view of the user U equipped with the information processing device 100, as candidates for object selection, a plurality of objects reflected in the image in the real space captured by the camera 104, CG parts generated by the information processing device 10, and the like. Objects exist, and among these multiple objects, there are objects within the object.

情報処理装置１００を装着したユーザＵの視野内に含まれる複数のオブジェクトの中から所望のオブジェクトが選択されると、このオブジェクトに基づく所定の処理が実行される。例えば、選択されたオブジェクトを含む検索クエリに従って、選択されたオブジェクトに類似するオブジェクトが映った画像を画像データベースから検索する類似画像検索などが、所定の処理の一例として挙げられる。この処理の結果は、カメラ１０４で撮像された実空間の画像に重畳してディスプレイ１０６に表示させることができる。 When a desired object is selected from a plurality of objects included in the field of view of the user U equipped with the information processing device 100, a predetermined process based on this object is executed. For example, a similar image search for searching an image database for an image showing an object similar to the selected object according to a search query including the selected object is given as an example of a predetermined process. The result of this processing can be superimposed on the real space image captured by the camera 104 and displayed on the display 106.

本実施形態の情報処理装置１００では、上述のように、情報処理装置１００を装着したユーザＵの視野内に含まれる複数のオブジェクトの中からユーザの意図したオブジェクトを正しく選択できるようにするために、オブジェクト選択の基準として、ユーザＵの動作に応じてユーザＵの視野内で動く操作体により指定される指定領域との重畳度とオブジェクトの大きさを用いる。特に本実施形態では、ユーザＵの動作に応じてユーザＵの視野内で動く操作体として、カメラ１０４により撮像された実空間の画像に映り込むユーザＵの手を想定する。すなわち、ユーザＵの視野内に含まれる複数のオブジェクトの中から、ユーザＵの手により指定される指定領域との重畳度とオブジェクトの大きさとに基づいて、ターゲットとなるオブジェクトを選択する。 In the information processing device 100 of the present embodiment, as described above, in order to enable the user to correctly select the object intended by the user from a plurality of objects included in the field of view of the user U who wears the information processing device 100. As a criterion for selecting an object, the degree of superimposition with a designated area specified by an operating body that moves in the field of view of the user U according to the operation of the user U and the size of the object are used. In particular, in the present embodiment, the hand of the user U reflected in the image in the real space captured by the camera 104 is assumed as the operating body that moves in the field of view of the user U according to the movement of the user U. That is, a target object is selected from a plurality of objects included in the field of view of the user U based on the degree of superimposition with the designated area specified by the user U and the size of the object.

図３は、第１実施形態に係る情報処理装置１００のハードウェア構成例を示すブロック図である。この情報処理装置１００は、例えば図３に示すように、コンピュータシステムを構成するＣＰＵ（Central Processing Unit）などのプロセッサ１０１と、ＲＡＭ（Random Access Memory）やＲＯＭ（Read Only Memory）などのメモリ１０２とを備える。また、情報処理装置１００は、記録媒体として例えばマイクロＳＤカードなどを用いたストレージデバイス１０３、ユーザＵの前方の実空間を撮像するカメラ１０４、ユーザＵが発話する音声コマンドが入力されるマイク１０５、カメラ１０４により撮像された実空間の画像やＣＧパーツなどを表示するディスプレイ１０６、サーバなどの外部装置と通信するための通信Ｉ／Ｆ１０７などを備え、これらが上述のコンピュータシステムと繋がっている。なお、情報処理装置１００は、図３に示す構成以外にも、ユーザＵの動きを検出するための各種センシングデバイスなどを備える、あるいは各種センシングデバイスなどと接続されていてもよい。 FIG. 3 is a block diagram showing a hardware configuration example of the information processing apparatus 100 according to the first embodiment. As shown in FIG. 3, the information processing apparatus 100 includes a processor 101 such as a CPU (Central Processing Unit) constituting a computer system and a memory 102 such as a RAM (Random Access Memory) and a ROM (Read Only Memory). To be equipped. Further, the information processing device 100 includes a storage device 103 using, for example, a micro SD card as a recording medium, a camera 104 that captures a real space in front of the user U, and a microphone 105 into which a voice command spoken by the user U is input. It includes a display 106 that displays real-space images and CG parts captured by the camera 104, a communication I / F 107 for communicating with an external device such as a server, and the like, and these are connected to the above-mentioned computer system. In addition to the configuration shown in FIG. 3, the information processing device 100 may be provided with various sensing devices for detecting the movement of the user U, or may be connected to various sensing devices.

図４は、第１実施形態に係る情報処理装置１００の機能的な構成例を示すブロック図である。この情報処理装置１００は、機能的な構成要素として、例えば図４に示すように、撮像部１１と、検出部１２と、画像保持部１３と、画像補間部１４と、画像縮尺変更部１５と、表示部１６と、算出部１７と、選択部１８と、フィードバック部１９と、オブジェクト保持部２０と、キャンセル処理部２１と、処理実行部２２とを備える。 FIG. 4 is a block diagram showing a functional configuration example of the information processing apparatus 100 according to the first embodiment. As functional components of the information processing device 100, for example, as shown in FIG. 4, the image pickup unit 11, the detection unit 12, the image holding unit 13, the image interpolation unit 14, and the image scale changing unit 15 are included. A display unit 16, a calculation unit 17, a selection unit 18, a feedback unit 19, an object holding unit 20, a cancel processing unit 21, and a processing execution unit 22 are provided.

これらの機能的な構成要素のうち、撮像部１１は、図３に示すカメラ１０４を用いて実現される。また、表示部１６は、図３に示すディスプレイ１０６を用いて実現される。また、画像保持部１３およびオブジェクト保持部２０は、図３に示すメモリ１０２やストレージデバイス１０３を用いて実現される。その他の検出部１２、画像補間部１４、画像縮尺変更部１５、算出部１７、選択部１８、フィードバック部１９、キャンセル処理部２１および処理実行部２２の各部は、例えば、図３に示すプロセッサ１０１がメモリ１０２を利用してストレージデバイス１０３などに格納された所定のプログラムを実行することにより実現される。また、これらの各部は、その一部または全部がＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field-Programmable Gate Array）などの専用のハードウェア（ＣＰＵなどの汎用のプロセッサ１０１ではなく専用のプロセッサ）を用いて実現されてもよい。 Among these functional components, the imaging unit 11 is realized by using the camera 104 shown in FIG. Further, the display unit 16 is realized by using the display 106 shown in FIG. Further, the image holding unit 13 and the object holding unit 20 are realized by using the memory 102 and the storage device 103 shown in FIG. The other detection unit 12, image interpolation unit 14, image scale change unit 15, calculation unit 17, selection unit 18, feedback unit 19, cancellation processing unit 21, and processing execution unit 22 are, for example, the processor 101 shown in FIG. Is realized by executing a predetermined program stored in the storage device 103 or the like using the memory 102. In addition, some or all of these parts are equipped with dedicated hardware such as ASIC (Application Specific Integrated Circuit) and FPGA (Field-Programmable Gate Array) (dedicated processor instead of general-purpose processor 101 such as CPU). It may be realized by using.

撮像部１１は、情報処理装置１００を装着したユーザＵの前方の実空間を撮像する。撮像部１１により撮像されたユーザＵの前方の実空間の画像（映像）は、検出部１２、画像保持部１３および画像補間部１４に送られる。 The imaging unit 11 images the real space in front of the user U equipped with the information processing device 100. The real space image (video) in front of the user U captured by the imaging unit 11 is sent to the detection unit 12, the image holding unit 13, and the image interpolation unit 14.

検出部１２は、撮像部１１により撮像されたユーザＵの前方の実空間の画像から、この画像に映り込んだユーザＵの手（「操作体」の一例）を検出することにより、このユーザＵの手により指定される指定領域を検出する。画像に映り込んだユーザＵの手は、例えば上述のセマンティックセグメンテーションにより検出することができる。 The detection unit 12 detects the user U's hand (an example of an “operation body”) reflected in this image from the image in the real space in front of the user U captured by the image pickup unit 11, and thereby the user U. Detects the specified area specified by the hand of. The user U's hand reflected in the image can be detected, for example, by the above-mentioned semantic segmentation.

指定領域は、ユーザＵがオブジェクトを選択するためにオブジェクト上に重ね合わせる領域であり、撮像部１１により撮像された画像に映り込んだユーザＵの手により指定される。この指定領域の範囲は、撮像部１１により撮像された画像に映り込んだユーザＵの手の形状またはその動きに応じて決定される。本実施形態で想定する指定領域の具体例を、図５−１乃至図５−６に示す。 The designated area is an area to be superimposed on the object in order for the user U to select the object, and is designated by the user U reflected in the image captured by the imaging unit 11. The range of this designated area is determined according to the shape of the user U's hand reflected in the image captured by the imaging unit 11 or the movement thereof. Specific examples of the designated area assumed in the present embodiment are shown in FIGS. 5-1 to 5-6.

まず、ユーザＵの手の形状に応じて決定される指定領域の一例を説明する。例えば手のひらを開いた状態であれば、図５−１に示すように、指先から手首までの領域が指定領域Ｒとされる。また、例えば人差し指のみを立てている状態であれば、図５−２に示すように、人差し指の指先から根元までの領域が指定領域Ｒとされる。また、例えば人差し指と親指で丸を作っている状態であれば、図５−３に示すように、人差し指と親指でつくられる丸の領域が指定領域Ｒとされる。また、例えば両手の人差し指と親指で四角を作っている状態であれば、図５−４に示すように、両手の人差し指と親指でつくられる四角の領域が指定領域Ｒとされる。その他、指定領域Ｒを指定するユーザＵの手の形状は、予め定めた様々なパターンを含むことができる。 First, an example of a designated area determined according to the shape of the hand of the user U will be described. For example, when the palm is open, the area from the fingertip to the wrist is designated as the designated area R, as shown in FIG. 5-1. Further, for example, in the case where only the index finger is raised, the area from the tip of the index finger to the base is designated as the designated area R, as shown in FIG. 5-2. Further, for example, in the case where a circle is formed by the index finger and the thumb, the circle area formed by the index finger and the thumb is designated as the designated area R as shown in FIG. 5-3. Further, for example, in the case where a square is formed by the index fingers and thumbs of both hands, as shown in FIG. 5-4, the square area formed by the index fingers and thumbs of both hands is designated as the designated area R. In addition, the shape of the hand of the user U who designates the designated area R can include various predetermined patterns.

撮像部１１により撮像された画像に映り込んだユーザＵの手の形状は、例えば、下記の参考文献２に記載されているＶＧＧＮｅｔなどのＣＮＮ（Convolutional Neural Network）を用いて識別することができる。すなわち、撮像部１１により撮像された画像から検出されたユーザＵの手の領域に対し、上述の手の形状のパターンを事前に学習したＣＮＮを適用することにより、画像に映り込んだユーザＵの手の形状を識別することができる。
（参考文献２）Karen Simonyan and Andrew Zisserman，“VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION”，arXiv:1409.1556v6，[cs.CV]，10．Apr．2015． The shape of the user U's hand reflected in the image captured by the imaging unit 11 can be identified by using, for example, a CNN (Convolutional Neural Network) such as VGGNet described in Reference 2 below. That is, by applying the CNN in which the above-mentioned hand shape pattern is learned in advance to the hand region of the user U detected from the image captured by the image pickup unit 11, the user U reflected in the image. The shape of the hand can be identified.
(Reference 2) Karen Simonyan and Andrew Zisserman, “VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION”, arXiv: 1409.1556v6, [cs.CV], 10. Apr. 2015.

次に、ユーザＵの手の動きに応じて決定される指定領域の一例を説明する。例えば手のひらを開いている状態で親指以外の４本の指をすべて動かしている状態であれば、図５−５に示すように、親指以外の４本の指の指先から根元までの領域が指定領域Ｒとされる。また、手のひら全体をなでるように動かした場合、図５−６に示すように、手のひらが移動した軌跡全体の領域が指定領域Ｒとされる。これらの手の動きは、例えば、ＲＮＮ（Recurrent Neural Network）などの時系列データを扱う識別器を用いて識別することができる。 Next, an example of the designated area determined according to the movement of the hand of the user U will be described. For example, if the palm is open and all four fingers other than the thumb are moving, the area from the fingertips to the base of the four fingers other than the thumb is specified as shown in FIG. 5-5. It is defined as the area R. Further, when the entire palm is moved so as to be stroked, as shown in FIG. 5-6, the region of the entire locus in which the palm has moved is designated as the designated region R. These hand movements can be identified using, for example, a classifier that handles time-series data such as an RNN (Recurrent Neural Network).

以上のように、指定領域Ｒは撮像部１１により撮像された画像に映り込んだユーザＵの手によって指定される領域であるため、指定領域Ｒの大きさは、ユーザＵが手を自身が装着している情報処理装置１００に近づけるほど大きくなり、遠ざけるほど小さくなる。検出部１２により検出された指定領域Ｒの位置情報は、算出部１７に送られる。また、検出部１２により検出されたユーザＵの手の領域を示す位置情報は、画像補間部１４に送られる。 As described above, since the designated area R is the area designated by the hand of the user U reflected in the image captured by the imaging unit 11, the size of the designated area R is determined by the user U wearing his / her hand. The closer it is to the information processing device 100, the larger it becomes, and the farther it is, the smaller it becomes. The position information of the designated area R detected by the detection unit 12 is sent to the calculation unit 17. Further, the position information indicating the area of the user U's hand detected by the detection unit 12 is sent to the image interpolation unit 14.

画像保持部１３は、撮像部１１により撮像された画像を、画像補間部１４での処理に利用する過去画像として、一定期間分保持する。 The image holding unit 13 holds the image captured by the imaging unit 11 as a past image used for processing by the image interpolation unit 14 for a certain period of time.

画像補間部１４は、撮像部１１により新たに撮像された画像において、ユーザＵの手が映り込んだ領域の背景を、画像保持部１３が保持する過去画像を用いて補間する処理を行う。撮像部１１により撮像された画像は、その画像に映り込むユーザＵの手によってオクルージョンが生じ、ユーザＵの手の領域の背景が見えなくなる。その結果、ユーザＵが所望のオブジェクトに指定領域Ｒを正確に重ね合わせることが難しくなったり、指定領域Ｒとオブジェクトの重畳度を正確に算出するのが難しくなったりといった不都合が生じる懸念がある。そこで、画像補間部１４が、ユーザＵの手の領域の背景を補間する処理を行う。 The image interpolation unit 14 performs a process of interpolating the background of the region in which the hand of the user U is reflected in the image newly captured by the image pickup unit 11 by using the past image held by the image holding unit 13. The image captured by the image pickup unit 11 is occlusioned by the hand of the user U reflected in the image, and the background of the region of the hand of the user U becomes invisible. As a result, there is a concern that it becomes difficult for the user U to accurately superimpose the designated area R on the desired object, or it becomes difficult to accurately calculate the degree of superimposition between the designated area R and the object. Therefore, the image interpolation unit 14 performs a process of interpolating the background of the hand region of the user U.

図６は、画像補間部１４による処理の概要を説明する図であり、撮像部１１により新たに撮像された画像Ｉｍ１において、ユーザＵの手が映り込んだ領域Ｘを、画像保持部１３が保持する過去画像Ｉｍ２を用いて補間する例を示している。 FIG. 6 is a diagram for explaining the outline of the processing by the image interpolation unit 14, and the image holding unit 13 holds the region X in which the hand of the user U is reflected in the image Im1 newly captured by the image capturing unit 11. An example of interpolation using the past image Im2 is shown.

画像補間部１４は、まず、画像保持部１３が保持する過去画像のうち、画像Ｉｍ１との間の変位量が少なく、かつ、ユーザＵの手が映り込んだ領域が重ならない過去画像Ｉｍ２を選択する。具体的には、画像補間部１４は、例えば画像保持部１３が保持する過去画像のうち、ユーザＵの手が映り込んだ領域が重ならないものをそれぞれタイル状に分割し、ユーザＵの手が映り込んだ領域を除くタイル画像を対象として、撮像部１１により新たに撮像された画像Ｉｍ１の対応する領域との間のオプティカルフローを算出する。そして、画像Ｉｍ１との間のオプティカルフローが最も少ないタイル画像を最も多く含む過去画像を、領域Ｘの補間に利用する過去画像Ｉｍ２として選択する。 The image interpolation unit 14 first selects, among the past images held by the image holding unit 13, the past image Im2 in which the amount of displacement between the image holding unit 13 and the image Im1 is small and the area in which the user U's hand is reflected does not overlap. To do. Specifically, the image interpolation unit 14 divides, for example, past images held by the image holding unit 13 into tiles in which the areas in which the user U's hand is reflected do not overlap, and the user U's hand The optical flow between the tile image excluding the reflected region and the corresponding region of the image Im1 newly captured by the imaging unit 11 is calculated. Then, the past image containing the most tile images with the least optical flow to and from the image Im1 is selected as the past image Im2 used for interpolation of the region X.

次に、画像補間部１４は、例えば、位置不変性を持つＳＩＦＴ（Scale-Invariant Feature Transform）特徴量を用いて、画像Ｉｍ１と過去画像Ｉｍ２との位置合わせを行う。具体的には、画像補間部１４は、画像Ｉｍ１と過去画像Ｉｍ２の双方に対し、ＳＩＦＴの特徴点検出を行い、各特徴点における特徴量を算出する。そして、画像Ｉｍ１と過去画像Ｉｍ２の間で距離（特徴量の差）が十分に小さい特徴点同士を対応付け、これらの特徴点が同一座標となるように、スケール変換・回転・シフト移動などの画像処理を行う。これにより、画像Ｉｍ１と過去画像Ｉｍ２との間の位置ずれが修正される。 Next, the image interpolation unit 14 aligns the image Im1 with the past image Im2 by using, for example, a SIFT (Scale-Invariant Feature Transform) feature having position invariance. Specifically, the image interpolation unit 14 detects the feature points of SIFT for both the image Im1 and the past image Im2, and calculates the feature amount at each feature point. Then, feature points having a sufficiently small distance (difference in feature amount) are associated with each other between the image Im1 and the past image Im2, and scale conversion, rotation, shift movement, etc. are performed so that these feature points have the same coordinates. Perform image processing. As a result, the misalignment between the image Im1 and the past image Im2 is corrected.

次に、画像補間部１４は、画像Ｉｍ１においてユーザＵの手が映り込んだ領域Ｘに相当する過去画像Ｉｍ２内の領域Ｙを切り出し、画像Ｉｍ１における領域Ｘを、過去画像Ｉｍ２から切り出した領域Ｙの画像に置き換えるとともに、領域Ｘの輪郭をマーキングする。これにより、画像Ｉｍ１における領域Ｘの背景が補間されるとともに、ユーザＵの手の輪郭を表すマーカＭが重畳された画像Ｉｍ３が生成される。この際、過去画像Ｉｍ２から切り出した領域Ｙの画像に対して輝度を低下させる、あるいは、ぼかし加工などの所定の画像処理を行った上で置換することにより、ユーザＵの手を透過してぼんやりと背景が見えるような視覚効果を与えることができる。 Next, the image interpolation unit 14 cuts out a region Y in the past image Im2 corresponding to the region X in which the hand of the user U is reflected in the image Im1, and cuts out the region X in the image Im1 from the past image Im2. The outline of the area X is marked while being replaced with the image of. As a result, the background of the region X in the image Im1 is interpolated, and the image Im3 on which the marker M representing the outline of the user U's hand is superimposed is generated. At this time, the brightness of the image of the region Y cut out from the past image Im2 is lowered, or the image is replaced after performing predetermined image processing such as blurring, so that the user U's hand is transparent and vague. It is possible to give a visual effect that makes the background visible.

画像補間部１４により生成された画像Ｉｍ３（手が映り込んだ領域の背景が補間された画像）は、画像縮尺変更部１５を介して、表示部１６および算出部１７に送られる。画像縮尺変更部１５は、必要に応じて画像Ｉｍ３の拡大・縮小を行う機能部であるが、具体的な説明は後述する。 The image Im3 (an image in which the background of the region in which the hand is reflected) generated by the image interpolation unit 14 is sent to the display unit 16 and the calculation unit 17 via the image scale changing unit 15. The image scale changing unit 15 is a functional unit that enlarges / reduces the image Im3 as needed, and a specific description thereof will be described later.

表示部１６は、画像補間部１４により生成された画像Ｉｍ３（画像縮尺変更部１５により拡大・縮小の処理が行われた場合は、拡大・縮小された画像Ｉｍ３）を表示する。この画像Ｉｍ３は、図６に示したように、ユーザＵの手の輪郭を示すマーカＭが重畳されている。したがって、ユーザＵはこの表示部１６により表示される画像Ｉｍ３を見ながら、ユーザＵの視野内に存在する複数のオブジェクトの中から所望のオブジェクトに指定領域Ｒを重ね合わせるように、手を動かすことができる。 The display unit 16 displays the image Im3 generated by the image interpolation unit 14 (when the image scale changing unit 15 performs the enlargement / reduction processing, the enlarged / reduced image Im3). As shown in FIG. 6, this image Im3 is superposed with a marker M indicating the contour of the user U's hand. Therefore, the user U moves his / her hand so as to superimpose the designated area R on the desired object from among the plurality of objects existing in the field of view of the user U while looking at the image Im3 displayed by the display unit 16. Can be done.

算出部１７は、ユーザＵの視野内に存在する複数のオブジェクトのうち、ユーザＵの手により指定される指定領域Ｒと重なるオブジェクトについて、指定領域Ｒとの重畳度を算出する。この算出部１７による重畳度の算出は、ユーザＵによる所定の行為をトリガとして開始される。 The calculation unit 17 calculates the degree of superimposition with the designated area R for the object overlapping the designated area R designated by the user U among the plurality of objects existing in the field of view of the user U. The calculation of the superposition degree by the calculation unit 17 is started by a predetermined action by the user U as a trigger.

ここで、算出部１７による重畳度の算出を開始させるトリガとなるユーザＵの行為としては、様々なものが考えられる。例えば、ユーザＵが手の動きを一定時間止める行為を行ったことをトリガとして、算出部１７による重畳度の算出を開始させるようにしてもよい。これは、例えば、撮像部１１により撮像された画像に映り込むユーザＵの手の変位量が所定値以下である状態が予め設定したフレーム数分継続したかどうかを判断することにより実現できる。 Here, various actions of the user U that trigger the start of the calculation of the superposition degree by the calculation unit 17 can be considered. For example, the calculation unit 17 may start the calculation of the degree of superimposition triggered by the user U performing an act of stopping the movement of the hand for a certain period of time. This can be realized, for example, by determining whether or not the state in which the displacement amount of the user U's hand reflected in the image captured by the imaging unit 11 is equal to or less than a predetermined value continues for a preset number of frames.

また、事前に設定したジェスチャ、例えば、ボタンを押す動作のように手を前に出して止める、片手で指定領域Ｒを指定したときにもう一方の手で指定領域Ｒを指し示すなど、所定のジェスチャをユーザＵが行ったことをトリガとして、算出部１７による重畳度の算出を開始させるようにしてもよい。ジェスチャの検出は、例えば、撮像部１１により撮像された画像に映り込むユーザＵの手の形状や動きが、事前に設定したジェスチャと一致するか否かを判断することにより実現できる。 In addition, a predetermined gesture such as a preset gesture, for example, putting a hand forward and stopping like a button pressing operation, or pointing to a designated area R with the other hand when a designated area R is specified with one hand, is performed. The calculation of the superposition degree by the calculation unit 17 may be started by using the user U as a trigger. Gesture detection can be realized, for example, by determining whether or not the shape and movement of the user U's hand reflected in the image captured by the imaging unit 11 matches the preset gesture.

また、事前に設定した音声コマンド、例えば「ここを選択」、「これを選択」など、所定の音声コマンドをユーザＵが発話したことをトリガとして、算出部１７による重畳度の算出を開始させるようにしてもよい。音声コマンドの検出は、例えば、情報処理装置１００に搭載されたマイク１０５に入力されたユーザＵの発話を音声認識し、事前に設定した音声コマンドと一致するか否かを判断することにより実現できる。 Further, the calculation unit 17 starts the calculation of the superposition degree by using a preset voice command such as "select here" or "select this" as a trigger when the user U utters a predetermined voice command. It may be. The detection of a voice command can be realized, for example, by voice-recognizing the utterance of the user U input to the microphone 105 mounted on the information processing device 100 and determining whether or not it matches the preset voice command. ..

また、情報処理装置１００に視線検出センサを搭載して、この視線検出センサによりユーザＵの所定の視線移動が検出されたとき、例えば、ユーザＵの視線方向が指定領域Ｒと重なったことをトリガとして、算出部１７による重畳度の算出を開始させるようにしてもよい。 Further, when the information processing device 100 is equipped with a line-of-sight detection sensor and the line-of-sight detection sensor detects a predetermined line-of-sight movement of the user U, for example, it triggers that the line-of-sight direction of the user U overlaps with the designated area R. As a result, the calculation unit 17 may start the calculation of the superposition degree.

そのほか、ユーザＵの特定の表情変化、例えば、片目を瞑る（ウィンクする）といった表情変化を検出し、これをトリガとして算出部１７による重畳度の算出を開始させるようにしてもよい。これは、図５−３や図５−４に示した例のように、指で囲まれた丸や四角の領域を指定領域Ｒとした場合に、この指定領域Ｒをユーザが覗き込もうとすると自然と片目を瞑ることが多いので、自然な処理と考えられる。 In addition, a specific facial expression change of the user U, for example, a facial expression change such as closing (winking) one eye may be detected, and this may be used as a trigger to start the calculation of the superposition degree by the calculation unit 17. This is because, as in the example shown in FIGS. 5-3 and 5-4, when the circled or square area surrounded by the fingers is set as the designated area R, the user tries to look into the designated area R. Then, one eye is often closed naturally, so it is considered to be a natural process.

算出部１７による重畳度の算出は、上述のように、ユーザＵの視野内に存在する複数のオブジェクトのうち、ユーザＵの手により指定される指定領域Ｒと重なるオブジェクトを対象として行われる。ここで、指定領域Ｒと重なるオブジェクトにオブジェクト内オブジェクトが含まれる場合、そのオブジェクト内オブジェクトを内包する大きなオブジェクトについても、指定領域Ｒとの重畳度が算出される。例えば上述のＰＣの例において、指定領域Ｒがキーボード内のいずれかのキーと重なる場合、そのキーだけでなく、そのキーを内包するキーボード全体、さらにはキーボードを内包するＰＣ全体についても、指定領域Ｒとの重畳度が算出される。 As described above, the calculation unit 17 calculates the degree of superimposition on the object that overlaps with the designated area R designated by the user U among the plurality of objects existing in the field of view of the user U. Here, when an object in the object is included in the object overlapping the designated area R, the degree of superimposition with the designated area R is calculated even for a large object including the object in the object. For example, in the above-mentioned example of a PC, when the designated area R overlaps with any key in the keyboard, not only the key but also the entire keyboard including the key, and further the entire PC including the keyboard are also designated areas. The degree of superimposition with R is calculated.

ここで、指定領域Ｒとの重畳度は、
重畳度＝指定領域Ｒと重なるオブジェクト内の領域の大きさ／オブジェクトの大きさ
により算出される。算出部１７により算出された重畳度は、選択部１８に送られる。 Here, the degree of superimposition with the designated area R is
Superposition = Calculated by the size of the area within the object that overlaps the specified area R / the size of the object. The superposition degree calculated by the calculation unit 17 is sent to the selection unit 18.

選択部１８は、算出部１７により算出された重畳度とオブジェクトの大きさに基づいて、ユーザＵの視野内に存在する複数のオブジェクトの中から、ターゲットとなるオブジェクト（後述の処理実行部２２による処理に用いるオブジェクト）を選択する。具体的には、選択部１８は、指定領域Ｒと重なるオブジェクトに対して算出部１７により算出された重畳度が所定の閾値（例えば０．８）を超えるオブジェクトのうち、大きさが最大のオブジェクトを選択する。 The selection unit 18 is a target object (depending on the processing execution unit 22 described later) from among a plurality of objects existing in the field of view of the user U based on the superposition degree calculated by the calculation unit 17 and the size of the object. Select the object to be used for processing). Specifically, the selection unit 18 is the object having the largest size among the objects whose superposition degree calculated by the calculation unit 17 exceeds a predetermined threshold value (for example, 0.8) with respect to the object overlapping the designated area R. Select.

ここで、選択部１８によるオブジェクト選択の具体例について、図１と同様にＰＣの例を用いて説明する。図７−１乃至図７−５は、ＰＣを題材とした選択部１８によるオブジェクト選択の具体例を説明する図である。ここでは、選択の候補となるオブジェクトは、ＰＣ全体、ＰＣ内のモニタ、ＰＣ内のタッチパッド、ＰＣ内のキーボード、キーボード内の各キーであるものとする。 Here, a specific example of object selection by the selection unit 18 will be described with reference to the example of a PC as in FIG. 7-1 to 7-5 are diagrams for explaining a specific example of object selection by the selection unit 18 using a PC as a subject. Here, it is assumed that the objects that are candidates for selection are the entire PC, the monitor in the PC, the touch pad in the PC, the keyboard in the PC, and each key in the keyboard.

図７−１に示す例では、ユーザＵの手により指定される指定領域ＲがＰＣ全体に重なっており、ＰＣ全体、ＰＣ内のモニタ、ＰＣ内のタッチパッド、ＰＣ内のキーボード、キーボード内の各キーの全てにおいて、指定領域Ｒとの重畳度が閾値以上となっている。この場合、選択部１８は、サイズが最も大きいＰＣ全体を、ターゲットとなるオブジェクトとして選択する。 In the example shown in FIG. 7-1, the designated area R designated by the user U overlaps the entire PC, and is included in the entire PC, the monitor in the PC, the touch pad in the PC, the keyboard in the PC, and the keyboard. In all of the keys, the degree of superimposition with the designated area R is equal to or higher than the threshold value. In this case, the selection unit 18 selects the entire PC having the largest size as the target object.

図７−２に示す例では、ユーザＵの手により指定される指定領域Ｒが主にＰＣ内のモニタと重なっており、指定領域Ｒとの重畳度が閾値以上となっているのは、ＰＣ内のモニタのみである。この場合、選択部１８は、指定領域Ｒとの重畳度が閾値以上となっている唯一のオブジェクトであるＰＣ内のモニタを、ターゲットとなるオブジェクトとして選択する。 In the example shown in FIG. 7-2, the designated area R designated by the user U mainly overlaps with the monitor in the PC, and the degree of superimposition with the designated area R is equal to or higher than the threshold value of the PC. Only the monitor inside. In this case, the selection unit 18 selects the monitor in the PC, which is the only object whose degree of superimposition with the designated area R is equal to or greater than the threshold value, as the target object.

図７−３に示す例では、ユーザＵの手により指定される指定領域ＲがＰＣ内のタッチパッドとキーボード内の一部のキーと重なっており、指定領域Ｒとの重畳度が閾値以上となっているのは、これらＰＣ内のタッチパッドとキーボード内の一部のキーである。この場合、選択部１８は、ＰＣ内のタッチパッドとキーボード内の一部のキーのうち、サイズが最も大きいＰＣ内のタッチパッドを、ターゲットとなるオブジェクトとして選択する。 In the example shown in FIG. 7-3, the designated area R designated by the user U overlaps the touch pad in the PC and some keys in the keyboard, and the degree of superimposition with the designated area R is equal to or higher than the threshold value. It is the touchpad in these PCs and some keys in the keyboard. In this case, the selection unit 18 selects the touchpad in the PC having the largest size among the touchpad in the PC and some keys in the keyboard as the target object.

図７−４に示す例では、ユーザＵの手により指定される指定領域Ｒがキーボード内の一部のキーのみと重なっており、指定領域Ｒとの重畳度が閾値以上となっているのは、キーボード内のキーのみである。この場合、選択部１８は、この場合、選択部１８は、指定領域Ｒとの重畳度が閾値以上となっている唯一のオブジェクトであるキーボード内のキーを、ターゲットとなるオブジェクトとして選択する。 In the example shown in FIG. 7-4, the designated area R designated by the user U overlaps only some keys in the keyboard, and the degree of superimposition with the designated area R is equal to or higher than the threshold value. , Only the keys in the keyboard. In this case, the selection unit 18 in this case selects the key in the keyboard, which is the only object whose degree of superimposition with the designated area R is equal to or greater than the threshold value, as the target object.

図７−５に示す例では、ユーザＵの手により指定される指定領域ＲがＰＣ内のキーボードとキーボード内の各キーと重なっており、指定領域Ｒとの重畳度が閾値以上となっているのは、これらＰＣ内のキーボードとキーボード内の各キーである。この場合、選択部１８は、ＰＣ内のキーボードとキーボード内の各キーのうち、サイズが最も大きいＰＣ内のキーボードを、ターゲットとなるオブジェクトとして選択する。 In the example shown in FIG. 7-5, the designated area R designated by the user U overlaps the keyboard in the PC and each key in the keyboard, and the degree of superimposition with the designated area R is equal to or higher than the threshold value. Is the keyboard in these PCs and each key in the keyboard. In this case, the selection unit 18 selects the keyboard in the PC having the largest size among the keyboard in the PC and each key in the keyboard as the target object.

なお、選択の候補となるオブジェクトとしてＣＧパーツよりなるオブジェクトをユーザＵの視野内に配置した場合、ユーザＵの視点によっては大きさが同等の複数のオブジェクトが重なった状態となる場合がある。ここで、ユーザＵの手により指定される指定領域Ｒがこれら複数のオブジェクトと重なり、これら複数のオブジェクトの指定領域Ｒとの重畳度が閾値以上となった場合、ユーザＵは手前側のオブジェクトを選択しようとしていると推定されるため、選択部１８は、これら複数のオブジェクトのうち、ユーザＵ（つまりユーザＵが装着している情報処理装置１００）に最も近いオブジェクトを、ターゲットとなるオブジェクトとして選択するようにしてもよい。 When an object made of CG parts is arranged in the field of view of the user U as a candidate object for selection, a plurality of objects having the same size may be overlapped depending on the viewpoint of the user U. Here, when the designated area R designated by the user U overlaps with these plurality of objects and the degree of superimposition of these plurality of objects with the designated area R becomes equal to or more than the threshold value, the user U sets the object on the front side. Since it is presumed that the object is to be selected, the selection unit 18 selects the object closest to the user U (that is, the information processing device 100 worn by the user U) as the target object among these plurality of objects. You may try to do it.

選択部１８により選択されたオブジェクトの情報は、フィードバック部１９とオブジェクト保持部２０とに送られる。オブジェクト保持部２０は、選択部１８により選択されたオブジェクトの情報を保持するバッファである。 The information of the object selected by the selection unit 18 is sent to the feedback unit 19 and the object holding unit 20. The object holding unit 20 is a buffer that holds information on the object selected by the selection unit 18.

フィードバック部１９は、選択部１８によりオブジェクトが選択されたことをユーザＵに認識させるフィードバック処理を行う。このフィードバック処理としては、例えば、選択部１８により選択されたオブジェクトを視覚的にユーザＵに認識させる処理が挙げられる。具体的には、フィードバック部１９は、例えば図８に示すように、表示部１６に表示されている画像５０内で、選択部１８により選択されたオブジェクト（ここではＰＣ内のタッチパッド）の領域５１を検出し、このオブジェクトの領域５１を目立たせるハイライト処理を行うようにしてもよい。ハイライト処理としては、例えば、領域５１の輪郭を反転色にする、領域５１内の各画素の輝度を高めて光らせるなどの処理が考えられる。 The feedback unit 19 performs feedback processing for the user U to recognize that the object has been selected by the selection unit 18. Examples of this feedback process include a process of causing the user U to visually recognize the object selected by the selection unit 18. Specifically, as shown in FIG. 8, the feedback unit 19 is an area of an object (here, a touch pad in a PC) selected by the selection unit 18 in the image 50 displayed on the display unit 16. 51 may be detected and highlighting processing may be performed to make the area 51 of this object stand out. As the highlighting process, for example, processing such as inversion of the outline of the area 51 or increasing the brightness of each pixel in the area 51 to make it shine can be considered.

また、フィードバック部１９は、例えば図９に示すように、表示部１６に表示されている画像５０内に、選択部１８により選択されたオブジェクトを提示するための提示領域５２を設け、この提示領域５２内に、選択部１８により選択されたオブジェクト（ここではＰＣ内のタッチパッド）付近を切り出した切り出し画像５３を表示させるようにしてもよい。このように、選択部１８により選択されたオブジェクトを視覚的にユーザＵに認識させる処理を行うことで、ユーザＵは、表示部１６の画像５０を見ながら、オブジェクトの選択が正しく行われたかどうかを確認することができる。 Further, as shown in FIG. 9, for example, the feedback unit 19 provides a presentation area 52 for presenting an object selected by the selection unit 18 in the image 50 displayed on the display unit 16, and this presentation area A cut-out image 53 cut out in the vicinity of the object (here, the touch pad in the PC) selected by the selection unit 18 may be displayed in the 52. By performing the process of visually recognizing the object selected by the selection unit 18 to the user U in this way, whether or not the user U has correctly selected the object while looking at the image 50 of the display unit 16. Can be confirmed.

また、フィードバック部１９は、選択部１８によりオブジェクトが選択されたことを、振動によりユーザＵに認識させる処理を行うようにしてもよい。例えば、フィードバック部１９は、選択部１８によりオブジェクトが選択されたタイミングで、情報処理装置１００を振動させるようにしてもよい。また、フィードバック部１９は、情報処理装置１００に対してＢｌｕｅｔｏｏｔｈ（登録商標）などの近距離無線通信により接続されるスマートウォッチなどを、選択部１８によりオブジェクトが選択されたタイミングで振動させるようにしてもよい。 Further, the feedback unit 19 may perform a process of causing the user U to recognize that the object has been selected by the selection unit 18 by vibration. For example, the feedback unit 19 may vibrate the information processing device 100 at the timing when the object is selected by the selection unit 18. Further, the feedback unit 19 vibrates a smart watch or the like connected to the information processing device 100 by short-range wireless communication such as Bluetooth (registered trademark) at the timing when the object is selected by the selection unit 18. May be good.

キャンセル処理部２１は、ユーザＵによる所定の行為をトリガとして、選択部１８により選択されたオブジェクトをキャンセルする処理を行う。具体的には、キャンセル処理部２１は、ユーザＵがキャンセル処理のトリガとして事前に定められた所定の行為を行った場合に、オブジェクト保持部２０が保持する最新のオブジェクトを削除する。また、キャンセル処理部２１は、ユーザＵがキャンセル処理のトリガとして事前に定められた所定の行為を行った場合に、オブジェクト保持部２０が保持するすべてのオブジェクトを削除するようにしてもよい。このキャンセル処理部２１の処理により、選択部１８により選択されたオブジェクトがユーザＵの意図したものでなかった場合に、オブジェクトの選択をやり直すことが可能となる。 The cancel processing unit 21 performs a process of canceling the object selected by the selection unit 18 triggered by a predetermined action by the user U. Specifically, the cancel processing unit 21 deletes the latest object held by the object holding unit 20 when the user U performs a predetermined action predetermined as a trigger for the cancel processing. Further, the cancel processing unit 21 may delete all the objects held by the object holding unit 20 when the user U performs a predetermined action as a trigger for the cancel processing. By the processing of the cancel processing unit 21, when the object selected by the selection unit 18 is not intended by the user U, the object can be selected again.

ここで、キャンセル処理のトリガとなるユーザＵの行為としては、様々なものが考えられる。例えば、ユーザＵが選択部１８により選択されたオブジェクトと同じオブジェクトに指定領域Ｒを重ね合わせて、そのオブジェクトを繰り返し選択する動作を行ったことをトリガとして、キャンセル処理部２１がそのオブジェクトをオブジェクト保持部２０から削除するようにしてもよい。 Here, various actions of the user U that trigger the cancellation process can be considered. For example, the cancellation processing unit 21 holds the object as a trigger when the user U superimposes the designated area R on the same object as the object selected by the selection unit 18 and repeatedly selects the object. It may be deleted from the part 20.

また、事前に設定したジェスチャ、例えば、親指を立てる、人差し指でフリック動作をするなど、所定のジェスチャをユーザＵが行ったことをトリガとして、キャンセル処理部２１がオブジェクト保持部２０から最新のオブジェクトまたはすべてのオブジェクトを削除するようにしてもよい。 In addition, the cancel processing unit 21 uses the latest object or the latest object from the object holding unit 20 as a trigger when the user U performs a predetermined gesture such as raising a thumb or flicking with the index finger, which is a preset gesture. You may want to delete all objects.

また、事前に設定した音声コマンド、例えば「キャンセル」、「やり直し」など、所定の音声コマンドをユーザＵが発話したことをトリガとして、キャンセル処理部２１がオブジェクト保持部２０から最新のオブジェクトまたはすべてのオブジェクトを削除するようにしてもよい。 Further, when the user U utters a predetermined voice command such as a preset voice command such as "cancel" or "redo", the cancel processing unit 21 sends the latest object or all from the object holding unit 20. You may want to delete the object.

なお、ユーザＵが選択しようとしているオブジェクトが、選択部１８により選択されたオブジェクトに内包される、より小さいサイズのオブジェクトであるが、この小さいオブジェクトが選択の候補となるオブジェクトとして認識されていない場合もある。このような場合には、選択部１８により選択されたオブジェクトのキャンセル処理を行った上で、例えば、「もっと細かく」といった音声コマンドをユーザＵが発話したことをトリガとして、上述のセマンティックセグメンテーションなどを用いたオブジェクトの認識をより細かい粒度で行って、よりサイズの小さいオブジェクトを選択できるようにしてもよい。 When the object to be selected by the user U is a smaller size object included in the object selected by the selection unit 18, but this small object is not recognized as a candidate object for selection. There is also. In such a case, after canceling the object selected by the selection unit 18, for example, the above-mentioned semantic segmentation or the like is performed with the user U uttering a voice command such as "more finely". The objects used may be recognized with a finer grain size so that smaller objects can be selected.

ところで、オブジェクトの選択に用いられる指定領域Ｒの大きさは、上述のように、ユーザＵが手を情報処理装置１００に近づけたり遠ざけたりすることにより変化するが、ユーザＵが手を動かせる範囲は限界がある。このため、選択の候補となるオブジェクトのうち、例えばサイズが極端に小さいオブジェクトなどに指定領域Ｒを正しく重ね合わせることができずに、そのようなオブジェクトを適切に選択できないといった不都合が生じる懸念もある。 By the way, the size of the designated area R used for selecting an object changes as the user U moves his / her hand closer to or further from the information processing device 100 as described above, but the range in which the user U can move his / her hand is limited. There is a limit. For this reason, among the objects that are candidates for selection, there is a concern that the designated area R cannot be correctly superimposed on, for example, an object having an extremely small size, and such an object cannot be properly selected. ..

このような問題点への対策として、本実施形態に係る情報処理装置１００は、画像縮尺変更部１５を備える。画像縮尺変更部１５は、ユーザＵによる所定の行為をトリガとして、表示部１６に表示される画像（例えば図６に示した画像Ｉｍ３）のうち、ユーザＵの手（例えば図６に示した画像Ｉｍ３におけるマーカＭ）を除く画像を拡大または縮小する処理を行う。つまり、画像縮尺変更部１５は、ユーザＵによる所定の行為に応じて、ユーザＵの手（例えば図６に示した画像Ｉｍ３におけるマーカＭ）の大きさを変えることなく、表示部１６に表示される画像（例えば図６に示した画像Ｉｍ３）を拡大または縮小する処理を行う。これにより、指定領域Ｒに対するオブジェクトの相対的な大きさを変化させることができ、オブジェクトの選択をより適切に行うことができる。 As a countermeasure against such a problem, the information processing apparatus 100 according to the present embodiment includes an image scale changing unit 15. The image scale changing unit 15 is triggered by a predetermined action by the user U, and among the images displayed on the display unit 16 (for example, the image Im3 shown in FIG. 6), the hand of the user U (for example, the image shown in FIG. 6). A process of enlarging or reducing an image excluding the marker M) in Im3 is performed. That is, the image scale changing unit 15 is displayed on the display unit 16 without changing the size of the user U's hand (for example, the marker M in the image Im3 shown in FIG. 6) in response to a predetermined action by the user U. The image (for example, the image Im3 shown in FIG. 6) is enlarged or reduced. As a result, the relative size of the object with respect to the designated area R can be changed, and the object can be selected more appropriately.

ここで、画像の拡大・縮小処理のトリガとなるユーザＵの行為としては、様々なものが考えられる。例えば、ユーザＵの姿勢変化を検出し、その姿勢変化に応じて画像縮尺変更部１５が表示部１６に表示される画像に対する拡大・縮小処理を行うようにしてもよい。具体的には、例えば図１０に示すように、ユーザＵが頭を後ろに反らせた場合は画像の縮小処理、ユーザＵが頭を前に出した場合は画像の拡大処理を行うといった例が挙げられる。姿勢変化の検出は、例えば、撮像部１１により撮像された画像のオプティカルフローを用いてユーザＵの移動量を算出する、あるいは、情報処理装置１００がジャイロセンサなどの動きを検知するセンサを搭載している場合は、そのセンサ出力を用いてユーザＵの移動量を算出することで実現できる。 Here, various actions of the user U that trigger the enlargement / reduction processing of the image can be considered. For example, the posture change of the user U may be detected, and the image scale changing unit 15 may perform enlargement / reduction processing on the image displayed on the display unit 16 according to the posture change. Specifically, for example, as shown in FIG. 10, when the user U bends his head backward, the image is reduced, and when the user U puts his head forward, the image is enlarged. Be done. For the detection of the posture change, for example, the movement amount of the user U is calculated by using the optical flow of the image captured by the imaging unit 11, or the information processing device 100 is equipped with a sensor that detects the movement such as a gyro sensor. If so, it can be realized by calculating the movement amount of the user U using the sensor output.

また、情報処理装置１００の筐体に操作ダイヤルなどの物理的な操作部を設け、ユーザＵがこの操作部を操作した場合に、その操作量に応じて画像縮尺変更部１５が表示部１６に表示される画像に対する拡大・縮小処理を行うようにしてもよい。 Further, when a physical operation unit such as an operation dial is provided in the housing of the information processing device 100 and the user U operates this operation unit, the image scale change unit 15 is displayed on the display unit 16 according to the operation amount. The displayed image may be enlarged or reduced.

また、事前に設定したジェスチャ、例えば、指を大きく広げた状態の手を手前側に引いたり奥側に押し出したりするといった所定のジェスチャをユーザＵが行ったことをトリガとして、画像縮尺変更部１５が表示部１６に表示される画像に対する拡大・縮小処理を行うようにしてもよい。 In addition, the image scale changing unit 15 is triggered by the user U performing a preset gesture, for example, a predetermined gesture such as pulling the hand with the fingers widely spread toward the front side or pushing it toward the back side. May perform enlargement / reduction processing on the image displayed on the display unit 16.

また、事前に設定した音声コマンド、例えば「拡大」、「縮小」などの所定の音声コマンドをユーザＵが発話したことをトリガとして、画像縮尺変更部１５が表示部１６に表示される画像に対する拡大・縮小処理を行うようにしてもよい。 Further, when the user U utters a preset voice command, for example, a predetermined voice command such as "enlargement" or "reduction", the image scale changing unit 15 enlarges the image displayed on the display unit 16. -The reduction process may be performed.

そのほか、ユーザＵの特定の表情変化、例えば、目を小さく閉じる、目を大きく開くといった表情変化を検出し、ユーザＵが目を小さく閉じた場合は画像縮尺変更部１５が表示部１６に表示される画像を縮小し、ユーザＵが目を大きく開いた場合は画像縮尺変更部１５が表示部１６に表示される画像を拡大するようにしてもよい。また、ユーザＵの脳波を解析することによりユーザＵの意図を理解し、それに合わせて画像縮尺変更部１５が表示部１６に表示される画像に対する拡大・縮小処理を行うようにしてもよい。 In addition, a specific facial expression change of the user U, for example, a facial expression change such as closing the eyes small or opening the eyes wide is detected, and when the user U closes the eyes small, the image scale changing unit 15 is displayed on the display unit 16. When the user U opens his eyes wide, the image scale changing unit 15 may enlarge the image displayed on the display unit 16. Further, the intention of the user U may be understood by analyzing the brain waves of the user U, and the image scale changing unit 15 may perform enlargement / reduction processing on the image displayed on the display unit 16 accordingly.

処理実行部２２は、選択部１８により選択され、キャンセル処理部２１により削除されることなくオブジェクト保持部２０に保持されたオブジェクトに基づいて、所定の処理を実行する。本実施形態では、処理実行部２２が実行する所定の処理の一例として、類似画像検索を例示する。 The process execution unit 22 executes a predetermined process based on the object selected by the selection unit 18 and held in the object holding unit 20 without being deleted by the cancel processing unit 21. In the present embodiment, a similar image search is illustrated as an example of a predetermined process executed by the process execution unit 22.

類似画像検索は、クエリとなる画像を指定したとき、画像データベースに格納されている画像の中から、クエリとなる画像に類似する画像を検索する処理である。本実施形態では、処理実行部２２が、選択部１８により選択され、キャンセル処理部２１により削除されることなくオブジェクト保持部２０に保持されたオブジェクトの画像を含む検索クエリを生成する。そして、処理実行部２２は、生成した検索クエリに従って、例えば情報処理装置１００の外部の画像データベースから、選択部１８により選択されたオブジェクトに類似するオブジェクトが映った画像を検索する。 The similar image search is a process of searching for an image similar to the image to be a query from the images stored in the image database when an image to be a query is specified. In the present embodiment, the processing execution unit 22 generates a search query including an image of an object selected by the selection unit 18 and held in the object holding unit 20 without being deleted by the cancel processing unit 21. Then, the processing execution unit 22 searches for an image showing an object similar to the object selected by the selection unit 18 from, for example, an external image database of the information processing device 100 according to the generated search query.

オブジェクトの類似性は、画像から抽出されるオブジェクトの特徴量に基づいて判断される。すなわち、検索クエリに含まれるオブジェクトと、画像データベースに格納されている各画像に含まれるオブジェクトのそれぞれについて特徴量を算出し、画像データベースに格納されている画像の中から、検索クエリに含まれるオブジェクトの特徴量との距離が所定値以下のオブジェクトが映っている画像を、検索結果として取得する。 Object similarity is determined based on the features of the objects extracted from the image. That is, the feature amount is calculated for each of the object included in the search query and the object included in each image stored in the image database, and the object included in the search query is calculated from the images stored in the image database. An image showing an object whose distance from the feature amount of is less than or equal to a predetermined value is acquired as a search result.

なお、検索クエリには、複数のオブジェクトを含めることもできる。すなわち、選択部１８により選択され、キャンセル処理部２１により削除されることなくオブジェクト保持部２０に保持されたオブジェクトが複数ある場合、処理実行部２２は、これら複数のオブジェクトの画像をＡＮＤ結合、あるいはＯＲ結合した検索クエリを生成し、この検索クエリに従って画像データベースに対する検索を行う。複数のオブジェクトの画像をＡＮＤ結合した検索クエリを実行した場合は、それぞれのオブジェクトに類似したオブジェクトがすべて映っている画像が、検索結果として取得される。一方、複数のオブジェクトの画像をＯＲ結合した検索クエリを実行した場合は、複数のオブジェクトのうちのいずれかに類似するオブジェクトが映っている画像がすべて、検索結果として取得される。 Note that the search query can include multiple objects. That is, when there are a plurality of objects selected by the selection unit 18 and held in the object holding unit 20 without being deleted by the cancel processing unit 21, the processing execution unit 22 AND-joins or combines the images of these plurality of objects. A search query combined with OR is generated, and the image database is searched according to this search query. When a search query in which images of a plurality of objects are AND-combined is executed, an image showing all objects similar to each object is acquired as a search result. On the other hand, when a search query in which images of a plurality of objects are OR-combined is executed, all the images showing an object similar to any of the plurality of objects are acquired as search results.

このような類似画像検索は、例えば、本実施形態に係る情報処理装置１００を作業支援の用途で利用する場合に極めて有効である。すなわち、ユーザＵが情報処理装置１００を頭部に装着して所定の作業を実施した場合、実施した作業が正確に行われたかどうかを確認する必要が生じる場合がある。このような場合に、作業の結果が反映されたオブジェクトを選択し、このオブジェクトの画像を含む検索クエリに従って、過去の作業結果を画像として蓄積した画像データベースから類似画像を検索してユーザＵに提示することにより、ユーザＵは、実施した作業が正確に行われたかどうかを容易に確認することができる。 Such a similar image search is extremely effective, for example, when the information processing apparatus 100 according to the present embodiment is used for work support purposes. That is, when the user U wears the information processing device 100 on his head and performs a predetermined work, it may be necessary to confirm whether or not the performed work has been performed accurately. In such a case, select an object that reflects the work result, search for a similar image from the image database that stores the past work result as an image according to the search query that includes the image of this object, and present it to user U. By doing so, the user U can easily confirm whether or not the performed work has been performed accurately.

なお、ここでは処理実行部２２が実行する所定の処理の一例として類似画像検索を例示したが、処理実行部２２が実行する処理は類似画像検索に限らない。処理実行部２２は、類似画像検索以外にも、例えば、選択部１８により選択されたオブジェクトの情報を記録する処理、選択部１８により選択されたオブジェクトの詳細情報を外部のデータベースから取得する処理、選択部１８により選択されたオブジェクトに含まれる文字を認識する処理など、様々な処理を実行するものであってもよい。 Here, a similar image search is illustrated as an example of a predetermined process executed by the process execution unit 22, but the process executed by the process execution unit 22 is not limited to the similar image search. In addition to the similar image search, the process execution unit 22 may record, for example, information on the object selected by the selection unit 18, or acquire detailed information on the object selected by the selection unit 18 from an external database. Various processes such as a process of recognizing characters included in the object selected by the selection unit 18 may be executed.

次に、以上のように構成される本実施形態に係る情報処理装置１００の主要な動作について、図１１のフローチャートを参照して簡単に説明する。図１１は、情報処理装置１００の動作概要を示すフローチャートである。 Next, the main operation of the information processing apparatus 100 according to the present embodiment configured as described above will be briefly described with reference to the flowchart of FIG. FIG. 11 is a flowchart showing an outline of the operation of the information processing apparatus 100.

情報処理装置１００の動作が開始されると、まず、検出部１２がユーザＵの視野内における指定領域Ｒを検出する（ステップＳ１０１）。そして、算出部１７が、ユーザＵの視野内に存在する複数のオブジェクトのうち、指定領域Ｒと重なるオブジェクトについて、指定領域Ｒとの重畳度を算出する（ステップＳ１０２）。 When the operation of the information processing device 100 is started, the detection unit 12 first detects the designated area R in the field of view of the user U (step S101). Then, the calculation unit 17 calculates the degree of superimposition with the designated area R for the object overlapping the designated area R among the plurality of objects existing in the field of view of the user U (step S102).

ここで、ユーザＵの視野内に存在する複数のオブジェクトの中に、指定領域Ｒとの重畳度が閾値以上のオブジェクトがない場合は（ステップＳ１０３：Ｎｏ）、そのまま処理が終了する。一方、指定領域Ｒとの重畳度が閾値以上のオブジェクトがある場合は（ステップＳ１０３：Ｙｅｓ）、指定領域Ｒとの重畳度が閾値以上のオブジェクトが１つであれば（ステップＳ１０４：Ｙｅｓ）、選択部１８が、そのオブジェクトをターゲットとなるオブジェクトとして選択する（ステップＳ１０５）。一方、指定領域Ｒとの重畳度が閾値以上のオブジェクトが複数ある場合は（ステップＳ１０４：Ｎｏ）、選択部１８は、これら複数のオブジェクトのうち、サイズが最大のオブジェクトをターゲットとなるオブジェクトとして選択する（ステップＳ１０６）。 Here, if there is no object whose degree of superimposition with the designated area R is equal to or greater than the threshold value among the plurality of objects existing in the field of view of the user U (step S103: No), the process ends as it is. On the other hand, if there is an object whose degree of superimposition with the designated area R is equal to or higher than the threshold value (step S103: Yes), and if there is one object whose degree of superimposition with the designated area R is equal to or higher than the threshold value (step S104: Yes). The selection unit 18 selects the object as the target object (step S105). On the other hand, when there are a plurality of objects whose degree of superimposition with the designated area R is equal to or greater than the threshold value (step S104: No), the selection unit 18 selects the object having the largest size among these plurality of objects as the target object. (Step S106).

その後、処理実行部２２が、ステップＳ１０５またはステップＳ１０６で選択されたオブジェクトに基づいて所定の処理を実行し（ステップＳ１０７）、一連の処理が終了する。 After that, the process execution unit 22 executes a predetermined process based on the object selected in step S105 or step S106 (step S107), and a series of processes is completed.

以上、具体的な例を挙げながら詳細に説明したように、本実施形態に係る情報処理装置１００では、検出部１２が、撮像部１１により撮像された画像内に映り込むユーザＵの手（「操作体」の一例）により指定される指定領域Ｒを検出する。そして、算出部１７が、ユーザＵの視野内に存在する複数のオブジェクトのうち、検出部１２により検出された指定領域Ｒと重なるオブジェクトについて、指定領域Ｒとの重畳度を算出する。そして、選択部１８が、算出部１７により算出された重畳度とオブジェクトの大きさとに基づいて、ユーザＵの視野内に存在する複数のオブジェクトの中から、ターゲットとなるオブジェクトを選択するようにしている。したがって、本実施形態に係る情報処理装置１００によれば、ユーザＵの視野内に存在する複数のオブジェクトにオブジェクト内オブジェクトが含まれる場合であっても、ユーザＵの意図したオブジェクトを正しく選択することができる。 As described in detail above with reference to specific examples, in the information processing apparatus 100 according to the present embodiment, the detection unit 12 is reflected in the image captured by the imaging unit 11 (““ Detects the designated area R specified by (an example of the operating body). Then, the calculation unit 17 calculates the degree of superimposition with the designated area R for the object overlapping the designated area R detected by the detection unit 12 among the plurality of objects existing in the field of view of the user U. Then, the selection unit 18 selects a target object from a plurality of objects existing in the field of view of the user U based on the superposition degree calculated by the calculation unit 17 and the size of the object. There is. Therefore, according to the information processing apparatus 100 according to the present embodiment, even when the objects in the object are included in the plurality of objects existing in the field of view of the user U, the object intended by the user U is correctly selected. Can be done.

なお、本実施形態では、ユーザＵの動作に応じてユーザＵの視野内で動く操作体として、カメラ１０４により撮像された実空間の画像に映り込むユーザＵの手を想定したが、ユーザＵの手の代わりに、ユーザＵの足を操作体として用いてもよい。この場合は、カメラ１０４により撮像された実空間の画像からユーザＵの足の領域を検出することで、指定領域Ｒを検出すればよい。 In the present embodiment, the hand of the user U reflected in the image in the real space captured by the camera 104 is assumed as the operating body that moves in the field of view of the user U according to the operation of the user U. Instead of the hand, the foot of the user U may be used as the operating body. In this case, the designated area R may be detected by detecting the area of the foot of the user U from the image in the real space captured by the camera 104.

また、ユーザＵの動作に応じてユーザＵの視野内で動く操作体は、例えば、ユーザＵが手で持ったり手や足に装着したりして動かす物体（道具）などであってもよい。この場合は、カメラ１０４により撮像された実空間の画像から、操作体として用いる物体（道具）に設けられたマーカを目印としてその物体（道具）の領域を検出することで、指定領域Ｒを検出すればよい。 Further, the operating body that moves in the field of view of the user U according to the movement of the user U may be, for example, an object (tool) that the user U moves by holding it by hand or attaching it to a hand or foot. In this case, the designated area R is detected by detecting the area of the object (tool) as a marker provided on the object (tool) used as the operating body from the image in the real space captured by the camera 104. do it.

また、本実施形態では、カメラ１０４により撮像された実空間の画像をディスプレイ１０６に表示させる、いわゆる非透過型ＨＭＤとして情報処理装置１００が構成されているものとしたが、ユーザＵの前方の実空間を撮像するカメラ１０４を備えていれば、ユーザＵの前方の実空間の光学像がディスプレイ１０６を透過してユーザＵに視認される、いわゆる透過型ＨＭＤとして情報処理装置１００が構成されていてもよい。 Further, in the present embodiment, the information processing device 100 is configured as a so-called non-transmissive HMD in which the image of the real space captured by the camera 104 is displayed on the display 106, but the actual image in front of the user U is realized. If the camera 104 that captures the space is provided, the information processing device 100 is configured as a so-called transmissive HMD in which the optical image of the real space in front of the user U is transmitted through the display 106 and visually recognized by the user U. May be good.

また、本実施形態は、ユーザＵが頭部に装着して使用するＨＤＭに限らず、例えばスマートフォンなど、ユーザＵが片手で目の前に置きながら使用できる情報端末に適用することもできる。 Further, the present embodiment is not limited to the HDM worn by the user U on the head and used, and can also be applied to an information terminal such as a smartphone that the user U can use while placing it in front of the eyes with one hand.

＜第２実施形態＞
次に、第２実施形態について説明する。本実施形態に係る情報処理装置は、ＶＲ技術を用いたＨＭＤとして構成される。以下では、本実施形態に係る情報処理装置を、上述の第１実施形態に係る情報処理装置１００と区別して、情報処理装置１００’と表記する。 <Second Embodiment>
Next, the second embodiment will be described. The information processing device according to the present embodiment is configured as an HMD using VR technology. In the following, the information processing device according to the present embodiment will be referred to as an information processing device 100'to distinguish it from the information processing device 100 according to the first embodiment described above.

本実施形態に係る情報処理装置１００’は、第１実施形態に係る情報処理装置１００と同様に、ユーザＵが頭部に装着して使用することができる。ただし、本実施形態に係る情報処理装置１００’は、第１実施形態に係る情報処理装置１００と異なり、完全ＣＧとなる仮想空間をユーザＵに提示する。したがって、ユーザＵの視野内に存在する複数のオブジェクト、つまりオブジェクト選択の候補となる複数のオブジェクトはすべてＣＧパーツであり、これらＣＧパーツよりなる複数のオブジェクトの中にはオブジェクト内オブジェクトが含まれている。 The information processing device 100'according to the present embodiment can be used by the user U by being worn on the head in the same manner as the information processing device 100 according to the first embodiment. However, unlike the information processing device 100 according to the first embodiment, the information processing device 100'according to the present embodiment presents a virtual space that is completely CG to the user U. Therefore, the plurality of objects existing in the user U's field of view, that is, the plurality of objects that are candidates for object selection are all CG parts, and the plurality of objects composed of these CG parts include the objects in the object. There is.

また、本実施形態では、ユーザＵの動作に応じてユーザＵの視野内で動く操作体として、ユーザＵの手の動きに合わせて仮想空間の画像上で動く手のＣＧパーツを想定する。すなわち、ユーザＵの視野内に含まれる複数のオブジェクトの中から、手のＣＧパーツにより指定される指定領域との重畳度とオブジェクトの大きさとに基づいて、ターゲットとなるオブジェクトを選択する。 Further, in the present embodiment, as an operating body that moves in the field of view of the user U according to the movement of the user U, a CG part of the hand that moves on the image of the virtual space according to the movement of the hand of the user U is assumed. That is, a target object is selected from a plurality of objects included in the field of view of the user U based on the degree of superimposition with the designated area specified by the CG part of the hand and the size of the object.

本実施形態に係る情報処理装置１００’のハードウェア構成は、図３に例示した第１実施形態に係る情報処理装置１００のハードウェア構成とほぼ同様である。ただし、本実施形態に係る情報処理装置１００’は、完全ＣＧとなる仮想空間をユーザＵに提示するため、カメラ１０４は備えていなくてもよい。その代り、ユーザＵの手の動きを検出するための各種センシングデバイスが設けられる。ユーザＵの手の動きを検出するためのセンシングデバイスとしては、例えば、ユーザＵの前方の距離画像を取得するＴＯＦ（Time-of-Flight）カメラ、ユーザＵが手に装着して使用するデータグローブ、ユーザＵの筋電位を計測することで動きを推定する筋電センサなどが挙げられる。なお、ユーザＵの前方の実空間を撮像するカメラ１０４を、ユーザＵの手の動きを検出するためのセンシングデバイスとして用いてもよい。 The hardware configuration of the information processing device 100'according to the present embodiment is substantially the same as the hardware configuration of the information processing apparatus 100 according to the first embodiment illustrated in FIG. However, since the information processing device 100'according to the present embodiment presents the virtual space to be a complete CG to the user U, the camera 104 may not be provided. Instead, various sensing devices for detecting the movement of the user U's hand are provided. Examples of the sensing device for detecting the movement of the user U's hand include a TOF (Time-of-Flight) camera that acquires a distance image in front of the user U, and a data glove worn by the user U in the hand. , An electromyographic sensor that estimates movement by measuring the myoelectric potential of the user U, and the like. The camera 104 that captures the real space in front of the user U may be used as a sensing device for detecting the movement of the user U's hand.

図１２は、本実施形態に係る情報処理装置１００’の機能的な構成例を示すブロック図である。図４に示した第１実施形態に係る情報処理装置１００の機能的な構成例と比較して、本実施形態では、撮像部１１、画像保持部１３および画像補間部１４の代わりに、動き検出部２３と、仮想空間生成部２４とを備える。動き検出部２３は、上述の各種センシングデバイスを用いて実現される。また、仮想空間生成部２４は、例えば、図３に示すプロセッサ１０１がメモリ１０２を利用してストレージデバイス１０３などに格納された所定のプログラムを実行することにより実現される。また、仮想空間生成部２４は、ＡＳＩＣやＦＰＧＡなどの専用のハードウェアを用いて実現されてもよい。 FIG. 12 is a block diagram showing a functional configuration example of the information processing apparatus 100'according to the present embodiment. Compared with the functional configuration example of the information processing apparatus 100 according to the first embodiment shown in FIG. 4, in this embodiment, motion detection is performed instead of the image capturing unit 11, the image holding unit 13, and the image interpolation unit 14. A unit 23 and a virtual space generation unit 24 are provided. The motion detection unit 23 is realized by using the various sensing devices described above. Further, the virtual space generation unit 24 is realized, for example, by the processor 101 shown in FIG. 3 using the memory 102 to execute a predetermined program stored in the storage device 103 or the like. Further, the virtual space generation unit 24 may be realized by using dedicated hardware such as ASIC or FPGA.

動き検出部２３は、本実施形態に係る情報処理装置１００’を装着したユーザＵの手の動きを検出する。すなわち、動き検出部２３は、上述のセンシングデバイスから得られるセンサデータを用いて、時系列で変化するユーザＵの手の位置と形状を検出する。例えば、センシングデバイスとしてＴＯＦカメラを用いる場合は、ＴＯＦカメラによって距離画像（ピクセルごとに前方の物体までの距離情報を保持した画像）が取得されるため、この距離画像の中でユーザＵの手が存在し得る範囲の距離に相当する領域を手領域とみなすことにより、ユーザＵの手の位置と形状を検出できる。 The motion detection unit 23 detects the motion of the hand of the user U who wears the information processing device 100'according to the present embodiment. That is, the motion detection unit 23 detects the position and shape of the user U's hand, which changes in time series, by using the sensor data obtained from the above-mentioned sensing device. For example, when a TOF camera is used as a sensing device, a distance image (an image that holds distance information to an object in front for each pixel) is acquired by the TOF camera, so that the user U's hand is included in this distance image. The position and shape of the user U's hand can be detected by regarding the area corresponding to the distance of the possible range as the hand area.

また、データグローブや筋電センサは、ユーザＵの手の形状を検出することができる。これらデータグローブや筋電センサを例えばＴＯＦカメラと組み合わせて利用することにより、時系列で変化するユーザＵの手の位置と形状をより精度良く検出することができる。ＴＯＦカメラの代わりに上述のカメラ１０４を用いてもよい。 In addition, the data glove and the myoelectric sensor can detect the shape of the user U's hand. By using these data gloves and myoelectric sensors in combination with, for example, a TOF camera, it is possible to more accurately detect the position and shape of the user U's hand, which changes over time. The above-mentioned camera 104 may be used instead of the TOF camera.

仮想空間生成部２４は、ユーザＵに提示する仮想空間を生成する。仮想空間生成部２４が生成する仮想空間内には、上述のように、ＣＧパーツよりなる複数のオブジェクトが存在し、これらＣＧパーツよりなる複数のオブジェクトの中にはオブジェクト内オブジェクトが含まれている。また、この仮想空間は、動き検出部２３により検出されたユーザＵの手の動きに合わせて、ユーザＵの手を模したＣＧパーツが動くように生成される。本実施形態では、この仮想空間生成部２４により生成される仮想空間の画像が表示部１６に表示される。 The virtual space generation unit 24 generates a virtual space to be presented to the user U. As described above, a plurality of objects composed of CG parts exist in the virtual space generated by the virtual space generation unit 24, and the objects in the object are included in the plurality of objects composed of these CG parts. .. Further, this virtual space is generated so that the CG part imitating the hand of the user U moves according to the movement of the hand of the user U detected by the motion detection unit 23. In the present embodiment, the image of the virtual space generated by the virtual space generation unit 24 is displayed on the display unit 16.

また、本実施形態では、仮想空間生成部２４により生成される仮想空間内で動くユーザＵの手のＣＧパーツを、指定領域Ｒを指定するための操作体として用いる。したがって、検出部１２は、動き検出部２３により検出されたユーザＵの手の動き（時系列で変化するユーザＵの手の位置と形状）に基づいて、仮想空間生成部２４により生成される仮想空間内で動くユーザＵの手のＣＧパーツの位置と形状を判定し、仮想空間内でユーザＵの手のＣＧパーツにより指定される指定領域Ｒを検出する。 Further, in the present embodiment, the CG part of the user U's hand that moves in the virtual space generated by the virtual space generation unit 24 is used as an operating body for designating the designated area R. Therefore, the detection unit 12 is a virtual space generation unit 24 generated by the virtual space generation unit 24 based on the movement of the user U's hand detected by the motion detection unit 23 (the position and shape of the user U's hand that changes in time series). The position and shape of the CG part of the user U's hand moving in the space are determined, and the designated area R designated by the CG part of the user U's hand is detected in the virtual space.

また、本実施形態では、算出部１７が、仮想空間生成部２４により生成された仮想空間の画像上で、検出部１２により検出された指定領域Ｒと重なる仮想空間内のオブジェクト（ＣＧパーツ）を対象に、指定領域Ｒとの重畳度を算出する。その他の処理は、上述の第１実施形態と同様である。 Further, in the present embodiment, the calculation unit 17 creates an object (CG part) in the virtual space that overlaps with the designated area R detected by the detection unit 12 on the image of the virtual space generated by the virtual space generation unit 24. The degree of superimposition with the designated area R is calculated for the target. Other processing is the same as that of the first embodiment described above.

以上のように、本実施形態に係る情報処理装置１００’では、検出部１２が、ユーザＵの手の動きに合わせて仮想空間内で動く手のＣＧパーツ（「操作体」の一例）により指定される指定領域Ｒを検出する。そして、算出部１７が、仮想空間生成部２４により生成された仮想空間の画像上で、検出部１２により検出された指定領域Ｒと重なる仮想空間内のオブジェクト（ＣＧパーツ）を対象に、指定領域Ｒとの重畳度を算出する。そして、選択部１８が、算出部１７により算出された重畳度とオブジェクトの大きさとに基づいて、ユーザＵの視野内に存在する複数のオブジェクトの中から、ターゲットとなるオブジェクトを選択するようにしている。したがって、本実施形態に係る情報処理装置１００’においても、上述の第１実施形態と同様に、ユーザＵの視野内に存在する複数のオブジェクトにオブジェクト内オブジェクトが含まれる場合であっても、ユーザＵの意図したオブジェクトを正しく選択することができるといった効果が得られる。 As described above, in the information processing device 100'according to the present embodiment, the detection unit 12 is designated by the CG parts (an example of the "operation body") of the hand that moves in the virtual space according to the movement of the hand of the user U. The designated area R to be processed is detected. Then, the calculation unit 17 targets the object (CG part) in the virtual space that overlaps with the designated area R detected by the detection unit 12 on the image of the virtual space generated by the virtual space generation unit 24, and sets the designated area. The degree of superimposition with R is calculated. Then, the selection unit 18 selects a target object from a plurality of objects existing in the field of view of the user U based on the superposition degree calculated by the calculation unit 17 and the size of the object. There is. Therefore, in the information processing device 100'according to the present embodiment, as in the first embodiment described above, even if the plurality of objects existing in the field of view of the user U include the in-object object, the user The effect of being able to correctly select the object intended by U can be obtained.

なお、本実施形態では、ユーザＵの動作に応じてユーザＵの視野内で動く操作体として、ユーザＵの手を模したＣＧパーツを想定したが、ユーザＵの手を模したＣＧパーツの代わりに、ユーザＵの足を模したＣＧパーツを操作体として用いてもよい。この場合は、ユーザＵの手の動きを検出する代わりにユーザＵの足の動きを検出し、仮想空間内でユーザＵの足のＣＧパーツにより指定される指定領域Ｒを検出すればよい。 In the present embodiment, a CG part that imitates the hand of the user U is assumed as an operating body that moves in the field of view of the user U according to the movement of the user U, but instead of the CG part that imitates the hand of the user U. In addition, a CG part imitating the foot of the user U may be used as an operating body. In this case, instead of detecting the movement of the user U's hand, the movement of the user U's foot may be detected, and the designated area R designated by the CG part of the user U's foot may be detected in the virtual space.

また、ユーザＵの動作に応じてユーザＵの視野内で動く操作体は、例えば、ユーザＵの手や足の動きに合わせて仮想空間内で動くロボットアームなど、ユーザＵの手や足を模したＣＧパーツ以外の他のＣＧパーツであってもよい。 Further, the operating body that moves in the field of view of the user U according to the movement of the user U imitates the hand or foot of the user U, for example, a robot arm that moves in the virtual space according to the movement of the hand or foot of the user U. It may be a CG part other than the CG part.

また、仮想空間内で操作体となるＣＧパーツを動かすユーザＵの動作は、手や足を動かすことに限らず、例えば、情報処理装置１００’に接続されたコントローラの操作などであってもよい。 Further, the operation of the user U who moves the CG part serving as the operating body in the virtual space is not limited to moving the hands and feet, and may be, for example, the operation of the controller connected to the information processing device 100'. ..

＜補足説明＞
上述の実施形態に係る情報処理装置１００（１００’）の各機能は、例えば、図３に例示したコンピュータシステムを含むハードウェアと、このコンピュータシステム上で実行されるプログラム（ソフトウェア）との協働により実現することができる。コンピュータシステム上で実行されるプログラムは、例えば、インストール可能な形式または実行可能な形式のファイルで、例えばＳＤカードなどのコンピュータで読み取り可能な記録媒体に記録されてコンピュータプログラムプロダクトとして提供される。また、コンピュータシステム上で実行されるプログラムを、インターネットなどのネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また、コンピュータシステム上で実行されるプログラムをインターネットなどのネットワーク経由で提供または配布するように構成してもよい。また、コンピュータ上で実行されるプログラムを、プログラムＲＯＭなどに予め組み込んで提供するように構成してもよい。 <Supplementary explanation>
Each function of the information processing apparatus 100 (100') according to the above-described embodiment is, for example, a collaboration between hardware including the computer system illustrated in FIG. 3 and a program (software) executed on the computer system. Can be realized by. A program executed on a computer system is provided as a computer program product, for example, in an installable or executable format file, recorded on a computer-readable recording medium such as an SD card. Further, the program executed on the computer system may be stored on a computer connected to a network such as the Internet and provided by downloading the program via the network. In addition, a program executed on a computer system may be configured to be provided or distributed via a network such as the Internet. Further, the program executed on the computer may be configured to be provided by incorporating it into a program ROM or the like in advance.

上述のコンピュータシステム上で実行されるプログラムは、例えば、検出部１２、算出部１７、選択部１８などの機能的な構成要素を含むモジュール構成となっており、例えば、プロセッサ１０１が上記記録媒体からプログラムを読み出して実行することにより、上述した各部がメモリ１０２上にロードされ、メモリ１０２上に生成されるようになっている。 The program executed on the computer system described above has, for example, a module configuration including functional components such as a detection unit 12, a calculation unit 17, and a selection unit 18. For example, the processor 101 is used from the recording medium. By reading and executing the program, each part described above is loaded on the memory 102 and generated on the memory 102.

以上、本発明の実施形態を説明したが、ここで説明した実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。ここで説明した新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。ここで説明した実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although the embodiments of the present invention have been described above, the embodiments described here are presented as examples and are not intended to limit the scope of the invention. The novel embodiment described here can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the gist of the invention. The embodiments and modifications thereof described here are included in the scope and gist of the invention, and are also included in the scope of the invention described in the claims and the equivalent scope thereof.

１１撮像部
１２検出部
１３画像保持部
１４画像補間部
１５画像縮尺変更部
１６表示部
１７算出部
１８選択部
１９フィードバック部
２０オブジェクト保持部
２１キャンセル処理部
２２処理実行部
２３動き検出部
２４仮想空間生成部 11 Imaging unit 12 Detection unit 13 Image holding unit 14 Image interpolation unit 15 Image scale changing unit 16 Display unit 17 Calculation unit 18 Selection unit 19 Feedback unit 20 Object holding unit 21 Cancel processing unit 22 Processing execution unit 23 Motion detection unit 24 Virtual space Generator

Claims

A detection unit that detects a designated area designated by an operating body that moves within the user's field of view according to the user's movement,
Of the plurality of objects existing in the user's field of view, including one or more objects included in the other objects, the object overlapping the designated area in the user's field of view. A calculation unit that calculates the degree of superimposition with the designated area,
A selection unit that selects a target object from the plurality of objects based on the degree of superimposition and the size of the object.
Information processing device equipped with.

The information processing device according to claim 1, wherein the selection unit selects an object having the largest size among objects whose superposition degree exceeds a predetermined threshold value.

The information processing apparatus according to claim 1, wherein the selection unit selects an object closest to the user when there are a plurality of objects having the same size and the degree of superposition exceeds a predetermined threshold value.

The information processing apparatus according to any one of claims 1 to 3, wherein the range of the designated area is determined according to the shape or movement of the operating body.

The information processing device according to any one of claims 1 to 4, wherein the calculation unit starts calculating the degree of superimposition in response to a predetermined action by the user.

The information processing device according to claim 5, wherein the predetermined action includes at least one of a gesture, a speech of a voice command, a movement of the line of sight, and a change in facial expression.

The information processing apparatus according to any one of claims 1 to 6, further comprising a cancel processing unit that cancels an object selected by the selection unit in response to a predetermined action by the user.

The information processing device according to claim 7, wherein the predetermined action includes at least one of an operation of repeatedly selecting the same object, a gesture, and an utterance of a voice command.

The information processing apparatus according to any one of claims 1 to 8, further comprising a feedback processing unit that performs feedback processing for causing the user to recognize that an object has been selected by the selection unit.

The feedback process includes at least one of a process of visually recognizing the object selected by the selection unit and a process of causing the user to recognize that the object has been selected by the selection unit by vibration. The information processing apparatus according to claim 9.

An imaging unit that captures the real space in front of the user,
A display unit that displays a real space image captured by the image pickup unit is further provided.
The operating body is the user's hand or foot or tool reflected in the image in the real space.
Each of the plurality of objects is an object reflected in the real space image or a CG (computer graphics) part superimposed on the real space image.
The detection unit detects the designated area by detecting the user's hand or foot or tool reflected in the image in the real space.
The information processing device according to any one of claims 1 to 10, wherein the calculation unit calculates the degree of superimposition of an object that overlaps with the designated area on the image in the real space.

An image holding unit that holds an image of the real space previously captured by the imaging unit,
An image interpolation unit that interpolates the background of the region in which the operating body is reflected in the real space image newly captured by the image capturing unit using the past real space image held by the image holding unit. And, with more
The information processing apparatus according to claim 11, wherein the selection unit calculates the degree of superimposition using an image in the real space in which the background of the region in which the operating body is reflected is interpolated by the image interpolation unit.

The 12th aspect of the present invention, wherein the display unit visualizes and displays the contour of the user's hand or foot in an image of the real space in which the background of the region in which the operation body is reflected is interpolated by the image interpolation unit. Information processing equipment.

Further provided with a display unit for displaying an image of the virtual space including the plurality of objects, each of which is composed of CG parts.
The operating body is a CG part of a hand or foot that moves on an image of the virtual space according to the movement of the user's hand or foot.
The detection unit detects the designated area by detecting the movement of the user's hand or foot.
The information processing device according to any one of claims 1 to 10, wherein the calculation unit calculates the degree of superimposition of an object that overlaps with the designated area on an image of the virtual space.

The invention according to any one of claims 11 to 14, further comprising an image scale changing unit for enlarging or reducing an image excluding the operating body among the images displayed on the display unit in response to a predetermined action by the user. Information processing equipment.

The information processing device according to claim 15, wherein the predetermined action includes at least one of a posture change, an operation of an operation unit, a gesture, a voice command utterance, and a facial expression change.

The information processing apparatus according to any one of claims 1 to 16, further comprising a process execution unit that executes a predetermined process based on an object selected by the selection unit.

The predetermined process is the process of searching an image in which an object similar to the object selected by the selection unit is displayed from an image database according to a search query including an object selected by the selection unit. Information processing device.

An information processing method executed in an information processing device.
A step of detecting a designated area designated by an operating body that moves within the user's field of view according to the user's movement, and
Of the plurality of objects existing in the user's field of view, including one or more objects included in the other objects, the object overlapping the designated area in the user's field of view. The step of calculating the degree of superimposition with the designated area and
A step of selecting a target object from the plurality of objects based on the degree of superposition and the size of the object, and
Information processing methods including.

On the computer
A function to detect a designated area specified by an operating body that moves within the user's field of view according to the user's movement, and
Of the plurality of objects existing in the user's field of view, including one or more objects included in the other objects, the object overlapping the designated area in the user's field of view. A function to calculate the degree of superimposition with the designated area and
A function to select a target object from the plurality of objects based on the degree of superimposition and the size of the object, and
A program to realize.