JP2021068293A

JP2021068293A - Object identification device

Info

Publication number: JP2021068293A
Application number: JP2019194498A
Authority: JP
Inventors: 瀬川　修; Osamu Segawa; 修瀬川
Original assignee: Chubu Electric Power Co Inc
Current assignee: Chubu Electric Power Co Inc
Priority date: 2019-10-25
Filing date: 2019-10-25
Publication date: 2021-04-30
Anticipated expiration: 2039-10-25
Also published as: JP7345355B2

Abstract

To provide a technique capable of easily and individually identifying a plurality of objects (including a plurality of objects having the same shape) arranged in a target area.SOLUTION: Imaging means 10 outputs imaging information indicating an imaging screen that captures an imaging area including a target area and a plurality of objects (including a plurality of objects having the same shape) arranged in the target area. Target area determination means 23 determines the target area based on the imaging information. Object detection means 24 detects categories and positions of the plurality of objects arranged in the determined target area. Object arrangement information generation means 25 generates identification information of the plurality of objects and object arrangement information indicating their positions based on the categories and positions of the plurality of objects arranged in the target area detected by the object detection means 24 and stores the generated information in storage means 30. Object identification means 27 individually identifies the plurality of objects arranged in the target area based on the categories and positions of the plurality of objects arranged in the target area detected by the object detection means 24 and the object arrangement information stored in the storage means 30.SELECTED DRAWING: Figure 1

Description

本発明は、撮像情報に基づいて物体を識別する物体識別装置、特に、対象領域に配置されている複数の物体（同一形状の複数の物体を含む）を個体識別する物体識別装置に関する。 The present invention relates to an object identification device that identifies an object based on imaging information, particularly an object identification device that individually identifies a plurality of objects (including a plurality of objects having the same shape) arranged in a target area.

ディープラーニング（多層ニューラルネットワークによる機械学習手法）を用いて物体を検出する技術が研究されている。例えば、撮像情報に基づいて、対象物体の位置とカテゴリを同時に検出するSSD(Single Shot MultiBox Detector)やYOLO(You Only Look Once)といったエンド・ツー・エンド(end-to-end)の手法が多数提案されている。これらの手法は、物体の位置検出のための多層ニューラルネットワークによる学習と、物体のカテゴリ判別のための多層ニューラルネットワークによる学習を同時に行うマルチタスク学習を基本としている。
SSDによる物体検出技術は、例えば、非特許文献１に開示され、YOLOによる物体検出技術は、例えば、非特許文献２に開示されている。 Technology for detecting objects using deep learning (machine learning method using multi-layer neural networks) is being researched. For example, there are many end-to-end methods such as SSD (Single Shot MultiBox Detector) and YOLO (You Only Look Once) that detect the position and category of the target object at the same time based on the imaging information. Proposed. These methods are based on multi-task learning in which learning by a multi-layer neural network for detecting the position of an object and learning by a multi-layer neural network for discriminating an object category are performed at the same time.
The object detection technology by SSD is disclosed in Non-Patent Document 1, for example, and the object detection technology by YOLO is disclosed in Non-Patent Document 2, for example.

“SSD: Single Shot MulitiBox Detector”, Wei Liu, Dragomir Anguelov. Domitru Erhan, Christian Szegedy, Scott reed, Cheng-Yang Fu and Alexsander C. berg (2015), https://arxiv.org/pdf/1512.02325.pdf“SSD: Single Shot MulitiBox Detector”, Wei Liu, Dragomir Anguelov. Domitru Erhan, Christian Szegedy, Scott reed, Cheng-Yang Fu and Alexsander C. berg (2015), https://arxiv.org/pdf/1512.02325.pdf “You Only Look Once Unified, Real-Time Object Detection”, Joseph Redmon, Santosh Divvala, Ross Girshick and Ali Farhadi (2016), https://pjreddie,com/media/files/papers/yolo.pdf“You Only Look Once Unified, Real-Time Object Detection”, Joseph Redmon, Santosh Divvala, Ross Girshick and Ali Farhadi (2016), https://pjreddie,com/media/files/papers/yolo.pdf

非特許文献１および非特許文献２に開示されている物体検出技術では、対象領域に配置されている複数の物体の位置やカテゴリを検出することはできるが、対象領域に配置されている複数の物体、特に、同一形状（「略同一形状」を含む）の複数の物体を個体識別することができない。
このため、従来では、対象領域に配置されている複数の物体の配置状態を示す配置情報（複数の物体の位置とＩＤ）を２次元マップとして作成（登録）しておき、撮像手段から出力される撮像情報に基づいて検出した物体検出情報（撮像領域における対象領域に配置されている複数の物体の位置およびカテゴリ）と２次元マップで示される配置情報（複数の物体の位置とＩＤ）とを照合することによって、複数の物体を個体識別していた。
しかしながら、物体を登録する毎に２次元マップを手作業で作成する必要があり、２次元マップの作成に労力と時間を要する。
本発明は、このような点に鑑みて創案されたものであり、対象領域に配置されている、複数の物体（同一形状の複数の物体を含む）を容易に個体識別することができる技術を提供することを目的とする。 The object detection technology disclosed in Non-Patent Document 1 and Non-Patent Document 2 can detect the positions and categories of a plurality of objects arranged in the target area, but a plurality of objects arranged in the target area. It is not possible to identify an object, particularly a plurality of objects having the same shape (including "substantially the same shape").
Therefore, conventionally, arrangement information (positions and IDs of a plurality of objects) indicating the arrangement state of a plurality of objects arranged in the target area is created (registered) as a two-dimensional map and output from the imaging means. The object detection information (positions and categories of multiple objects arranged in the target area in the imaging area) detected based on the imaging information and the arrangement information (positions and IDs of multiple objects) shown in the two-dimensional map are displayed. By collating, a plurality of objects were individually identified.
However, it is necessary to manually create a two-dimensional map every time an object is registered, and it takes labor and time to create the two-dimensional map.
The present invention has been devised in view of these points, and is a technique capable of easily identifying a plurality of objects (including a plurality of objects having the same shape) arranged in a target area. The purpose is to provide.

本発明の物体識別装置は、所定の対象領域に配置されている複数の物体、特に、同一形状（「略同一形状」を含む）の複数の物体を含む複数の物体を個体識別するために用いられる。
本発明は、撮像手段、対象領域判別手段、物体検出手段、物体配置情報生成手段、物体識別手段および記憶手段を備えている。
撮像手段は、対象領域および対象領域に配置されている複数の物体（同一形状の複数の物体を含む）を含む撮像領域を撮像した撮像画面を示す撮像情報を出力する。撮像手段としては、好適には、ＣＣＤまたはＣＭＯＳを用いたデジタルカメラが用いられる。
対象領域判別手段は、撮像手段からの撮像情報に基づいて、撮像画面における対象領域を判別する。
物体検出手段は、撮像手段からの撮像情報と、対象領域判別手段により判別した、撮像画面における対象領域とに基づいて、撮像画面における対象領域に配置されている複数の物体のカテゴリおよび位置を検出する。好適には、物体検出手段は、対象領域に配置されている複数の物体のカテゴリおよび対象領域における位置を示す物体検出情報を出力する。対象領域判別手段による対象領域を判別する手法や物体検出手段による物体のカテゴリおよび位置の検出手法としては、公知の種々の手法を用いることができる。例えば、SSDやYOLO等のディープラーニングによる検出手法を用いることができる。
物体配置情報生成手段は、物体検出手段により検出した、撮像画面における対象領域に配置されている複数の物体のカテゴリおよび位置（物体検出情報）に基づいて、対象領域における複数の物体の配置状態を示す物体配置情報を生成して記憶手段に記憶する。物体配置情報には、物体を識別する識別情報（ＩＤ）と、対象領域における物体の位置を示す位置情報が含まれている。識別情報（ＩＤ）は、適宜の方法で各物体に付与される。好適には、物体のカテゴリを示すカテゴリ情報を含む識別情報（ＩＤ）が用いられる。
物体識別手段は、物体検出手段から出力される物体検出情報（撮像画面における対象領域に配置されている複数の物体のカテゴリおよび位置）と、記憶手段に記憶されている物体配置情報（対象領域に配置されている複数の物体の識別情報と位置）とに基づいて、撮像画面における対象領域に配置されている複数の物体を個体識別する。例えば、撮像画面における対象領域に配置されている複数の物体と、物体配置情報で示される複数の物体との対応関係を判別する。
そして、本発明は、物体配置情報生成モードと物体識別モードに設定可能に構成されている。物体配置情報生成モードに設定されている時は、対象領域判別手段、物体検出手段および物体配置情報生成手段により物体配置情報を生成して記憶手段に記憶する。物体識別モードに設定されている時は、対象領域判別手段、物体検出手段および物体識別手段により、撮像画面における対象領域に配置されている複数の物体を個体識別する。物体配置情報生成モードあるいは物体識別モードに設定可能に構成する方法としては、種々の方法を用いることができる。例えば、入力手段等から物体配置情報生成モード設定情報が入力されることによって物体配置情報生成モードが設定され、入力手段等から物体識別モード設定情報が入力されることによって物体識別モードに設定されるように構成することができる。あるいは、入力手段等から物体識別開始情報が入力されることによって、先ず、物体配置情報生成モードが設定され、その後、物体識別モードに設定されるように構成することもできる。
本発明では、物体配置情報生成モードにおいて、対象領域における複数の物体の配置状態を示す物体配置情報を生成し、物体識別モードにおいて、物体配置情報を参照して対象領域に配置されている複数の物体を個体識別しているため、対象領域に配置されている複数の物体を容易に個体識別することができる。
本発明の異なる形態では、画角変化量判別手段を備えている。画角変化量判別手段は、記憶手段に記憶されている物体配置情報を生成する際に用いた撮像画面（Ｍ）と、撮像手段から出力された撮像情報で示される撮像画面（Ｐ）との間の画角変化量が所定範囲を超えているか否かを判別する。撮像画面（Ｍ）と撮像画面（Ｐ）との間の画角変化量が所定範囲を超えているか否かを判別する手法としては、好適には、撮像画面（Ｍ）と撮像画面（Ｐ）の類似度が低いか否かを判別する手法を用いることができる。例えば、撮像画面（Ｍ）と撮像画面（Ｐ）の特徴点を検出し、両画面の特徴点のうち、対応関係が一致する特徴点を含む特徴ベクトルの類似度（例えば、両ベクトルのコサイン距離）が所定値以上である場合には画角変化量が所定範囲を超えていないことを判別し、所定値未満である場合には画角変化量が所定範囲を超えていることを判別する。勿論、画角変化量が所定範囲を超えているか否かを判別する手法としては、これ以外の手法を用いることもできる。なお、撮像画面（Ｍ）における対象領域（Ｍ）と撮像画面（Ｐ）における対象領域（Ｐ）との間の画角変化量は、撮像画面（Ｍ）と撮像画面（Ｐ）との間の画角変化量と等価である。
そして、物体識別モードに設定されている時に、画角変化量が所定範囲を超えていることが画角変化量判別手段によって判別されると、対象領域判別手段、物体検出手段および物体配置情報生成手段により物体配置情報を生成して記憶手段に記憶するように構成されている。
撮像手段の撮影位置や撮像角度（撮像方向）を変更した場合には、撮像画面における複数の物体の配置状態と、記憶手段に記憶されている物体配置情報を生成する際に用いた撮像画面における複数の物体の配置状態とのずれ（特に、撮像画面における位置）が大きくなり、物体識別手段による、物体検出情報と物体配置情報との照合に基づく複数の物体の個体識別精度が低下するおそれがある。本形態では、記憶手段に記憶されている物体配置情報を生成する際に用いた撮像画面と、撮像手段から出力された撮像情報で示される撮像画面との間の画角変化量が所定範囲を超えた場合には、物体配置情報を再生成して記憶手段に記憶するため、撮像手段の撮影位置や撮像角度（撮像方向）が変更された場合でも、対象領域に配置されている複数の物体を確実に個体識別することができる。
本発明の異なる形態では、物体識別手段は、物体検出手段により検出した、撮像画面における対象領域に配置されている複数の物体のカテゴリおよび位置と、記憶手段に記憶されている物体配置情報で示される複数の物体の識別情報および位置を、所定の誤差範囲を許容しながら照合することによって、複数の物体を個体識別する。物体の位置を照合する方法としては、好適には、バウンディングボックス（物体検出手段により推定される物体の存在領域候補）の四隅の点の座標、あるいは、バウンディングボックスの幅と高さおよび重心位置の座標を照合する方法が用いられる。
本形態では、簡単に複数の物体の個体識別を行うことができる。
本発明の異なる形態では、物体検出手段は、複数の物体のカテゴリおよび対象領域における相対位置を検出し、物体配置情報生成手段は、複数の物体の識別情報および対象領域における相対位置を示す物体配置情報を生成する。相対位置としては、例えば、撮像画面上に設定された座標系における座標が用いられる。
物体識別手段は、物体の相対位置を用いて、物体検出情報と物体配置情報とを照合する。
本形態では、物体検出情報と物体配置情報との照合処理を容易に行うことができる。
本発明の異なる形態では、対象領域判別手段および物体検出手段は、多層ニューラルネットワークで構成されている。
本形態では、対象領域の判別処理や対象領域に配置されている複数の物体のカテゴリおよび位置の検出処理を簡単に行うことができる。 The object identification device of the present invention is used to identify a plurality of objects arranged in a predetermined target area, particularly a plurality of objects including a plurality of objects having the same shape (including "substantially the same shape"). Be done.
The present invention includes an image pickup means, a target area determination means, an object detection means, an object arrangement information generation means, an object identification means, and a storage means.
The imaging means outputs imaging information indicating an imaging screen that captures an imaging region including a target region and a plurality of objects (including a plurality of objects having the same shape) arranged in the target region. As the imaging means, a digital camera using a CCD or CMOS is preferably used.
The target area determining means discriminates the target area on the imaging screen based on the imaging information from the imaging means.
The object detection means detects the categories and positions of a plurality of objects arranged in the target area on the image pickup screen based on the image pickup information from the image pickup means and the target area on the image pickup screen determined by the target area determination means. To do. Preferably, the object detection means outputs object detection information indicating a category of a plurality of objects arranged in the target area and a position in the target area. Various known methods can be used as a method for discriminating the target area by the target area discriminating means and a method for detecting the category and position of the object by the object detecting means. For example, a detection method by deep learning such as SSD or YOLO can be used.
The object arrangement information generation means determines the arrangement state of a plurality of objects in the target area based on the categories and positions (object detection information) of the plurality of objects arranged in the target area on the imaging screen detected by the object detection means. The object arrangement information to be shown is generated and stored in the storage means. The object placement information includes identification information (ID) for identifying the object and position information indicating the position of the object in the target area. The identification information (ID) is given to each object by an appropriate method. Preferably, identification information (ID) including categorical information indicating the category of the object is used.
The object identification means includes object detection information output from the object detection means (categories and positions of a plurality of objects arranged in the target area on the imaging screen) and object arrangement information stored in the storage means (in the target area). Based on the identification information and position of the plurality of arranged objects), the plurality of objects arranged in the target area on the imaging screen are individually identified. For example, the correspondence between a plurality of objects arranged in the target area on the imaging screen and a plurality of objects indicated by the object arrangement information is determined.
The present invention is configured to be configurable in the object arrangement information generation mode and the object identification mode. When the object arrangement information generation mode is set, the object arrangement information is generated by the target area determination means, the object detection means, and the object arrangement information generation means and stored in the storage means. When the object identification mode is set, a plurality of objects arranged in the target area on the imaging screen are individually identified by the target area discrimination means, the object detection means, and the object identification means. Various methods can be used as a method for configuring the object arrangement information generation mode or the object identification mode so as to be settable. For example, the object arrangement information generation mode is set by inputting the object arrangement information generation mode setting information from the input means or the like, and the object identification mode is set by inputting the object identification mode setting information from the input means or the like. It can be configured as follows. Alternatively, by inputting the object identification start information from the input means or the like, the object arrangement information generation mode can be set first, and then the object identification mode can be set.
In the present invention, in the object arrangement information generation mode, object arrangement information indicating the arrangement state of a plurality of objects in the target area is generated, and in the object identification mode, a plurality of objects arranged in the target area with reference to the object arrangement information. Since the objects are individually identified, it is possible to easily identify a plurality of objects arranged in the target area.
Different embodiments of the present invention include means for determining the amount of change in the angle of view. The angle of view change amount discriminating means includes an imaging screen (M) used when generating object arrangement information stored in the storage means and an imaging screen (P) indicated by imaging information output from the imaging means. It is determined whether or not the amount of change in the angle of view between them exceeds a predetermined range. As a method for determining whether or not the amount of change in the angle of view between the imaging screen (M) and the imaging screen (P) exceeds a predetermined range, the imaging screen (M) and the imaging screen (P) are preferably used. It is possible to use a method of determining whether or not the similarity of is low. For example, the feature points of the imaging screen (M) and the imaging screen (P) are detected, and among the feature points of both screens, the similarity of the feature vectors including the feature points having the same correspondence (for example, the cosine distance of both vectors). ) Is greater than or equal to the predetermined value, it is determined that the amount of change in the angle of view does not exceed the predetermined range, and if it is less than the predetermined value, it is determined that the amount of change in the angle of view exceeds the predetermined range. Of course, other methods can be used as a method for determining whether or not the amount of change in the angle of view exceeds a predetermined range. The amount of change in the angle of view between the target area (M) on the image pickup screen (M) and the target area (P) on the image pickup screen (P) is between the image pickup screen (M) and the image pickup screen (P). It is equivalent to the amount of change in the angle of view.
Then, when it is determined by the angle of view change amount discriminating means that the angle of view change amount exceeds a predetermined range when the object identification mode is set, the target area discriminating means, the object detecting means, and the object arrangement information generation The object arrangement information is generated by the means and stored in the storage means.
When the imaging position or imaging angle (imaging direction) of the imaging means is changed, the arrangement state of a plurality of objects on the imaging screen and the imaging screen used when generating the object arrangement information stored in the storage means are displayed. The deviation from the arrangement state of a plurality of objects (particularly, the position on the imaging screen) becomes large, and the individual identification accuracy of a plurality of objects based on the collation of the object detection information and the object arrangement information by the object identification means may decrease. is there. In this embodiment, the amount of change in the angle of view between the image pickup screen used when generating the object arrangement information stored in the storage means and the image pickup screen indicated by the image pickup information output from the image pickup means is within a predetermined range. When the amount exceeds the limit, the object arrangement information is regenerated and stored in the storage means. Therefore, even if the imaging position or the imaging angle (imaging direction) of the imaging means is changed, a plurality of objects arranged in the target area. Can be reliably identified as an individual.
In a different embodiment of the present invention, the object identification means is indicated by the categories and positions of a plurality of objects arranged in the target area on the imaging screen detected by the object detecting means, and the object arrangement information stored in the storage means. A plurality of objects are individually identified by collating the identification information and positions of the plurality of objects while allowing a predetermined error range. As a method of collating the position of the object, preferably, the coordinates of the points at the four corners of the bounding box (candidate for the existence area of the object estimated by the object detecting means), or the width and height of the bounding box and the position of the center of gravity. A method of collating coordinates is used.
In this embodiment, individual identification of a plurality of objects can be easily performed.
In different embodiments of the present invention, the object detecting means detects the categories of a plurality of objects and their relative positions in the target area, and the object placement information generating means indicates the identification information of the plurality of objects and the object placement indicating the relative positions in the target area. Generate information. As the relative position, for example, the coordinates in the coordinate system set on the imaging screen are used.
The object identification means collates the object detection information with the object arrangement information by using the relative position of the object.
In this embodiment, the collation process of the object detection information and the object arrangement information can be easily performed.
In a different embodiment of the present invention, the target area discriminating means and the object detecting means are configured by a multi-layer neural network.
In this embodiment, it is possible to easily perform the determination process of the target area and the detection process of the categories and positions of a plurality of objects arranged in the target area.

本発明では、対象領域に配置されている複数の物体を容易に個体識別することができる。 In the present invention, a plurality of objects arranged in the target area can be easily identified as individuals.

本発明の物体識別装置の一実施形態のブロック図である。It is a block diagram of one Embodiment of the object identification apparatus of this invention. SSDのニューラルネットワーク構成を示す図であるIt is a figure which shows the neural network configuration of SSD. 撮像画像の一例を示す図である。It is a figure which shows an example of the captured image. 図２に示されている撮像画像における対象領域を示す図である。It is a figure which shows the target area in the captured image shown in FIG. 図２に示されている撮像画像に基づいた物体検出結果の一例を示す図である。It is a figure which shows an example of the object detection result based on the captured image shown in FIG. 図２に示されている撮像画像に基づいた物体検出情報の一例を示す図である。It is a figure which shows an example of the object detection information based on the captured image shown in FIG. 図５に示されている物体検出情報に基づいた物体配置情報の一例を示す図である。It is a figure which shows an example of the object arrangement information based on the object detection information shown in FIG. 物体検出情報と物体配置情報に基づいて複数の物体を個体識別する動作を説明する図である。It is a figure explaining the operation which identifies a plurality of objects individually based on the object detection information and the object arrangement information. 物体配置情報生成モードにおける動作の一実施形態を説明するフローチャートである。It is a flowchart explaining one Embodiment of the operation in the object arrangement information generation mode. 物体識別モードにおける動作の一実施形態を説明するフローチャートである。It is a flowchart explaining one Embodiment of operation in an object identification mode. 撮像画像の特徴ベクトルによる画角変化量判別方法を示す図である。It is a figure which shows the method of discriminating the amount of change of the angle of view by the feature vector of the captured image. 物体識別モードにおける動作の他の実施形態を説明するフローチャートである。It is a flowchart explaining another embodiment of operation in an object identification mode.

以下に、本発明の実施形態を、図面を参照して説明する。
図１は、本発明の物体識別装置の一実施形態のブロック図である。
本実施形態の物体識別装置は、撮像手段１０、処理手段２０、記憶手段３０、入力手段４０および表示手段５０等を有している。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram of an embodiment of the object identification device of the present invention.
The object identification device of the present embodiment includes an imaging means 10, a processing means 20, a storage means 30, an input means 40, a display means 50, and the like.

撮像手段１０は、好適には、ＣＣＤまたはＣＭＯＳを用いたデジタルカメラにより構成される。撮像手段１０は、対象領域と、対象領域に配置されている複数の物体を含む撮像領域を撮像可能に配置される。なお、以下の実施形態の説明では、複数の物体には、同一形状の複数の物体が少なくとも一種類含まれているものとする。
後述する物体検出処理では、「物体の形状」に基づいて物体のカテゴリ（種類）が判別される。「同一形状」という記載は、物体検出処理によって同じカテゴリの物体であると判別される「略同一形状」を含むものとして用いられている。 The imaging means 10 is preferably configured by a digital camera using a CCD or CMOS. The imaging means 10 is arranged so that the target region and the imaging region including a plurality of objects arranged in the target region can be imaged. In the following description of the embodiment, it is assumed that the plurality of objects include at least one type of a plurality of objects having the same shape.
In the object detection process described later, the category (type) of the object is determined based on the "shape of the object". The description "same shape" is used to include "substantially the same shape" which is determined to be an object of the same category by the object detection process.

処理手段２０は、ＣＰＵ等により構成される。処理手段２０は、管理手段２１、撮像情報入力手段２２、対象領域判別手段２３、物体検出手段２４、物体配置情報生成手段２５、画角変化量判別手段２６、物体識別手段２７を有している。本実施形態では、各手段２１〜２７は、処理手段２０を構成するＣＰＵに設けられている。勿論、各手段２１〜２７の少なくとも１つを、処理手段２０を構成するＣＰＵと異なるＣＰＵで構成することもできる。 The processing means 20 is composed of a CPU or the like. The processing means 20 includes a management means 21, an imaging information input means 22, a target area discriminating means 23, an object detecting means 24, an object arrangement information generating means 25, an angle of view change amount discriminating means 26, and an object discriminating means 27. .. In the present embodiment, the means 21 to 27 are provided in the CPU constituting the processing means 20. Of course, at least one of the means 21 to 27 may be configured by a CPU different from the CPU constituting the processing means 20.

各手段２１〜２７の処理を、図２〜図８を参照しながら説明する。
以下では、図３に示されているように、操作盤１００の操作パネル１１０に、同一形状の３個のボリューム１２０ａ〜１２０ｃ、同一形状の２個のダイヤル１３０ａ、１３０ｂ、同一形状の１１個のボタン１４０ａ〜１４０ｋが配置されている場合について説明する。
操作パネル１１０は、上側縁部１１１ａ、左側縁部１１１ｂ、下側縁部１１１ｃおよび右側縁部１１１ｄにより四角形状に形成されている縁部１１１により囲まれている。
なお、１１は、操作パネル１１０とボリューム１２０ａ〜１２０ｃ、ダイヤル１３０ａ、１３０ｂおよびボタン１４０ａ〜１４０ｋを含む撮像領域を撮像手段１０で撮像した撮像画面を示している。
操作パネル１１０が、本発明の「対象領域」に対応し、操作パネル１１０の縁部１１１（上側縁部１１１ａ〜右側縁部１１１ｄ）が、本発明の「対象領域縁部（対象領域上側縁部〜対象領域右側縁部）」に対応し、ボリューム１２０ａ〜１２０ｃ、ダイヤル１３０ａ、１３０ｂおよびボタン１４０ａ〜１４０ｋが、本発明の「対象領域に配置されている複数の物体」に対応し、ボリューム１２０ａ〜１２０ｃあるいはダイヤル１３０ａ、１３０ｂあるいはボタン１４０ａ〜１４０ｋが、本発明の「同一形状の複数の物体」に対応する。また、ボリューム、ダイヤルおよびボタンが、本発明の「物体のカテゴリ」に対応する。 The processing of each means 21 to 27 will be described with reference to FIGS. 2 to 8.
In the following, as shown in FIG. 3, on the operation panel 110 of the operation panel 100, three volumes 120a to 120c having the same shape, two dials 130a and 130b having the same shape, and 11 eleven having the same shape. A case where the buttons 140a to 140k are arranged will be described.
The operation panel 110 is surrounded by an edge portion 111 formed in a quadrangular shape by an upper edge portion 111a, a left side edge portion 111b, a lower side edge portion 111c, and a right side edge portion 111d.
Reference numeral 11 denotes an imaging screen in which the imaging region including the operation panel 110, volumes 120a to 120c, dials 130a and 130b, and buttons 140a to 140k is imaged by the imaging means 10.
The operation panel 110 corresponds to the "target area" of the present invention, and the edge portion 111 (upper edge portion 111a to right side edge portion 111d) of the operation panel 110 corresponds to the "target area edge portion (target area upper edge portion)" of the present invention. ~ Right edge of target area) ”, and the volumes 120a to 120c, dials 130a, 130b and buttons 140a to 140k correspond to“ a plurality of objects arranged in the target area ”of the present invention, and the volumes 120a to The 120c or dials 130a, 130b or buttons 140a-140k correspond to the "plurality of objects of the same shape" of the present invention. Also, volumes, dials and buttons correspond to the "object categories" of the present invention.

管理手段２１は、各手段２２〜２７の処理を管理する。
撮像情報入力手段２２は、撮像手段１０からの撮像情報を入力する。 The management means 21 manages the processes of the means 22 to 27.
The image pickup information input means 22 inputs the image pickup information from the image pickup means 10.

対象領域判別手段２３は、撮像情報入力手段２２により入力した、撮像手段１０からの撮像情報に基づいて、対象領域を判別する。 The target area determination means 23 determines the target area based on the image pickup information from the image pickup means 10 input by the image pickup information input means 22.

対象領域判別手段２３による対象領域を判別する手法としては、種々の手法を用いることができる。例えば、非特許文献１に開示されているSSDを用いることができる。
SSDは、図２に示されているように、多層のCNN(Convolutional neural network)（畳み込みニューラルネットワーク）を基本とし、物体の存在領域候補を推定するレイヤと、存在領域候補内の物体のカテゴリを判別するレイヤとにより構成される。物体の存在領域候補を推定するレイヤでは、画像情報を、複数の所定サイズの矩形領域（デフォルトボックス）に分割し、矩形領域のずれを考慮しながら物体の存在領域候補（バウンディングボックス）を推定する。存在領域候補内の物体のカテゴリを判別するレイヤでは、別途学習済のCNNを用いて当該存在領域候補内の物体のカテゴリを判別する。
対象領域判別手段２３に用いられるSSDは、複数の物体が配置されている対象領域の存在領域候補を推定するレイヤと、存在領域候補内の対象領域を判別するレイヤとにより構成される。SSDにより構成される対象領域判別手段２３は、撮像領域を撮像した撮像画面における対象領域とその位置を判別する。
本実施形態では、図４に示されているように、縁部１１１により囲まれている操作パネル１１０が対象領域１１０として判別される。 As a method for discriminating the target area by the target area discriminating means 23, various methods can be used. For example, the SSD disclosed in Non-Patent Document 1 can be used.
As shown in FIG. 2, the SSD is based on a multi-layered CNN (Convolutional neural network), and has a layer for estimating the existence area candidate of an object and a category of the object in the existence area candidate. It is composed of a layer to be discriminated. In the layer for estimating the existence area candidate of the object, the image information is divided into a plurality of rectangular areas (default box) of a predetermined size, and the existence area candidate (bounding box) of the object is estimated while considering the deviation of the rectangular area. .. In the layer for determining the category of the object in the existing area candidate, the category of the object in the existing area candidate is determined by using the separately learned CNN.
The SSD used in the target area determining means 23 is composed of a layer for estimating an existing area candidate of a target area in which a plurality of objects are arranged, and a layer for discriminating a target area in the existing area candidate. The target area determination means 23 configured by the SSD determines the target area and its position on the imaging screen in which the imaging area is imaged.
In the present embodiment, as shown in FIG. 4, the operation panel 110 surrounded by the edge portion 111 is determined as the target area 110.

物体検出手段２４は、撮像情報入力手段２２により入力した、撮像手段１０からの撮像情報と、対象領域判別手段２３により判別した対象領域に基づいて、撮像画面における対象領域に配置（表示）されている複数の物体のカテゴリと位置を検出する。
物体検出手段２４による、対象領域に配置されている複数の物体のカテゴリと位置を検出する手法としては、種々の手法を用いることができる。例えは、対象領域判別手段２３と同様に、図２に示されているSSDを用いることができる。
物体検出手段２４に用いられるSSDは、対象領域における、複数の物体それぞれの存在領域候補（バウンディングボックス）を推定するレイヤと、各存在領域候補内の物体のカテゴリを検出するレイヤとにより構成される。
本実施形態では、図５に破線で示されているバウンディングボックスと、各バウンディングボックス内のボリューム１２０ａ〜１２０ｃ、ダイヤル１３０ａ、１３０ｂおよびボタン１４０ａ〜１４０ｋが検出される。 The object detection means 24 is arranged (displayed) in the target area on the image pickup screen based on the image pickup information from the image pickup means 10 input by the image pickup information input means 22 and the target area determined by the target area determination means 23. Detect the categories and positions of multiple objects.
As a method for detecting the category and position of a plurality of objects arranged in the target area by the object detection means 24, various methods can be used. For example, the SSD shown in FIG. 2 can be used as in the target area determination means 23.
The SSD used in the object detection means 24 is composed of a layer for estimating the existence area candidate (bounding box) of each of a plurality of objects in the target area and a layer for detecting the category of the object in each existence area candidate. ..
In the present embodiment, the bounding box shown by the broken line in FIG. 5, the volumes 120a to 120c, the dials 130a and 130b, and the buttons 140a to 140k in each bounding box are detected.

そして、物体検出手段２４は、対象領域に配置されている複数の物体のカテゴリと位置を示す物体検出情報を出力する。物体の位置としては、好適には、バウンディングボックスの四隅の点の座標、あるいは、バウンディングボックスの幅と高さおよび重心位置の座標が用いられる。座標としては、好適には、対象領域に設定されたｘｙ座標系におけるｘｙ座標が用いられる。「対象領域に設定されたｘｙ座標系におけるｘｙ座標」が、本発明の「対象領域における相対位置」に対応する。
本実施形態では、図６に示されているように、破線で示されているバウンディングボックスの位置に、３個のボリュームＶ、２個のダイヤルＤおよび１１個のスイッチＳが配置されていることを示す物体検出情報が出力される。 Then, the object detection means 24 outputs object detection information indicating the categories and positions of a plurality of objects arranged in the target area. As the position of the object, preferably, the coordinates of the points at the four corners of the bounding box or the coordinates of the width and height of the bounding box and the position of the center of gravity are used. As the coordinates, preferably, the xy coordinates in the xy coordinate system set in the target area are used. The "xy coordinates in the xy coordinate system set in the target area" correspond to the "relative position in the target area" of the present invention.
In this embodiment, as shown in FIG. 6, three volume Vs, two dials D, and eleven switches S are arranged at the positions of the bounding boxes shown by the broken lines. Object detection information indicating is output.

物体配置情報生成手段２５は、物体検出手段２４から出力される物体検出情報（対象領域に配置されている複数の物体のカテゴリと位置）に基づいて物体配置情報を作成し、記憶手段３０に記憶する。物体配置情報には、各物体を識別する識別情報（ＩＤ）と位置を示す位置情報が含まれている。各物体に識別情報（ＩＤ）を付与する方法は、適宜設定可能である。識別情報（ＩＤ）には、物体のカテゴリを示すカテゴリ情報が含まれる。好適には、同一形状の物体には、物体の種類を示すカテゴリ情報（共通情報）と番号（個別情報）を含む識別情報が付与される。番号は、例えば、物体の重心位置の分散が大きい方向に沿って順に大きくなるように付与される。
物体配置情報で示される物体の位置としては、好適には、対象領域における位置、より好適には、対象領域における相対位置が用いられる。
図７では、操作パネル１１０の上側縁部１１１ａと左側縁部１１１ｂ（対象領域上側縁部と対象領域左側縁部）をそれぞれｘ軸とｙ軸としたｘｙ座標系が設定され、ｘｙ座標系におけるｘｙ座標が相対位置として用いられる。 The object arrangement information generation means 25 creates object arrangement information based on the object detection information (categories and positions of a plurality of objects arranged in the target area) output from the object detection means 24, and stores the object arrangement information in the storage means 30. To do. The object arrangement information includes identification information (ID) for identifying each object and position information indicating a position. The method of assigning the identification information (ID) to each object can be appropriately set. The identification information (ID) includes category information indicating the category of the object. Preferably, identification information including category information (common information) and number (individual information) indicating the type of the object is given to the objects having the same shape. The numbers are assigned so that, for example, the variance of the position of the center of gravity of the object increases in order from the larger direction.
As the position of the object indicated by the object arrangement information, a position in the target area is preferably used, and more preferably, a relative position in the target area is used.
In FIG. 7, an xy coordinate system is set in which the upper edge portion 111a and the left side edge portion 111b (the upper edge portion of the target area and the left edge portion of the target area) of the operation panel 110 are the x-axis and the y-axis, respectively, in the xy coordinate system. The xy coordinates are used as the relative position.

物体配置情報生成手段２５は、例えば、図７に示されているように、３個のボリューム［Ｖ］に対して、ｘ軸に沿って左側から順に識別情報［Ｖ１］〜［Ｖ３］を付与し、２個のダイヤル［Ｄ］に対して、ｘ軸に沿って左側から順に識別情報［Ｄ１］、［Ｄ２］を付与し、１１個のボタン［Ｓ］のうち上方の７個のボタン［Ｓ］に対して、ｘ軸に沿って左側から順に識別情報［Ｓ１］〜［Ｓ７］を付与し、右側の４個のボタン［Ｓ］に対して、上方から順に識別情報［Ｓ８］〜［Ｓ１１］を付与する。
なお、図７では、ボリューム［Ｖ１］〜［Ｖ３］、ダイヤル［Ｄ１］、［Ｄ２］、ボタン［Ｓ１］〜［Ｓ１１］の位置は、それぞれのバウンディングボックスの幅と高さおよび重心である点Ｇで示されている。点Ｇの位置は、ｘｙ座標系におけるｘｙ座標で表される。
「重心の位置を、撮像画面あるいは撮像画面の対象領域に設定されたｘｙ座標系におけるｘｙ座標で表す」構成は、本発明の「物体の位置を相対位置で表す」構成に含まれる。 For example, as shown in FIG. 7, the object arrangement information generating means 25 imparts identification information [V1] to [V3] to the three volumes [V] in order from the left side along the x-axis. Then, identification information [D1] and [D2] are given to the two dials [D] in order from the left side along the x-axis, and the upper seven buttons [S] out of the eleven buttons [S] [ Identification information [S1] to [S7] are given to S] in order from the left side along the x-axis, and identification information [S8] to [S7] are given to the four buttons [S] on the right side in order from the top. S11] is given.
In FIG. 7, the positions of the volumes [V1] to [V3], the dials [D1], [D2], and the buttons [S1] to [S11] are the width, height, and center of gravity of the respective bounding boxes. It is indicated by G. The position of the point G is represented by the xy coordinates in the xy coordinate system.
The configuration of "representing the position of the center of gravity by the xy coordinates in the imaging screen or the target area of the imaging screen in the xy coordinate system" is included in the configuration of "representing the position of the object by the relative position" of the present invention.

なお、画角変化量判別手段２６については、後述する。 The angle of view change amount determining means 26 will be described later.

物体識別手段２７は、物体検出手段２４から出力される物体検出情報（対象領域に配置されている複数の物体のカテゴリと位置）と記憶手段３０に記憶されている物体配置情報（各物体の識別情報と位置）とに基づいて、対象領域に配置されている複数の物体を個体識別する。
物体識別手段２７による物体識別方法としては、種々の方法を用いることができる。例えば、物体検出情報および物体配置情報から、同一形状の物体の位置を抽出する。そして、物体検出情報から抽出した位置と物体配置情報から抽出した位置を照合し、両位置が一致するか否かを判断する。この時、両位置が一致するか否かの判断は、所定の誤差範囲を許容しながら行う。すなわち、両位置の差が所定の誤差範囲内である場合には、両位置が一致すると判断し、所定の誤差範囲を超えている場合には、両位置が一致していないと判断する。両位置が一致すると判断した場合には、物体検出情報から抽出した位置に配置されている物体は、物体配置情報から抽出した位置に配置されている物体に付与されている識別情報で示される物体であると識別する。 The object identification means 27 includes object detection information (categories and positions of a plurality of objects arranged in the target area) output from the object detection means 24 and object arrangement information (identification of each object) stored in the storage means 30. Based on information and position), multiple objects arranged in the target area are individually identified.
As the object identification method by the object identification means 27, various methods can be used. For example, the position of an object having the same shape is extracted from the object detection information and the object arrangement information. Then, the position extracted from the object detection information and the position extracted from the object arrangement information are collated, and it is determined whether or not both positions match. At this time, it is determined whether or not both positions match while allowing a predetermined error range. That is, when the difference between the two positions is within a predetermined error range, it is determined that the two positions match, and when the difference exceeds the predetermined error range, it is determined that the two positions do not match. When it is determined that both positions match, the object placed at the position extracted from the object detection information is the object indicated by the identification information given to the object placed at the position extracted from the object placement information. Identify as.

具体的には、物体識別手段２７は、図８に示されているように、物体検出手段２４から出力される物体検出情報と記憶手段３０に記憶されている物体配置情報を照合する。
例えば、物体検出情報に含まれている、ボリューム［Ｖ］（１）の位置（バウンディングボックスの幅と高さおよび重心位置のｘｙ座標）と、物体配置情報に含まれている、ボリューム［Ｖ］（１）と同一形状（同一カテゴリ）のボリューム［Ｖ１］〜［Ｖ３］の位置（バウンディングボックスの幅と高さおよび重心位置のｘｙ座標）を照合し、両者の差が所定範囲内であるか否かを判断する。この時、ボリューム［Ｖ］（１）の位置とボリューム［Ｖ２］および［Ｖ３］の位置との差は所定範囲を超えているが、ボリューム［Ｖ］（１）の位置とボリューム［Ｖ１］の位置の差は所定範囲内である場合には、物体検出情報に含まれているボリューム［Ｖ］（１）は、物体配置情報に含まれているボリューム［Ｖ１］であると識別（個体識別）する。 Specifically, as shown in FIG. 8, the object identification means 27 collates the object detection information output from the object detection means 24 with the object arrangement information stored in the storage means 30.
For example, the position of the volume [V] (1) included in the object detection information (xy coordinates of the width and height of the bounding box and the position of the center of gravity) and the volume [V] included in the object arrangement information. The positions of the volumes [V1] to [V3] of the same shape (same category) as in (1) (xy coordinates of the width and height of the bounding box and the position of the center of gravity) are collated, and whether the difference between the two is within a predetermined range. Judge whether or not. At this time, the difference between the position of the volume [V] (1) and the positions of the volumes [V2] and [V3] exceeds a predetermined range, but the position of the volume [V] (1) and the position of the volume [V1] When the difference in position is within a predetermined range, the volume [V] (1) included in the object detection information is identified as the volume [V1] included in the object placement information (individual identification). To do.

記憶手段３０は、ＲＯＭやＲＡＭ等により構成され、各手段２１〜２７の処理を実行するプログラムや種々のデータが記憶される。物体配置情報生成手段２５によって作成された物体配置情報は、記憶手段３０の物体配置情報記憶部３１に記憶される。
入力手段４０は、キーボード等により構成され、種々の情報を入力する。
表示手段５０は、液晶表示装置や有機ＥＬ表示装置等により構成され、種々の情報を表示する。なお、表示手段５０として、表示画面に表示されている表示部をタッチすることによって情報を入力することができる表示手段が用いられる場合には、入力手段４０を省略することもできる。 The storage means 30 is composed of a ROM, a RAM, or the like, and stores a program for executing the processes of the means 21 to 27 and various data. The object arrangement information created by the object arrangement information generation means 25 is stored in the object arrangement information storage unit 31 of the storage means 30.
The input means 40 is composed of a keyboard or the like, and inputs various information.
The display means 50 is composed of a liquid crystal display device, an organic EL display device, and the like, and displays various information. If the display means 50 is a display means that allows information to be input by touching a display unit displayed on the display screen, the input means 40 may be omitted.

次に、本実施形態の物体識別装置の動作を説明する。本実施形態の物体識別装置は、物体配置情報生成モードあるいは物体識別モードに設定可能に構成されている。
例えば、入力手段４０から物体配置情報生成モード設定情報あるいは物体識別モード設定情報を入力することによって、物体配置情報生成モードあるいは物体識別モードに設定されるように構成される。あるいは、入力手段４０から物体識別開始情報が入力されることによって、先ず、物体配置情報生成モードが設定され、物体配置情報を生成して記憶手段３０に記憶した後、物体識別モードに設定されるように構成される。 Next, the operation of the object identification device of the present embodiment will be described. The object identification device of the present embodiment is configured to be set to the object arrangement information generation mode or the object identification mode.
For example, by inputting the object arrangement information generation mode setting information or the object identification mode setting information from the input means 40, the object arrangement information generation mode or the object identification mode is set. Alternatively, when the object identification start information is input from the input means 40, the object arrangement information generation mode is first set, the object arrangement information is generated and stored in the storage means 30, and then the object identification mode is set. It is configured as follows.

物体情報生成モードに設定された時の動作を、図９に示されているフローチャートを参照して説明する。
ステップＡ１では、撮像手段１０で撮像した撮像情報を入力する。ステップＡ１の処理は、撮像情報入力手段２２により実行される。
ステップＡ２では、入力された撮像情報に基づいて、対象領域を判別する。ステップＡ２の処理は、対象領域判別手段２３により実行される。
ステップＡ３では、入力された撮像情報とステップＡ２で判別した対象領域に基づいて、対象領域に配置されている複数の物体のカテゴリおよび位置を検出して物体検出情報を出力する。ステップＡ３の処理は、物体検出手段２４によって実行される。
ステップＡ４では、ステップＡ３で検出した、対象領域に配置されている物体のカテゴリおよび位置を示す物体検出情報に基づいて、各物体を識別する識別情報と各物体の位置を含む物体配置情報を生成する。ステップＡ４の処理は、物体配置情報生成手段２５によって実行される。
ステップＡ５では、ステップＡ４で生成した物体配置情報を記憶手段３０に記憶する。ステップＡ５の処理は、物体配置情報生成手段２５によって実行される。 The operation when the object information generation mode is set will be described with reference to the flowchart shown in FIG.
In step A1, the imaging information captured by the imaging means 10 is input. The process of step A1 is executed by the imaging information input means 22.
In step A2, the target area is determined based on the input imaging information. The process of step A2 is executed by the target area determination means 23.
In step A3, based on the input imaging information and the target area determined in step A2, the categories and positions of a plurality of objects arranged in the target area are detected and the object detection information is output. The process of step A3 is executed by the object detection means 24.
In step A4, based on the object detection information indicating the category and position of the object arranged in the target area detected in step A3, the identification information for identifying each object and the object arrangement information including the position of each object are generated. To do. The process of step A4 is executed by the object arrangement information generation means 25.
In step A5, the object arrangement information generated in step A4 is stored in the storage means 30. The process of step A5 is executed by the object arrangement information generation means 25.

物体識別モードに設定された時の動作を、図１０に示されているフローチャートを参照して説明する。
ステップＢ１では、撮像手段１０で撮像した撮像情報を入力する。ステップＢ１の処理は、撮像情報入力手段２２により実行される。
ステップＢ２では、入力された撮像情報に基づいて、対象領域を判別する。ステップＢ２の処理は、対象領域判別手段２３により実行される。
ステップＢ３では、入力された撮像情報とステップＢ２で判別した対象領域に基づいて、対象領域に配置されている複数の物体のカテゴリおよび位置を検出して物体検出情報を出力する。ステップＢ３の処理は、物体検出手段２４によって実行される。
ステップＢ４では、ステップＢ３で検出した、対象領域に配置されている物体のカテゴリおよび位置を示す物体検出情報と、記憶手段３０に記憶されている（物体配置情報生成モードのステップＡ４で生成した）物体配置情報（各物体の識別情報と位置）を照合し、ステップＢ３で検出した、対象領域に配置されている複数の物体を個体識別する。ステップＢ４の処理は、物体識別手段２７によって実行される。
ステップＢ４の処理を終了した後、ステップＢ１に戻る。なお、ステップＢ１〜Ｂ４の処理は、連続的に繰り返して実行してもよいし、所定の時間間隔をおいて実行してもよい。 The operation when the object identification mode is set will be described with reference to the flowchart shown in FIG.
In step B1, the image pickup information captured by the image pickup means 10 is input. The process of step B1 is executed by the imaging information input means 22.
In step B2, the target area is determined based on the input imaging information. The process of step B2 is executed by the target area determination means 23.
In step B3, based on the input imaging information and the target area determined in step B2, the categories and positions of a plurality of objects arranged in the target area are detected and the object detection information is output. The process of step B3 is executed by the object detection means 24.
In step B4, the object detection information indicating the category and position of the object arranged in the target area detected in step B3 and the object detection information stored in the storage means 30 (generated in step A4 of the object arrangement information generation mode). The object arrangement information (identification information and position of each object) is collated, and a plurality of objects arranged in the target area detected in step B3 are individually identified. The process of step B4 is executed by the object identification means 27.
After completing the process of step B4, the process returns to step B1. The processes of steps B1 to B4 may be executed continuously and repeatedly, or may be executed at predetermined time intervals.

次に、本発明の異なる実施形態を説明する。
撮像手段１０の撮像位置や撮像角度（撮像方向）が変更されると、撮像画面における、対象領域に配置されている複数の物体の位置（ｘｙ座標）が変化する。
撮像画面における対象領域に配置されている複数の物体の位置が大きく変化すると、物体検出情報と物体配置情報との照合精度、すなわち、各物体の個体識別精度が低下するおそれがある。
このため、撮像手段１０の撮像位置や撮像角度（撮像方向）が変更された場合には、物体配置情報を再生成することによって、各物体の個体識別精度の低下を防止するのが好ましい。
撮像手段１０の撮像位置や撮像角度（撮像方向）が変更されたことは、後述するように、撮像画面の画角の変化量によって判別することができる。 Next, different embodiments of the present invention will be described.
When the imaging position and the imaging angle (imaging direction) of the imaging means 10 are changed, the positions (xy coordinates) of a plurality of objects arranged in the target area on the imaging screen change.
If the positions of a plurality of objects arranged in the target area on the imaging screen change significantly, the matching accuracy between the object detection information and the object arrangement information, that is, the individual identification accuracy of each object may decrease.
Therefore, when the imaging position or the imaging angle (imaging direction) of the imaging means 10 is changed, it is preferable to prevent the individual identification accuracy of each object from being lowered by regenerating the object arrangement information.
The change in the imaging position and the imaging angle (imaging direction) of the imaging means 10 can be determined by the amount of change in the angle of view of the imaging screen, as will be described later.

本実施形態では、図１に示されている画角変化量判別手段２６を有している。
画角変化量判別手段２６は、記憶手段３０に記憶されている物体配置情報を生成する際に用いた撮像画面（Ｍ）と、撮像手段１０から出力された撮像情報で示される撮像画面（Ｐ）との間の画角変化量が所定範囲を超えているか否かを判別する。画角は、撮像手段１０の撮像位置や撮像角度（撮像方向）によって変化する。
画角変化量判別手段２６による、画角変化量が所定範囲を超えているか否かを判別する方法（画角変化量判別方法）としては、種々の方法を用いることができる。
例えば、撮像画面（Ｍ）と撮像画面（Ｐ）の類似度が所定値より低いか否かを判別する方法を用いることができる。撮像画面（Ｍ）と撮像画面（Ｐ）の類似度は、撮像画面（Ｍ）の特徴点と撮像画面（Ｐ）の特徴点の類似度によって判別することができる。ここで、像撮像画面（Ｍ）および撮像画面（Ｐ）の各特徴点は、ＳＩＦＴなどの局所特徴量に基づく手法を用いて検出することができる。後述のとおり、撮像画面（Ｍ）および撮像画面（Ｐ）の特徴点を用いることにより、撮像画面（Ｍ）と撮像画面（Ｐ）の類似度を判別することができる。 In this embodiment, the angle of view change amount determining means 26 shown in FIG. 1 is provided.
The angle of view change amount determining means 26 includes an imaging screen (M) used when generating object arrangement information stored in the storage means 30, and an imaging screen (P) indicated by imaging information output from the imaging means 10. ), It is determined whether or not the amount of change in the angle of view exceeds a predetermined range. The angle of view changes depending on the imaging position and the imaging angle (imaging direction) of the imaging means 10.
Various methods can be used as a method for determining whether or not the amount of change in the angle of view exceeds a predetermined range (method for determining the amount of change in the angle of view) by the angle of view change amount determining means 26.
For example, a method of determining whether or not the similarity between the image pickup screen (M) and the image pickup screen (P) is lower than a predetermined value can be used. The degree of similarity between the imaging screen (M) and the imaging screen (P) can be determined by the similarity between the feature points of the imaging screen (M) and the feature points of the imaging screen (P). Here, each feature point of the image imaging screen (M) and the imaging screen (P) can be detected by using a method based on a local feature amount such as SIFT. As will be described later, the similarity between the imaging screen (M) and the imaging screen (P) can be determined by using the feature points of the imaging screen (M) and the imaging screen (P).

ここで、撮像画面（Ｍ）における対象領域（Ｍ）と撮像画面（Ｐ）における対象領域（Ｐ）との間の画角変化量は、撮像画面（Ｍ）と撮像画面（Ｐ）との間の画角変化量と等価である。
本実施形態では、画角変化量判別手段２６は、図１１に示されているように、記憶手段３０に記憶されている物体配置情報を生成する際に用いた撮像画面（Ｍ）における対象領域（Ｍ）の特徴点と、撮像手段１０から出力された撮像情報で示される撮像画面（Ｐ）における対象領域（Ｐ）の特徴点の類似度が所定値より低いか否かを判別することによって、撮像画面（Ｍ）と撮像画面（Ｐ）との間の画角変化量が所定範囲を超えているか否かを判別している。
対象領域の特徴点としては、例えば、図１１に破線で例示されている箇所が検出される。なお、好適には、複数の特徴点が検出される。
そして、両対象領域の特徴点のうち、それぞれの特徴点同士の局所特徴量による類似度から対応関係が一致すると判別される特徴点の集合により、両対象領域の特徴ベクトルを生成する。そして、両対象領域の特徴ベクトルの類似度が所定値以上であるか否かを判別する。
特徴ベクトルの類似度は、例えば、対象領域（Ｍ）の特徴ベクトルを〈Ｍ〉、対象領域（Ｐ）の特徴ベクトルを〈Ｐ〉とした場合、以下に示す両ベクトルのコサイン距離で表される。
Ｓｉｍ（〈Ｍ〉，〈Ｐ〉）＝（〈Ｍ〉・〈Ｐ〉）／（｜〈Ｍ〉｜｜〈Ｐ〉｜）
コサイン距離が所定値以上である場合には、特徴ベクトルの類似度が高く、画角変化量が所定範囲を超えていないことを判別し、所定値未満である場合に、特徴ベクトルの類似度が低く、画角変化量が所定範囲を超えていることを判別する。
なお、特徴点検出、特徴ベクトル算出手法、類似度判定手法としては、前記手法以外の種々の手法を用いることができる。
また、画角変化量を検出する手法としては、前記手法以外にも、オプティカルフローによって画素の移動による変化量を速度ベクトルとして算出する手法等の種々の手法を用いることができる。
撮像画面（Ｍ）および撮像画面（Ｐ）の特徴点は、撮像画面（Ｍ）における対象領域（Ｍ）および撮像画面（Ｐ）における対象領域（Ｐ）の特徴点に限定されない。 Here, the amount of change in the angle of view between the target area (M) on the image pickup screen (M) and the target area (P) on the image pickup screen (P) is between the image pickup screen (M) and the image pickup screen (P). Is equivalent to the amount of change in the angle of view.
In the present embodiment, as shown in FIG. 11, the angle of view change amount determining means 26 is a target area on the imaging screen (M) used when generating the object arrangement information stored in the storage means 30. By determining whether or not the similarity between the feature points of (M) and the feature points of the target region (P) on the image pickup screen (P) indicated by the image pickup information output from the image pickup means 10 is lower than a predetermined value. It is determined whether or not the amount of change in the angle of view between the imaging screen (M) and the imaging screen (P) exceeds a predetermined range.
As the feature points of the target area, for example, the points illustrated by the broken lines in FIG. 11 are detected. It should be noted that preferably, a plurality of feature points are detected.
Then, among the feature points of both target regions, the feature vector of both target regions is generated from the set of the feature points that are determined to have the same correspondence from the similarity of the local feature quantities of the feature points. Then, it is determined whether or not the similarity of the feature vectors of both target regions is equal to or higher than a predetermined value.
The similarity of the feature vectors is represented by the cosine distances of both vectors shown below, for example, when the feature vector of the target region (M) is <M> and the feature vector of the target region (P) is <P>. ..
Sim (<M>, <P>) = (<M>, <P>) / (| <M> || <P> |)
When the cosine distance is equal to or more than a predetermined value, it is determined that the similarity of the feature vectors is high and the amount of change in the angle of view does not exceed the predetermined range, and when it is less than the predetermined value, the similarity of the feature vectors is high. It is low, and it is determined that the amount of change in the angle of view exceeds a predetermined range.
As the feature point detection, feature vector calculation method, and similarity determination method, various methods other than the above methods can be used.
In addition to the above method, various methods such as a method of calculating the amount of change due to pixel movement by optical flow as a velocity vector can be used as a method of detecting the amount of change in the angle of view.
The feature points of the image pickup screen (M) and the image pickup screen (P) are not limited to the feature points of the target area (M) on the image pickup screen (M) and the target area (P) on the image pickup screen (P).

本実施形態の物体識別装置の動作を説明する。本実施形態の物体識別装置は、物体配置情報生成モードあるいは物体識別モードに設定可能に構成されている。
物体配置情報生成モードに設定された時の動作は、図９に示されている動作と同じであるため、説明を省略する。 The operation of the object identification device of the present embodiment will be described. The object identification device of the present embodiment is configured to be set to the object arrangement information generation mode or the object identification mode.
Since the operation when the object arrangement information generation mode is set is the same as the operation shown in FIG. 9, the description thereof will be omitted.

物体識別モードに設定された時の動作を、図１２に示されているフローチャートを参照して説明する。
ステップＣ１およびＣ２の動作は、図１０に示されているステップＢ１およびＢ２の動作と同じである。
ステップＣ３では、画角変化量が所定範囲を超えているか否かを判断する。画角変化量が所定範囲を超えている場合には、ステップＣ４に進む。一方、画角変化量が所定範囲を超えていない場合には、ステップＣ７に進む。ステップＣ３の処理は、画角変化量判別手段２６によって実行される。
ステップＣ４〜Ｃ６では、図９に示されているステップＡ３〜Ａ５と同様に物体配置情報を生成（再生成）して記憶手段３０に記憶する。
ステップＣ７およびＣ８では、図１０に示されているステップＢ３およびＢ４と同様に、対象領域に配置されている複数の物体を個体識別する。 The operation when the object identification mode is set will be described with reference to the flowchart shown in FIG.
The operations of steps C1 and C2 are the same as the operations of steps B1 and B2 shown in FIG.
In step C3, it is determined whether or not the amount of change in the angle of view exceeds a predetermined range. If the amount of change in the angle of view exceeds the predetermined range, the process proceeds to step C4. On the other hand, if the amount of change in the angle of view does not exceed the predetermined range, the process proceeds to step C7. The process of step C3 is executed by the angle of view change amount determining means 26.
In steps C4 to C6, object arrangement information is generated (regenerated) and stored in the storage means 30 in the same manner as in steps A3 to A5 shown in FIG.
In steps C7 and C8, as in steps B3 and B4 shown in FIG. 10, a plurality of objects arranged in the target area are individually identified.

本発明の好適な利用形態としては、操作盤の操作パネル等に配置されている複数の操作部（同一形状の複数の操作部を含む）の状態を監視する監視装置等が想定される。
このような監視装置に、非特許文献１に開示されているSSDや非特許文献２に開示されているYOLOによる物体検出手段を用いた場合、前述のとおり、対象領域内に存在する同一形状の複数の物体を個体識別することができない。
このため、従来では、対象領域に配置されている複数の物体の配置状態を示す配置情報（各物体の位置とＩＤ）を２次元マップとして作成（登録）しておき、撮像手段から出力される撮像情報に基づいて検出した物体検出情報（対象領域に配置されている複数の物体の位置およびカテゴリ）と２次元マップで示されるは物体配置情報（各物体の位置とＩＤ）とを照合することによって、各物体を個体識別していた。
しかしながら、物体を登録する毎に２次元マップを手作業で作成する必要があり、２次元マップの作成に労力と時間を要する。
一方、本発明の物体識別装置を用いた場合、物体配置情報が自動的に生成されるため、対象領域に配置されている物体の配置状態を示す２次元マップを手作業で作成する必要がない。
また、撮像情報が入力されるごとに、物体検出情報と物体配置情報を照合することによって、操作パネルに配置されている複数の操作部を個体識別するため、作業者によって操作された操作部や操作部の状態を容易に識別することができる。 As a preferred mode of use of the present invention, a monitoring device or the like for monitoring the state of a plurality of operation units (including a plurality of operation units having the same shape) arranged on an operation panel or the like of an operation panel is assumed.
When the SSD disclosed in Non-Patent Document 1 or the object detection means by YOLO disclosed in Non-Patent Document 2 is used for such a monitoring device, as described above, the same shape existing in the target region is used. It is not possible to identify multiple objects as individuals.
Therefore, conventionally, arrangement information (position and ID of each object) indicating the arrangement state of a plurality of objects arranged in the target area is created (registered) as a two-dimensional map and output from the imaging means. Collate the object detection information (positions and categories of multiple objects placed in the target area) detected based on the imaging information with the object placement information (position and ID of each object) shown in the two-dimensional map. Each object was individually identified by.
However, it is necessary to manually create a two-dimensional map every time an object is registered, and it takes labor and time to create the two-dimensional map.
On the other hand, when the object identification device of the present invention is used, the object arrangement information is automatically generated, so that it is not necessary to manually create a two-dimensional map showing the arrangement state of the objects arranged in the target area. ..
In addition, each time the imaging information is input, the object detection information and the object arrangement information are collated to individually identify a plurality of operation units arranged on the operation panel. The state of the operation unit can be easily identified.

本発明は、実施形態で説明した構成に限定されず、種々の変更、追加、削除が可能である。
処理手段２０を管理手段２１〜物体識別手段２７により構成したが、処理手段２０の構成は、これに限定されない。例えば、不要な手段を削除することもできるし、複数の手段を統合することもできる。
対象領域判別手段２３、物体検出手段２４、物体配置情報生成手段２５、画角変化量判別手段２６および物体識別手段２７の構成は、実施形態で説明した構成に限定されず、本発明の要旨を変更しない範囲内で種々変更可能である。
図９、図１０および図１２に示されている処理は、種々変更可能である。 The present invention is not limited to the configuration described in the embodiment, and various changes, additions, and deletions can be made.
The processing means 20 is composed of the management means 21 to the object identification means 27, but the configuration of the processing means 20 is not limited to this. For example, unnecessary means can be deleted, or multiple means can be integrated.
The configurations of the target area discriminating means 23, the object detecting means 24, the object arrangement information generating means 25, the angle of view change amount discriminating means 26, and the object discriminating means 27 are not limited to the configurations described in the embodiments, and the gist of the present invention is described. Various changes can be made within the range that does not change.
The processes shown in FIGS. 9, 10 and 12 can be changed in various ways.

１０撮像手段
１１撮像画面
２０処理手段
２１管理手段
２２撮像情報入力手段
２３対象領域判別手段
２４物体検出手段
２５物体配置情報作成手段
２６物体識別手段
２７画角変化判別手段
３０記憶手段
３１物体配置情報記憶部
４０入力手段
５０表示手段
１００操作盤
１１０操作パネル（対象領域）
１１１縁部（対象領域縁部）
１１１ａ上側縁部（対象領域上側縁部）
１１１ｂ左側縁部（対象領域左側縁部）
１１１ｃ下側縁部（対象領域下側縁部）
１１１ｄ右側縁部（対象領域右側縁部）
１２０ａ〜１２０ｃボリューム
１３０ａ、１３０ｂダイヤル
１４０ａ〜１４０ｋボタン 10 Imaging means 11 Imaging screen 20 Processing means 21 Management means 22 Imaging information input means 23 Target area discrimination means 24 Object detection means 25 Object placement information creation means 26 Object identification means 27 Angle change discrimination means 30 Storage means 31 Object placement information storage Unit 40 Input means 50 Display means 100 Operation panel 110 Operation panel (target area)
111 Edge (Target area edge)
111a Upper edge (upper edge of target area)
111b Left edge (left edge of target area)
111c Lower edge (lower edge of target area)
111d Right edge (right edge of target area)
120a-120c Volume 130a, 130b Dial 140a-140k Button

Claims

An object identification device that identifies a plurality of objects including a plurality of objects having the same shape, which are arranged in a target area.
An image pickup means, a target area determination means, an object detection means, an object arrangement information generation means, an object identification means, and a storage means are provided.
The imaging means outputs imaging information indicating an imaging screen that images an imaging region including the target region and the plurality of objects arranged in the target region.
The target area determining means discriminates the target area on the imaging screen based on the imaging information output from the imaging means.
The object detection means is arranged in the target area on the image pickup screen based on the image pickup information output from the image pickup means and the target area on the image pickup screen determined by the target area determination means. Detects the categories and positions of the plurality of objects that are present
The object placement information generating means obtains identification information and positions of the plurality of objects based on the categories and positions of the plurality of objects arranged in the target area on the imaging screen detected by the object detecting means. The object arrangement information to be shown is generated, stored in the storage means, and stored.
The object identification means is indicated by the categories and positions of the plurality of objects arranged in the target area on the imaging screen detected by the object detection means, and the object arrangement information stored in the storage means. Based on the identification information and the position of the plurality of objects, the plurality of objects arranged in the target area on the imaging screen are individually identified.
It is possible to set the object placement information generation mode and the object identification mode.
When the object arrangement information generation mode is set, the object arrangement information is generated by the target area determination means, the object detection means, and the object arrangement information generation means and stored in the storage means.
When the object identification mode is set, the target area determination means, the object detection means, and the object identification means are used to individually identify the plurality of objects arranged in the target area on the imaging screen. An object identification device characterized by being configured in.

The object identification device according to claim 1.
Equipped with a means for determining the amount of change in the angle of view
The angle-of-view change amount determining means is between an image pickup screen used when generating the object arrangement information stored in the storage means and an image pickup screen indicated by the image pickup information output from the image pickup means. Determine whether the amount of change in the angle of view exceeds the predetermined range,
When it is determined by the angle of view change amount discriminating means that the angle of view change amount exceeds the predetermined range when the object identification mode is set, the target area discriminating means, the object detecting means, and the object detecting means An object identification device characterized in that the object arrangement information generation means is configured to generate the object arrangement information and store it in the storage means.

The object identification device according to claim 1 or 2.
The object identification means is an object represented by the category and position of the object arranged in the target area on the imaging screen detected by the object detecting means and the object arrangement information stored in the storage means. An object identification device characterized in that a plurality of objects are individually identified by collating the identification information and the position while allowing a predetermined error range.

The object identification device according to any one of claims 1 to 3.
The object detection means detects the categories of the plurality of objects and their relative positions in the target area, and detects the categories.
The object identification information generation means is an object identification device that generates identification information of the plurality of objects and object arrangement information indicating a relative position in the target area.

The object identification device according to any one of claims 1 to 4.
An object identification device characterized in that the target area discriminating means and the object detecting means are composed of a multi-layer neural network.