JPWO2019235050A1

JPWO2019235050A1 - Object detection method and object detection device

Info

Publication number: JPWO2019235050A1
Application number: JP2020523539A
Authority: JP
Inventors: 大気関井
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2018-06-05
Filing date: 2019-04-03
Publication date: 2021-06-17
Anticipated expiration: 2039-04-03
Also published as: JP7036208B2; WO2019235050A1

Abstract

物体検出方法は、複数の物体が画像上で占める領域の候補を検出する候補検出ステップ（Ｓ１）と、検出した候補ごとに、上記候補を一の候補として、一の候補の周囲の予め設定された範囲内の他の候補を参照して、一の候補の他の候補に対する所定の関係の尤度を表す関係スコアを検出する関係スコア検出ステップ（Ｓ２）と、所定のアルゴリズムに基づいて、検出した候補から複数の物体の領域を決定する物体決定ステップ（Ｓ３）と、決定した領域に対応する候補のペアについて検出された関係スコアに基づいて、複数の物体の関係を決定する関係決定ステップ（Ｓ４）とを含む。The object detection method is preset in a candidate detection step (S1) for detecting a candidate for an area occupied by a plurality of objects on an image, and for each detected candidate, with the above candidate as one candidate and around one candidate. Detection based on the relationship score detection step (S2) for detecting the relationship score representing the likelihood of a predetermined relationship with another candidate of one candidate by referring to other candidates in the range, and a predetermined algorithm. An object determination step (S3) for determining the regions of a plurality of objects from the determined candidates, and a relationship determination step (S3) for determining the relationship between a plurality of objects based on the relationship scores detected for the pair of candidates corresponding to the determined regions. S4) and is included.

Description

本発明は、画像から、複数の物体と、複数の物体の間の関係とを検出する物体検出方法および物体検出装置に関する。 The present invention relates to an object detection method and an object detection device for detecting a plurality of objects and relationships between the plurality of objects from an image.

近年、画像から複数の物体を検出し、複数の物体の間に存在する関係を推定する技術が提案されている。例えば非特許文献１では、複数のＣＮＮ（Convolutional Neural Network）を用い、物体の候補および関係の候補を段階的に絞ることで、複数の物体とそれらの間の関係を推定している。より詳しくは、画像から人物を検出するためのＣＮＮと、人物以外の物体（例えば傘）を検出するためのＣＮＮと、それらの関係を検出するためのＣＮＮとを用意し、各ＣＮＮによって人物、物体、関係の各候補を抽出した後、抽出した各候補を統合し、その後、畳み込みを行って各候補を絞っている。 In recent years, a technique has been proposed in which a plurality of objects are detected from an image and a relationship existing between the plurality of objects is estimated. For example, in Non-Patent Document 1, a plurality of CNNs (Convolutional Neural Networks) are used to gradually narrow down the candidates for objects and the candidates for relationships, thereby estimating the relationships between the plurality of objects and them. More specifically, a CNN for detecting a person from an image, a CNN for detecting an object other than the person (for example, an umbrella), and a CNN for detecting the relationship between them are prepared, and each CNN prepares a person, After extracting each candidate of object and relationship, each extracted candidate is integrated, and then convolution is performed to narrow down each candidate.

Ji Zhang , et al., “Relationship Proposal Networks”, Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference, July 21-26, 2017Ji Zhang, et al., “Relationship Proposal Networks”, Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference, July 21-26, 2017

ところが、非特許文献１の技術では、候補を絞り込むための中間処理（統合および畳み込み）が必要であり、また、複数のＣＮＮで畳み込みを行った上で、各候補の統合後に再度畳み込みを行うため、畳み込みの処理も冗長である。このため、非特許文献１の技術では、画像から複数の物体とそれらの物体間の関係とを高速で推定（検出）することが困難である。 However, in the technique of Non-Patent Document 1, intermediate processing (integration and convolution) for narrowing down the candidates is required, and after convolution is performed by a plurality of CNNs, convolution is performed again after the integration of each candidate. , The convolution process is also redundant. Therefore, in the technique of Non-Patent Document 1, it is difficult to estimate (detect) a plurality of objects and the relationship between the objects at high speed from the image.

本発明は、上記の問題点を解決するためになされたもので、その目的は、画像から複数の物体とそれらの関係とを高速で検出することができる物体検出方法および物体検出装置を提供することにある。 The present invention has been made to solve the above problems, and an object of the present invention is to provide an object detection method and an object detection device capable of detecting a plurality of objects and their relationships from an image at high speed. There is.

本発明の一側面に係る物体検出方法は、画像から複数の物体と前記複数の物体の関係とを検出する物体検出方法であって、前記複数の物体が前記画像上で占める領域の候補を検出する候補検出ステップと、検出した前記候補ごとに、前記候補を一の候補として、前記一の候補の周囲の予め設定された範囲内の他の候補を参照して、前記一の候補の前記他の候補に対する所定の関係の尤度を表す関係スコアを検出する関係スコア検出ステップと、所定のアルゴリズムに基づいて、検出した前記候補から前記複数の物体の前記領域を決定する物体決定ステップと、決定した前記領域に対応する候補のペアについて検出された前記関係スコアに基づいて、前記複数の物体の前記関係を決定する関係決定ステップとを含む。 The object detection method according to one aspect of the present invention is an object detection method for detecting a plurality of objects and the relationship between the plurality of objects from an image, and detects candidates for a region occupied by the plurality of objects on the image. With reference to the candidate detection step to be performed and other candidates within a preset range around the one candidate, with the candidate as one candidate for each detected candidate, the other candidate of the one candidate is referred to. A relationship score detection step that detects a relationship score that represents the likelihood of a predetermined relationship with respect to a candidate, and an object determination step that determines the region of the plurality of objects from the detected candidates based on a predetermined algorithm. It includes a relationship determination step of determining the relationship of the plurality of objects based on the relationship score detected for the pair of candidates corresponding to the region.

本発明の他の側面に係る物体検出装置は、画像から複数の物体と前記複数の物体の関係とを検出する物体検出装置であって、前記複数の物体が前記画像上で占める領域の候補を検出するとともに、検出した前記候補ごとに、前記候補を一の候補として、前記一の候補の周囲の予め設定された範囲内の他の候補を参照して、前記一の候補の前記他の候補に対する所定の関係の尤度を表す関係スコアを検出する候補／関係スコア検出部と、所定のアルゴリズムに基づいて、検出した前記候補から前記複数の物体の前記領域を決定する物体決定部と、決定した前記領域に対応する候補のペアについて検出された前記関係スコアに基づいて、前記複数の物体の前記関係を決定する関係決定部とを含む。 The object detection device according to another aspect of the present invention is an object detection device that detects a plurality of objects and the relationship between the plurality of objects from an image, and can be a candidate for a region occupied by the plurality of objects on the image. In addition to detecting, for each of the detected candidates, the candidate is set as one candidate, and other candidates within a preset range around the one candidate are referred to, and the other candidate of the one candidate is referred to. A candidate / relationship score detection unit that detects a relationship score representing the likelihood of a predetermined relationship with respect to the object, and an object determination unit that determines the region of the plurality of objects from the detected candidates based on a predetermined algorithm. Includes a relationship determination unit that determines the relationship of the plurality of objects based on the relationship score detected for the pair of candidates corresponding to the region.

複数の物体が画像上で占める領域の候補を検出し、検出した候補間の関係の尤度を表す関係スコアを検出するため、候補の検出と関係スコアの検出とを、例えば単一のＣＮＮを用いたニューロ演算によって同時に（並列して、まとめて）行うことができる。これにより、上記関係スコアに基づいて、複数の物体とそれらの物体間の関係とを高速で検出することが可能となる。 In order to detect candidates in the area occupied by a plurality of objects on the image and detect the relationship score indicating the likelihood of the relationship between the detected candidates, the detection of the candidate and the detection of the relationship score, for example, a single CNN. It can be performed simultaneously (in parallel, collectively) by the neuro operation used. This makes it possible to detect a plurality of objects and the relationships between the objects at high speed based on the relationship score.

本発明の実施の形態の物体検出装置の概略の構成を示すブロック図である。It is a block diagram which shows the schematic structure of the object detection apparatus of embodiment of this invention. 上記物体検出装置が有する候補／関係スコア検出部の構成を模式的に示す説明図である。It is explanatory drawing which shows typically the structure of the candidate / relationship score detection part which the object detection apparatus has. 上記物体検出装置による物体検出方法の処理の流れを示すフローチャートである。It is a flowchart which shows the process flow of the object detection method by the said object detection apparatus. 上記候補／関係スコア検出部による候補の検出の手順を模式的に示す説明図である。It is explanatory drawing which shows typically the procedure of the detection of a candidate by the said candidate / relationship score detection part. 特定のグリッドに対して複数の候補を検出する例を模式的に示す説明図である。It is explanatory drawing which shows typically the example of detecting a plurality of candidates with respect to a specific grid. 他のグリッドに対して複数の候補を検出する例を模式的に示す説明図である。It is explanatory drawing which shows typically the example of detecting a plurality of candidates with respect to other grids. クラススコアが最大であるクラスをグリッドごとに模式的に示す説明図である。It is explanatory drawing which shows typically the class which has the maximum class score for each grid. 上記候補／関係スコア検出部による関係スコアの検出の手順を模式的に示す説明図である。It is explanatory drawing which shows typically the procedure of the detection of the relational score by the said candidate / relational score detection part. 上記候補／関係スコア検出部から出力される特徴マップを示す説明図である。It is explanatory drawing which shows the feature map output from the said candidate / relationship score detection part. 上記物体検出装置の物体決定部が、複数の候補から、複数の物体の領域を決定した状態を示す説明図である。It is explanatory drawing which shows the state which the object determination part of the object detection apparatus determined the region of a plurality of objects from a plurality of candidates. 図１０で示した各領域に対応する候補から選択される２つの候補についての関係をグラフ表現した説明図である。It is explanatory drawing which graphically represented the relationship about two candidates selected from the candidates corresponding to each area shown in FIG. 閾値処理の前後での２つの候補間の関係を模式的に示す説明図である。It is explanatory drawing which shows typically the relationship between two candidates before and after the threshold processing. 関係スコアを絞り込む前後での２つの候補間の関係を模式的に示す説明図である。It is explanatory drawing which shows typically the relationship between two candidates before and after narrowing down a relational score. 上記物体検出装置の他の構成を示すブロック図である。It is a block diagram which shows the other structure of the said object detection apparatus. 図１４の物体検出装置による物体検出方法の処理の流れを示すフローチャートである。It is a flowchart which shows the process flow of the object detection method by the object detection apparatus of FIG.

本発明の実施の形態について、図面に基づいて説明すれば、以下の通りである。なお、本発明は、以下の内容に限定されるわけではない。 An embodiment of the present invention will be described below with reference to the drawings. The present invention is not limited to the following contents.

〔物体検出装置の構成〕
図１は、本実施形態の物体検出装置１の概略の構成を示すブロック図である。物体検出装置１は、例えば、パーソナルコンピュータなどの端末装置で構成されており、店舗内に設置される少なくとも１台の監視カメラ１０と通信回線（有線、無線を問わない）を介して接続されている。監視カメラ１０での撮影によって取得された画像（動画または静止画）のデータが物体検出装置１に入力されると、物体検出装置１では、後述する物体検出方法による処理を実行し、これによって、画像上の複数の物体と、各物体の間の関係とが検出される。[Configuration of object detection device]
FIG. 1 is a block diagram showing a schematic configuration of the object detection device 1 of the present embodiment. The object detection device 1 is composed of, for example, a terminal device such as a personal computer, and is connected to at least one surveillance camera 10 installed in the store via a communication line (whether wired or wireless). There is. When the data of the image (moving image or still image) acquired by the shooting by the surveillance camera 10 is input to the object detection device 1, the object detection device 1 executes a process by the object detection method described later, thereby performing a process. Multiple objects on the image and the relationships between each object are detected.

なお、本明細書において、「関係」とは、複数の物体間のかかわりまたは状態、一方の物体の他方の物体に対する作用または動作、の少なくともいずれかを含む。したがって、例えば、画像上で人物が傘を持っているとき、「持っている」という状態、または「持つ」という動作が上記の関係に相当する。この他、（人物が傘を）「差す」とか、ＡがＢに「属する」、といった動作または状態も、上記の関係に含まれる。 In addition, in this specification, "relationship" includes at least one of the relationship or state between a plurality of objects, and the action or action of one object on the other object. Therefore, for example, when a person holds an umbrella on an image, the state of "holding" or the action of "holding" corresponds to the above relationship. In addition, actions or states such as "a person holding an umbrella" or "belonging" to B are also included in the above relationship.

なお、物体検出装置１は、必ずしも監視カメラ１０と接続されている必要はなく、外部から画像データを取得できる構成であればよい。例えば、物体検出装置１は、画像データのファイルを添付した電子メールを他の端末装置から受信することにより、上記画像データを取得したり、記録媒体に記録された画像データを読取装置によって読み取ることにより、物体検出に必要な上記画像データを取得する構成であってもよい。以下、物体検出装置１の詳細について説明する。 The object detection device 1 does not necessarily have to be connected to the surveillance camera 10, and may have a configuration in which image data can be acquired from the outside. For example, the object detection device 1 acquires the image data by receiving an e-mail with an image data file attached from another terminal device, or reads the image data recorded on the recording medium by a reading device. Therefore, the image data necessary for object detection may be acquired. Hereinafter, the details of the object detection device 1 will be described.

物体検出装置１は、制御部２と、記憶部３と、入力部４と、表示部５と、通信部６と、検出処理部７とを有している。 The object detection device 1 includes a control unit 2, a storage unit 3, an input unit 4, a display unit 5, a communication unit 6, and a detection processing unit 7.

制御部２は、例えば中央演算処理装置（ＣＰＵ；Central Processing Unit）で構成されており、記憶部３に記憶された動作プログラムに従って動作し、物体検出装置１の各部の動作を制御する。 The control unit 2 is composed of, for example, a central processing unit (CPU), operates according to an operation program stored in the storage unit 3, and controls the operation of each unit of the object detection device 1.

記憶部３は、上記動作プログラムや、監視カメラ１０で取得された画像のデータなどを記憶するメモリである。記憶部３は、例えばハードディスクで構成されるが、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、光ディスク、光磁気ディスク、不揮発性メモリなどの記録媒体から適宜選択して構成されてもよい。 The storage unit 3 is a memory that stores the above-mentioned operation program, image data acquired by the surveillance camera 10, and the like. The storage unit 3 is composed of, for example, a hard disk, but may be appropriately selected from recording media such as a RAM (Random Access Memory), a ROM (Read Only Memory), an optical disk, a magneto-optical disk, and a non-volatile memory. ..

入力部４は、例えばキーボード、マウス、タッチパッド、タッチパネルなどで構成されており、ユーザによる各種の指示入力を受け付ける。表示部５は、監視カメラ１０で取得された画像や、後述する検出処理部７で検出された結果（例えば物体と物体との関係）などの各種の情報を表示するデバイスであり、例えば液晶表示装置で構成される。通信部６は、外部（監視カメラ１０を含む）と通信するためのインターフェースであり、入出力端子などを含んで構成される。なお、例えば監視カメラ１０と物体検出装置１とが無線で通信（例えば画像データの送受信）を行う場合、通信部６は、アンテナ、送受信回路、変調回路、復調回路などを含んで構成されてもよい。 The input unit 4 is composed of, for example, a keyboard, a mouse, a touch pad, a touch panel, and the like, and receives various instruction inputs by the user. The display unit 5 is a device that displays various information such as an image acquired by the surveillance camera 10 and a result detected by the detection processing unit 7 described later (for example, the relationship between objects), and is, for example, a liquid crystal display. It consists of devices. The communication unit 6 is an interface for communicating with the outside (including the surveillance camera 10), and includes input / output terminals and the like. For example, when the surveillance camera 10 and the object detection device 1 wirelessly communicate with each other (for example, transmission / reception of image data), the communication unit 6 may be configured to include an antenna, a transmission / reception circuit, a modulation circuit, a demodulation circuit, and the like. Good.

検出処理部７は、画像から複数の物体と、複数の物体の関係とを検出する物体検出処理を行うブロックであり、候補／関係スコア検出部７ａと、物体決定部７ｂと、関係決定部７ｃとを含んで構成されている。検出処理部７は、例えばリアルタイムな画像処理に特化した演算装置であるＧＰＵ（Graphics Processing Unit）で構成されるが、制御部２と同一のまたは別個のＣＰＵで構成されてもよく、その他の演算装置で構成されてもよい。 The detection processing unit 7 is a block that performs object detection processing for detecting a plurality of objects and the relationship between the plurality of objects from an image, and is a candidate / relationship score detection unit 7a, an object determination unit 7b, and a relationship determination unit 7c. It is composed including and. The detection processing unit 7 is composed of, for example, a GPU (Graphics Processing Unit) which is an arithmetic unit specialized in real-time image processing, but may be composed of the same CPU as the control unit 2 or a separate CPU, and other CPUs. It may be composed of an arithmetic unit.

候補／関係スコア検出部７ａは、例えばＣＮＮで構成されており、画像が入力されたときに、複数の物体が画像上で占める領域の候補（バウンディングボックスとも呼ばれる矩形状の枠）を検出するとともに、検出した複数の候補の間での所定の関係の尤度（確からしさ、信頼度）を表す関係スコアを検出する。なお、上記候補の検出にあたっては、例えば、Joseph Redmon, et al., "You Only Look Once: Unified, Real-Time Object Detection", Computer Vision and Pattern Recognition (cs.CV), Submitted on 8 Jun 2015 (v1), last revised 9 May 2016 に記載の技術（以下、“YOLO”と略記する）を採用することができるが、上記関係スコアの検出およびその関係スコアに基づく関係の決定については、“YOLO”には一切記載されていない。 The candidate / relationship score detection unit 7a is composed of, for example, CNN, and when an image is input, it detects candidates for an area occupied by a plurality of objects on the image (a rectangular frame also called a bounding box). , Detects a relationship score that represents the likelihood (probability, reliability) of a predetermined relationship between a plurality of detected candidates. In detecting the above candidates, for example, Joseph Redmon, et al., "You Only Look Once: Unified, Real-Time Object Detection", Computer Vision and Pattern Recognition (cs.CV), Submitted on 8 Jun 2015 ( The technology described in v1), last revised 9 May 2016 (hereinafter abbreviated as “YOLO”) can be adopted, but for the detection of the above relationship score and the determination of the relationship based on the relationship score, “YOLO” Is not mentioned at all.

また、候補／関係スコア検出部７ａは、画像上に設定された複数のグリッドのそれぞれに対する候補の相対位置（例えばグリッドの中心に対する候補の中心の位置）、上記画像に対する候補の相対的なサイズ（縦横の大きさ）、物体に対する候補の尤度を表す検出スコア、グリッドにおける物体のクラス（種類）の尤度を表すクラススコアの少なくともいずれかのパラメータをＣＮＮによって検出する。なお、物体に対する候補の尤度とは、候補が物体を示すことの確からしさ、および／または、候補が物体にフィットしていることの確からしさを表す。また、物体のクラスの尤度とは、グリッドにおける物体が属するクラスの確からしさを表す。なお、上記各パラメータ（候補の相対位置等）の検出にあたっては、“YOLO”の技術を採用することができる。 Further, the candidate / relationship score detection unit 7a has a relative position of the candidate with respect to each of the plurality of grids set on the image (for example, the position of the center of the candidate with respect to the center of the grid) and a relative size of the candidate with respect to the image (for example, the position of the center of the candidate with respect to the center of the grid). CNN detects at least one of the following parameters: vertical and horizontal dimensions), a detection score that represents the likelihood of a candidate for an object, and a class score that represents the likelihood of a class (type) of an object in the grid. The likelihood of the candidate with respect to the object represents the certainty that the candidate indicates the object and / or the certainty that the candidate fits the object. The likelihood of a class of an object represents the certainty of the class to which the object belongs in the grid. In detecting each of the above parameters (relative positions of candidates, etc.), the "YOLO" technology can be adopted.

ここで、本実施形態の候補／関係スコア検出部７ａを構成するＣＮＮについて、説明を補足しておく。 Here, the description of the CNN constituting the candidate / relationship score detection unit 7a of the present embodiment will be supplemented.

図２は、候補／関係スコア検出部７ａ（ＣＮＮ）の構成を模式的に示す説明図である。ＣＮＮは、入力層１１と、畳み込み層１２と、プーリング層１３と、出力層１４とを有して構成される。畳み込み層１２およびプーリング層１３は、少なくとも１組あればよいが、複数組あってもよい。 FIG. 2 is an explanatory diagram schematically showing the configuration of the candidate / relationship score detection unit 7a (CNN). The CNN includes an input layer 11, a convolutional layer 12, a pooling layer 13, and an output layer 14. The convolution layer 12 and the pooling layer 13 may have at least one set, but may have a plurality of sets.

ＣＮＮを構成する上述した各層は、それぞれ複数のノード（またはユニット）を有しており、各層間で、複数のノードの少なくとも一部がエッジで結ばれている。ニューラルネットワークとは、人間の神経ネットワークを模倣した情報処理システムのことであるが、上記のノードは、人間の神経細胞に相当する工学的なニューロンのモデルを表す。各層は、活性化関数（応答関数）と呼ばれる関数を持ち、エッジは重みを持つ。したがって、各層のノードから出力される値は、前の層のノードの値と、エッジの重みと、層が持つ活性化関数とから計算される。なお、エッジの重みは、学習によって変化させることができる。 Each of the above-mentioned layers constituting the CNN has a plurality of nodes (or units), and at least a part of the plurality of nodes is connected by an edge between the layers. A neural network is an information processing system that imitates a human neural network, and the above-mentioned nodes represent an engineering neuron model corresponding to a human nerve cell. Each layer has a function called an activation function (response function), and edges have weights. Therefore, the value output from the node of each layer is calculated from the value of the node of the previous layer, the weight of the edge, and the activation function of the layer. The edge weight can be changed by learning.

入力層１１の各ノードには、１枚の画像を構成する各画素のデータ（画素値）がそれぞれ入力される。畳み込み層１２は、前の層の所定のノードから出力される値に対してフィルタ処理を行って特徴マップを得る。プーリング層１３は、畳み込み層１２から出力された特徴マップをさらに縮小して新たな特徴マップを得る。出力層１４は、ＣＮＮの最終層であり、前の層のノードの値と、エッジの重みと、出力層１４が持つ活性化関数とから、上述の各パラメータ（領域の候補、関係スコア、候補の相対位置、候補の相対的なサイズ、検出スコア、クラススコア）を出力する。なお、出力層１４は、フィルタ処理を行う畳み込み層で構成されてもよいし、１つ前の層の全ノードからの出力を結合し、所定の演算を行って各パラメータを出力する全結合層で構成されてもよい。 Data (pixel value) of each pixel constituting one image is input to each node of the input layer 11. The convolution layer 12 filters the values output from a predetermined node in the previous layer to obtain a feature map. The pooling layer 13 further reduces the feature map output from the convolution layer 12 to obtain a new feature map. The output layer 14 is the final layer of the CNN, and from the node values of the previous layer, the edge weights, and the activation function of the output layer 14, each of the above parameters (region candidate, relationship score, candidate) is used. Relative position of, relative size of candidates, detection score, class score) is output. The output layer 14 may be composed of a convolution layer to be filtered, or a fully connected layer that combines the outputs from all the nodes of the previous layer, performs a predetermined operation, and outputs each parameter. It may be composed of.

候補／関係スコア検出部７ａ（ＣＮＮ）の学習アルゴリズムとしては、例えば、正解付きの画像データを用い、上記画像データを入力したときに得られる出力層１４からの出力値と、正解を示す値との２乗誤差が最小となるように、最急降下法を用いて、各層（エッジ）の重みを出力層１４側から入力層１１側に向かって順次変化させていく誤差逆伝播法（バックプロパゲーション）を用いることができる。このようにＣＮＮを予め学習させておくことにより、検出対象の複数の物体が含まれる画像をＣＮＮに入力させたときに、上述した各パラメータをＣＮＮから出力させて、それらを検出（推定）することができる。 As the learning algorithm of the candidate / relationship score detection unit 7a (CNN), for example, an image data with a correct answer is used, and an output value from the output layer 14 obtained when the above image data is input and a value indicating the correct answer are used. Backpropagation is an error backpropagation method in which the weight of each layer (edge) is sequentially changed from the output layer 14 side to the input layer 11 side by using the steepest descent method so that the squared error of is minimized. ) Can be used. By learning the CNN in advance in this way, when an image containing a plurality of objects to be detected is input to the CNN, each of the above-mentioned parameters is output from the CNN to detect (estimate) them. be able to.

物体決定部７ｂは、所定のアルゴリズムに基づいて、候補／関係スコア検出部７ａによって検出した複数の物体の候補から、複数の物体の領域を決定する。関係決定部７ｃは、物体決定部７ｂが決定した（複数の物体の）領域に対応する候補のペアについて検出された関係スコアに基づいて、複数の物体の関係を決定する。なお、物体の領域の決定および物体間の関係の決定の詳細については、以下の物体検出方法の説明の中で併せて説明する。 The object determination unit 7b determines a region of a plurality of objects from a plurality of object candidates detected by the candidate / relationship score detection unit 7a based on a predetermined algorithm. The relationship determination unit 7c determines the relationship between a plurality of objects based on the relationship scores detected for the pair of candidates corresponding to the region (of the plurality of objects) determined by the object determination unit 7b. The details of determining the region of the object and determining the relationship between the objects will be described in the following description of the object detection method.

〔物体検出方法〕
次に、本実施形態の物体検出方法について説明する。図３は、図１の物体検出装置１による物体検出方法の処理の流れを示すフローチャートである。本実施形態の物体検出方法は、候補検出ステップ（Ｓ１）と、関係スコア検出ステップ（Ｓ２）と、物体決定ステップ（Ｓ３）と、関係決定ステップ（Ｓ４）とを含む。以下、より詳細に説明する。[Object detection method]
Next, the object detection method of the present embodiment will be described. FIG. 3 is a flowchart showing a processing flow of the object detection method by the object detection device 1 of FIG. The object detection method of the present embodiment includes a candidate detection step (S1), a relationship score detection step (S2), an object determination step (S3), and a relationship determination step (S4). Hereinafter, a more detailed description will be given.

（Ｓ１；候補検出ステップ）
Ｓ１では、検出処理部７の候補／関係スコア検出部７ａが、所定のアルゴリズムに基づくニューロ演算（ＣＮＮ）により、複数の物体が画像上で占める領域の候補を検出する。より詳しくは、以下の通りである。(S1; candidate detection step)
In S1, the candidate / relationship score detection unit 7a of the detection processing unit 7 detects a candidate for a region occupied by a plurality of objects on the image by a neuro operation (CNN) based on a predetermined algorithm. More details are as follows.

図４は、候補／関係スコア検出部７ａによる候補の検出の手順を模式的に示している。同図に示すように、候補／関係スコア検出部７ａは、ＨおよびＷを２以上の整数として、入力画像をＨ×Ｗ個（図４では例として４×４＝１６個）の複数のグリッドに分割したときの各グリッドごとに、全クラスに共通で２個の候補Ｂをクラススコア付きで検出する。なお、上記入力画像として、ここでは、ＣＮＮの出力層１４（図２参照）に入力される画像、すなわち、入力層１１に入力された画像から、畳み込み層１２およびプーリング層１３によって特徴が抽出された後の画像を考える。以下の説明では、上記入力画像のことを、単に画像とも称する。 FIG. 4 schematically shows a procedure for detecting a candidate by the candidate / relationship score detecting unit 7a. As shown in the figure, the candidate / relationship score detection unit 7a has a plurality of grids of H × W (4 × 4 = 16 as an example in FIG. 4) with H and W as integers of 2 or more. For each grid when divided into, two candidate Bs are detected with a class score, which is common to all classes. As the input image, here, features are extracted by the folding layer 12 and the pooling layer 13 from the image input to the output layer 14 (see FIG. 2) of the CNN, that is, the image input to the input layer 11. Consider the image after. In the following description, the input image is also simply referred to as an image.

なお、各グリッドごとに検出する候補Ｂの数をＫとしたとき、図４では、Ｋ＝２としているが、Ｋを２以外の値に設定することも可能である。また、Ｋの値は、クラスごとに（異なる値に）設定されてもよい。 When the number of candidate Bs to be detected for each grid is K, K = 2 in FIG. 4, but K can be set to a value other than 2. Further, the value of K may be set (to a different value) for each class.

また、Ｓ１では、候補／関係スコア検出部７ａは、画像上に設定された複数のグリッドのそれぞれに対する候補Ｂの相対位置、画像に対する候補Ｂの相対的なサイズ、候補Ｂの検出スコア、グリッドごとのクラススコアをさらに検出する。ここで、各グリッドに対する候補Ｂの相対位置は、各グリッドの中心に対する、候補Ｂの中心の縦方向（Ｈ方向）および横方向（Ｗ方向）の相対位置であり、縦方向および横方向についての２次元の情報である。また、候補Ｂの相対的なサイズは、画像全体の縦方向および横方向のサイズに対する、候補Ｂの縦方向および横方向のサイズであり、候補Ｂの相対位置と同様に、縦方向および横方向についての２次元の情報である。また、候補Ｂの検出スコアは、候補Ｂが画像上の物体を示すことの確からしさの情報、および／または、候補Ｂが物体にフィットしていることの確からしさの情報であり、Ｐ＝１または２として、Ｐ次元の情報であると言える。さらに、クラススコアは、設定されるクラスの数をＣ個として、各グリッドごとにＣ種類検出されるため、Ｃ次元の情報であると言える。 Further, in S1, the candidate / relationship score detection unit 7a uses the relative position of the candidate B with respect to each of the plurality of grids set on the image, the relative size of the candidate B with respect to the image, the detection score of the candidate B, and each grid. Further detect the class score of. Here, the relative positions of the candidate B with respect to each grid are the relative positions of the center of the candidate B in the vertical direction (H direction) and the horizontal direction (W direction) with respect to the center of each grid, with respect to the vertical direction and the horizontal direction. It is two-dimensional information. Further, the relative size of the candidate B is the vertical and horizontal sizes of the candidate B with respect to the vertical and horizontal sizes of the entire image, and is the same as the relative position of the candidate B in the vertical and horizontal directions. It is two-dimensional information about. Further, the detection score of the candidate B is information on the certainty that the candidate B indicates an object on the image and / or information on the certainty that the candidate B fits the object, and P = 1. Or 2, it can be said that it is P-dimensional information. Further, the class score can be said to be C-dimensional information because C types are detected for each grid, where C is the number of classes to be set.

したがって、後述する関係スコアの検出を考慮しなければ、候補／関係スコア検出部７ａ（ＣＮＮの出力層１４）から出力される特徴マップは、Ｈ×Ｗ×｛Ｋ（２＋２＋Ｐ）＋Ｃ）｝で表される３階のテンソルとなる。 Therefore, if the detection of the relational score described later is not considered, the feature map output from the candidate / relational score detection unit 7a (output layer 14 of CNN) is represented by H × W × {K (2 + 2 + P) + C)}. It will be the tensor on the 3rd floor.

図５は、特定のグリッドに対して複数の候補Ｂを検出する例を模式的に示している。図４では、検出する候補Ｂの縦横のサイズを一定として示しているが、候補／関係スコア検出部７ａを構成するＣＮＮは、予め所定の画像パターンを用いて学習されているため、検出する候補Ｂの縦横のサイズを、画像または画像上の物体に応じて変化させることができる。例えば、各グリッドの位置を、Ｈ方向の座標をｈとし、Ｗ方向の座標をｗとして、（ｈ，ｗ）で表し、（ｈ，ｗ）＝（１，２）のグリッド（斜線のグリッド参照）に着目したとき、候補／関係スコア検出部７ａは、複数の物体領域の候補Ｂとして、上記グリッド内に中心が位置する２つの候補Ｂ₁・Ｂ₂を検出する（ここではＫ＝２に設定されているため、検出される候補数は２個である）。図５の例では、候補／関係スコア検出部７ａが、候補Ｂ₂として、物体（例えば傘）にフィットする枠を検出した例を示している。FIG. 5 schematically shows an example of detecting a plurality of candidate Bs for a specific grid. In FIG. 4, the vertical and horizontal sizes of the candidate B to be detected are shown as constant, but since the CNN constituting the candidate / relationship score detection unit 7a has been learned in advance using a predetermined image pattern, the candidate to be detected can be detected. The vertical and horizontal sizes of B can be changed according to the image or the object on the image. For example, the position of each grid is represented by (h, w) with the coordinates in the H direction as h and the coordinates in the W direction as w, and the grid (see the diagonal grid) of (h, w) = (1, 2). _{), The candidate / relationship score detection unit 7a detects two candidates B 1} and B ₂ whose centers are located in the grid as candidates B of a plurality of object regions (here, K = 2). Since it is set, the number of candidates detected is two). In the example of FIG. 5, the candidate / relationship score detection unit 7a shows an example in which a frame that fits an object (for example, an umbrella) is detected as _{candidate B 2.}

図６は、他のグリッドに対して複数の候補Ｂを検出する例を模式的に示している。ここでは、（ｈ，ｗ）＝（３，２）、（３，４）、（４，３）、の各グリッドに対する候補Ｂの検出例を示している。このように、図５と同様の候補Ｂの検出が、他のグリッドに対しても行われ、最終的に、全グリッドに対して複数の候補Ｂが検出される。 FIG. 6 schematically shows an example of detecting a plurality of candidate Bs with respect to other grids. Here, an example of detecting candidate B for each grid of (h, w) = (3,2), (3,4), (4,3) is shown. In this way, the same detection of candidate B as in FIG. 5 is performed for other grids, and finally, a plurality of candidate Bs are detected for all grids.

図７は、各グリッドごとに検出されるクラススコアに基づき、クラススコアが最大であるクラスをグリッドごとに模式的に示したものである（クラスの違いはハッチングの違いで示している）。同図では、例として、「傘」についてのクラススコアが最大であるグリッドの領域と、「傘を持つ人物」についてのクラススコアが最大であるグリッドの領域と、「他の人物」についてのクラススコアが最大であるグリッドの領域とを区別して示している。候補／関係スコア検出部７ａがグリッドごとにクラススコアを検出することにより、そのクラススコアに基づき、各グリッドにおいてどのクラスが最も確からしいかを検出することができる。 FIG. 7 schematically shows the class having the maximum class score for each grid based on the class score detected for each grid (the difference between the classes is shown by the difference in hatching). In the figure, as an example, the area of the grid with the highest class score for "umbrella", the area of the grid with the highest class score for "person with an umbrella", and the class for "other person". It is shown separately from the area of the grid that has the highest score. When the candidate / relationship score detection unit 7a detects the class score for each grid, it is possible to detect which class is most likely in each grid based on the class score.

（Ｓ２；関係スコア検出ステップ）
Ｓ２では、候補／関係スコア検出部７ａが、所定のアルゴリズムに基づくニューロ演算（ＣＮＮ）により、各候補間の関係の尤度を表す関係スコアを検出する。このＳ２の関係スコア検出ステップは、上述したＳ１の候補検出ステップと並列に実行される。ここでは、Ｓ１の候補検出ステップと、Ｓ２の関係スコア検出ステップとをまとめて、候補／関係スコア検出ステップＳ１０と呼ぶ。以下、Ｓ２の関係スコア検出ステップの詳細について説明する。(S2; Relationship score detection step)
In S2, the candidate / relationship score detection unit 7a detects the relationship score representing the likelihood of the relationship between each candidate by a neuro operation (CNN) based on a predetermined algorithm. The relationship score detection step of S2 is executed in parallel with the candidate detection step of S1 described above. Here, the candidate detection step of S1 and the relationship score detection step of S2 are collectively referred to as a candidate / relationship score detection step S10. Hereinafter, the details of the relationship score detection step of S2 will be described.

図８は、候補／関係スコア検出部７ａによる関係スコアの検出の手順を模式的に示している。ここで、Ｈ’×Ｗ’（ただしＨ’＜Ｈ、Ｗ’＜Ｗ）の範囲内のいずれかのグリッドｉにおいて検出された一の候補を、Ｂ（ｋ，ｉ）とする。ただし、ｋは１からＫまでのいずれかの整数であり、ｉは１からＨ’×Ｗ’までのいずれかの整数であり、グリッドｉは、Ｈ’×Ｗ’の範囲内の第ｉ番目のグリッドを指す。また、Ｈ’×Ｗ’の範囲内のいずれかのグリッドｊにおいて検出された他の候補を、Ｂ（ｋ，ｊ）とする。ただし、ｊは１からＨ’×Ｗ’までのいずれかの整数であり、グリッドｊは、Ｈ’×Ｗ’の範囲内の第ｊ番目のグリッドを指す。 FIG. 8 schematically shows a procedure for detecting a relationship score by the candidate / relationship score detection unit 7a. Here, one candidate detected in any grid i within the range of H'× W'(where H'<H, W'<W) is defined as B (k, i). However, k is any integer from 1 to K, i is any integer from 1 to H'× W', and the grid i is the i-th within the range of H'× W'. Point to the grid of. Further, let B (k, j) be another candidate detected in any grid j within the range of H'x W'. However, j is an integer from 1 to H'xW', and grid j points to the jth grid within the range of H'xW'.

同図に示すように、候補／関係スコア検出部７ａは、Ｓ１で検出した各候補Ｂから、一の候補Ｂ（ｋ，ｉ）を抽出するとともに、その一の候補Ｂ（ｋ，ｉ）の周囲の予め設定されたＨ’×Ｗ’の範囲内の他の候補Ｂ（ｋ，ｊ）、Ｂ（ｋ，ｊ＋１）、・・・を順に参照して、一の候補Ｂ（ｋ，ｉ）の他の候補Ｂ（ｋ，ｊ）、Ｂ（ｋ，ｊ＋１）、・・・に対する関係スコアをそれぞれ検出し、この処理をＳ１で検出した各候補Ｂごとに（各候補Ｂから抽出する一の候補を順に変えて）行う。候補／関係スコア検出部７ａを構成するＣＮＮは、予め所定の画像パターンを用いて学習されているため、一の候補Ｂ（ｋ，ｉ）の他の候補Ｂ（ｋ，ｊ）等に対する関係スコアについても、ＣＮＮでのニューロ演算によって検出することができる。なお、図８では、便宜的に、一の候補Ｂ（ｋ，ｉ）および他の候補Ｂ（ｋ，ｊ）、Ｂ（ｋ，ｊ＋１）、・・・を一定のサイズで示しているが、画像または物体に応じて縦横のサイズが変化する点は上記と同様である。 As shown in the figure, the candidate / relationship score detection unit 7a extracts one candidate B (k, i) from each candidate B detected in S1 and of the one candidate B (k, i). One candidate B (k, i) with reference to other candidates B (k, j), B (k, j + 1), ... Within the surrounding preset range of H'x W'. Relationship scores for other candidates B (k, j), B (k, j + 1), ... Are detected, and this process is performed for each candidate B detected in S1 (one extracted from each candidate B). (Change the candidates in order). Since the CNN constituting the candidate / relationship score detection unit 7a is learned in advance using a predetermined image pattern, the relationship score of one candidate B (k, i) with respect to another candidate B (k, j) or the like is obtained. Can also be detected by neuro-calculation on CNN. In FIG. 8, for convenience, one candidate B (k, i) and other candidates B (k, j), B (k, j + 1), ... Are shown in a fixed size. The point that the vertical and horizontal sizes change according to the image or the object is the same as the above.

ここで、２つの候補間にある関係Ｒが存在する場合をＲ＝１とし、関係Ｒが存在しない場合をＲ＝０としたとき、グリッドｉ内の一の候補Ｂ（ｋ，ｉ）の、グリッドｊ内の他の候補Ｂ（ｋ，ｊ）に対する関係Ｒの有無およびその確率（関係スコアＰ）は、例えば以下のように表される。すなわち、グリッドｉ内の一の候補Ｂ（ｋ，ｉ）の、グリッドｊ内の他の候補Ｂ（ｋ，ｊ）に対する関係Ｒの存在は、Ｒ＝１｜Ｂ（ｋ，ｉ），Ｂ（ｋ，ｊ）で表され、関係Ｒが存在する確率（関係スコアＰ）は、Ｐ｛Ｒ＝１｜Ｂ（ｋ，ｉ），Ｂ（ｋ，ｊ）｝で表される。同様に、グリッドｉ内の一の候補Ｂ（ｋ，ｉ）の、グリッドｊ内の他の候補Ｂ（ｋ，ｊ）に対する関係Ｒの不存在は、Ｒ＝０｜Ｂ（ｋ，ｉ），Ｂ（ｋ，ｊ）で表され、関係Ｒが不存在である確率（関係スコアＰ）は、Ｐ｛Ｒ＝０｜Ｂ（ｋ，ｉ），Ｂ（ｋ，ｊ）｝で表される。候補／関係スコア検出部７ａは、少なくとも、各候補間に関係Ｒが存在する関係スコアＰ（Ｒ＝１の場合）をＣＮＮによって検出する。なお、上記したＰ｛Ｒ＝１｜Ｂ（ｋ，ｉ），Ｂ（ｋ，ｊ）｝と、Ｐ｛Ｒ＝０｜Ｂ（ｋ，ｉ），Ｂ（ｋ，ｊ）｝は、いずれも、０から１までのいずれかの値をとり、Ｐ｛Ｒ＝１｜Ｂ（ｋ，ｉ），Ｂ（ｋ，ｊ）｝＋Ｐ｛Ｒ＝０｜Ｂ（ｋ，ｉ），Ｂ（ｋ，ｊ）｝＝１である。 Here, when the case where the relation R between the two candidates exists is set to R = 1, and the case where the relation R does not exist is set to R = 0, the case of one candidate B (k, i) in the grid i is set to The presence / absence of a relation R with respect to another candidate B (k, j) in the grid j and its probability (relationship score P) are expressed as follows, for example. That is, the existence of the relationship R of one candidate B (k, i) in the grid i with respect to the other candidate B (k, j) in the grid j is such that R = 1 | B (k, i), B ( It is represented by k, j), and the probability that the relationship R exists (relationship score P) is represented by P {R = 1 | B (k, i), B (k, j)}. Similarly, the absence of the relation R of one candidate B (k, i) in the grid i to the other candidate B (k, j) in the grid j is R = 0 | B (k, i) ,. It is represented by B (k, j), and the probability that the relation R does not exist (relationship score P) is represented by P {R = 0 | B (k, i), B (k, j)}. The candidate / relationship score detection unit 7a detects at least the relationship score P (in the case of R = 1) in which the relationship R exists between the candidates by CNN. In addition, P {R = 1 | B (k, i), B (k, j)} and P {R = 0 | B (k, i), B (k, j)} described above are both. , 0 to 1 and P {R = 1 | B (k, i), B (k, j)} + P {R = 0 | B (k, i), B (k, j)} = 1.

また、画像上で一の候補Ｂ（ｋ，ｉ）が「人物」の領域に対応しており、他の候補Ｂ（ｋ，ｊ）が「傘」の領域に対応している場合、一の候補Ｂ（ｋ，ｉ）の他の候補Ｂ（ｋ，ｊ）に対する関係Ｒの一例としては、「人物が傘を持つ」場合の「持つ」が考えられる。この場合、グリッドｉ内の一の候補Ｂ（ｋ，ｉ）の、グリッドｊ内の他の候補Ｂ（ｋ，ｊ）に対する関係Ｒ（「持つ」という関係）が存在する確率（関係スコアＰ）は、Ｐ｛持つ＝１｜Ｂ（ｋ，ｉ），Ｂ（ｋ，ｊ）｝のように表すことができる。 Further, when one candidate B (k, i) corresponds to the area of "person" on the image and the other candidate B (k, j) corresponds to the area of "umbrella", one candidate B (k, i) corresponds to the area of "umbrella". As an example of the relationship R of the candidate B (k, i) with respect to the other candidate B (k, j), "having" in the case of "a person holding an umbrella" can be considered. In this case, the probability that one candidate B (k, i) in the grid i has a relationship R (relationship of "having") with respect to another candidate B (k, j) in the grid j (relationship score P). Can be expressed as P {having = 1 | B (k, i), B (k, j)}.

図９は、候補／関係スコア検出部７ａ（ＣＮＮの出力層１４）から出力される特徴マップを示している。上記した関係スコアＰ（ここではＲ＝１についてのスコアを考える）は、Ｈ’×Ｗ’の範囲内の各グリッドごとに、候補Ｂ（ｋ，ｉ）がＫ個存在し、候補Ｂ（ｋ，ｊ）もＫ個存在することから、Ｈ’×Ｗ’の範囲内で得られる関係スコアＰの個数は、合計でＨ’×Ｗ’×Ｋ×Ｋ（＝Ｈ’Ｗ’ＫＫ）個となる。本実施形態では、候補／関係スコア検出部７ａは、単一のＣＮＮにより、Ｓ１での候補の検出（ＣＮＮによるニューロ演算）と、Ｓ２での関係スコアＰの検出（ＣＮＮによるニューロ演算）とをまとめて（同時に、並列に）行う。これにより、候補／関係スコア検出部７ａから出力される特徴マップは、同図に示すように、Ｈ×Ｗ×｛Ｋ（２＋２＋Ｐ）＋Ｃ）＋Ｈ’Ｗ’ＫＫ｝で表される３階のテンソルとなる。 FIG. 9 shows a feature map output from the candidate / relationship score detection unit 7a (CNN output layer 14). In the above-mentioned relationship score P (here, the score for R = 1 is considered), there are K candidates B (k, i) for each grid within the range of H'× W', and candidate B (k). , J) also exists, so the total number of relational scores P obtained within the range of H'xW'is H'xW'xKxK (= H'W'KK). Become. In the present embodiment, the candidate / relationship score detection unit 7a detects the candidate in S1 (neuro calculation by CNN) and the detection of the relationship score P in S2 (neuro calculation by CNN) by a single CNN. Do it all together (at the same time, in parallel). As a result, the feature map output from the candidate / relationship score detection unit 7a is a tensor on the third floor represented by H × W × {K (2 + 2 + P) + C) + H'W'KK}, as shown in the figure. It becomes.

（Ｓ３；物体決定ステップ）
Ｓ３では、物体決定部７ｂが、所定のアルゴリズムに基づいて、Ｓ１で検出した複数の物体の候補Ｂから、複数の物体の領域を決定する。ここで、上記所定のアルゴリズムは、Non-Maximum Suppression（ＮＭＳ）である。ＮＭＳは、同じクラスとして認識され、重なっている領域（候補Ｂ）のうち、スコア（Ｓ１で求めた検出スコアまたはクラススコア）の低いほうの領域（または候補）を除去し、スコアの高いほうの領域（または候補）を残すことで、重複を抑制するアルゴリズムである。なお、２つの領域（または候補）の「重なり」は、ＩｏＵ（Intersection over Union）値と閾値とに基づいて判断することができる。ＩｏＵ値は、例えば２つの候補ｐ、ｑに着目した場合、｛（候補ｐと候補ｑの両方に含まれる領域の面積）／（候補ｐ、ｑの少なくとも一方に含まれる領域の面積）｝の値（割合）を指し、このＩｏＵ値が閾値以上の場合に、２つの候補ｐおよびｑが「重なっている」と判断することができる。(S3; object determination step)
In S3, the object determination unit 7b determines the regions of a plurality of objects from the candidates B of the plurality of objects detected in S1 based on a predetermined algorithm. Here, the above-mentioned predetermined algorithm is Non-Maximum Suppression (NMS). The NMS is recognized as the same class, and among the overlapping regions (candidate B), the region (or candidate) having the lower score (detection score or class score obtained in S1) is removed, and the region having the higher score is removed. It is an algorithm that suppresses duplication by leaving an area (or candidate). The "overlap" of the two regions (or candidates) can be determined based on the IOU (Intersection over Union) value and the threshold value. The IoU value is, for example, when focusing on two candidates p and q, {(the area of the region included in both the candidate p and the candidate q) / (the area of the region included in at least one of the candidates p and q)}. It refers to a value (ratio), and when this IoT value is equal to or greater than a threshold value, it can be determined that the two candidates p and q "overlap".

図１０は、物体決定部７ｂが図６で示した複数の候補（破線の枠参照）に対してＮＭＳを適用し、複数の物体の領域Ｂ_A、Ｂ_B、Ｂ_C、Ｂ_Dを決定した状態を示している。領域Ｂ_A、Ｂ_B、Ｂ_C、Ｂ_Dは、画像上で、それぞれ、「傘を持つ人物」の領域、「傘」の領域、「傘を持つ人物と対面している人物の顔」の領域、「傘を持つ人物と対面している人物全体」の領域に対応している。このように、ＮＭＳを適用することにより、重複する領域の候補から適切な候補を絞り込み、絞り込んだ候補を画像上での物体の領域として適切に決定することができる。In FIG. 10, the object determination unit 7b applied NMS to the plurality of candidates (see the frame of the broken line) shown in FIG. 6 to determine the regions B _A , B _B , _BC , and _BD of the plurality of objects. Indicates the state. Areas B _A , B _B , _BC , and _BD are the areas of "the person with an umbrella", the area of "umbrella", and the "face of the person facing the person with an umbrella", respectively, on the image. It corresponds to the area, "the whole person facing the person holding the umbrella". In this way, by applying NMS, an appropriate candidate can be narrowed down from the candidates of the overlapping region, and the narrowed down candidate can be appropriately determined as the region of the object on the image.

（Ｓ４；関係決定ステップ）
Ｓ４では、関係決定部７ｃが、Ｓ３で決定した領域に対応する候補の組（ペア）について、Ｓ２で検出された関係スコアＰに基づいて、複数の物体の関係を決定する。より具体的には、以下の通りである。(S4; relationship determination step)
In S4, the relationship determination unit 7c determines the relationship between a plurality of objects based on the relationship score P detected in S2 for the candidate pair (pair) corresponding to the region determined in S3. More specifically, it is as follows.

まず、関係決定部７ｃは、Ｓ３で決定した領域に対応する候補のペアについての関係をグラフ表現する。図１１は、図１０で示した各領域Ｂ_A、Ｂ_B、Ｂ_C、Ｂ_Dに対応する候補Ｂ_a、Ｂ_b、Ｂ_c、Ｂ_dから選択される２つの候補についての関係をグラフ表現した図である。図１１では、２つの候補間の関係を矢印で示すとともに、矢印の太さ（線幅）を関係スコアＰの値に対応させて表現している。なお、矢印の始点にある候補は、図８で示したＨ’×Ｗ’の範囲内のいずれかのグリッドｉにおいて検出された一の候補Ｂ（ｋ，ｉ）に対応しており、矢印の終点にある候補は、Ｈ’×Ｗ’の範囲内のいずれかのグリッドｊにおいて検出された他の候補Ｂ（ｋ，ｊ）に対応している。First, the relationship determination unit 7c graphically represents the relationship for the pair of candidates corresponding to the region determined in S3. FIG. 11 is a graphical representation of the relationship between two candidates selected from the candidates B _a , B _b , B _c , and B _d _{corresponding to the respective regions B A} , B _B , _BC , and _{BD shown in FIG.} It is a figure that was made. In FIG. 11, the relationship between the two candidates is indicated by an arrow, and the thickness (line width) of the arrow is expressed corresponding to the value of the relationship score P. The candidate at the start point of the arrow corresponds to one candidate B (k, i) detected in any grid i within the range of H'× W'shown in FIG. The candidate at the end point corresponds to another candidate B (k, j) detected in any grid j within the range of H'x W'.

図１１において、例えば、候補Ｂ_aから候補Ｂ_bに向かう矢印は、候補Ｂ_a（「人物」）が候補Ｂ_b（「傘」）に対して関係（人物が傘を「持つ」という関係）があること、およびその「持つ」という関係の尤度を表す関係スコアＰが最大であることを示している（矢印の線幅が最も大きいため）。候補Ｂ_a、Ｂ_b、Ｂ_c、Ｂ_dから選択される２つの候補のいずれについても、それらの間の関係スコアＰは、Ｓ２の関係スコア検出ステップで既に検出されているため、関係決定部７ｃは、図１１のように、上記２つの候補間の関係を、上記関係スコアＰを用いてグラフ化することができる。In FIG. 11, for example, the arrow _{from candidate B a} to candidate B _b _{has a relationship between candidate B a} (“person”) and candidate B _b (“umbrella”) (a relationship in which a person “holds” an umbrella). It shows that there is, and that the relationship score P, which indicates the likelihood of the relationship of "having", is the maximum (because the line width of the arrow is the largest). For any of the two candidates selected from the candidates B _a , B _b , B _c , and B _d , the relationship score P between them has already been detected in the relationship score detection step of S2, so that the relationship determination unit In 7c, as shown in FIG. 11, the relationship between the two candidates can be graphed using the relationship score P.

図１１のように、関係スコアＰが複数存在する場合（図１１で矢印の本数が複数である場合）、関係決定部７ｃは、複数の関係スコアＰに対して閾値処理を行って閾値未満の関係スコアＰを除外する。図１２は、閾値処理の前後での２つの候補間の関係を模式的に示している。このように閾値未満の関係スコアＰを除外することにより、閾値以上の関係スコアＰを示す矢印だけが残る。これにより、複数の候補のペアの中から、関係を決定する必要のある候補のペアを絞り込むことができる。同図では、閾値処理の結果、候補Ｂ_aから候補Ｂ_bに対して得られる関係スコアＰ１と、候補Ｂ_bから候補Ｂ_aに対して得られる関係スコアＰ２のみが残り、複数の候補のペアの中から、関係を決定する必要のあるペアが、候補Ｂ_aおよび候補Ｂ_bのペアに絞られたことを示している。As shown in FIG. 11, when there are a plurality of relationship scores P (when the number of arrows is a plurality in FIG. 11), the relationship determination unit 7c performs threshold processing on the plurality of relationship scores P and is less than the threshold value. Exclude the relationship score P. FIG. 12 schematically shows the relationship between the two candidates before and after the threshold processing. By excluding the relationship score P below the threshold value in this way, only the arrow indicating the relationship score P above the threshold value remains. As a result, it is possible to narrow down the candidate pairs for which the relationship needs to be determined from the plurality of candidate pairs. In the figure, as a result of the threshold processing, only the relationship score P1 obtained _{from the candidate B a} to the candidate B _b _{and the relationship score P2 obtained from the candidate B b} to the candidate B _a remain, and a pair of a plurality of candidates remains. It is shown that the pairs for which the relationship needs to be determined _{are narrowed down to the pair of candidate B a} and the pair of candidate B _b.

このように、２つの候補Ｂ_a・Ｂ_bからなるペアについて得られる関係スコアＰとして、一方の候補Ｂ_aから他方の候補Ｂ_bに対して得られる関係スコアＰ１と、他方の候補Ｂ_bから一方の候補Ｂ_aに対して得られる関係スコアＰ２とが存在するとき、関係決定部７ｃは、より小さいほうの関係スコアを除外する。図１３は、関係スコアＰを絞り込む前後での２つの候補間の関係を模式的に示している。同じ２つの候補Ｂ_a・Ｂ_bについて得られる関係スコアＰ１・Ｐ２のうち、より小さいほうの関係スコア（同図では関係スコアＰ２）を除外することにより、信頼性の高い関係スコアＰ１だけが残る。これにより、残った関係スコアＰ１に基づいて、２つの候補Ｂ_a・Ｂ_b間の関係、つまり、２つの候補Ｂ_a・Ｂ_bに対応する２つの物体間の関係を適切に決定することができる。同図の例では、関係決定部７ｃが、残った関係スコアＰ１に基づき、候補Ｂ_aに対応する物体（「人物」）と、候補Ｂ_bに対応する物体（「傘」）との関係として、人物が傘を「持つ」という関係を決定することになる。As described above, as the relationship score P obtained for the pair consisting of the _{two candidates B a} and B _{b, the} _{relationship score P1 obtained from one candidate B a} to the other candidate B _b and the other candidate B _b When there is a relationship score P2 obtained for one candidate B _a , the relationship determination unit 7c excludes the smaller relationship score. FIG. 13 schematically shows the relationship between the two candidates before and after narrowing down the relationship score P. By excluding the smaller relationship score (relationship score P2 in the figure) from the relationship scores P1 and P2 obtained for the same two candidates B _a and B _{b, only the highly reliable relationship score P1 remains.} .. As a result, the relationship between the two candidates B _a and B _b , that is, the relationship between the two objects corresponding to the two _{candidates B a} and B _{b can be appropriately determined based on the remaining relationship score P1.} it can. In the example of the figure, the relationship determination unit 7c determines the relationship between _{the object corresponding to the candidate B a} (“person”) and the object corresponding to the _{candidate B b (“umbrella”) based on the remaining relationship score P1.} , The person will decide the relationship of "holding" an umbrella.

〔効果〕
以上のように、本実施形態の物体検出方法は、候補／関係スコア検出部７ａが、複数の物体が画像上で占める領域の候補Ｂを検出し（Ｓ１）、検出した各候補Ｂごとに、上記候補を一の候補Ｂ（ｋ，ｉ）として、予め設定されたＨ’×Ｗ’の範囲内の他の候補Ｂ（ｋ，ｊ）に対する関係スコアＰを検出する（Ｓ２）。そして、物体決定部７ｂが、所定のアルゴリズム（例えばＮＭＳ）に基づいて、複数の候補Ｂから複数の物体の領域を決定し（Ｓ３）、関係決定部７ｃが、関係スコアＰに基づいて、複数の物体の関係を決定する（Ｓ４）。候補Ｂの検出および関係スコアＰの検出は、候補／関係スコア検出部７ａを構成するＣＮＮ、つまり、単一のＣＮＮによってまとめて（同時に、並列に、非段階的に、Single-shotで）行うことができるため、複数のＣＮＮを用いて人物等の候補の抽出、抽出した候補の統合および再度の畳み込みを行う従来技術のような、候補選択のための中間処理（統合、畳み込み）は不要であり、畳み込みの処理も冗長とならない。これにより、上記関係スコアに基づいて、画像から複数の物体およびそれらの関係を高速で検出することが可能となる。その結果、本実施形態の物体検出方法および物体検出装置１は、そのような高速な検出が必要とされるアプリケーションにも容易に適用可能となる。〔effect〕
As described above, in the object detection method of the present embodiment, the candidate / relationship score detection unit 7a detects candidate B in the area occupied by a plurality of objects on the image (S1), and for each detected candidate B, With the above candidate as one candidate B (k, i), the relationship score P with respect to the other candidate B (k, j) within the preset range of H'× W'is detected (S2). Then, the object determination unit 7b determines the regions of a plurality of objects from the plurality of candidates B based on a predetermined algorithm (for example, NMS) (S3), and the relationship determination unit 7c determines a plurality of regions based on the relationship score P. The relationship between the objects of (S4) is determined. The detection of the candidate B and the detection of the relationship score P are performed collectively by the CNNs constituting the candidate / relationship score detection unit 7a, that is, a single CNN (at the same time, in parallel, in a non-stepwise manner, in a single-shot). Therefore, there is no need for intermediate processing (integration, convolution) for candidate selection as in the conventional technology that extracts candidates such as people using multiple CNNs, integrates the extracted candidates, and convolves again. Yes, the convolution process is not redundant. This makes it possible to detect a plurality of objects and their relationships at high speed from the image based on the relationship score. As a result, the object detection method and the object detection device 1 of the present embodiment can be easily applied to applications that require such high-speed detection.

特に、候補／関係スコア検出部７ａは、Ｓ１での候補Ｂの検出とＳ２での関係スコアＰの検出とを、ニューロ演算によって並列して行うため、複数の物体および関係の検出用のＣＮＮを１種類とすることができ、上記従来技術に比べて、簡単な構成で、かつ、高速で、複数の物体および関係を検出することができる。 In particular, since the candidate / relationship score detection unit 7a performs the detection of the candidate B in S1 and the detection of the relationship score P in S2 in parallel by the neuro operation, a CNN for detecting a plurality of objects and relationships is generated. It can be one type, and can detect a plurality of objects and relationships with a simpler configuration and at a higher speed than the above-mentioned conventional technique.

また、上記のニューロ演算は、ＣＮＮによる演算である。この場合、候補Ｂの検出および関係スコアＰの検出を、単一のＣＮＮで並列して同時に行うことができるため、検出した関係スコアＰに基づいて、複数の物体およびそれらの関係を高速で検出することが確実に可能となる。 Further, the above neuro operation is an operation by CNN. In this case, since the detection of the candidate B and the detection of the relationship score P can be performed simultaneously in parallel with a single CNN, a plurality of objects and their relationships can be detected at high speed based on the detected relationship score P. It is definitely possible to do so.

なお、候補／関係スコア検出部７ａを構成するニューラルネットワークは、ＣＮＮには限定されず、各層のノードが全結合（fully connected）であるＭＬＰ（多層パーセプトロン）であってもよいし、ＲＮＮ（Recurrent Neural Networks）などの他のニューラルネットワークであってもよい。ただし、関係の認識（検出）においては、一般的な物体の種類の認識などよりも、はるかに複雑な外観（画像）をモデリングする（学習させる）必要があるが、処理が簡単である点、および少ない学習データで認識が可能になる点では、ＣＮＮを用いることが望ましい。 The neural network constituting the candidate / relationship score detection unit 7a is not limited to CNN, and may be an MLP (multilayer perceptron) in which the nodes of each layer are fully connected, or an RNN (Recurrent). It may be another neural network such as Neural Networks). However, in relationship recognition (detection), it is necessary to model (learn) a much more complicated appearance (image) than in general object type recognition, but it is easy to process. It is desirable to use CNN because it enables recognition with a small amount of training data.

また、候補／関係スコア検出部７ａは、画像上に設定された複数のグリッドのそれぞれに対する候補Ｂの相対位置、画像に対する候補Ｂの相対的なサイズ、物体に対する候補Ｂの検出スコア、グリッドにおける物体のクラススコアの少なくともいずれかのパラメータをさらに検出する。この場合、従来の“YOLO”の技術を有効利用して候補Ｂを検出することができるとともに、上記検出スコア等に基づくＮＭＳ処理により、複数の候補Ｂから物体の領域を決定することが可能となる。 Further, the candidate / relationship score detection unit 7a includes the relative position of the candidate B with respect to each of the plurality of grids set on the image, the relative size of the candidate B with respect to the image, the detection score of the candidate B with respect to the object, and the object on the grid. Further detect at least one of the parameters of the class score of. In this case, the candidate B can be detected by effectively utilizing the conventional "YOLO" technology, and the region of the object can be determined from the plurality of candidate Bs by the NMS processing based on the detection score or the like. Become.

〔物体検出装置の他の構成〕
図１４は、物体検出装置１の他の構成を示すブロック図である。図１４の物体検出装置１は、検出処理部７がさらに説明文作成部７ｄを有している点以外は、図１の構成と全く同様である。また、図１５は、図１４の物体検出装置１による物体検出方法の処理の流れを示すフローチャートであり、関係決定ステップ（Ｓ４）の後に、説明文作成工程（Ｓ５）が加わった以外は、図３のフローチャートと全く同様である。以下、図１および図３と異なる部分について説明する。[Other configurations of object detection device]
FIG. 14 is a block diagram showing another configuration of the object detection device 1. The object detection device 1 of FIG. 14 is exactly the same as the configuration of FIG. 1 except that the detection processing unit 7 further has an explanatory text creation unit 7d. Further, FIG. 15 is a flowchart showing a processing flow of the object detection method by the object detection device 1 of FIG. 14, except that the explanatory text creation step (S5) is added after the relationship determination step (S4). It is exactly the same as the flowchart of 3. Hereinafter, parts different from those in FIGS. 1 and 3 will be described.

説明文作成部７ｄは、Ｓ３で決定した複数の物体の領域と、Ｓ４で決定した複数の物体の関係とに基づいて、画像の説明文を作成する。このような説明文作成部７ｄは、ニューロ演算を行うＣＮＮやＲＮＮなどのニューラルネットワークで構成されている。なお、説明文作成部７ｄのニューラルネットワークは、候補／関係スコア検出部７ａのニューラルネットワークとは別個に構成される（互いに区別される）。 The description creation unit 7d creates a description of the image based on the region of the plurality of objects determined in S3 and the relationship between the plurality of objects determined in S4. Such an explanatory text creating unit 7d is composed of a neural network such as CNN or RNN that performs a neuro operation. The neural network of the explanatory text creation unit 7d is configured separately from the neural network of the candidate / relationship score detection unit 7a (distinguished from each other).

例えば、物体決定部７ｂがＮＭＳによって画像上で物体（「人物」）の領域Ｂ_Aと物体（「傘」）の領域Ｂ_Bとを決定し（図１０参照）、関係決定部７ｃが２つの物体Ｂ_A・Ｂ_Bに対応する候補Ｂ_a・Ｂ_bの関係（例えば「持つ」）を決定した場合（図１３参照）、説明文作成部７ｄは、「人物が傘を持つ」という説明文を作成する（Ｓ５；説明文作成工程）。作成した説明文のデータは、例えば記憶部３に記憶され、必要に応じて読み出されて使用される。例としては、上記説明文（データ）は、画像と併せて表示部５に表示されたり、通信部６を介して、画像データと併せて外部の端末に送信される。For example, to determine the region B _B region B _A and the object of the object the object determination unit 7b is on the image by the NMS ( "person") ( "Umbrella") (see FIG. 10), the relationship determining unit 7c are two When the relationship between the candidates B _a and B _b corresponding to the objects B _A and B _B (for example, "having") is determined (see FIG. 13), the explanation creating unit 7d has an explanation saying "a person has an umbrella". (S5; description creation step). The data of the created explanatory text is stored in, for example, the storage unit 3, and is read out and used as needed. As an example, the above explanatory text (data) is displayed on the display unit 5 together with the image, or is transmitted to an external terminal together with the image data via the communication unit 6.

上記のように説明文作成部７ｄが、複数の物体の領域と関係とに基づいて、画像の説明文を作成することにより、上記説明文を画像と併せて利用者に提示するアプリケーションを実現することが可能となる。これにより、物体検出装置１の利便性を向上させることができる。また、説明文作成部７ｄは、ＣＮＮなどを用いたニューロ演算により、説明文を簡単に作成することができる。 As described above, the description creation unit 7d creates a description of the image based on the areas and relationships of a plurality of objects, thereby realizing an application that presents the description to the user together with the image. It becomes possible. Thereby, the convenience of the object detection device 1 can be improved. In addition, the explanatory text creation unit 7d can easily create an explanatory text by a neuro operation using CNN or the like.

〔プログラムおよび記録媒体〕
本実施形態で説明した物体検出装置１は、例えば、所定のプログラム（アプリケーションソフトウェア）をインストールしたコンピュータ（ＰＣ）で構成することができる。上記プログラムをコンピュータ（例えばＣＰＵとしての制御部２）が読み取って実行することにより、物体検出装置１の各部を動作させて上述した各処理（各工程）を実行させることができる。このようなプログラムは、例えばネットワークを介して外部からダウンロードすることによって取得されて記憶部３に記憶される。また、上記プログラムは、例えばＣＤ−ＲＯＭ（Compact Disk-Read Only Memory）などのコンピュータ読取可能な記録媒体に記録され、この記録媒体から上記プログラムをコンピュータが読み取って記憶部３に記憶される形態であってもよい。[Programs and recording media]
The object detection device 1 described in the present embodiment can be configured by, for example, a computer (PC) in which a predetermined program (application software) is installed. By reading and executing the above program by a computer (for example, a control unit 2 as a CPU), each unit of the object detection device 1 can be operated to execute each process (each process) described above. Such a program is acquired by downloading from the outside via a network, for example, and is stored in the storage unit 3. Further, the program is recorded on a computer-readable recording medium such as a CD-ROM (Compact Disk-Read Only Memory), and the computer reads the program from the recording medium and stores it in the storage unit 3. There may be.

〔その他〕
以上で説明した本実施形態の物体検出方法および物体検出装置は、以下のように表現されてもよい。また、本実施形態で説明した内容は、以下のように表現されるプログラムおよび記録媒体も含む。[Other]
The object detection method and the object detection device of the present embodiment described above may be expressed as follows. In addition, the contents described in the present embodiment also include a program and a recording medium expressed as follows.

１．画像から複数の物体と前記複数の物体の関係とを検出する物体検出方法であって、前記複数の物体が前記画像上で占める領域の候補を検出する候補検出ステップと、検出した前記候補ごとに、前記候補を一の候補として、前記一の候補の周囲の予め設定された範囲内の他の候補を参照して、前記一の候補の前記他の候補に対する所定の関係の尤度を表す関係スコアを検出する関係スコア検出ステップと、所定のアルゴリズムに基づいて、検出した前記候補から前記複数の物体の前記領域を決定する物体決定ステップと、決定した前記領域に対応する候補のペアについて検出された前記関係スコアに基づいて、前記複数の物体の前記関係を決定する関係決定ステップとを含むことを特徴とする物体検出方法。 1. 1. An object detection method for detecting the relationship between a plurality of objects and the plurality of objects from an image, which is a candidate detection step for detecting a candidate for a region occupied by the plurality of objects on the image, and for each of the detected candidates. , With the candidate as one candidate, with reference to other candidates within a preset range around the one candidate, a relationship representing the likelihood of a predetermined relationship of the one candidate with respect to the other candidate. A relationship score detection step for detecting a score, an object determination step for determining the region of the plurality of objects from the detected candidates based on a predetermined algorithm, and a pair of candidates corresponding to the determined region are detected. A method for detecting an object, which includes a relationship determination step for determining the relationship between the plurality of objects based on the relationship score.

２．前記候補検出ステップおよび前記関係スコア検出ステップは、ニューロ演算によって並列に実行されることを特徴とする前記１に記載の物体検出方法。 2. The object detection method according to 1, wherein the candidate detection step and the relationship score detection step are executed in parallel by a neuro operation.

３．前記ニューロ演算は、ＣＮＮによる演算であることを特徴とする前記２に記載の物体検出方法。 3. 3. The object detection method according to 2, wherein the neuro operation is an operation by CNN.

４．前記候補検出ステップでは、前記画像上に設定された複数のグリッドのそれぞれに対する前記候補の相対位置、前記画像に対する前記候補の相対的なサイズ、前記物体に対する前記候補の尤度を表す検出スコア、前記グリッドにおける前記物体のクラスの尤度を表すクラススコアの少なくともいずれかのパラメータをさらに検出することを特徴とする前記１から３のいずれかに記載の物体検出方法。 4. In the candidate detection step, the relative position of the candidate with respect to each of the plurality of grids set on the image, the relative size of the candidate with respect to the image, the detection score representing the likelihood of the candidate with respect to the object, and the above. The object detection method according to any one of 1 to 3, wherein at least one parameter of a class score representing the likelihood of the class of the object in the grid is further detected.

５．前記物体決定ステップにおける前記所定のアルゴリズムは、Non-Maximum Suppressionであることを特徴とする前記４に記載の物体検出方法。 5. The object detection method according to 4, wherein the predetermined algorithm in the object determination step is Non-Maximum Suppression.

６．前記関係決定ステップでは、前記関係スコアが複数存在するときに、閾値未満の関係スコアを除外することを特徴とする前記１から５のいずれかに記載の物体検出方法。 6. The object detection method according to any one of 1 to 5, wherein in the relationship determination step, when a plurality of the relationship scores are present, the relationship scores below the threshold value are excluded.

７．前記関係決定ステップでは、２つの前記候補からなるペアについて得られる前記関係スコアとして、前記ペアの一方の候補から他方の候補に対して得られる関係スコアと、前記ペアの前記他方の候補から前記一方の候補に対して得られる関係スコアとが存在するときに、より小さいほうの関係スコアを除外することを特徴とする前記１から６のいずれかに記載の物体検出方法。 7. In the relationship determination step, as the relationship score obtained for the pair consisting of the two candidates, the relationship score obtained from one candidate of the pair to the other candidate and the one from the other candidate of the pair The object detection method according to any one of 1 to 6 above, wherein the smaller relationship score is excluded when there is a relationship score obtained for the candidate.

８．決定した前記複数の物体の前記領域と、前記複数の物体の前記関係とに基づいて、前記画像の説明文を作成する説明文作成ステップをさらに含むことを特徴とする前記１から７のいずれかに記載の物体検出方法。 8. Any of the above 1 to 7, further comprising a descriptive text creation step of creating a descriptive text of the image based on the region of the plurality of objects determined and the relationship of the plurality of objects. The object detection method according to.

９．前記説明文作成ステップでは、ニューロ演算によって前記説明文を作成することを特徴とする前記８に記載の物体検出方法。 9. The object detection method according to 8, wherein the explanatory text is created by a neuro operation in the explanatory text creation step.

１０．前記複数の物体の前記関係は、前記複数の物体間のかかわりまたは状態、一方の物体の他方の物体に対する作用または動作、の少なくともいずれかを含むことを特徴とする前記１から９のいずれかに記載の物体検出方法。 10. The relationship between the plurality of objects is any one of the above 1 to 9, characterized in that the relationship or state between the plurality of objects, the action or action of one object on the other object, or at least one of them is included. The described object detection method.

１１．画像から複数の物体と前記複数の物体の関係とを検出する物体検出装置であって、前記複数の物体が前記画像上で占める領域の候補を検出するとともに、検出した前記候補ごとに、前記候補を一の候補として、前記一の候補の周囲の予め設定された範囲内の他の候補を参照して、前記一の候補の前記他の候補に対する所定の関係の尤度を表す関係スコアを検出する候補／関係スコア検出部と、所定のアルゴリズムに基づいて、検出した前記候補から前記複数の物体の前記領域を決定する物体決定部と、決定した前記領域に対応する候補のペアについて検出された前記関係スコアに基づいて、前記複数の物体の前記関係を決定する関係決定部とを含むことを特徴とする物体検出装置。 11. An object detection device that detects a plurality of objects and the relationship between the plurality of objects from an image, detects candidates for an area occupied by the plurality of objects on the image, and detects the candidates for each of the detected candidates. With reference to other candidates within a preset range around the one candidate, a relationship score representing the likelihood of a predetermined relationship of the one candidate with respect to the other candidate is detected. A pair of a candidate / relationship score detection unit, an object determination unit that determines the region of the plurality of objects from the detected candidates, and a candidate pair corresponding to the determined region are detected based on a predetermined algorithm. An object detection device including a relationship determination unit that determines the relationship between the plurality of objects based on the relationship score.

１２．前記候補／関係スコア検出部は、前記候補の検出と前記関係スコアの検出とをニューロ演算によって並列して行うことを特徴とする前記１１に記載の物体検出装置。 12. 11. The object detection device according to 11, wherein the candidate / relationship score detection unit performs detection of the candidate and detection of the relationship score in parallel by a neuro operation.

１３．前記ニューロ演算は、ＣＮＮによる演算であることを特徴とする前記１２に記載の物体検出装置。 13. The object detection device according to the above 12, wherein the neuro operation is an operation by CNN.

１４．前記候補／関係スコア検出部は、前記画像上に設定された複数のグリッドのそれぞれに対する前記候補の相対位置、前記画像に対する前記候補の相対的なサイズ、前記物体に対する前記候補の尤度を表す検出スコア、前記グリッドにおける前記物体のクラスの尤度を表すクラススコアの少なくともいずれかのパラメータをさらに検出することを特徴とする前記１１から１３のいずれかに記載の物体検出装置。 14. The candidate / relationship score detection unit is a detection representing the relative position of the candidate with respect to each of the plurality of grids set on the image, the relative size of the candidate with respect to the image, and the likelihood of the candidate with respect to the object. The object detection apparatus according to any one of 11 to 13, further detecting at least one parameter of a score, a class score representing the likelihood of a class of the object in the grid.

１５．前記物体決定部が前記領域を決定する際に用いる前記所定のアルゴリズムは、Non-Maximum Suppressionであることを特徴とする前記１４に記載の物体検出装置。 15. The object detection device according to 14, wherein the predetermined algorithm used by the object determination unit to determine the region is Non-Maximum Suppression.

１６．前記関係決定部は、前記関係スコアが複数存在するときに、閾値未満の関係スコアを除外することを特徴とする前記１１から１５のいずれかに記載の物体検出装置。 16. The object detection device according to any one of 11 to 15, wherein the relationship determination unit excludes a relationship score less than a threshold value when a plurality of relationship scores are present.

１７．前記関係決定部は、２つの前記候補からなるペアについて得られる前記関係スコアとして、前記ペアの一方の候補から他方の候補に対して得られる関係スコアと、前記ペアの前記他方の候補から前記一方の候補に対して得られる関係スコアとが存在するときに、より小さいほうの関係スコアを除外することを特徴とする前記１１から１６のいずれかに記載の物体検出装置。 17. As the relationship score obtained for the pair consisting of the two candidates, the relationship determination unit includes a relationship score obtained from one candidate of the pair to the other candidate and the one from the other candidate of the pair. The object detection device according to any one of 11 to 16, wherein the smaller relationship score is excluded when there is a relationship score obtained for the candidate.

１８．決定した前記複数の物体の前記領域と、前記複数の物体の前記関係とに基づいて、前記画像の説明文を作成する説明文作成部をさらに含むことを特徴とする前記１１から１７のいずれかに記載の物体検出装置。 18. One of the above 11 to 17, further comprising a description creating unit that creates a description of the image based on the region of the plurality of objects determined and the relationship of the plurality of objects. The object detection device according to.

１９．前記説明文作成部は、ニューロ演算によって前記説明文を作成することを特徴とする前記１８に記載の物体検出装置。 19. The object detection device according to 18, wherein the explanatory text creating unit creates the explanatory text by a neuro operation.

２０．前記複数の物体の前記関係は、前記複数の物体間のかかわりまたは状態、一方の物体の他方の物体に対する作用または動作、の少なくともいずれかを含むことを特徴とする前記１１から１９のいずれかに記載の物体検出装置。 20. The relationship between the plurality of objects is one of 11 to 19, wherein the relationship includes at least one of a relationship or a state between the plurality of objects, and an action or action of one object on the other object. The object detection device described.

２１．前記１から１０のいずれかに記載の物体検出方法をコンピュータに実行させるプログラム。 21. A program that causes a computer to execute the object detection method according to any one of 1 to 10.

２２．前記２１に記載のプログラムを記録した、コンピュータ読取可能な記録媒体。 22. A computer-readable recording medium on which the program according to 21 is recorded.

以上、本発明の実施形態について説明したが、本発明の範囲はこれに限定されるものではなく、発明の主旨を逸脱しない範囲で拡張または変更して実施することができる。 Although the embodiments of the present invention have been described above, the scope of the present invention is not limited to this, and can be extended or modified within a range that does not deviate from the gist of the invention.

本発明は、例えば監視カメラシステムのような、画像から人物の行動を分析するアプリケーションや、画像から複数の物体の間に存在する関係を高速で推定するアプリケーションに利用可能である。 The present invention can be used in applications such as surveillance camera systems that analyze human behavior from images and applications that estimate relationships existing between a plurality of objects from images at high speed.

１物体検出装置
７ａ候補／関係スコア検出部
７ｂ物体決定部
７ｃ関係決定部
７ｄ説明文作成部1 Object detection device 7a Candidate / relationship score detection unit 7b Object determination unit 7c Relationship determination unit 7d Explanation creation unit

Claims

An object detection method for detecting the relationship between a plurality of objects and the plurality of objects from an image.
A candidate detection step for detecting a candidate for an area occupied by the plurality of objects on the image,
For each of the detected candidates, the candidate is set as one candidate, and the other candidates within the preset range around the one candidate are referred to, and the predetermined relationship of the one candidate with respect to the other candidate. The relationship score detection step that detects the relationship score that represents the likelihood of
An object determination step of determining the region of the plurality of objects from the detected candidates based on a predetermined algorithm.
An object detection method including a relationship determination step of determining the relationship of a plurality of objects based on the relationship score detected for a pair of candidates corresponding to the determined area.

The object detection method according to claim 1, wherein the candidate detection step and the relationship score detection step are executed in parallel by a neuro operation.

The object detection method according to claim 2, wherein the neuro operation is an operation by CNN.

In the candidate detection step, the relative position of the candidate with respect to each of the plurality of grids set on the image, the relative size of the candidate with respect to the image, the detection score representing the likelihood of the candidate with respect to the object, and the above. The object detection method according to any one of claims 1 to 3, further detecting at least one parameter of a class score representing the likelihood of the class of the object in the grid.

The object detection method according to claim 4, wherein the predetermined algorithm in the object determination step is Non-Maximum Suppression.

The object detection method according to any one of claims 1 to 5, wherein in the relationship determination step, when a plurality of the relationship scores are present, the relationship scores below the threshold value are excluded.

In the relationship determination step, as the relationship score obtained for the pair consisting of the two candidates, the relationship score obtained from one candidate of the pair to the other candidate and the one from the other candidate of the pair The object detection method according to any one of claims 1 to 6, wherein the smaller relationship score is excluded when there is a relationship score obtained for the candidate.

The description according to any one of claims 1 to 7, further comprising a description creation step of creating a description of the image based on the region of the plurality of objects determined and the relationship of the plurality of objects. Object detection method.

The object detection method according to claim 8, wherein in the description creation step, the description is created by a neuro operation.

The object according to any one of claims 1 to 9, wherein the relationship between the plurality of objects includes at least one of a relationship or a state between the plurality of objects, and an action or operation of one object on the other object. Detection method.

An object detection device that detects the relationship between a plurality of objects and the plurality of objects from an image.
A candidate for an area occupied by the plurality of objects on the image is detected, and for each of the detected candidates, the candidate is set as one candidate, and other candidates within a preset range around the one candidate. With reference to, a candidate / relationship score detection unit that detects a relationship score representing the likelihood of a predetermined relationship of the one candidate with respect to the other candidate.
An object determination unit that determines the region of the plurality of objects from the detected candidates based on a predetermined algorithm.
An object detection device including a relationship determination unit that determines the relationship of a plurality of objects based on the relationship score detected for a pair of candidates corresponding to the determined area.

The object detection device according to claim 11, wherein the candidate / relationship score detection unit performs detection of the candidate and detection of the relationship score in parallel by a neuro operation.

The object detection device according to claim 12, wherein the neuro operation is an operation by CNN.

The candidate / relationship score detection unit is a detection representing the relative position of the candidate with respect to each of the plurality of grids set on the image, the relative size of the candidate with respect to the image, and the likelihood of the candidate with respect to the object. The object detection apparatus according to any one of claims 11 to 13, further detecting at least one parameter of a score, a class score representing the likelihood of the class of the object in the grid.

The object detection device according to claim 14, wherein the predetermined algorithm used by the object determination unit to determine the region is Non-Maximum Suppression.

The object detection device according to any one of claims 11 to 15, wherein the relationship determination unit excludes relationship scores below a threshold value when a plurality of relationship scores are present.

As the relationship score obtained for the pair consisting of the two candidates, the relationship determination unit includes a relationship score obtained from one candidate of the pair to the other candidate and the one from the other candidate of the pair. The object detection device according to any one of claims 11 to 16, which excludes the smaller relationship score when there is a relationship score obtained for the candidate.

The invention according to any one of claims 11 to 17, further comprising a description creating unit that creates a description of the image based on the region of the plurality of objects determined and the relationship of the plurality of objects. Object detector.

The object detection device according to claim 18, wherein the explanatory text creating unit creates the explanatory text by a neuro operation.

The object according to any one of claims 11 to 19, wherein the relationship between the plurality of objects includes at least one of a relationship or a state between the plurality of objects, and an action or operation of one object on the other object. Detection device.