JP2018156416A

JP2018156416A - Information processing apparatus, information processing method, and program

Info

Publication number: JP2018156416A
Application number: JP2017052937A
Authority: JP
Inventors: 建志入江; Kenji Irie
Original assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Current assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Priority date: 2017-03-17
Filing date: 2017-03-17
Publication date: 2018-10-04

Abstract

PROBLEM TO BE SOLVED: To provide an object detection method with high processing speed and high accuracy in order to process object detection processing faster and to detect an object with high accuracy, so as to quickly check the object detection result when object detection is performed relating to an image photographed with a camera-equipped device such as a smartphone.SOLUTION: The designation of the position is accepted by accepting a touch operation on an image displayed on the touch panel with respect to an input image and to define the position where designation was accepted as a starting point and to enlarge the area, thereby an area including an object to be detected is specified, and the information of the specified area is output.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、情報処理方法、プログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program.

従来から、画像に写っている物体が何か、またその位置は画像中のどこにあるのか、を認識する物体検出手法が研究されており、特に近年、ディープラーニングと呼ばれる機械学習技術を応用した物体検出手法が数多く提案されている。 Conventionally, object detection methods for recognizing what an object is in the image and where the position is in the image have been studied.In particular, in recent years, an object using machine learning technology called deep learning has been applied. Many detection methods have been proposed.

非特許文献１には、従来のディープラーニングを応用した物体検出手法と比較して物体の大きさをより正確に、良好な精度で検出することを特徴とする物体検出手法について開示されている。 Non-Patent Document 1 discloses an object detection method characterized by detecting the size of an object more accurately and with good accuracy compared to a conventional object detection method applying deep learning.

ＤｏｎｇｇｅｕｎＹｏｏら、“Ａｃｔｉｏｎ−ＤｒｉｖｅｎＯｂｊｅｃｔＤｅｅｃｔｉｏｎｗｉｔｈＴｏｐ−ＤｏｗｎＶｉｓｕａｌＡｔｔｅｎｔｉｏｎｓ”Donggeun Yoo et al., “Action-Driven Object Object detection with Top-Down Visual Attentions”

しかし、非特許文献１には、検出対象画像の解像度やアスペクト比を変更した多くの画像を作成し、それらのすべての画像に対して、物体を検出するためにニューラルネットワークによって多くの回数処理されるため、検出速度が遅いという課題がある。特に、スマートフォンなどのカメラ付きデバイスで撮影した画像に対して物体検出を行い、その結果を確認する時などには、利便性が低くなるという課題がある。また、検出対象の物体によっては、検出精度が低いという課題がある。 However, in Non-Patent Document 1, a number of images in which the resolution and aspect ratio of a detection target image are changed are created, and all of these images are processed many times by a neural network in order to detect an object. Therefore, there is a problem that the detection speed is slow. In particular, when object detection is performed on an image captured by a camera-equipped device such as a smartphone and the result is confirmed, there is a problem that convenience is lowered. Further, depending on the object to be detected, there is a problem that the detection accuracy is low.

そのため、例えばスマートフォンなどのカメラ付きデバイスで撮影した画像に対して物体検出を行った時に、物体検出結果を素早く確認するため、物体検出処理をより速く処理することが望まれる。また、さらに精度良く物体を検出することも望まれる。 Therefore, for example, when object detection is performed on an image captured with a camera-equipped device such as a smartphone, it is desired to perform the object detection process faster in order to quickly confirm the object detection result. It is also desired to detect an object with higher accuracy.

そこで、本発明は、処理速度および精度の高い物体検出手法を提供することを目的とする。 Accordingly, an object of the present invention is to provide an object detection method with high processing speed and accuracy.

本発明の情報処理装置は、 The information processing apparatus of the present invention

本発明によれば、処理速度および精度の高い物体検出手法を提供することが可能となる。 According to the present invention, it is possible to provide an object detection method with high processing speed and accuracy.

本発明の実施形態における、物体検出システムのシステム構成の一例を示す図である。It is a figure which shows an example of the system configuration | structure of an object detection system in embodiment of this invention. 本発明の実施形態における、携帯端末１００、物体検出サーバ２００のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of the portable terminal 100 and the object detection server 200 in embodiment of this invention. 本発明の実施形態における、物体検出システムの機能構成の一例を示す図である。It is a figure which shows an example of a function structure of the object detection system in embodiment of this invention. 本発明の実施形態における、撮影した画像に対して物体検出をおこない、検出結果を表示する処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process which performs an object detection with respect to the image | photographed image and displays a detection result in embodiment of this invention. 本発明の実施形態における、物体位置ヒントを利用して、物体検出する処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process which detects an object using the object position hint in embodiment of this invention.

以下、図面を参照して、本発明の実施形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の実施形態における情報処理システム（物体検出システム）の構成の一例を示す図である。 FIG. 1 is a diagram illustrating an example of a configuration of an information processing system (object detection system) according to an embodiment of the present invention.

物体検出システムは、スマートフォンなど撮像手段を備える携帯端末１００、物体検出サーバ２００が、ＬＡＮ３００を介して接続される構成となっている。 The object detection system has a configuration in which a portable terminal 100 including an imaging unit such as a smartphone and an object detection server 200 are connected via a LAN 300.

携帯端末１００は、デバイスに備え付けられラカメラ（撮像手段）を使って物体検出処理の対象となる画像を撮影する。撮影した画像に対して、利用者の物体位置ヒント指示を受け付け、画像と物体位置ヒントとを物体検出サーバ２００へ送信する。そして、物体検出サーバ２００から物体検出結果の受信を行う。 The portable terminal 100 captures an image to be subjected to object detection processing using a camera (imaging means) provided in the device. A user's object position hint instruction is received for the captured image, and the image and the object position hint are transmitted to the object detection server 200. Then, the object detection result is received from the object detection server 200.

物体検出サーバ２００は、携帯端末１００から送信された画像と、物体位置ヒントとを利用して物体検出処理を行い、物体検出結果を携帯端末１００へ送信する。 The object detection server 200 performs an object detection process using the image transmitted from the mobile terminal 100 and the object position hint, and transmits the object detection result to the mobile terminal 100.

図２は、本発明の実施形態における携帯端末１００、物体検出サーバ２００に適用可能な情報処理装置のハードウェア構成の一例を示すブロック図である。 FIG. 2 is a block diagram illustrating an example of a hardware configuration of an information processing apparatus applicable to the mobile terminal 100 and the object detection server 200 according to the embodiment of the present invention.

図２に示すように、携帯端末１００、物体検出サーバ２００は、システムバス２０４を介してＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０２、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０３、入力コントローラ２０５、ビデオコントローラ２０６、メモリコントローラ２０７、よび通信Ｉ／Ｆコントローラ２０８が接続される。 As shown in FIG. 2, the mobile terminal 100 and the object detection server 200 are connected via a system bus 204 to a CPU (Central Processing Unit) 201, a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, and an input controller 205. , A video controller 206, a memory controller 207, and a communication I / F controller 208 are connected.

ＣＰＵ２０１は、システムバス２０４に接続される各デバイスやコントローラを統括的に制御する。 The CPU 201 comprehensively controls each device and controller connected to the system bus 204.

ＲＯＭ２０２あるいは外部メモリ２１１は、ＣＰＵ２０１が実行する制御プログラムであるＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ）やＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や、本情報処理方法を実現するためのコンピュータ読み取り実行可能なプログラムおよび必要な各種データ（データテーブルを含む）を保持している。 The ROM 202 or the external memory 211 includes a BIOS (Basic Input / Output System) and an OS (Operating System) that are control programs executed by the CPU 201, and a computer-readable program and various necessary programs for realizing the information processing method. Holds data (including data table).

ＲＡＭ２０３は、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。ＣＰＵ２０１は、処理の実行に際して必要なプログラム等をＲＯＭ２０２あるいは外部メモリ２１１からＲＡＭ２０３にロードし、ロードしたプログラムを実行することで各種動作を実現する。 The RAM 203 functions as a main memory, work area, and the like for the CPU 201. The CPU 201 implements various operations by loading a program or the like necessary for executing the processing from the ROM 202 or the external memory 211 to the RAM 203 and executing the loaded program.

入力コントローラ２０５は、キーボード２０９や不図示のマウス等のポインティングデバイス等の入力装置からの入力を制御する。入力装置がタッチパネルの場合、ユーザがタッチパネルに表示されたアイコンやカーソルやボタンに合わせて押下（指等でタッチ）することにより、各種の指示を行うことができることとする。 The input controller 205 controls input from an input device such as a keyboard 209 or a pointing device such as a mouse (not shown). When the input device is a touch panel, the user can perform various instructions by pressing (touching with a finger or the like) in accordance with an icon, a cursor, or a button displayed on the touch panel.

また、タッチパネルは、マルチタッチスクリーンなどの、複数の指でタッチされた位置を検出することが可能なタッチパネルであってもよい。 The touch panel may be a touch panel capable of detecting a position touched with a plurality of fingers, such as a multi-touch screen.

ビデオコントローラ２０６は、ディスプレイ２１０などの外部出力装置への表示を制御する。ディスプレイは本体と一体になったノート型パソコンのディスプレイも含まれるものとする。なお、外部出力装置はディスプレイに限ったものははく、例えばプロジェクタであってもよい。また、前述のタッチ操作を受け付け可能な装置については、入力装置も提供する。 The video controller 206 controls display on an external output device such as the display 210. The display includes a display of a notebook computer integrated with the main body. The external output device is not limited to a display, and may be a projector, for example. An input device is also provided for the device that can accept the touch operation described above.

なおビデオコントローラ２０６は、表示制御を行うためのビデオメモリ（ＶＲＡＭ）を制御することが可能で、ビデオメモリ領域としてＲＡＭ２０３の一部を利用することもできるし、別途専用のビデオメモリを設けることも可能である。 Note that the video controller 206 can control a video memory (VRAM) for display control, and a part of the RAM 203 can be used as a video memory area, or a dedicated video memory can be provided separately. Is possible.

メモリコントローラ２０７は、外部メモリ２１１へのアクセスを制御する。外部メモリとしては、ブートプログラム、各種アプリケーション、フォントデータ、ユーザファイル、編集ファイル、および各種データ等を記憶する外部記憶装置（ハードディスク）、フレキシブルディスク（ＦＤ）、或いはＰＣＭＣＩＡカードスロットにアダプタを介して接続されるコンパクトフラッシュ（登録商標）メモリ等を利用可能である。 The memory controller 207 controls access to the external memory 211. The external memory is connected via an adapter to an external storage device (hard disk), flexible disk (FD), or PCMCIA card slot that stores boot programs, various applications, font data, user files, editing files, and various data. A compact flash (registered trademark) memory or the like can be used.

通信Ｉ／Ｆコントローラ２０９は、ネットワークを介して外部機器と接続・通信するものであり、ネットワークでの通信制御処理を実行する。例えば、ＴＣＰ／ＩＰを用いた通信やＩＳＤＮなどの電話回線、および携帯電話の３Ｇ回線を用いた通信が可能である。 The communication I / F controller 209 connects and communicates with an external device via a network, and executes communication control processing on the network. For example, communication using TCP / IP, telephone lines such as ISDN, and communication using 3G lines of mobile phones are possible.

尚、ＣＰＵ２０１は、例えばＲＡＭ２０３内の表示情報用領域へアウトラインフォントの展開（ラスタライズ）処理を実行することにより、ディスプレイ２１０上での表示を可能としている。また、ＣＰＵ２０１は、ディスプレイ２１０上の不図示のマウスカーソル等でのユーザ指示を可能とする。

次に図３を参照して、本発明の実施形態における各種装置の機能構成の一例について説明する。 Note that the CPU 201 enables display on the display 210 by executing outline font rasterization processing on a display information area in the RAM 203, for example. Further, the CPU 201 enables a user instruction with a mouse cursor (not shown) on the display 210.

Next, with reference to FIG. 3, an example of a functional configuration of various apparatuses according to the embodiment of the present invention will be described.

物体位置ヒント受付部１０１は、画像に対してユーザにより指定された位置を取得する機能を備える。ステップＳ４０２においてユーザから位置の指定を受け付け、当該指定された位置を取得する（Ｓ４０４）処理を実行する。 The object position hint receiving unit 101 has a function of acquiring a position designated by the user with respect to an image. In step S402, designation of the position is received from the user, and the designated position is acquired (S404).

送信制御部１０２は、カメラ撮影部１０５により撮影された画像と、物体位置ヒント受付部１０１により指定された位置に係る情報とを物体検出サーバ２００に送信する機能を備える。 The transmission control unit 102 has a function of transmitting, to the object detection server 200, an image photographed by the camera photographing unit 105 and information related to the position designated by the object position hint accepting unit 101.

受信制御部１０３は、物体検出サーバ２００の物体検出処理部２０２による処理結果を受信する機能を備える。 The reception control unit 103 has a function of receiving a processing result by the object detection processing unit 202 of the object detection server 200.

物体検出結果確認部１０４は、受信制御部１０３において受信した結果に基づき、生成された矩形などの情報を表示する機能をそなえる。詳細はステップＳ４０７において説明する。 The object detection result confirmation unit 104 has a function of displaying information such as a generated rectangle based on the result received by the reception control unit 103. Details will be described in step S407.

受信制御部２０１は送信制御部１０２から送信されたデータを受信（入力）する（ステップＳ４０４の処理である）。 The reception control unit 201 receives (inputs) the data transmitted from the transmission control unit 102 (the process in step S404).

物体検出処理部２０２は、ステップＳ４０５、図５のフローチャートに示す処理を実行する機能部であり、指定された位置を始点として領域を拡大していくことで、検出対象の物体が含まれる領域を特定し、特定された領域を示す情報を出力する。具体的には、指定された位置を中心とする矩形を生成し、生成された矩形に前記検出対象の物体が包含されるかを判定し、包含される判定された場合、検出対象の物体が含まれる領域として特定する。さらに、生成された矩形に検出対象物体が含まれないと判定された場合には、当該矩形を拡大した矩形を生成する。具体的な内容は、フローチャートを用いて後述する。 The object detection processing unit 202 is a functional unit that executes the process shown in the flowchart of FIG. 5 in step S405. The object detection processing unit 202 expands the region from the specified position as a starting point, thereby detecting the region including the object to be detected. Specify and output information indicating the specified area. Specifically, a rectangle centered at a specified position is generated, and it is determined whether or not the detected object is included in the generated rectangle. Identifies as an included area. Further, when it is determined that the detection target object is not included in the generated rectangle, a rectangle obtained by enlarging the rectangle is generated. Specific contents will be described later using a flowchart.

送信制御部２０３は、物体検出処理部２０２における処理の結果を携帯端末１００に送信する機能を備える。 The transmission control unit 203 has a function of transmitting the processing result in the object detection processing unit 202 to the mobile terminal 100.

次に図４、図５のフローチャートを用いて、本発明の実施形態における物体検出システムが実行する処理について説明する。 Next, processing executed by the object detection system according to the embodiment of the present invention will be described using the flowcharts of FIGS. 4 and 5.

図５のフローチャートは、携帯端末１００、物体検出サーバ２００のＣＰＵ２０１が所定の制御プログラムを読み出して実行する処理であり、物体検出する処理を示すフローチャートである。 The flowchart of FIG. 5 is a flowchart showing a process of detecting an object, which is a process in which the CPU 201 of the mobile terminal 100 and the object detection server 200 reads and executes a predetermined control program.

ステップＳ４０１では、携帯端末１００のＣＰＵ２０１は、利用者からの撮影指示を受けつけ、内蔵されたカメラにより画像を撮影する。 In step S <b> 401, the CPU 201 of the mobile terminal 100 receives a shooting instruction from the user and takes an image with a built-in camera.

ステップＳ４０２では、携帯端末１００のＣＰＵ２０１は、ステップＳ４０１で撮影した画像を表示部に表示し、利用者からの物体位置ヒントの指示（位置の指定）を受け付ける。物体位置ヒントは、例えば、表示されている画像に利用者が指でタッチ操作を行うなどの操作によって指示される。 In step S402, the CPU 201 of the mobile terminal 100 displays the image captured in step S401 on the display unit, and accepts an object position hint instruction (position specification) from the user. The object position hint is instructed by an operation such as a user performing a touch operation on the displayed image with a finger.

また、撮影時に物体の位置を所定の位置（領域）に合わせて撮影するよう誘導し、当該所定の位置を指定された位置としてもよい。 Further, it is possible to guide the object to be photographed in accordance with a predetermined position (area) at the time of photographing, and the predetermined position may be set as the designated position.

また、本実施例においては、スマートフォン等のカメラ機能付き携帯端末において物体位置ヒントの指示を受け付ける形態を想定しているが、ＰＣなどに撮影された画像を表示し、マウスクリック等により物体位置ヒントの指示を受け付けても良い。 Further, in this embodiment, it is assumed that a mobile terminal with a camera function such as a smartphone accepts an object position hint instruction, but an image taken on a PC or the like is displayed, and the object position hint is displayed by a mouse click or the like. May be accepted.

このように、ユーザから物体位置または物体位置のヒント（目安）の指定を受け付ける構成であれば、いずれの態様であってもよい。 As described above, any configuration may be used as long as the specification of the object position or the hint (reference) of the object position is received from the user.

ステップＳ４０３では、携帯端末１００のＣＰＵ２０１は、ステップＳ４０１で撮影した画像とステップＳ４０２で受け付けた物体位置ヒントとを、物体検出サーバ２００へ送信する。 In step S403, the CPU 201 of the mobile terminal 100 transmits the image captured in step S401 and the object position hint received in step S402 to the object detection server 200.

ステップＳ４０４では、物体検出サーバ２００のＣＰＵ２０１は、ステップＳ４０３で送信された画像と物体位置ヒント（ユーザにより指定された位置を示す情報）を受信（取得）する。 In step S404, the CPU 201 of the object detection server 200 receives (acquires) the image and the object position hint (information indicating the position designated by the user) transmitted in step S403.

ステップＳ４０５では、物体検出サーバ２００のＣＰＵ２０１は、ステップＳ４０４で受信した画像と物体位置ヒントを用いて、画像に対し、物体検出処理を行う。ステップＳ４０５の詳細な処理については、図５のフローチャートを用いて後述する。 In step S405, the CPU 201 of the object detection server 200 performs object detection processing on the image using the image received in step S404 and the object position hint. Detailed processing in step S405 will be described later with reference to the flowchart of FIG.

ステップＳ４０６では、物体検出サーバ２００のＣＰＵ２０１は、ステップＳ４０５の物体検出処理によって得られた物体検出結果とステップＳ４０４で受信した画像とを携帯端末１００へ送信する。物体検出結果は、物体の画像上での位置と大きさを示す、矩形の左上の点の座標および右下の点の座標と、で表現される。物体が検出されなかった場合は、左上の点と右下の点の座標が全て０として表現されるとする。 In step S406, the CPU 201 of the object detection server 200 transmits the object detection result obtained by the object detection process in step S405 and the image received in step S404 to the mobile terminal 100. The object detection result is expressed by the coordinates of the upper left point and the lower right point of the rectangle indicating the position and size of the object on the image. If no object is detected, the coordinates of the upper left point and the lower right point are all expressed as 0.

ステップＳ４０７では、携帯端末１００のＣＰＵ２０１は、ステップＳ４０６で送信された物体検出結果と画像を受信し、例えば、画像上に、物体検出結果である矩形を赤色の線で描画し、当該画像を表示する。 In step S407, the CPU 201 of the mobile terminal 100 receives the object detection result and the image transmitted in step S406, for example, draws a rectangle as the object detection result with a red line on the image, and displays the image. To do.

次に図５のフローチャートを用いて、本発明の実施形態における物体検出サーバ２００が実行する処理について説明する。 Next, processing executed by the object detection server 200 according to the embodiment of the present invention will be described using the flowchart of FIG.

図５のフローチャートは、物体検出サーバ２００のＣＰＵ２０１が所定の制御プログラムを読み出して実行する処理であり、画像と物体位置ヒントを利用して物体検出処理を行う処理を示すフローチャートである。 The flowchart in FIG. 5 is a process in which the CPU 201 of the object detection server 200 reads out and executes a predetermined control program, and is a flowchart showing a process in which an object detection process is performed using an image and an object position hint.

ステップＳ５０１では、物体検出サーバ２００のＣＰＵ２０１は、物体位置ヒントの座標から、初期の処理対象である矩形の座標情報を生成する。矩形の座標情報は、物体位置ヒントのＸ座標およびＹ座標から５を引いた座標を矩形の左上の点座標と、物体位置ヒントのＸ座標およびＹ座標に５を足した座標を矩形の右下の点座標と、で保持される。したがって、初期の処理対象である矩形は、幅と高さが１０であり、中心位置が物体位置ヒントとなる。なお、本実施例で説明する初期の処理対象である矩形のサイズや、Ｓ５０５等における修正量は一例であり、検出対象の物体などに応じて適宜設定できるものとする。 In step S501, the CPU 201 of the object detection server 200 generates coordinate information of a rectangle that is an initial processing target from the coordinates of the object position hint. The coordinate information of the rectangle is obtained by subtracting 5 from the X and Y coordinates of the object position hint as the upper left point coordinate of the rectangle, and adding the X and Y coordinates of the object position hint as 5 to the lower right of the rectangle. And the point coordinates. Therefore, the rectangle that is the initial processing target has a width and a height of 10, and the center position is the object position hint. Note that the size of the rectangle that is the initial processing target described in the present embodiment, the correction amount in S505, and the like are examples, and can be set as appropriate according to the object to be detected.

ステップＳ５０２では、物体検出サーバ２００のＣＰＵ２０１は、処理対象矩形の座標情報で示される領域で構成される画像（部分画像）を生成する。そして、生成された部分画像を、非特許文献１で示される物体検出手法であるＡｔｔｅｎｔｉｏｎＮｅｔへと入力し、部分画像の、左上の点および右下の点に対する識別結果を取得する。また、部分画像の幅と高さをメモリ領域に記憶する。 In step S502, the CPU 201 of the object detection server 200 generates an image (partial image) including an area indicated by the coordinate information of the processing target rectangle. And the produced | generated partial image is input into AttentionNet which is an object detection method shown by the nonpatent literature 1, and the identification result with respect to the upper left point and the lower right point of a partial image is acquired. Further, the width and height of the partial image are stored in the memory area.

ここで、ＡｔｔｅｎｔｉｏｎＮｅｔとは、検出対象物体の大きさを正確に検出できているか否かにより、画像左上の点に対して、下、右、右下、停止、ＦＡＬＳＥのいずれかを識別し、画像右下の点に対して、上、左、左上、停止、ＦＡＬＳＥのいずれかを識別する、ディープニューラルネットワークである。 Here, “AttentionNet” identifies one of “down”, “right”, “bottom right”, “stop”, and “FALSE” with respect to the upper left point of the image depending on whether or not the size of the detection target object has been accurately detected. It is a deep neural network that identifies one of top, left, top left, stop, and FALSE for the lower right point.

検出物体が対象画像（本実施例における部分画像）に正確な大きさで包含される（検出物体が対象画像にちょうど包含される）場合に、「停止」が出力される。すなわち、左上と右下ともに停止が出力された場合には、検出物体に外接する矩形が生成されたことを意味する。 When the detected object is included in the target image (partial image in this embodiment) with an accurate size (the detected object is included in the target image), “stop” is output. That is, when a stop is output for both the upper left and lower right, it means that a rectangle circumscribing the detected object has been generated.

ＦＡＬＳＥは、対象画像に、検出物体は含まれていないことを示している。 FALSE indicates that the detection object is not included in the target image.

停止とＦＡＬＳＥ以外は、当該方向へ座標を移動させ、対象画像の領域を変更（縮小）すると、検出対象物体をより正確な大きさで囲むことができる（検出物体に外接する矩形を生成できる）ことを示している。 Except for stop and FALSE, if the coordinates are moved in this direction and the area of the target image is changed (reduced), the detection target object can be surrounded with a more accurate size (a rectangle circumscribing the detection object can be generated). It is shown that.

また、ここで利用するＡｔｔｅｎｔｉｏｎＮｅｔは、非特許文献１で示されるように、検出対象物体を検出可能なように、事前に学習されたディープニューラルネットワークを用いることとする。 In addition, as shown in Non-Patent Document 1, the AttentionNet used here uses a deep neural network learned in advance so that a detection target object can be detected.

ステップＳ５０３では、物体検出サーバ２００のＣＰＵ２０１は、ステップＳ５０２で取得した左上の点の識別結果が右下または停止、かつ、右下の点の識別結果が左上または停止であれば、検出対象物体が収まる矩形が生成できたと判定して、現在の処理対象矩形の座標情報を物体検出結果とし、処理を終了する。そうでなければステップＳ５０４へ進む。 In step S503, the CPU 201 of the object detection server 200, if the identification result of the upper left point acquired in step S502 is lower right or stopped, and if the identification result of the lower right point is upper left or stopped, the detection target object is It is determined that a rectangle that can be accommodated has been generated, and the coordinate information of the current processing target rectangle is used as the object detection result, and the process ends. Otherwise, the process proceeds to step S504.

なお、本実施例においては、矩形の大きさを段階的に拡大していくことで、検出対象物体が収まる矩形を生成するため、「停止」を通り越してしまうことがある。すなわち、検出対象物体にちょうど外接する矩形を通り越して拡大してしまうこともある。そのため、Ｓ５０３の判定において、「停止」の場合だけでなく、左上の点の識別結果が右下の場合、かつ、右下の点の識別結果が左上の場合もＹＥＳと判定するよう制御している。 Note that in this embodiment, by gradually expanding the size of the rectangle, a rectangle in which the object to be detected is accommodated is generated, and thus “stop” may be passed. That is, the image may be enlarged past a rectangle just circumscribing the detection target object. Therefore, in the determination of S503, control is performed not only in the case of “stop” but also when the identification result of the upper left point is lower right and when the identification result of the lower right point is upper left, it is determined to be YES. Yes.

なお、後述のＳ５０４の処理の説明の通り、矩形が小さいうちは誤検出の可能性も高いため、「停止」の場合にはＹＥＳと判定せず、「左上の点の識別結果が右下の場合、かつ、右下の点の識別結果が左上の場合」にのみＹＥＳと判定するよう制御してもよい。 As described in the processing of S504, which will be described later, since the possibility of false detection is high while the rectangle is small, it is not determined YES in the case of “stop”, and “the identification result of the upper left point is lower right. In such a case, control may be performed so as to determine YES only when the identification result of the lower right point is upper left.

ステップＳ５０４では、物体検出サーバ２００のＣＰＵ２０１は、ステップＳ５０２で取得した左上の点の識別結果が、ＦＡＬＳＥであれば、ステップＳ５０５へ進む。そうでなければ、ステップＳ５０６へ進む。 In step S504, if the identification result of the upper left point acquired in step S502 is FALSE, the CPU 201 of the object detection server 200 proceeds to step S505. Otherwise, the process proceeds to step S506.

なお、上述のＳ５０３の処理の説明の通り、矩形が小さいうち（例えば所定のサイズに満たない場合）は誤検出の可能性も高いため、「停止」の場合にもＹＥＳと判定するよう制御してもよい。 As described in the above-described processing of S503, since the possibility of erroneous detection is high while the rectangle is small (for example, when the rectangle is less than a predetermined size), control is performed so as to determine YES even in the case of “stop”. May be.

ステップＳ５０５では、物体検出サーバ２００のＣＰＵ２０１は、処理対象矩形の座標情報の、左上の点のＸ座標およびＹ座標から５を引き、Ｘ座標が０より小さい場合は０に、Ｙ座標が０より小さい場合は０に、座標を修正し、ステップＳ５１０へと進む。 In step S505, the CPU 201 of the object detection server 200 subtracts 5 from the X coordinate and Y coordinate of the upper left point of the coordinate information of the processing target rectangle, and when the X coordinate is smaller than 0, it is 0, and the Y coordinate is 0. If it is smaller, the coordinates are corrected to 0, and the process proceeds to step S510.

ステップＳ５０６では、物体検出サーバ２００のＣＰＵ２０１は、ステップＳ５０２で取得した左上の点の識別結果が、下であれば、ステップＳ５０７へ進む。そうでなければ、ステップＳ５０８へ進む。 In step S506, the CPU 201 of the object detection server 200 proceeds to step S507 if the identification result of the upper left point acquired in step S502 is lower. Otherwise, the process proceeds to step S508.

ステップＳ５０７では、物体検出サーバ２００のＣＰＵ２０１は、処理対象矩形の座標情報の、左上の点のＸ座標から５を引いて、Ｘ座標が０より小さい場合は０に座標を修正し、ステップＳ５１０へと進む。すなわち、左上の点の識別結果が下である場合は、Ｙ座標方向については検出対象物体を包含しているものの、Ｘ座標方向については、検出対象物体を包含するためにはさらに拡大する必要があることを意味している。そのため、Ｘ座標方向について拡大する処理を実行する。 In step S507, the CPU 201 of the object detection server 200 subtracts 5 from the X coordinate of the upper left point of the coordinate information of the processing target rectangle, and corrects the coordinate to 0 when the X coordinate is smaller than 0, and proceeds to step S510. Proceed with That is, when the identification result of the upper left point is lower, the detection target object is included in the Y coordinate direction, but the X coordinate direction needs to be further enlarged to include the detection target object. It means that there is. Therefore, a process for enlarging the X coordinate direction is executed.

ステップＳ５０８では、物体検出サーバ２００のＣＰＵ２０１は、ステップＳ５０２で取得した左上の点の識別結果が、右であれば、ステップＳ５０９へ進む。そうでなければ、ステップＳ５１０へ進む。 In step S508, the CPU 201 of the object detection server 200 proceeds to step S509 if the identification result of the upper left point acquired in step S502 is right. Otherwise, the process proceeds to step S510.

ステップＳ５０９では、物体検出サーバ２００のＣＰＵ２０１は、処理対象矩形の座標情報の、左上の点のＹ座標から５を引いて、Ｙ座標が０より小さい場合は０に座標を修正し、ステップＳ５１０へと進む。すなわち、左上の点の識別結果が右である場合は、Ｘ座標方向については検出対象物体を包含しているものの、Ｙ座標方向については、検出対象物体を包含するためにはさらに拡大する必要があることを意味している。そのため、Ｙ座標方向について拡大する処理を実行する。 In step S509, the CPU 201 of the object detection server 200 subtracts 5 from the Y coordinate of the upper left point in the coordinate information of the processing target rectangle, and corrects the coordinate to 0 when the Y coordinate is smaller than 0, and proceeds to step S510. Proceed with That is, when the identification result of the upper left point is right, the X coordinate direction includes the detection target object, but the Y coordinate direction needs to be further enlarged to include the detection target object. It means that there is. Therefore, a process for enlarging in the Y coordinate direction is executed.

ステップＳ５１０では、物体検出サーバ２００のＣＰＵ２０１は、ステップＳ５０２で取得した右下の点の識別結果が、ＦＡＬＳＥであれば、ステップＳ５１１へ進む。そうでなければ、ステップＳ５１２へ進む。 In step S510, if the result of identifying the lower right point acquired in step S502 is FALSE, the CPU 201 of the object detection server 200 proceeds to step S511. Otherwise, the process proceeds to step S512.

なお、上述のＳ５０４の処理と同様に、矩形が小さいうち（例えば所定のサイズに満たない場合）は誤検出の可能性も高いため、「停止」の場合にもＹＥＳと判定するよう制御してもよい。 As in the above-described processing of S504, while the rectangle is small (for example, when it is less than the predetermined size), there is a high possibility of erroneous detection. Also good.

ステップＳ５１１では、物体検出サーバ２００のＣＰＵ２０１は、処理対象矩形の座標情報の、右下の点のＸ座標およびＹ座標に５を足し、Ｘ座標が画像の幅以上の場合、Ｘ座標を画像の幅から１を引いた値に、Ｙ座標が画像の高さ以上の場合、Ｙ座標を画像の高さから１を引いた値に、座標を修正し、ステップＳ５１６へと進む。 In step S511, the CPU 201 of the object detection server 200 adds 5 to the X coordinate and the Y coordinate of the lower right point of the coordinate information of the processing target rectangle, and when the X coordinate is equal to or larger than the width of the image, If the value obtained by subtracting 1 from the width is equal to or larger than the height of the image, the coordinate is corrected to a value obtained by subtracting 1 from the height of the image, and the process proceeds to step S516.

ステップＳ５１２では、物体検出サーバ２００のＣＰＵ２０１は、テップＳ５０２で取得した右下の点の識別結果が、上であれば、ステップＳ５１３へ進む。そうでなければ、ステップＳ５１４へ進む。 In step S512, the CPU 201 of the object detection server 200 proceeds to step S513 if the identification result of the lower right point acquired in step S502 is up. Otherwise, the process proceeds to step S514.

ステップＳ５１３では、物体検出サーバ２００のＣＰＵ２０１は、処理対象矩形の座標情報の、右下の点のＸ座標に５を足し、Ｘ座標が画像の幅以上の場合、Ｘ座標を画像の幅から１を引いた値に、座標を修正し、ステップＳ５１６へと進む。 In step S513, the CPU 201 of the object detection server 200 adds 5 to the X coordinate of the lower right point of the coordinate information of the processing target rectangle, and when the X coordinate is equal to or larger than the width of the image, the X coordinate is set to 1 from the image width. The coordinates are corrected to the value obtained by subtracting, and the process proceeds to step S516.

すなわち、右上の点の識別結果が上である場合は、Ｙ座標方向については検出対象物体を包含しているものの、Ｘ座標方向については、検出対象物体を包含するためにはさらに拡大する必要があることを意味している。そのため、Ｘ座標方向について拡大する処理を実行する。 That is, when the identification result of the upper right point is up, the Y coordinate direction includes the detection target object, but the X coordinate direction needs to be further enlarged to include the detection target object. It means that there is. Therefore, a process for enlarging the X coordinate direction is executed.

ステップＳ５１４では、物体検出サーバ２００のＣＰＵ２０１は、テップＳ５０２で取得した右下の点の識別結果が、左であれば、ステップＳ５１５へ進む。そうでなければ、ステップＳ５０２へ進む。 In step S514, if the identification result of the lower right point acquired in step S502 is left, the CPU 201 of the object detection server 200 proceeds to step S515. Otherwise, the process proceeds to step S502.

ステップＳ５１５では、物体検出サーバ２００のＣＰＵ２０１は、処理対象矩形の座標情報の、右下の点のＹ座標に５を足し、Ｙ座標が画像の高さ以上の場合、Ｙ座標を画像の高さから１を引いた値に、座標を修正し、ステップＳ５１６へと進む。 In step S515, the CPU 201 of the object detection server 200 adds 5 to the Y coordinate of the lower right point of the coordinate information of the processing target rectangle, and when the Y coordinate is equal to or higher than the image height, the Y coordinate is set to the image height. The coordinate is corrected to a value obtained by subtracting 1 from 1, and the process proceeds to step S516.

すなわち、右上の点の識別結果が左である場合は、Ｘ座標方向については検出対象物体を包含しているものの、Ｙ座標方向については、検出対象物体を包含するためにはさらに拡大する必要があることを意味している。そのため、Ｙ座標方向について拡大する処理を実行する。 That is, when the identification result of the upper right point is left, the X coordinate direction includes the detection target object, but the Y coordinate direction needs to be further expanded to include the detection target object. It means that there is. Therefore, a process for enlarging in the Y coordinate direction is executed.

ステップＳ５１６では、物体検出サーバ２００のＣＰＵ２０１は、ステップＳ５０２でメモリ領域に記憶した部分画像の幅と高さと、現在の処理対象矩形の座標情報で示される領域の幅と高さとを、比較し、幅と高さともに等しい場合（すなわち、Ｓ６０４等による座標修正前と後でサイズが変わっていない場合。例えば、画像全体のサイズと矩形領域のサイズが同一の場合などがある。）は、物体は検出できなかったと判定し、左上の点および右下の点のＸ座標とＹ座標が全て０の座標情報を、物体検出結果として処理を終了する。 In step S516, the CPU 201 of the object detection server 200 compares the width and height of the partial image stored in the memory area in step S502 with the width and height of the area indicated by the coordinate information of the current processing target rectangle. If the width and height are the same (that is, the size has not changed before and after coordinate correction by S604 etc., for example, the size of the entire image and the size of the rectangular area may be the same), the object is It determines with having not detected, and complete | finishes a process by making into the object detection result the coordinate information whose X coordinate and Y coordinate of the upper left point and the lower right point are all 0.

以上、図５のフローチャートの処理によって、物体位置ヒントを利用し、矩形情報を徐々に拡大させながら、当該矩形で特定される領域の画像を、非特許文献１で示されるＡｔｔｅｎｔｉｏｎＮｅｔへと繰り返し入力することで、高速にかつ物体の大きさをより正確に検出することができる。 As described above, the image of the region specified by the rectangle is repeatedly input to the AttentionNet shown in Non-Patent Document 1 while gradually expanding the rectangle information using the object position hint by the processing of the flowchart of FIG. Thus, the size of the object can be detected more accurately at high speed.

以上、情報処理装置としての実施形態について示したが、本発明は、例えば、システム、装置、方法、プログラムもしくは記録媒体等としての実施態様をとることが可能である。具体的には、複数の機器から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 The embodiment as the information processing apparatus has been described above, but the present invention can take an embodiment as a system, apparatus, method, program, recording medium, or the like. Specifically, the present invention may be applied to a system composed of a plurality of devices, or may be applied to an apparatus composed of a single device.

また、本発明におけるプログラムは、図５、図６に示すフローチャートの処理方法をコンピュータが実行可能なプログラムであり、本発明の記憶媒体は図５、図６の処理方法をコンピュータが実行可能なプログラムが記憶されている。なお、本発明におけるプログラムは図５、図６の各装置の処理方法ごとのプログラムであってもよい。 The program according to the present invention is a program that allows a computer to execute the processing methods of the flowcharts shown in FIGS. 5 and 6. The storage medium according to the present invention is a program that allows the computer to execute the processing methods of FIGS. Is remembered. Note that the program in the present invention may be a program for each processing method of each apparatus shown in FIGS.

以上のように、前述した実施形態の機能を実現するプログラムを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムを読み出し、実行することによっても本発明の目的が達成されることは言うまでもない。 As described above, a recording medium that records a program that implements the functions of the above-described embodiments is supplied to a system or apparatus, and a computer (or CPU or MPU) of the system or apparatus stores the program stored in the recording medium. It goes without saying that the object of the present invention can also be achieved by reading and executing.

この場合、記録媒体から読み出されたプログラム自体が本発明の新規な機能を実現することになり、そのプログラムを記録した記録媒体は本発明を構成することになる。 In this case, the program itself read from the recording medium realizes the novel function of the present invention, and the recording medium recording the program constitutes the present invention.

プログラムを供給するための記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＤＶＤ−ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＥＥＰＲＯＭ、シリコンディスク等を用いることが出来る。 As a recording medium for supplying the program, for example, a flexible disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD-R, DVD-ROM, magnetic tape, nonvolatile memory card, ROM, EEPROM, silicon A disk or the like can be used.

また、コンピュータが読み出したプログラムを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the program read by the computer, not only the functions of the above-described embodiments are realized, but also an OS (operating system) operating on the computer based on an instruction of the program is actually It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the processing and the processing is included.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, after the program read from the recording medium is written to the memory provided in the function expansion board inserted into the computer or the function expansion unit connected to the computer, the function expansion board is based on the instructions of the program code. It goes without saying that the case where the CPU or the like provided in the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.

また、本発明は、複数の機器から構成されるシステムに適用しても、ひとつの機器から成る装置に適用しても良い。また、本発明は、システムあるいは装置にプログラムを供給することによって達成される場合にも適応できることは言うまでもない。この場合、本発明を達成するためのプログラムを格納した記録媒体を該システムあるいは装置に読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 The present invention may be applied to a system constituted by a plurality of devices or an apparatus constituted by a single device. Needless to say, the present invention can be applied to a case where the present invention is achieved by supplying a program to a system or apparatus. In this case, by reading a recording medium storing a program for achieving the present invention into the system or apparatus, the system or apparatus can enjoy the effects of the present invention.

さらに、本発明を達成するためのプログラムをネットワーク上のサーバ、データベース等から通信プログラムによりダウンロードして読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。なお、上述した各実施形態およびその変形例を組み合わせた構成も全て本発明に含まれるものである。 Furthermore, by downloading and reading a program for achieving the present invention from a server, database, etc. on a network using a communication program, the system or apparatus can enjoy the effects of the present invention. In addition, all the structures which combined each embodiment mentioned above and its modification are also included in this invention.

１００情報処理装置
２００物体検出サーバ
３００ＬＡＮ 100 Information processing apparatus 200 Object detection server 300 LAN

Claims

Position acquisition means for acquiring a position designated by the user with respect to the image;
An area specifying means for specifying an area including an object to be detected by expanding the area starting from the position acquired by the position acquiring means;
Output means for outputting information indicating the area specified by the area specifying means;
An information processing system comprising:

Rectangle generating means for generating a rectangle centered on the position acquired by the position acquiring means;
Determining means for determining whether the object to be detected is included in the rectangle generated by the rectangle generating means;
With
The information processing system according to claim 1, wherein when it is determined that a detection target object is included in the generated rectangle, the information processing system is specified as a region including the detection target object.

The rectangle generation unit further generates a rectangle obtained by enlarging the rectangle when the determination unit determines that a detection target object is not included in the generated rectangle. Information processing system.

Display means for displaying the input image on a touch panel;
The position acquisition means receives a touch operation on an image displayed on a touch panel by the display means, thereby receiving the designation of the position and acquiring the position where the designation is accepted. The information processing system according to any one of the above.

A position acquisition step of acquiring a position specified by the user with respect to the image;
A region specifying step for specifying a region including an object to be detected by expanding the region starting from the position acquired in the position acquiring step;
An output step of outputting information indicating the region specified by the region specifying step;
An information processing method comprising:

A program executable in the information processing apparatus,
The information processing apparatus;
Position acquisition means for acquiring a position designated by the user with respect to the image;
An area specifying means for specifying an area including an object to be detected by expanding the area starting from the position acquired by the position acquiring means;
A program for functioning as output means for outputting information indicating an area specified by the area specifying means.