JP2019139497A

JP2019139497A - Image processing system and image processing method

Info

Publication number: JP2019139497A
Application number: JP2018022173A
Authority: JP
Inventors: 孝海小西; Takami Konishi
Original assignee: Hitachi Solutions Create Ltd
Current assignee: Hitachi Solutions Create Ltd
Priority date: 2018-02-09
Filing date: 2018-02-09
Publication date: 2019-08-22
Anticipated expiration: 2038-02-09
Also published as: JP6948959B2

Abstract

To provide an image processing system and an image processing method that enable creation of learning data for object detection AI without human intervention.SOLUTION: An image processing system includes an arithmetic device that executes a predetermined process and a storage device connected to the arithmetic device. The arithmetic device divides an input image by a predetermined grid pattern; estimates an object reflected in each of the divided regions and accuracy thereof and excludes an object whose accuracy of the estimated object is smaller than a predetermined threshold, in which, of objects that were not excluded, a same type of object was estimated; and combines adjacent regions to define an overall grid.SELECTED DRAWING: Figure 6

Description

本発明は、物体検出ＡＩの学習データを生成する画像処理装置及び画像処理方法に関する。 The present invention relates to an image processing apparatus and an image processing method for generating learning data for object detection AI.

物体検出技術が進歩し、ディープラーニングを用いた物体検出用のＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）によって、画像中に写る複数の物体の種類の識別（犬、猫、車など）と画像中の位置の情報が、高速かつ高精度で取得できるようになった。 With the advancement of object detection technology, AI (Artificial Intelligence) for object detection using deep learning enables identification of multiple object types (dogs, cats, cars, etc.) appearing in an image and information on positions in the image. Now it can be acquired at high speed and with high accuracy.

Real-Time Object Detection，［平成３０年１月６日検索］、インターネット〈URL：https://pjreddie.com/darknet/yolo/〉Real-Time Object Detection, [Search January 6, 2018], Internet <URL: https://pjreddie.com/darknet/yolo/> SSD: Single Shot MultiBox Detector，［平成３０年１月６日検索］、インターネット〈URL：https://github.com/weiliu89/caffe/tree/ssd〉SSD: Single Shot MultiBox Detector, [Search January 6, 2018], Internet <URL: https://github.com/weiliu89/caffe/tree/ssd>

物体検出の精度を向上させるには、多数の画像と、各画像に写っている物体の種類と位置情報が記述されたレコードを学習する必要がある。この学習データは、数万点も必要な場合があり、人手で作成するとコストがかかる問題がある。 In order to improve the accuracy of object detection, it is necessary to learn a record in which a large number of images and types and position information of objects appearing in each image are described. This learning data may require tens of thousands of points, and there is a problem that costs are high if it is created manually.

物体らしき場所を機械的に抽出する従来技術としてＳｅｌｅｃｔｉｖｅＳｅａｒｃｈがある。ＳｅｌｅｃｔｉｖｅＳｅａｒｃｈは、ピクセルレベルで類似する領域をグルーピングして候補領域を選出するアルゴリズムである。ＳｅｌｅｃｔｉｖｅＳｅａｒｃｈでは類似する領域を色情報で機械的に候補領域を選出するため、物体を適切に抽出できないことがある。また、候補領域を選出するものであり、候補領域中の画像が何であるかは識別できない。このため、ＳｅｌｅｃｔｉｖｅＳｅａｒｃｈだけでは物体検出ＡＩの学習データを生成できない。 As a conventional technique for mechanically extracting a place that looks like an object, there is Selective Search. Selective Search is an algorithm that selects candidate areas by grouping similar areas at the pixel level. In Selective Search, a candidate region is mechanically selected by color information for a similar region, and thus an object may not be appropriately extracted. Further, the candidate area is selected, and it is impossible to identify what the image in the candidate area is. For this reason, learning data for the object detection AI cannot be generated only by the selective search.

本発明は、物体検出ＡＩの学習データを人手によらず作成可能とすることを目的とする。 An object of the present invention is to make it possible to create learning data for an object detection AI regardless of human hands.

本願において開示される発明の代表的な一例を示せば以下の通りである。すなわち、画像処理システムであって、所定の処理を実行する演算装置と、前記演算装置に接続された記憶装置とを備え、前記演算装置は、入力された画像を所定のグリッドパターンによって分割し、前記分割された各領域に写っているオブジェクト及びその確度を推測し、前記推測されたオブジェクトの確度が所定の閾値より小さいオブジェクトを除外し、前記除外されなかったオブジェクトのうち、同種のオブジェクトが推測されており、隣接する領域を結合して全体グリッドを定めることを特徴とする。 A typical example of the invention disclosed in the present application is as follows. That is, the image processing system includes an arithmetic device that executes a predetermined process and a storage device connected to the arithmetic device, and the arithmetic device divides the input image by a predetermined grid pattern, The object reflected in each of the divided areas and the accuracy thereof are estimated, the object whose accuracy of the estimated object is smaller than a predetermined threshold is excluded, and among the objects that are not excluded, the same kind of object is estimated. The entire grid is defined by combining adjacent regions.

本発明の一態様によれば、物体検出ＡＩの学習データを人手によらず作成できる。前述した以外の課題、構成及び効果は、以下の実施例の説明によって明らかにされる。 According to one embodiment of the present invention, learning data for object detection AI can be created without human intervention. Problems, configurations, and effects other than those described above will become apparent from the following description of embodiments.

本発明の実施例に係る物体検出装置の構成を示すブロック図である。It is a block diagram which shows the structure of the object detection apparatus which concerns on the Example of this invention. 領域検出結果ファイルの構成例を示す図である。It is a figure which shows the structural example of an area | region detection result file. 中央処理装置が実行する処理のフローチャートである。It is a flowchart of the process which a central processing unit performs. 領域検出処理部が実行する物体検出処理の詳細のフローチャートである。It is a detailed flowchart of the object detection process which an area | region detection process part performs. グリッドパターンファイルのフォーマットである。This is the grid pattern file format. グリッド探索処理の詳細のフローチャートである。It is a flowchart of the detail of a grid search process. 中心グリッドの計算例を示す図である。It is a figure which shows the example of calculation of a center grid. 平均グリッドの計算例を示す図である。It is a figure which shows the example of calculation of an average grid. グリッド探索処理の詳細のフローチャートである。It is a flowchart of the detail of a grid search process. マージ処理を説明する図である。It is a figure explaining merge processing. 中心グリッド計算処理を説明する図である。It is a figure explaining a center grid calculation process.

以下、図面を参照して本発明の実施形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

まず、本明細書において、一枚の画像に写った一つの物体の種別（犬、猫、車など）を識別するＡＩを画像認識と称する。また、一枚の画像に複数の物体が写り、各物体の種別と位置情報を識別できるＡＩを物体検出と称する。 First, in this specification, AI that identifies the type of one object (dog, cat, car, etc.) in one image is called image recognition. Also, an AI in which a plurality of objects appear in one image and the type and position information of each object can be identified is referred to as object detection.

図１は、本発明の実施例に係る物体検出装置の構成を示すブロック図である。 FIG. 1 is a block diagram showing a configuration of an object detection apparatus according to an embodiment of the present invention.

物体検出装置は、装置に入力された画像に含まれる物体（オブジェクト）の種別と画像中の位置情報を抽出する。物体検出装置は、中央処理装置０１０、データメモリ０２０、プログラムメモリ０３０、表示装置０４０、画像認識ＡＩ訓練済みデータ０５０、グリッドパターンファイル０６０、領域検出前画像０７０、領域検出結果ファイル０８０、キーボード０９０及びポインティングデバイス１００を有する計算機システムによって構成される。中央処理装置０１０は、データメモリ０２０、プログラムメモリ０３０、表示装置０４０、画像認識ＡＩ訓練済みデータ０５０、グリッドパターンファイル０６０、領域検出前画像０７０、領域検出結果ファイル０８０、キーボード０９０及びポインティングデバイス１００と相互に接続されている。 The object detection device extracts a type of an object (object) included in an image input to the device and position information in the image. The object detection device includes a central processing unit 010, a data memory 020, a program memory 030, a display device 040, image recognition AI trained data 050, a grid pattern file 060, an image before area detection 070, an area detection result file 080, a keyboard 090, and A computer system having a pointing device 100 is used. The central processing unit 010 includes a data memory 020, a program memory 030, a display device 040, an image recognition AI trained data 050, a grid pattern file 060, a pre-region detection image 070, a region detection result file 080, a keyboard 090, and a pointing device 100. Are connected to each other.

中央処理装置０１０は、画像認識ＡＩ訓練済みデータ読み込み部０１１、領域検出前画像読み込み部０１２、領域検出処理部０１３及び領域検出結果出力部０１４を有する。これらの各部は、中央処理装置０１０が所定のプログラムを実行することによって実現される。なお、物体検出装置がプログラムを実行して行う処理の一部をハードウェア（例えば、ＦＰＧＡ）で行ってもよい。 The central processing unit 010 includes an image recognition AI trained data reading unit 011, a pre-region detection image reading unit 012, a region detection processing unit 013, and a region detection result output unit 014. Each of these units is realized by the central processing unit 010 executing a predetermined program. Note that part of processing performed by the object detection apparatus executing a program may be performed by hardware (for example, FPGA).

中央処理装置０１０では、まず、画像認識ＡＩ訓練済みデータ読み込み部０１１が画像認識ＡＩファイルを読み込む。画像認識ＡＩとは、ユーザが認識させたい物体を識別できるように訓練されたＡＩである。例としては、公に配布されている事前学習済みファイル（ＶＧＧ１６やＩｎｃｅｐｔｉｏｎＶ３など）がある。 In the central processing unit 010, first, the image recognition AI trained data reading unit 011 reads an image recognition AI file. The image recognition AI is an AI that has been trained so that an object that the user wants to recognize can be identified. Examples include pre-learned files (VGG16, Inception V3, etc.) that are publicly distributed.

このＡＩの機能を用いて、領域検出前画像読み込み部０１２が読み込んだ画像から、領域検出処理部０１３が物体を検出する。領域検出結果出力部０１４は、領域検出処理部０１３が特定した物体の種別と画像中の位置情報をファイルに出力する。なお、領域検出処理部０１３で物体の種別と位置情報を特定する方法の詳細は後述する。 Using this AI function, the region detection processing unit 013 detects an object from the image read by the pre-region detection image reading unit 012. The area detection result output unit 014 outputs the object type specified by the area detection processing unit 013 and position information in the image to a file. The details of the method for specifying the type and position information of the object by the area detection processing unit 013 will be described later.

データメモリ０２０は、中央処理装置０１０の各処理部が処理に用いるデータを格納する。具体的には、データメモリ０２０は、予測用画像データ０２１及び画像認識ＡＩ訓練済みデータ０２２を格納する。 The data memory 020 stores data used by each processing unit of the central processing unit 010 for processing. Specifically, the data memory 020 stores prediction image data 021 and image recognition AI trained data 022.

画像認識ＡＩ訓練済みデータ０５０は、画像認識ＡＩを実現するためのファイルであり、本実施例の物体検出装置を使用するユーザが予め作成しておくとよい。 The image recognition AI trained data 050 is a file for realizing the image recognition AI, and may be created in advance by a user who uses the object detection apparatus of the present embodiment.

なお、画像認識ＡＩの学習データは、物体検出データと異なり、ディレクトリごとに犬の画像だけ、猫の画像だけ、人の画像だけ等、識別させたい画像をディレクトリに分けて学習させるため、学習データを低コストで作成できる。本実施例では、ユーザが画像認識ＡＩだけで物体検出用の学習データを作成できる。 Note that the learning data for the image recognition AI is different from the object detection data. For each directory, the image to be identified, such as only the dog image, only the cat image, and only the human image, is divided into the learning data. Can be created at low cost. In this embodiment, the user can create learning data for object detection only by the image recognition AI.

グリッドパターンファイル０６０は、領域検出前画像０７０を分割する際のサイズを指定する。グリッドパターンファイル０６０は、本実施例の物体検出装置を使用するユーザが予め作成しておくとよいが、ユーザが変更できる。 The grid pattern file 060 specifies the size when dividing the pre-area detection image 070. The grid pattern file 060 is preferably created in advance by a user who uses the object detection apparatus of the present embodiment, but can be changed by the user.

中央処理装置０１０が実行するプログラムは、リムーバブルメディア（ＣＤ−ＲＯＭ、フラッシュメモリなど）又はネットワークを介して物体検出装置に提供され、非一時的記憶媒体である不揮発性の補助記憶装置に格納される。このため、物体検出装置は、リムーバブルメディアからデータを読み込むインターフェースを有するとよい。 The program executed by the central processing unit 010 is provided to the object detection device via a removable medium (CD-ROM, flash memory, etc.) or a network, and is stored in a nonvolatile auxiliary storage device that is a non-temporary storage medium. . For this reason, the object detection device may have an interface for reading data from a removable medium.

物体検出装置は、物理的に一つの計算機上で、又は、論理的又は物理的に構成された複数の計算機上で構成される計算機システムであり、複数の物理的計算機資源上に構築された仮想計算機上で動作してもよい。 An object detection device is a computer system configured on a single physical computer or a plurality of logically or physically configured computers, and is a virtual system constructed on a plurality of physical computer resources. It may operate on a computer.

図２は、領域検出結果ファイル０８０の構成例を示す図であり、物体検出処理装置が出力する領域検出結果ファイル０８０のフォーマットを示す。 FIG. 2 is a diagram showing a configuration example of the region detection result file 080, and shows a format of the region detection result file 080 output by the object detection processing device.

領域検出結果ファイル０８０は、画像ファイル名２０１、物体の種類２０２、物体の左上Ｘ座標２０３、左上Ｙ座標２０４、物体の幅２０５及び物体の高さ２０６を含むレコードが格納される（例えば、ＣＳＶ形式の）ファイルである。画像ファイル名２０１は、領域が検出されたファイル名である。物体の種類２０２には、０〜Ｎまでの整数が記録され、各数値が物体の種類（０＝犬、１＝猫、２＝人など）を示す。物体の左上Ｘ座標２０３及び左上Ｙ座標２０４は物体が画像中に含まれる矩形の左上の点の座標である。物体の幅２０５及び高さ２０６は、物体が画像中に含まれる矩形の大きさ（左上点から右下点まので横方向及び縦方向の長さ）である。領域検出結果ファイル０８０は、物体検出の速度を向上させる目的で、ＳＳＤやｙｏｌｏｖ２などの物体検出用の深層学習モデルの学習のために用いてもよい。 The area detection result file 080 stores a record including an image file name 201, an object type 202, an upper left X coordinate 203, an upper left Y coordinate 204, an object width 205, and an object height 206 (for example, CSV File). An image file name 201 is a file name in which an area is detected. In the object type 202, integers from 0 to N are recorded, and each numerical value indicates the object type (0 = dog, 1 = cat, 2 = human, etc.). The upper left X coordinate 203 and upper left Y coordinate 204 of the object are the coordinates of the upper left point of the rectangle in which the object is included in the image. The width 205 and the height 206 of the object are the size of the rectangle in which the object is included in the image (the length in the horizontal and vertical directions from the upper left point to the lower right point). The region detection result file 080 may be used for learning a deep learning model for detecting an object such as SSD or yolov2 for the purpose of improving the speed of object detection.

＜システム動作について＞
図３は、中央処理装置０１０が実行する処理のフローチャートである。 <About system operation>
FIG. 3 is a flowchart of processing executed by the central processing unit 010.

まず、画像認識ＡＩ訓練済みデータ読み込み部０１１が画像認識ＡＩ訓練済みデータ０５０を読み込む（３０１）。 First, the image recognition AI trained data reading unit 011 reads the image recognition AI trained data 050 (301).

次に、領域検出前画像読み込み部０１２が、領域検出前画像０７０を読み込み、読み込んだ画像ファイルの枚数をＩｍｇＮｕｍ変数に格納する（３０２）。 Next, the pre-area detection image reading unit 012 reads the pre-area detection image 070 and stores the number of read image files in the ImgNum variable (302).

次に、領域検出処理部０１３が、読み込んだ画像ごとに物体を検出し（３０３）、領域検出結果出力部０１４が、物体検出結果を領域検出結果ファイル０８０に書き込む（３０４）。 Next, the area detection processing unit 013 detects an object for each read image (303), and the area detection result output unit 014 writes the object detection result in the area detection result file 080 (304).

図４は、領域検出処理部０１３が実行する物体検出処理３０３の詳細のフローチャートである。 FIG. 4 is a detailed flowchart of the object detection process 303 executed by the area detection processing unit 013.

本実施例におけるグリッドとは、画像を探索する枠である。まず、ステップ４０１では、グリッドパターンファイル０６０を読み込み、探索枠をメモリに格納する。 The grid in the present embodiment is a frame for searching for an image. First, in step 401, the grid pattern file 060 is read and the search frame is stored in the memory.

グリッドパターンファイル０６０は、例えば図５に示すフォーマットのものを用いることができる。グリッドパターンファイル０６０はグリッドの幅（Ｗ）５０１と高さ（Ｈ）５０２が記述された（例えば、ＣＳＶ形式の）ファイルである。記述される各グリッドパターンは、少なくとも幅及び高さの一方が他のグリッドパターンと異なる。グリッドパターンファイル０６０で指定される単位は、画像に対する比率やピクセル単位、センチメートルなどである。領域検出前画像０７０のサイズや検出したい物体の画像中の比率に応じて、本実施例の物体検出装置のユーザがグリッドサイズを変更できる。なお、グリッドパターンファイル０６０に記述したグリッドサイズに対して物体のサイズが約２倍〜４倍程度まで検出できる。 As the grid pattern file 060, for example, the format shown in FIG. 5 can be used. The grid pattern file 060 is a file in which a grid width (W) 501 and a height (H) 502 are described (for example, in CSV format). Each described grid pattern is at least one of width and height different from the other grid patterns. The unit specified in the grid pattern file 060 is a ratio to an image, a pixel unit, a centimeter, or the like. The user of the object detection apparatus according to the present embodiment can change the grid size according to the size of the pre-area detection image 070 and the ratio of the object to be detected in the image. It should be noted that the object size can be detected from about 2 to 4 times the grid size described in the grid pattern file 060.

次に、グリッドパターンファイル０６０から読み込んだ複数のグリッドパターンごとにグリッド探索を行い、物体の種別と領域を検出する（４０２）。グリッド探索処理４０２の詳細は図６で説明する。 Next, a grid search is performed for each of the plurality of grid patterns read from the grid pattern file 060 to detect the type and area of the object (402). Details of the grid search processing 402 will be described with reference to FIG.

全てのグリッドパターンを用いたグリッド探索の終了後、グリッドパターンごとにグリッド探索処理４０２で求まった結果を均化してマージして領域の精度を向上する（４０３）。マージ処理４０３の詳細は図１０で説明する。 After the grid search using all the grid patterns is completed, the results obtained by the grid search processing 402 are averaged and merged for each grid pattern to improve the accuracy of the region (403). Details of the merge processing 403 will be described with reference to FIG.

ステップ４０３で得られた種別と領域を特定したデータを物体検出ＡＩの学習データとして用いると、学習データ作成のコストを削減できる。 If the data specifying the type and area obtained in step 403 is used as learning data for the object detection AI, the cost for creating learning data can be reduced.

図６は、グリッド探索処理４０２の詳細のフローチャートである。図６において、右側は処理のフローチャートであり、左側は処理される画像の例を示す。 FIG. 6 is a detailed flowchart of the grid search process 402. In FIG. 6, the right side is a flowchart of processing, and the left side shows an example of an image to be processed.

まず、３０２の領域検出前画像ファイルの読み込み処理で読み込んだ画像を４０１でグリッドパターンファイル０６０から読み込んだグリッドパターンの幅Ｗ、高さＨのグリッドに分割する。図６に示す例では、領域検出前画像０７０を幅Ｗ、高さＨのグリッド６０１１〜６０１９に９分割する（６０１）。 First, the image read in the process 302 for reading the image file before region detection is divided into grids having the width W and the height H of the grid pattern read from the grid pattern file 060 in 401. In the example shown in FIG. 6, the pre-area detection image 070 is divided into nine grids 6011 to 6019 having a width W and a height H (601).

分割した画像それぞれを画像認識ＡＩへ入力し、グリッド内に写る物体の種別と、その物体である確からしさを予測する（６０２）。ステップ６０２の処理によって、グリッド６０１１、６０１２、６０１４、６０１５は、それぞれ８５％、９０％、９０％、９０％の確率で車が写っていると予測される。同様に、グリッド６０１３には９９％の確率で信号が写っており、グリッド６０１６、６０１９には、それぞれ７５％、８５％の確率で人が写っており、グリッド６０１７、６０１８には、５％の確率で犬が写っていることが予測される。 Each of the divided images is input to the image recognition AI, and the type of object appearing in the grid and the probability of being the object are predicted (602). Through the processing in step 602, the grids 6011, 6012, 6014, and 6015 are predicted to have cars with a probability of 85%, 90%, 90%, and 90%, respectively. Similarly, the grid 6013 shows a signal with a probability of 99%, the grids 6016 and 6019 show a person with a probability of 75% and 85%, respectively, and the grids 6017 and 6018 show a 5% probability. It is predicted that the dog is reflected in the probability.

予測の結果、確率が特定の閾値より低いグリッドは、予測された種別の物体が写っていないグリッドと判定する（６０３）。例えば、閾値を５０％とすると、グリッド６０１７、６０１８の犬の確率は閾値より小さいため、予測された種別の物体（犬）が写っていないと判定し、検出対象から外している。 As a result of the prediction, a grid having a probability lower than a specific threshold is determined as a grid in which the predicted type of object is not captured (603). For example, if the threshold value is 50%, the probability of dogs in the grids 6017 and 6018 is smaller than the threshold value, so it is determined that the predicted type of object (dog) is not captured and is excluded from the detection target.

次に、複数の隣接するグリッドが同じ種別の物体であると判定した場合、グリッドの中心位置を求める（６０４）。例えば、隣接したグリッド６０１１、６０１２、６０１４及び６０１５に同じ種別の物体（車）が写っているため、グリッド６０１１、６０１２、６０１４及び６０１５で一つの中心グリッド６０４１を求める。同様に、隣接したグリッド６０１６及び６０１９には同じ種別の物体（人）が写っているため、グリッド６０１６及び６０１９で一つの中心グリッド６０４２を求める。グリッド６０１３では、一つのグリッドだけで信号が検出されているため、中心グリッド６０４３は検出したグリッドと同じ位置になる。中心グリッドの計算は図７で説明する。 Next, when it is determined that a plurality of adjacent grids are the same type of object, the center position of the grid is obtained (604). For example, since the same type of object (car) is shown in adjacent grids 6011, 6012, 6014 and 6015, one central grid 6041 is obtained from the grids 6011, 6012, 6014 and 6015. Similarly, since the same type of object (person) is shown in the adjacent grids 6016 and 6019, one central grid 6042 is obtained from the grids 6016 and 6019. In the grid 6013, since the signal is detected by only one grid, the center grid 6043 is at the same position as the detected grid. The calculation of the center grid is illustrated in FIG.

その後、同じ物体で隣接しているグリッドを一つのグリッドとして結合して全体グリッドを求める（６０５）。例えば、グリッド６０１１、６０１２、６０１４及び６０１５を結合して車の全体グリッド６０５１を作成する。同様に、グリッド６０１６及び６０１９を結合して人の全体グリッド６０５２を作成する。グリッド６０１３では、一つのグリッドだけで信号が検出されているため、全体グリッド６０５３と中心グリッド６０４３は一致する。 Thereafter, the adjacent grids of the same object are combined as one grid to obtain an entire grid (605). For example, the grids 6011, 6012, 6014, and 6015 are combined to create the entire vehicle grid 6051. Similarly, grids 6016 and 6019 are combined to create an overall grid 6052 for a person. In the grid 6013, since the signal is detected by only one grid, the whole grid 6053 and the center grid 6043 coincide.

そして、中心グリッドと全体グリッドとの平均を求める（６０６）。多くのグリッドでは領域の隅には物体が写っていないため、中心グリッドと全体グリッドとの平均を計算することで外枠を縮めている。例えば、図１１に示すように、中心グリッド１１０１と全体グリッド１１０２との平均を計算すると、全体グリッドに含まれる余白が除去された平均グリッド１１０３を生成できる。平均グリッドを求める計算は図８で説明する。 Then, an average of the center grid and the entire grid is obtained (606). In many grids, no object is shown in the corner of the area, so the outer frame is reduced by calculating the average of the center grid and the whole grid. For example, as shown in FIG. 11, when the average of the center grid 1101 and the overall grid 1102 is calculated, an average grid 1103 from which the margins included in the overall grid are removed can be generated. The calculation for obtaining the average grid will be described with reference to FIG.

なお、図６では、全体グリッドと中心グリッドとを用いて平均グリッドを求める処理を説明したが、平均グリッドを求めず、グリッドに分割された領域を統合して全体グリッドのみを求めてもよい。この場合、物体が写っている領域の特性精度は低くなるが、物体の有無を確実に検出できる。 In addition, although the process which calculates | requires an average grid using the whole grid and the center grid was demonstrated in FIG. 6, you may obtain | require only a whole grid by integrating | segmenting the area | region divided | segmented into the grid, without calculating | requiring an average grid. In this case, the accuracy of the characteristics of the area in which the object is shown is lowered, but the presence or absence of the object can be reliably detected.

図７は、中心グリッドの計算例を示す図である。 FIG. 7 is a diagram illustrating a calculation example of the center grid.

グリッドＧ１、Ｇ２、Ｇ３及びＧ４では同じ種別の物体が検出されている。各グリッドは、矩形の上側にｔｏｐ、左側にｌｅｆｔ、下側にｂｏｔｔｏｍ、右側にｒｉｇｈｔの座標を持つ。グリッドＧ１、Ｇ２、Ｇ３及びＧ４の中心となるグリッドＣの矩形の頂点は、各グリッドのｔｏｐ、ｌｅｆｔ、ｒｉｇｈｔ、ｂｏｔｔｏｍ座標の和をグリッド数で除した値である。 In the grids G1, G2, G3 and G4, the same type of object is detected. Each grid has coordinates of top on the rectangle, left on the left, bottom on the bottom, and right on the right. The rectangular vertex of the grid C that is the center of the grids G1, G2, G3, and G4 is a value obtained by dividing the sum of top, left, right, and bottom coordinates of each grid by the number of grids.

図７に計算式を示す。Ｇ１（ｔｏｐ）〜Ｇ４（ｔｏｐ）はグリッド７０１〜７０４の上辺のＹ座標であり、Ｇ１（ｔｏｐ）〜Ｇ４（ｔｏｐ）の平均値が中心グリッドの上辺のＹ座標Ｃ（ｔｏｐ）となる。同様に、Ｇ１（ｌｅｆｔ）〜Ｇ４（ｌｅｆｔ）はグリッド７０１〜７０４の左辺のＸ座標であり、Ｇ１（ｌｅｆｔ）〜Ｇ４（ｌｅｆｔ）の平均値が中心グリッドの左辺のＸ座標Ｃ（ｌｅｆｔ）となる。また、Ｇ１（ｒｉｇｈｔ）〜Ｇ４（ｒｉｇｈｔ）はグリッド７０１〜７０４の右辺のＸ座標であり、Ｇ１（ｒｉｇｈｔ）〜Ｇ４（ｒｉｇｈｔ）の平均値が中心グリッドの右辺のＸ座標Ｃ（ｒｉｇｈｔ）となる。また、Ｇ１（ｂｏｔｔｏｍ）〜Ｇ４（ｂｏｔｔｏｍ）はグリッド７０１〜７０４の下辺のＹ座標であり、Ｇ１（ｂｏｔｔｏｍ）〜Ｇ４（ｂｏｔｔｏｍ）の平均値が中心グリッドの下辺のＹ座標Ｃ（ｂｏｔｔｏｍ）となる。 FIG. 7 shows the calculation formula. G1 (top) to G4 (top) are the Y coordinates of the upper sides of the grids 701 to 704, and the average value of G1 (top) to G4 (top) is the Y coordinate C (top) of the upper side of the center grid. Similarly, G1 (left) to G4 (left) are the X coordinates of the left side of the grids 701 to 704, and the average value of G1 (left) to G4 (left) is the X coordinate C (left) of the left side of the center grid. Become. G1 (right) to G4 (right) are the X coordinates of the right side of the grids 701 to 704, and the average value of G1 (right) to G4 (right) is the X coordinate C (right) of the right side of the center grid. . G1 (bottom) to G4 (bottom) are Y coordinates of the lower sides of the grids 701 to 704, and an average value of G1 (bottom) to G4 (bottom) is a Y coordinate C (bottom) of the lower side of the center grid. .

図８は、平均グリッドの計算例を示す図である。 FIG. 8 is a diagram illustrating a calculation example of the average grid.

全体グリッド８０１と中心グリッド８０３との位置を平均したグリッド８０２の矩形の頂点は、全体グリッド８０１と中心グリッド８０３それぞれのｔｏｐ、ｌｅｆｔ、ｒｉｇｈｔ、ｂｏｔｔｏｍ座標の和を２で除した値である。 The rectangular vertex of the grid 802 obtained by averaging the positions of the overall grid 801 and the center grid 803 is a value obtained by dividing the sum of the top, left, right, and bottom coordinates of the overall grid 801 and the center grid 803 by 2.

図８に計算例を示す。Ｇ（ｔｏｐ）は全体グリッドの上辺のＹ座標であり、Ｃ（ｔｏｐ）は中心グリッドの上辺のＹ座標であり、Ｇ（ｔｏｐ）とＣ（ｔｏｐ）の平均値が平均グリッドの上辺のＹ座標Ｍ（ｔｏｐ）となる。同様に、Ｇ（ｌｅｆｔ）は全体グリッドの左辺のＸ座標であり、Ｃ（ｌｅｆｔ）は中心グリッドの左辺のＸ座標であり、Ｇ（ｌｅｆｔ）とＣ（ｌｅｆｔ）の平均値が平均グリッドの左辺の座標Ｍ（ｌｅｆｔ）となる。また、Ｇ（ｒｉｇｈｔ）は全体グリッドの右辺のＸ座標であり、Ｃ（ｒｉｇｈｔ）は中心グリッドの右辺のＸ座標であり、Ｇ（ｒｉｇｈｔ）とＣ（ｒｉｇｈｔ）の平均値が平均グリッドの右辺のＸ座標となる。また、Ｇ（ｂｏｔｔｏｍ）は全体グリッドの下辺のＹ座標であり、Ｃ（ｂｏｔｔｏｍ）は中心グリッドの下辺のＹ座標であり、Ｇ（ｂｏｔｔｏｍ）とＣ（ｂｏｔｔｏｍ）の平均値が平均グリッドの下辺のＹ座標となる。 FIG. 8 shows a calculation example. G (top) is the Y coordinate of the upper side of the entire grid, C (top) is the Y coordinate of the upper side of the center grid, and the average value of G (top) and C (top) is the Y coordinate of the upper side of the average grid. M (top). Similarly, G (left) is the X coordinate of the left side of the entire grid, C (left) is the X coordinate of the left side of the center grid, and the average value of G (left) and C (left) is the left side of the average grid. Coordinates M (left). G (right) is the X coordinate of the right side of the entire grid, C (right) is the X coordinate of the right side of the center grid, and the average value of G (right) and C (right) is the right side of the average grid. X coordinate. G (bottom) is the Y coordinate of the lower side of the entire grid, C (bottom) is the Y coordinate of the lower side of the center grid, and the average value of G (bottom) and C (bottom) is the lower side of the average grid. Y coordinate.

図９は、図６と同じ処理のフローチャートであるが、グリッドのサイズが小さくなっている。そのため、図６より小さな物体（信号、犬、猫など）を検出しやすいが、大きな物体（車など）は検出しにくい。このため、大きな物体は大きなグリッドで検出し、小さな物体は小さなグリッドで検出するとよい。 FIG. 9 is a flowchart of the same processing as that of FIG. 6, but the size of the grid is reduced. Therefore, it is easy to detect objects (signals, dogs, cats, etc.) smaller than those in FIG. 6, but large objects (cars, etc.) are difficult to detect. For this reason, it is preferable to detect a large object with a large grid and detect a small object with a small grid.

図１０は、マージ処理４０３を説明する図である。 FIG. 10 is a diagram for explaining the merge process 403.

図１０に示すように、複数のグリッドパターン１〜Ｎを用いて画像から領域を探索したところ、各画像において検出された物体（車、信号、人）の平均グリッドが求まっている。 As shown in FIG. 10, when an area is searched from an image using a plurality of grid patterns 1 to N, an average grid of objects (cars, signals, people) detected in each image is obtained.

次に、グリッド探索で得られた複数の平均グリッドを統合する。例えば、まず、検出された物体ごとに平均グリッドを重ね合わせて、平均グリッドの重なる面積が所定の閾値を超えているかを判定する。そして、重なる面積が所定の閾値を超えていれば、同じ物体を検出していると判定し、各平均グリッドの４隅（ｔｏｐ、ｌｅｆｔ、ｒｉｇｈｔ、ｂｏｔｔｏｍ）の平均値を計算して領域検出結果とする。平均値の計算は、単なる算術平均でも、 Next, a plurality of average grids obtained by grid search are integrated. For example, first, an average grid is overlaid for each detected object, and it is determined whether the area where the average grid overlaps exceeds a predetermined threshold. If the overlapping area exceeds a predetermined threshold, it is determined that the same object is detected, and the average value of the four corners (top, left, right, bottom) of each average grid is calculated and the region detection result And The average value can be calculated simply by arithmetic mean,

その後、計算された領域検出結果（４隅の座標値）を領域検出結果ファイル０８０に出力する。 Thereafter, the calculated region detection result (coordinate values at the four corners) is output to the region detection result file 080.

具体的には、車が検出された平均グリッドの領域は複数重なっているため、車が検出されたの四つの平均グリッドをマージしている。信号が検出された平均グリッドをマージし、人が検出された平均グリッドをマージする。マージによって、物体の周辺の不要な領域を除去し、領域分析性能を向上できる。 Specifically, since a plurality of areas of the average grid where the car is detected overlap, the four average grids where the car is detected are merged. Merge the average grid where the signal was detected and merge the average grid where the person was detected. By merging, unnecessary areas around the object can be removed, and the area analysis performance can be improved.

なお、重なる領域が所定の閾値より小さいければ、同じ種類の物体が複数検出されていると判定して、各平均グリッドを別領域として扱うとよい。 If the overlapping area is smaller than a predetermined threshold value, it is determined that a plurality of objects of the same type are detected, and each average grid may be handled as a different area.

以上に説明したように、本発明の実施例によると、画像処理システムは、入力された画像を所定のグリッドパターンによって分割し、前記分割された各領域に写っているオブジェクト及びその確度を推測し、前記推測されたオブジェクトの確度が所定の閾値より小さいオブジェクトを除外し、前記除外されなかったオブジェクトのうち、同種のオブジェクトが推測されており、隣接する領域を結合して全体グリッドを定めるので、従来は人手で物体の種類と位置を記述して作成していた学習データをＡＩに作成させることができ、学習データの作成コストの削減と学習データの精度を向上できる。また、また、物体の種別と物体の候補選出にもディープラーニングを用いることでＳｅｌｅｃｔｉｖｅＳｅａｒｃｈでは取りこぼしていた物体検出を可能とする。 As described above, according to the embodiment of the present invention, the image processing system divides an input image by a predetermined grid pattern, and estimates an object reflected in each of the divided areas and its accuracy. Since the estimated object accuracy is excluded from objects smaller than a predetermined threshold, among the objects that are not excluded, the same type of object is estimated, and the adjacent grid is combined to define the entire grid. Conventionally, learning data that has been created by manually describing the type and position of an object can be created in the AI, thereby reducing learning data creation costs and improving the accuracy of the learning data. In addition, by using deep learning for selecting object types and object candidates, it is possible to detect an object that has been missed by Selective Search.

また、画像処理システムは、前記同種のオブジェクトが推測された隣接する領域の中心位置に配置される中心グリッドを定め、前記中心グリッドが定められたオブジェクトの各々について、前記中心グリッドと前記全体グリッドとの間に平均グリッドを定めるので、余白を除去でき、背景に写り込んだ他の物体による学習精度の低下を抑制できる。 Further, the image processing system defines a center grid arranged at a center position of an adjacent region where the same type of object is estimated, and the center grid and the entire grid for each of the objects for which the center grid is determined. Since the average grid is defined between the two, the margin can be removed and the decrease in learning accuracy due to other objects reflected in the background can be suppressed.

また、前記画像を分割するために用いられるグリッドパターンは、幅及び高さの少なくとも一つが異なる複数の矩形が準備されており、前記画像処理システムは、入力された画像を複数のグリッドパターンによって分割された各領域について、全体グリッド、中心グリッド及び平均グリッドを定める処理を実行するので、様々な形状（例えば、縦長、横長）の物体を適切に検出できる。 The grid pattern used to divide the image is prepared with a plurality of rectangles having at least one different width and height, and the image processing system divides the input image by the plurality of grid patterns. Since the processing for determining the entire grid, the center grid, and the average grid is executed for each of the regions that have been formed, it is possible to appropriately detect objects having various shapes (for example, vertically long and horizontally long).

また、前記画像処理システムは、前記複数のグリッドパターンを用いて定められた平均グリッドを統合して、前記オブジェクトが存在する領域を特定するので、様々な形状の物体を適切に検出できる。 In addition, the image processing system integrates the average grid determined using the plurality of grid patterns to identify the area where the object exists, and thus can appropriately detect objects of various shapes.

また、前記画像処理システムは、前記複数のグリッドパターンを用いて定められた平均グリッドの矩形の各頂点の座標の平均を計算して、前記平均グリッドを統合するので、少ない計算量で、様々な形状の物体を適切に検出できる。 Further, the image processing system calculates the average of the coordinates of the vertices of the rectangle of the average grid determined using the plurality of grid patterns, and integrates the average grid. Shaped objects can be detected appropriately.

なお、本発明は前述した実施例に限定されるものではなく、添付した特許請求の範囲の趣旨内における様々な変形例及び同等の構成が含まれる。例えば、前述した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに本発明は限定されない。また、ある実施例の構成の一部を他の実施例の構成に置き換えてもよい。また、ある実施例の構成に他の実施例の構成を加えてもよい。また、各実施例の構成の一部について、他の構成の追加・削除・置換をしてもよい。 The present invention is not limited to the above-described embodiments, and includes various modifications and equivalent configurations within the scope of the appended claims. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and the present invention is not necessarily limited to those having all the configurations described. A part of the configuration of one embodiment may be replaced with the configuration of another embodiment. Moreover, you may add the structure of another Example to the structure of a certain Example. In addition, for a part of the configuration of each embodiment, another configuration may be added, deleted, or replaced.

また、前述した各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等により、ハードウェアで実現してもよく、プロセッサがそれぞれの機能を実現するプログラムを解釈し実行することにより、ソフトウェアで実現してもよい。 In addition, each of the above-described configurations, functions, processing units, processing means, etc. may be realized in hardware by designing a part or all of them, for example, with an integrated circuit, and the processor realizes each function. It may be realized by software by interpreting and executing the program to be executed.

各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記憶装置、又は、ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に格納することができる。 Information such as programs, tables, and files that realize each function can be stored in a storage device such as a memory, a hard disk, and an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, and a DVD.

また、制御線や情報線は説明上必要と考えられるものを示しており、実装上必要な全ての制御線や情報線を示しているとは限らない。実際には、ほとんど全ての構成が相互に接続されていると考えてよい。 Further, the control lines and the information lines are those that are considered necessary for the explanation, and not all the control lines and the information lines that are necessary for the mounting are shown. In practice, it can be considered that almost all the components are connected to each other.

０１０中央処理装置
０１１データ読み込み部
０１２領域検出前画像読み込み部
０１３領域検出処理部
０１４領域検出結果出力部
０２０データメモリ
０２１予測用画像データ
０２２画像認識ＡＩ訓練済みデータ
０３０プログラムメモリ
０４０表示装置
０５０画像認識ＡＩ訓練済みデータ
０６０グリッドパターンファイル
０７０領域検出前画像
０８０領域検出結果ファイル
０９０キーボード
１００ポインティングデバイス 010 Central processing unit 011 Data reading unit 012 Pre-region detection image reading unit 013 Region detection processing unit 014 Region detection result output unit 020 Data memory 021 Image data for prediction 022 Image recognition AI trained data 030 Program memory 040 Display device 050 Image recognition AI trained data 060 Grid pattern file 070 Image before area detection 080 Area detection result file 090 Keyboard 100 Pointing device

Claims

An image processing system,
An arithmetic device that executes a predetermined process, and a storage device connected to the arithmetic device,
The arithmetic unit is:
Divide the input image by a predetermined grid pattern,
Guess the object and its accuracy in each of the divided areas,
Exclude objects whose inferred object accuracy is less than a predetermined threshold;
An image processing system characterized in that, among the objects that are not excluded, an object of the same type is estimated, and an entire grid is defined by combining adjacent regions.

The image processing system according to claim 1,
The arithmetic unit is:
Defining a central grid that is placed at the center position of the adjacent region in which the same type of object is estimated;
An image processing system, wherein an average grid is defined between the center grid and the overall grid for each object for which the center grid is defined.

The image processing system according to claim 2,
The grid pattern used to divide the image is provided with a plurality of rectangles having different widths and heights,
The said arithmetic unit performs the process which determines the whole grid, a center grid, and an average grid about each area | region which divided | segmented the input image with the some grid pattern, The image processing system characterized by the above-mentioned.

The image processing system according to claim 3,
The said arithmetic unit integrates the average grid defined using these grid patterns, and specifies the area | region where the said object exists, The image processing system characterized by the above-mentioned.

The image processing system according to claim 4,
The image processing system, wherein the arithmetic unit calculates an average of coordinates of each vertex of a rectangle of an average grid determined using the plurality of grid patterns, and integrates the average grid.

An image processing method executed by an image processing system,
The image processing system includes an arithmetic device that executes predetermined processing, and a storage device connected to the arithmetic device,
The method
The arithmetic device divides the input image by a predetermined grid pattern,
The arithmetic device estimates the object and its accuracy in each of the divided areas,
The arithmetic unit excludes objects whose accuracy of the estimated object is smaller than a predetermined threshold;
An image processing method characterized in that the arithmetic device determines a whole grid by combining adjacent regions in which the same kind of objects are estimated among the objects not excluded.

The image processing method according to claim 6,
The arithmetic device defines a center grid arranged at a center position of an adjacent region where the same kind of object is estimated;
The image processing method, wherein the arithmetic device determines an average grid between the center grid and the overall grid for each object for which the center grid is determined.

The image processing method according to claim 7, comprising:
The grid pattern used to divide the image is provided with a plurality of rectangles having different widths and heights,
The image processing method is characterized in that the calculation device executes a process of determining an overall grid, a center grid, and an average grid for each region obtained by dividing an input image by a plurality of grid patterns.

The image processing method according to claim 8, comprising:
The image processing method, wherein the arithmetic device identifies an area where the object exists by integrating average grids determined using the plurality of grid patterns.

The image processing method according to claim 9, comprising:
The image processing method, wherein the arithmetic unit calculates an average of coordinates of each vertex of a rectangle of an average grid defined using the plurality of grid patterns, and integrates the average grid.