JP6939065B2

JP6939065B2 - Image recognition computer program, image recognition device and image recognition method

Info

Publication number: JP6939065B2
Application number: JP2017091201A
Authority: JP
Inventors: 明燮鄭; 馬場　孝之; 孝之馬場; 上原　祐介; 祐介上原; 信浩宮▲崎▼
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-05-01
Filing date: 2017-05-01
Publication date: 2021-09-22
Anticipated expiration: 2037-05-01
Also published as: JP2018190132A

Description

本発明は、例えば、画像上に表された被写体を認識する画像認識用コンピュータプログラム、画像認識装置及び画像認識方法に関する。 The present invention relates to, for example, an image recognition computer program, an image recognition device, and an image recognition method for recognizing a subject represented on an image.

防犯、防災、あるいは情報収集といった目的のために、建物内あるいは屋外に監視カメラが設置されることがある。このような監視カメラにより得られた画像に基づいて、上記のような所望の目的を達成するためには、その画像から、人あるいは車両といった認識対象となる被写体を正確に検出できることが好ましい。そこで、背景画像のフレームと対象映像のフレーム間の背景差分領域を求めることで、対象映像のフレームにおいて被写体が表された領域を抽出する技術が提案されている（例えば、特許文献１を参照）。 Surveillance cameras may be installed inside or outside the building for purposes such as crime prevention, disaster prevention, or information gathering. Based on the image obtained by such a surveillance camera, in order to achieve the desired purpose as described above, it is preferable that the subject to be recognized such as a person or a vehicle can be accurately detected from the image. Therefore, a technique has been proposed in which a region in which a subject is represented in a frame of a target video is extracted by obtaining a background subtraction region between the frame of the background image and the frame of the target video (see, for example, Patent Document 1). ..

特開平０６−２５１１５１号公報Japanese Unexamined Patent Publication No. 06-251151

画像から被写体を検出するための技術の一つとして、adaBoost、サポートベクトルマシン、あるいは、ニューラルネットワークといった、機械学習を用いた識別器が知られている。このような識別器を用いて画像から被写体を高精度で検出できるようにするために、カメラの設置環境下において被写体が表された様々な画像、及び、被写体が表されていない様々な画像が、その識別器を学習するための教師画像として収集される。 As one of the techniques for detecting a subject from an image, a classifier using machine learning such as adaBoost, a support vector machine, or a neural network is known. In order to enable highly accurate detection of the subject from the image using such a classifier, various images in which the subject is represented and various images in which the subject is not represented are displayed under the installation environment of the camera. , Collected as a teacher image for learning the classifier.

このような教師画像を収集するために、画像ごとに、その画像に被写体が表されているか否かを確認し、かつ、被写体が表されている場合には、被写体が表されている領域を示す枠及び識別情報などを画像に付与する作業が行われる。作業者が、画像ごとにこのような作業を実施すると、その工数は膨大となる。そのため、このような作業は、極力自動で行えることが好ましい。しかしながら、例えば、特許文献１に記載されたような技術を利用して、画像に被写体が表されているか否かを自動的に判別する場合でも、被写体でないものを被写体と誤検出したり、被写体の検出に失敗することがある。このような場合、得られた教師画像のセットは識別器の学習には適さないものとなる。 In order to collect such a teacher image, for each image, it is confirmed whether or not the subject is represented in the image, and if the subject is represented, the area in which the subject is represented is displayed. The work of adding the frame to be shown and the identification information to the image is performed. If the worker performs such work for each image, the man-hours will be enormous. Therefore, it is preferable that such work can be performed automatically as much as possible. However, for example, even when it is automatically determined whether or not a subject is represented in an image by using a technique as described in Patent Document 1, a non-subject may be erroneously detected as a subject or a subject may be detected. May fail to be detected. In such a case, the obtained set of teacher images is not suitable for learning the classifier.

一つの側面では、本発明は、被写体が表された領域が誤検出されている可能性がある画像を特定することが可能な画像認識用コンピュータプログラムを提供することを目的とする。 In one aspect, it is an object of the present invention to provide an image recognition computer program capable of identifying an image in which a region in which a subject is represented may be erroneously detected.

一つの実施形態によれば、画像認識用コンピュータプログラムが提供される。この画像認識用コンピュータプログラムは、撮像部により得られた時系列の複数の画像のそれぞれから被写体が表された領域を検出し、被写体が表された領域の縦横比の時系列に沿った変化度合い、または、被写体が表された領域内の各画素の前景の値と背景の値との差に応じて、複数の画像のうち、被写体が表された領域が誤検出されている可能性がある要注意画像を特定する、ことをコンピュータに実行させるための命令を含む。 According to one embodiment, a computer program for image recognition is provided. This computer program for image recognition detects an area in which the subject is represented from each of a plurality of images in the time series obtained by the imaging unit, and the degree of change in the aspect ratio of the area in which the subject is represented along with the time series. Or, depending on the difference between the foreground value and the background value of each pixel in the area where the subject is represented, the area where the subject is represented may be erroneously detected among the plurality of images. Includes instructions to get the computer to identify the image of interest.

一つの側面によれば、被写体が表された領域が誤検出されている可能性がある画像を特定することができる。 According to one aspect, it is possible to identify an image in which the area in which the subject is represented may be erroneously detected.

一つの実施形態による、画像認識装置のハードウェア構成図である。It is a hardware block diagram of the image recognition apparatus according to one Embodiment. 画像認識処理に関するＣＰＵの機能ブロック図である。It is a functional block diagram of a CPU related to an image recognition process. 画素値差算出の概要の説明図である。It is explanatory drawing of the outline of the pixel value difference calculation. 被写体領域の縦横比算出の概要の説明図である。It is explanatory drawing of the outline of aspect ratio calculation of a subject area. 検出情報テーブルの一例を示す図である。It is a figure which shows an example of the detection information table. ユーザインターフェースに表示される画像の一例を示す図である。It is a figure which shows an example of the image displayed on the user interface. （ａ）及び（ｂ）は、変形例による、ユーザインターフェースに表示される画像の一例を示す図である。(A) and (b) are diagrams showing an example of an image displayed on the user interface according to a modified example. 画像認識処理の動作フローチャートである。It is an operation flowchart of an image recognition process.

以下、図を参照しつつ、画像認識装置、及び、その画像認識装置で利用される画像認識方法及び画像認識用コンピュータプログラムについて説明する。この画像認識装置は、例えば、機械学習に基づく被写体認識用の識別器の学習に使用される教師画像のセットを作成するために用いられる。そのために、この画像認識装置は、その識別器が利用されるカメラの設置環境下で、そのカメラにより得られた一連の複数の画像のそれぞれから、検出対象となる被写体が表されている被写体領域を検出する。そしてこの画像認識装置は、被写体領域内の各画素の前景と背景間の画素値の差の統計的代表値、及び、被写体領域の縦横比を算出する。そしてこの画像認識装置は、画素値の差の統計的代表値または被写体領域の縦横比の時系列に沿った変化度合いに応じて、被写体領域が誤検出されている可能性がある画像を要注意画像として特定する。なお、本明細書において、被写体領域が誤検出されていることには、被写体領域に被写体以外のものが表されていることだけでなく、被写体領域に被写体の一部しか表されていないことも含まれる。 Hereinafter, the image recognition device, the image recognition method used in the image recognition device, and the computer program for image recognition will be described with reference to the drawings. This image recognition device is used, for example, to create a set of teacher images used for learning a classifier for subject recognition based on machine learning. Therefore, in the installation environment of the camera in which the classifier is used, this image recognition device is a subject area in which the subject to be detected is represented from each of a series of a plurality of images obtained by the camera. Is detected. Then, this image recognition device calculates a statistical representative value of the difference in pixel values between the foreground and the background of each pixel in the subject area, and the aspect ratio of the subject area. Then, this image recognition device needs attention to an image in which the subject area may be erroneously detected according to the statistical representative value of the difference in pixel values or the degree of change in the aspect ratio of the subject area along the time series. Identify as an image. In the present specification, the fact that the subject area is erroneously detected means that not only the subject area represents something other than the subject, but also that the subject area represents only a part of the subject. included.

図１は、一つの実施形態による、画像認識装置のハードウェア構成図である。図１に示されるように、画像認識装置１は、カメラ２と、通信インターフェース３と、ユーザインターフェース４と、メモリ５と、記憶媒体アクセス装置６と、ＣＰＵ(Central Processing Unit)７とを有する。 FIG. 1 is a hardware configuration diagram of an image recognition device according to one embodiment. As shown in FIG. 1, the image recognition device 1 includes a camera 2, a communication interface 3, a user interface 4, a memory 5, a storage medium access device 6, and a CPU (Central Processing Unit) 7.

カメラ２は、撮像部の一例であり、所定の撮影範囲を撮影し、その撮影範囲が写った画像を生成する。そのために、カメラ２は、CCDあるいはC-MOSなど、可視光に感度を有する固体撮像素子のアレイで構成された２次元検出器と、その２次元検出器上に撮影範囲の像を結像する結像光学系とを有する。カメラ２は、一定の撮影周期（例えば1/30秒）ごとに画像を生成する。そしてカメラ２は、画像を生成する度に、その画像を、通信ネットワークを介して、通信インターフェース３へ出力する。
本実施形態では、カメラ２により生成される画像は、各画素の値が輝度値で表されるモノクロ画像である。しかし、カメラ２により生成される画像は、各画素の値がRGB色空間または他の色空間（例えば、HLS色空間あるいはYPbPr色空間）の値で表されるカラー画像であってもよい。 The camera 2 is an example of an imaging unit, captures a predetermined shooting range, and generates an image in which the shooting range is captured. Therefore, the camera 2 forms an image of a photographing range on a two-dimensional detector composed of an array of solid-state image sensors having sensitivity to visible light such as a CCD or C-MOS, and the two-dimensional detector. It has an imaging optical system. The camera 2 generates an image at regular shooting cycles (for example, 1/30 second). Then, each time the camera 2 generates an image, the camera 2 outputs the image to the communication interface 3 via the communication network.
In the present embodiment, the image generated by the camera 2 is a monochrome image in which the value of each pixel is represented by a luminance value. However, the image generated by the camera 2 may be a color image in which the value of each pixel is represented by the value of the RGB color space or another color space (for example, the HLS color space or the YPbPr color space).

通信インターフェース３は、イーサネット（登録商標）などの通信規格に従った通信ネットワークに接続するための通信インターフェース及びその制御回路を有する。通信インターフェース３は、通信ネットワークを介してカメラ２から画像を受け取り、その受け取った画像をＣＰＵ７にわたす。また通信インターフェース３は、ＣＰＵ７から受け取った、機械学習システムの学習用の教師画像のセットを、通信ネットワークを介して他の機器（図示せず）へ出力してもよい。 The communication interface 3 includes a communication interface for connecting to a communication network according to a communication standard such as Ethernet (registered trademark) and a control circuit thereof. The communication interface 3 receives an image from the camera 2 via the communication network and passes the received image to the CPU 7. Further, the communication interface 3 may output a set of teacher images for learning of the machine learning system received from the CPU 7 to another device (not shown) via the communication network.

ユーザインターフェース４は、例えば、キーボードとマウスなどの入力装置と、液晶ディスプレイといった表示装置とを有する。あるいは、ユーザインターフェース４は、タッチパネルといった、入力装置と表示装置とが一体化された装置を有していてもよい。そしてユーザインターフェース４は、例えば、ＣＰＵ７から受け取った、画像認識処理が行われた画像とその画像認識結果とを表示装置に表示する。また、ユーザインターフェース４は、表示装置に表示された画像及び画像認識結果を確認したユーザの操作に応じて修正された被写体領域の外接矩形を表す情報及び被写体の識別情報などをＣＰＵ７へ出力する。 The user interface 4 has, for example, an input device such as a keyboard and a mouse, and a display device such as a liquid crystal display. Alternatively, the user interface 4 may have a device such as a touch panel in which an input device and a display device are integrated. Then, the user interface 4 displays, for example, an image received from the CPU 7 that has undergone image recognition processing and the image recognition result on the display device. Further, the user interface 4 outputs the image displayed on the display device, information representing the circumscribed rectangle of the subject area modified according to the operation of the user who confirmed the image recognition result, subject identification information, and the like to the CPU 7.

メモリ５は、記憶部の一例であり、例えば、読み書き可能な半導体メモリと読み出し専用の半導体メモリである。そしてメモリ５は、例えば、ＣＰＵ７で実行される画像認識処理を実行するための各種のデータ及び認識結果などを記憶する。さらに、メモリ５は、カメラ２から取得した画像を一定期間記憶してもよい。 The memory 5 is an example of a storage unit, and is, for example, a read / write semiconductor memory and a read-only semiconductor memory. Then, the memory 5 stores, for example, various data and recognition results for executing the image recognition process executed by the CPU 7. Further, the memory 5 may store the image acquired from the camera 2 for a certain period of time.

記憶媒体アクセス装置６は、記憶部の他の一例であり、例えば、磁気ディスク、半導体メモリカード及び光記憶媒体といった記憶媒体８にアクセスする装置である。記憶媒体アクセス装置６は、例えば、記憶媒体８に記憶された、ＣＰＵ７上で実行される画像認識処理用のコンピュータプログラムを読み込み、ＣＰＵ７に渡す。あるいは、記憶媒体アクセス装置６は、作成された教師画像のセットをＣＰＵ７から受け取って、そのセットを記憶媒体８に書き込んでもよい。 The storage medium access device 6 is another example of the storage unit, and is a device that accesses a storage medium 8 such as a magnetic disk, a semiconductor memory card, and an optical storage medium. The storage medium access device 6 reads, for example, a computer program for image recognition processing executed on the CPU 7 stored in the storage medium 8 and passes it to the CPU 7. Alternatively, the storage medium access device 6 may receive the created set of teacher images from the CPU 7 and write the set to the storage medium 8.

ＣＰＵ７は、制御部の一例であり、例えば、１個または複数個のプロセッサ及びその周辺回路を有する。そしてＣＰＵ７は、画像認識装置１全体を制御する。また、ＣＰＵ７は、通信インターフェース３を介してカメラ２から画像を受け取る度に、その画像に画像生成日時を対応付けてメモリ５に保存する。そしてＣＰＵ７は、教師画像のセットに含める予定の複数の画像のそれぞれに対して画像認識処理を実行する。 The CPU 7 is an example of a control unit, and includes, for example, one or a plurality of processors and peripheral circuits thereof. Then, the CPU 7 controls the entire image recognition device 1. Further, each time the CPU 7 receives an image from the camera 2 via the communication interface 3, the CPU 7 associates the image with the image generation date and time and stores the image in the memory 5. Then, the CPU 7 executes an image recognition process for each of the plurality of images to be included in the set of teacher images.

本実施形態では、画像認識処理の対象となる複数の画像は、カメラ２が連続的に撮影することにより生成された、時系列に沿って連続する一連の画像である。なお、画像認識処理の対象となる複数の画像には、そのような一連の画像のセットが複数含まれていてもよい。また、画像認識処理における認識対象となる被写体は人である。しかし、認識対象となる被写体は人に限られず、例えば、車両あるいは人以外の他の動物であってもよい。 In the present embodiment, the plurality of images to be subjected to the image recognition processing are a series of continuous images in chronological order generated by the continuous shooting by the camera 2. The plurality of images to be subjected to the image recognition processing may include a plurality of sets of such a series of images. Further, the subject to be recognized in the image recognition process is a person. However, the subject to be recognized is not limited to a human being, and may be, for example, a vehicle or an animal other than a human being.

図２は、画像認識処理に関するＣＰＵ７の機能ブロック図である。図２に示されるように、ＣＰＵ７は、被写体検出部１１と、画素値差算出部１２と、縦横比算出部１３と、要注意判定部１４と、提示部１５とを有する。 FIG. 2 is a functional block diagram of the CPU 7 regarding image recognition processing. As shown in FIG. 2, the CPU 7 includes a subject detection unit 11, a pixel value difference calculation unit 12, an aspect ratio calculation unit 13, a caution determination unit 14, and a presentation unit 15.

ＣＰＵ７が有するこれらの各部は、例えば、ＣＰＵ７が有するプロセッサ上で実行されるコンピュータプログラムによって実現される機能モジュールである。あるいは、ＣＰＵ７が有するこれらの各部は、ファームウェアとして画像認識装置１に実装されてもよい。 Each of these parts included in the CPU 7 is, for example, a functional module realized by a computer program executed on the processor included in the CPU 7. Alternatively, each of these parts of the CPU 7 may be mounted on the image recognition device 1 as firmware.

被写体検出部１１は、画像認識処理の対象となる複数の画像のそれぞれから被写体が表された領域である被写体領域を検出する。なお、被写体検出部１１は、複数の画像のそれぞれに対して同一の処理を行えばよいので、以下では、一つの画像に対する処理について説明する。 The subject detection unit 11 detects a subject area, which is an area in which the subject is represented, from each of a plurality of images to be image recognition processed. Since the subject detection unit 11 may perform the same processing on each of the plurality of images, the processing on one image will be described below.

被写体検出部１１は、例えば、画像に対して背景差分処理を行うことにより、被写体領域を検出する。この場合、被写体検出部１１は、画像と背景画像との各対応画素間の輝度値の差の絶対値を算出し、輝度値の差の絶対値が所定の閾値以上となる画素を抽出する。そして被写体検出部１１は、抽出した画素の集合を被写体領域とする。なお、背景画像は、例えば、カメラ２の撮影範囲内に被写体が存在しないときにカメラ２がその撮影範囲を撮影することにより生成され、メモリ５に予め保存される。 The subject detection unit 11 detects the subject area by performing background subtraction processing on the image, for example. In this case, the subject detection unit 11 calculates the absolute value of the difference in the brightness values between the corresponding pixels of the image and the background image, and extracts the pixels in which the absolute value of the difference in the brightness values is equal to or more than a predetermined threshold value. Then, the subject detection unit 11 sets the set of extracted pixels as the subject area. The background image is generated, for example, when the camera 2 shoots the shooting range when the subject does not exist in the shooting range of the camera 2, and is stored in the memory 5 in advance.

なお、被写体検出部１１は、抽出した画素の集合に対してラベリング処理を行って、互いに連結される画素同士を一つの被写体領域としてもよい。これにより、被写体検出部１１は、一つの画像に複数の被写体が写っている場合でも、被写体ごとに、その被写体が表される被写体領域を検出できる。また、被写体検出部１１は、ラベリング処理の前に、モルフォロジーの膨張収縮演算を実行して得られる領域を被写体領域としてもよい。これにより、被写体検出部１１は、被写体の一部が写っているにもかかわらず、孤立した画素を被写体領域に含めることができる。 The subject detection unit 11 may perform labeling processing on the set of extracted pixels, and the pixels connected to each other may be used as one subject area. As a result, the subject detection unit 11 can detect a subject area in which the subject is represented for each subject even when a plurality of subjects are captured in one image. Further, the subject detection unit 11 may use a region obtained by executing the expansion / contraction calculation of the morphology as the subject region before the labeling process. As a result, the subject detection unit 11 can include isolated pixels in the subject area even though a part of the subject is captured.

また、被写体検出部１１は、カメラ２の設置環境に対して最適化されていない、被写体検出用の汎用識別器を用いて、画像から被写体領域を検出してもよい。この場合、被写体検出部１１は、画像上にウィンドウを設定し、ウィンドウ内の各画素の値から、汎用識別器に入力する特徴（例えば、Haar-like特徴）を算出して、その特徴を汎用識別器に入力することで、ウィンドウ内に被写体が写っているか否か判定する。そして被写体検出部１１は、被写体が写っていると判定されたウィンドウに対して、例えば、改めて背景差分処理を実行することで、被写体領域を検出してもよい。被写体検出部１１は、ウィンドウの位置を変更しながら上記の処理を繰り返すことで、画像上の何れの位置に写っている被写体についても、被写体領域を検出できる。 Further, the subject detection unit 11 may detect the subject area from the image by using a general-purpose classifier for subject detection, which is not optimized for the installation environment of the camera 2. In this case, the subject detection unit 11 sets a window on the image, calculates a feature (for example, a Haar-like feature) to be input to the general-purpose classifier from the value of each pixel in the window, and general-purposes the feature. By inputting to the classifier, it is determined whether or not the subject is shown in the window. Then, the subject detection unit 11 may detect the subject area by performing background subtraction processing again on the window determined to show the subject. By repeating the above process while changing the position of the window, the subject detection unit 11 can detect the subject area of the subject appearing at any position on the image.

なお、汎用識別器として、例えば、adaBoost識別器、サポートベクトルマシンあるいは多層パーセプトロンなどを用いることができる。ただし、汎用識別器は、カメラ２の設置環境に応じて最適化された識別器と比較して、被写体の検出精度は低い。そこで汎用識別器は、被写体以外のものが写っている領域を被写体領域として誤検出する可能性が若干高くても、被写体が写っているにもかかわらず、被写体領域の検出に失敗する可能性が低くなるように調整されていることが好ましい。 As the general-purpose classifier, for example, an adaBoost classifier, a support vector machine, a multi-layer perceptron, or the like can be used. However, the general-purpose classifier has lower subject detection accuracy than the classifier optimized according to the installation environment of the camera 2. Therefore, the general-purpose classifier may fail to detect the subject area even though the subject is captured, even if there is a slight possibility that the area other than the subject is erroneously detected as the subject area. It is preferably adjusted to be low.

被写体検出部１１は、画像上に複数の被写体領域が検出された場合、それら被写体領域のうちの面積が最大となる被写体領域を残し、他の被写体領域は誤検出されたものとして削除してもよい。あるいは、被写体検出部１１は、面積が所定の面積閾値未満となる被写体領域について、誤検出されたものとして削除してもよい。 When a plurality of subject areas are detected on the image, the subject detection unit 11 leaves the subject area having the largest area among the subject areas, and deletes the other subject areas as erroneously detected. good. Alternatively, the subject detection unit 11 may delete the subject area whose area is less than a predetermined area threshold value as being erroneously detected.

検出した被写体領域は、例えば、被写体領域に含まれる各画素の値と被写体領域外の各画素の値とが異なる２値画像で表される。被写体検出部１１は、各画像について、検出した被写体領域を画素値差算出部１２及び縦横比算出部１３へ通知する。 The detected subject area is represented by, for example, a binary image in which the value of each pixel included in the subject area and the value of each pixel outside the subject area are different. The subject detection unit 11 notifies the pixel value difference calculation unit 12 and the aspect ratio calculation unit 13 of the detected subject area for each image.

画素値差算出部１２は、画像認識処理の対象となる複数の画像のうち、被写体領域が検出された画像に対して、被写体領域内の前景と背景間の画素値の差の統計的代表値を算出する。この画素値の差の統計的代表値は、被写体領域が誤検出されている可能性がある要注意画像を特定するために用いられる特徴量の一つである。なお、画素値差算出部１２は、複数の画像のそれぞれに対して同一の処理を行えばよいので、以下では、一つの画像に対する処理について説明する。 The pixel value difference calculation unit 12 is a statistical representative value of the difference in pixel values between the foreground and the background in the subject area with respect to the image in which the subject area is detected among the plurality of images to be image recognition processed. Is calculated. The statistical representative value of the difference between the pixel values is one of the feature quantities used for identifying the image requiring attention in which the subject area may be erroneously detected. Since the pixel value difference calculation unit 12 may perform the same processing for each of the plurality of images, the processing for one image will be described below.

画素値差算出部１２は、例えば、直前の画像について被写体領域が確認されている場合、直前の画像について設定された、被写体領域の外接矩形と同サイズの画素値差算出枠を、着目する画像の被写体領域の位置に応じて設定する。そして画素値差算出部１２は、画素値差算出枠内の被写体領域の各画素について前景と背景間の画素値の差を算出する。例えば、画素値差算出部１２は、画素値差算出枠の上端を被写体領域の上端と一致させ、かつ、画素値差算出枠の水平方向の中心が、被写体領域の重心の水平方向の座標と一致するように、画素値差算出枠を設定する。これにより、着目する画像において被写体領域が正確に検出されていない場合でも、画素値差算出部１２は、画素値差が算出される領域を適切に設定できる。
なお、直前の画像について検出された被写体領域が作業者により確認されていない場合には、画素値差算出部１２は、着目する画像の被写体領域の外接矩形そのものを画素値差算出枠としてもよい。 For example, when the subject area is confirmed for the immediately preceding image, the pixel value difference calculation unit 12 focuses on the pixel value difference calculation frame having the same size as the circumscribing rectangle of the subject area set for the immediately preceding image. Set according to the position of the subject area of. Then, the pixel value difference calculation unit 12 calculates the difference in pixel values between the foreground and the background for each pixel in the subject area within the pixel value difference calculation frame. For example, in the pixel value difference calculation unit 12, the upper end of the pixel value difference calculation frame coincides with the upper end of the subject area, and the horizontal center of the pixel value difference calculation frame is the horizontal coordinate of the center of gravity of the subject area. Set the pixel value difference calculation frame so that they match. As a result, even if the subject area is not accurately detected in the image of interest, the pixel value difference calculation unit 12 can appropriately set the area in which the pixel value difference is calculated.
If the operator has not confirmed the subject area detected for the immediately preceding image, the pixel value difference calculation unit 12 may use the circumscribed rectangle itself of the subject area of the image of interest as the pixel value difference calculation frame. ..

本実施形態では、画素値差算出部１２は、被写体領域を複数のサブ領域に分割する。その際、画素値差算出部１２は、互いに類似する画素値を持つ画素同士が同じサブ領域に含まれるように、例えば、被写体領域の前景内の各画素の値（すなわち、着目画像における、被写体領域内の各画素の値）に基づいて、被写体領域をサブ領域ごとに分割する。なお、被写体領域をサブ領域ごとに分割するために、画素値差算出部１２は、例えば、単純領域拡張法といった領域分割法を用いることができる。また、カメラ２により生成される画像がカラー画像である場合には、画素値差算出部１２は、領域分割法として、例えば、Maximally Stable Color Regions(MSCR)法といった、色情報を用いる領域分割法を使用してもよい。 In the present embodiment, the pixel value difference calculation unit 12 divides the subject area into a plurality of sub-areas. At that time, the pixel value difference calculation unit 12 determines, for example, the value of each pixel in the foreground of the subject area (that is, the subject in the image of interest) so that pixels having pixel values similar to each other are included in the same sub-region. The subject area is divided into sub-areas based on the value of each pixel in the area). In order to divide the subject area into sub-areas, the pixel value difference calculation unit 12 can use an area decomposition method such as a simple area expansion method. When the image generated by the camera 2 is a color image, the pixel value difference calculation unit 12 uses color information such as the Maximumly Stable Color Regions (MSCR) method as the domain decomposition method. May be used.

画素値差算出部１２は、サブ領域ごとに、そのサブ領域内の各画素について、前景と背景の画素値の差の絶対値を算出する。なお、前景の画素の画素値は、着目する画像における、対応画素の画素値であり、背景の画素の画素値は、背景画像における、対応画素の画素値である。また、本実施形態では、画素値の差の絶対値の算出に利用する画素値は、輝度値とすることができる。しかし、カメラ２により生成される画像がカラー画像である場合、画素値の差の絶対値の算出に利用する画素値は、何れかの色成分、色相あるいは彩度の値であってもよい。 The pixel value difference calculation unit 12 calculates the absolute value of the difference between the pixel values of the foreground and the background for each pixel in the sub-region for each sub-region. The pixel value of the pixel in the foreground is the pixel value of the corresponding pixel in the image of interest, and the pixel value of the background pixel is the pixel value of the corresponding pixel in the background image. Further, in the present embodiment, the pixel value used for calculating the absolute value of the difference between the pixel values can be a luminance value. However, when the image generated by the camera 2 is a color image, the pixel value used for calculating the absolute value of the difference between the pixel values may be any color component, hue, or saturation value.

画素値差算出部１２は、サブ領域ごとに、そのサブ領域内の各画素の画素値の差の絶対値の平均値を、前景と背景間の画素値の差の統計的代表値として算出する。 The pixel value difference calculation unit 12 calculates, for each sub-region, the average value of the absolute values of the pixel value differences of each pixel in the sub-region as a statistical representative value of the pixel value difference between the foreground and the background. ..

図３は、画素値差算出の概要の説明図である。図３に示されるように、着目する画像３００と背景画像３０１間の背景差分により得られる差分画像３０２上で、被写体領域３１０が特定される。被写体領域３１０が複数のサブ領域３１１に分割され、サブ領域３１１ごとに、前景と背景間の画素値の差の統計的代表値mdiが算出される。 FIG. 3 is an explanatory diagram of an outline of pixel value difference calculation. As shown in FIG. 3, the subject area 310 is specified on the difference image 302 obtained by the background difference between the image 300 of interest and the background image 301. The subject area 310 is divided into a plurality of sub-regions 311 and a statistical representative value mdi of the difference in pixel values between the foreground and the background is calculated for each sub-region 311.

なお、変形例によれば、画素値差算出部１２は、サブ領域ごとに、そのサブ領域内の各画素の輝度値の差の絶対値の最頻値あるいは中央値を、前景と背景間の画素値の差の統計的代表値として算出してもよい。また、他の変形例によれば、画素値差算出部１２は、被写体領域全体として、前景と背景間の画素値の差の絶対値の平均値、中央値あるいは最頻値の何れかを、前景と背景間の画素値の差の統計的代表値として算出してもよい。 According to the modified example, the pixel value difference calculation unit 12 sets the mode or median of the absolute value of the difference in the brightness values of each pixel in the sub region as the mode between the foreground and the background for each sub region. It may be calculated as a statistical representative value of the difference in pixel values. Further, according to another modification, the pixel value difference calculation unit 12 determines either the average value, the median value, or the mode value of the absolute value of the difference in pixel values between the foreground and the background as the entire subject area. It may be calculated as a statistical representative value of the difference in pixel values between the foreground and the background.

画素値差算出部１２は、画像ごとに、その画像の被写体領域のサブ領域ごとの画素値の差の統計的代表値を要注意判定部１４へ通知する。また、被写体領域全体として一つの統計的代表値が算出されている場合、画素値差算出部１２は、被写体領域について算出されたその統計的代表値を要注意判定部１４へ通知する。 The pixel value difference calculation unit 12 notifies the caution determination unit 14 of the statistical representative value of the difference in pixel values for each sub-region of the subject region of the image for each image. Further, when one statistical representative value is calculated for the entire subject area, the pixel value difference calculation unit 12 notifies the caution determination unit 14 of the statistical representative value calculated for the subject area.

縦横比算出部１３は、画像認識処理の対象となる複数の画像のうち、被写体領域が検出された画像に対して、被写体領域の縦横比を算出する。なお、縦横比算出部１３は、複数の画像のそれぞれに対して同一の処理を行えばよいので、以下では、一つの画像に対する処理について説明する。 The aspect ratio calculation unit 13 calculates the aspect ratio of the subject area with respect to the image in which the subject area is detected among the plurality of images to be subjected to the image recognition processing. Since the aspect ratio calculation unit 13 may perform the same processing on each of the plurality of images, the processing on one image will be described below.

縦横比算出部１３は、被写体領域に対して外接矩形を設定し、その外接矩形の横幅に対する縦幅の比を、被写体領域の縦横比として算出する。画像に表される被写体（例えば、人）の縦横比の取り得る範囲は、その被写体の形状に応じて予め想定される。また、被写体領域の縦横比は、ある画像について被写体の一部が被写体領域に含まれていなかったり、逆に、被写体領域に被写体以外の物が含まれると、時系列に大きく変化することがある。そのため、被写体領域の縦横比の時系列の変化は、要注意画像を特定するために用いられる特徴量の他の一つである。 The aspect ratio calculation unit 13 sets an extrinsic rectangle for the subject area, and calculates the ratio of the aspect width to the width of the extrinsic rectangle as the aspect ratio of the subject area. The range in which the aspect ratio of the subject (for example, a person) represented in the image can be taken is assumed in advance according to the shape of the subject. In addition, the aspect ratio of the subject area may change significantly in time series if a part of the subject is not included in the subject area for a certain image, or conversely, if the subject area includes an object other than the subject. .. Therefore, the time-series change in the aspect ratio of the subject area is one of the other features used to specify the image requiring attention.

図４は、被写体領域の縦横比算出の概要の説明図である。画像４００上の被写体領域４０１に対して、被写体領域４０１の上下端、及び左右端とそれぞれ接するように、外接矩形４０２が設定される。この場合、被写体領域４０１の縦横比Rは、外接矩形４０２の横幅Wに対する縦幅Hの比として算出される。すなわち、外接矩形４０２の左上端の画素SPの座標が(x1,y1)であり、右下端の画素EPの座標が(x2,y2)であれば、縦横比Rは、次式で表される。

FIG. 4 is an explanatory diagram of an outline of the aspect ratio calculation of the subject area. The circumscribed rectangle 402 is set so as to be in contact with the upper and lower ends and the left and right ends of the subject area 401 with respect to the subject area 401 on the image 400. In this case, the aspect ratio R of the subject area 401 is calculated as the ratio of the height H to the width W of the extrinsic rectangle 402. That is, if the coordinates of the pixel SP at the upper left end of the circumscribing rectangle 402 are (x1, y1) and the coordinates of the pixel EP at the lower right end are (x2, y2), the aspect ratio R is expressed by the following equation. ..

縦横比算出部１３は、画像ごとに、その画像の被写体領域の縦横比を要注意判定部１４へ通知する。 The aspect ratio calculation unit 13 notifies the caution determination unit 14 of the aspect ratio of the subject area of the image for each image.

要注意判定部１４は、画像認識処理の対象となる複数の画像のうち、被写体領域が検出された画像について、被写体領域の前景と背景間の画素値の差の統計的代表値及び被写体領域の縦横比に基づいて、要注意画像か否か判定する。なお、要注意判定部１４は、複数の画像のそれぞれに対して同一の処理を行えばよいので、以下では、一つの画像に対する処理について説明する。 The caution determination unit 14 determines the statistical representative value of the difference between the pixel values between the foreground and the background of the subject area and the subject area for the image in which the subject area is detected among the plurality of images to be subjected to the image recognition processing. Based on the aspect ratio, it is determined whether or not the image requires attention. Since the caution determination unit 14 may perform the same processing for each of the plurality of images, the processing for one image will be described below.

例えば、画像上に表された被写体とその背景間の画素値の差が小さいほど、被写体を正確に検出することが困難となる。そのため、被写体領域の前景と背景間の画素値の差の統計的代表値（例えば、差の絶対値の平均値、中央値あるいは最頻値）が小さいほど、被写体の一部について検出に失敗していたり、逆に、被写体以外の物体を被写体として誤検出している可能性が高くなる。また、被写体領域の一部に、被写体以外の物が含まれる場合、その一部について、前景と背景間の画素値の差の統計的代表値が小さくなることがある。 For example, the smaller the difference in pixel values between the subject displayed on the image and the background thereof, the more difficult it becomes to accurately detect the subject. Therefore, the smaller the statistical representative value of the difference between the pixel values between the foreground and the background of the subject area (for example, the average value, the median value, or the mode value of the absolute values of the difference), the more the detection fails for a part of the subject. On the contrary, there is a high possibility that an object other than the subject is erroneously detected as the subject. Further, when a part of the subject area includes an object other than the subject, the statistical representative value of the difference in pixel values between the foreground and the background may be small for the part.

そこで、要注意判定部１４は、着目する画像の被写体領域について複数のサブ領域が設定されている場合、サブ領域ごとに、そのサブ領域について算出された前景と背景間の画素値の差の統計的代表値を画素値差閾値Th1と比較する。そして要注意判定部１４は、何れかのサブ領域について、その統計的代表値が画素値差閾値Th1未満となる場合、そのサブ領域が含まれる被写体領域は、正しく検出されていない可能性があると判定する。すなわち、被写体の一部のみが被写体領域に含まれていたり、あるいは、被写体とは異なる他の物体が被写体領域に含まれている可能性がある。そこで要注意判定部１４は、着目する画像を要注意画像と判定する。このように、要注意判定部１４は、サブ領域ごとに前景と背景間の画素値の差の統計的代表値を画素値差閾値Th1と比較することで、被写体領域の一部に他の物体が含まれている場合でも、その被写体領域を含む画像を要注意画像として特定できる。 Therefore, when a plurality of sub-regions are set for the subject region of the image of interest, the caution determination unit 14 statistics on the difference in pixel values between the foreground and the background calculated for each sub-region. The target representative value is compared with the pixel value difference threshold Th1. If the statistical representative value of any of the sub-regions is less than the pixel value difference threshold Th1, the subject region including the sub-region may not be detected correctly. Is determined. That is, there is a possibility that only a part of the subject is included in the subject area, or another object different from the subject is included in the subject area. Therefore, the attention-requiring determination unit 14 determines that the image of interest is a caution-required image. In this way, the caution determination unit 14 compares the statistical representative value of the difference in pixel values between the foreground and the background with the pixel value difference threshold Th1 for each sub-region, thereby forming another object in a part of the subject region. Even when is included, the image including the subject area can be specified as an image requiring attention.

なお、着目する画像の被写体領域全体について前景と背景間の画素値の差の統計的代表値が算出されている場合、要注意判定部１４は、その統計的代表値が画素値差閾値Th1未満となる場合、着目する画像を要注意画像と判定すればよい。 When the statistical representative value of the difference between the pixel values between the foreground and the background is calculated for the entire subject area of the image of interest, the caution determination unit 14 determines that the statistical representative value is less than the pixel value difference threshold Th1. In that case, the image of interest may be determined to be an image requiring attention.

なお、画素値差閾値Th1は、例えば、カメラ２の撮影範囲内に被写体が存在しない状態でカメラ２がその撮影範囲を互いに異なるタイミングで撮影することで得られた２枚の背景画像の対応画素間の差分値D(i,j)の標準偏差σの3倍とすることができる。 The pixel value difference threshold Th1 is, for example, the corresponding pixel of two background images obtained by the camera 2 shooting the shooting range at different timings when the subject does not exist in the shooting range of the camera 2. It can be three times the standard deviation σ of the difference value D (i, j) between them.

また、時間的に連続する複数の画像間で、被写体の形状はあまり変化しないと想定される。したがって、各画像において、被写体領域が被写体を正しく表している場合には、被写体領域の縦横比は、それら複数の画像間であまり変化しないと想定される。一方、何れかの画像において被写体領域が被写体を正確に表していなければ、その画像と前後の画像間で、被写体領域の縦横比が相対的に大きく変化することがある。 In addition, it is assumed that the shape of the subject does not change much between a plurality of images that are continuous in time. Therefore, in each image, when the subject area correctly represents the subject, it is assumed that the aspect ratio of the subject area does not change much between the plurality of images. On the other hand, if the subject area does not accurately represent the subject in any of the images, the aspect ratio of the subject area may change relatively significantly between the image and the images before and after the image.

そこで要注意判定部１４は、着目する画像の被写体領域の縦横比R_nと、着目する画像の直前の被写体領域の縦横比R_n-1間の差Rvを縦横比の時系列に沿った変化度合いとして算出し、その変化度合いRvを所定の形状閾値Th2と比較する。そして要注意判定部１４は、その変化度合いRvが所定の形状閾値Th2よりも大きければ、着目する画像を要注意画像と判定する。なお、変化度合いRvは、例えば、次式に従って算出される。

Therefore careful determination unit 14, and the aspect ratio R _n of the subject region of interest to the image, along a time series of the aspect ratio difference Rv between the aspect ratio R _n-1 in the subject region of the previous interest to the image change It is calculated as a degree, and the degree of change Rv is compared with a predetermined shape threshold Th2. Then, if the degree of change Rv is larger than the predetermined shape threshold value Th2, the attention-requiring determination unit 14 determines the image of interest as the caution-required image. The degree of change Rv is calculated according to, for example, the following equation.

また、形状閾値Th2は、想定される被写体の縦横比に応じて予め設定される。 Further, the shape threshold value Th2 is set in advance according to the expected aspect ratio of the subject.

なお、着目する画像の直前の画像において被写体領域が検出されていない場合、被写体領域の縦横比の時系列に沿った変化度合いが算出されないので、要注意判定部１４は、着目する画像を要注意画像と判定してもよい。また、要注意判定部１４は、被写体領域が検出されなかった画像を、要注意画像でないとしてもよい。 If the subject area is not detected in the image immediately before the image of interest, the degree of change in the aspect ratio of the subject area along the time series is not calculated, so the caution determination unit 14 needs attention to the image of interest. It may be determined as an image. Further, the attention-requiring determination unit 14 may not consider the image in which the subject area is not detected as the caution-required image.

要注意判定部１４は、画像ごとに、検出された被写体領域の外接矩形の位置及び範囲と、被写体領域内のサブ領域ごとの画素値の差の統計的代表値と、縦横比と、要注意画像か否かを表すフラグとを、メモリ５に記憶される検出情報テーブルに書き込む。 For each image, the caution determination unit 14 determines the position and range of the detected circumscribing rectangle of the subject area, the statistical representative value of the difference between the pixel values for each sub-area in the subject area, the aspect ratio, and the caution. A flag indicating whether or not it is an image is written in the detection information table stored in the memory 5.

図５は、検出情報テーブルの一例を示す図である。検出情報テーブル５００には、行ごとに、一つの画像についての検出情報が表される。すなわち、検出情報テーブル５００の左端の欄には、画像の生成順序を表すフレーム番号が示される。左から２番目の欄には、検出された被写体領域の外接矩形の左上端画素の座標SPn(x1,y1)及び右下端画素の座標EPn(x2,y2)が示される。左から３番目の欄には、被写体領域内のサブ領域ごとの前景と背景間の画素値の差の統計的代表値(md1, md2, ...)が示される。また、右から２番目の欄には、縦横比Rnが示される。そして右端の欄には、要注意画像か否かのフラグが示される。 FIG. 5 is a diagram showing an example of the detection information table. In the detection information table 500, the detection information for one image is displayed for each row. That is, in the leftmost column of the detection information table 500, a frame number indicating the image generation order is shown. In the second column from the left, the coordinates SPn (x1, y1) of the upper left pixel of the circumscribing rectangle of the detected subject area and the coordinates EPn (x2, y2) of the lower right pixel are shown. In the third column from the left, statistical representative values (md1, md2, ...) of the difference in pixel values between the foreground and the background for each sub-domain in the subject area are shown. The aspect ratio Rn is shown in the second column from the right. And in the rightmost column, a flag indicating whether or not the image needs attention is shown.

提示部１５は、画像認識処理の対象となる複数の画像のそれぞれを、例えば、生成時間順に、ユーザインターフェース４に順次表示させる。その際、提示部１５は、検出情報テーブルを参照して、要注意画像でない画像については、被写体領域の外接矩形をその画像に重畳して表示させる。一方、提示部１５は、検出情報テーブルを参照して、要注意画像である画像については、被写体領域が検出されていても、被写体領域の外接矩形を表示しない。提示部１５は、代わりに、被写体領域の近傍に、要注意画像であることを表すマーク、例えば、矢印を表示させてもよい。 The presentation unit 15 causes the user interface 4 to sequentially display each of the plurality of images to be subjected to the image recognition process, for example, in the order of generation time. At that time, the presentation unit 15 refers to the detection information table and displays an image that is not a caution image by superimposing the circumscribed rectangle of the subject area on the image. On the other hand, the presentation unit 15 refers to the detection information table and does not display the circumscribed rectangle of the subject area even if the subject area is detected for the image which is an image requiring attention. Instead, the presentation unit 15 may display a mark indicating that the image requires attention, for example, an arrow, in the vicinity of the subject area.

図６は、ユーザインターフェース４に表示される画像の一例を示す図である。この例では、時刻T0〜T3に得られた４枚の画像６００〜６０３のうち、時刻T3に得られた画像６０３のみが要注意画像であるとする。なお、画像６００〜６０３は、例えば、生成時間順に一枚ずつ表示される。 FIG. 6 is a diagram showing an example of an image displayed on the user interface 4. In this example, of the four images 600 to 603 obtained at times T0 to T3, only the image 603 obtained at time T3 is assumed to be a caution image. The images 600 to 603 are displayed one by one in the order of generation time, for example.

この場合、画像６００〜６０２については、それぞれ、検出された被写体領域の外接矩形６１０〜６１２が、画像６００〜６０２とともに表示される。これにより、作業者は、画像６００〜６０２に関しては、外接矩形６１０〜６１２を参照して、被写体が正しく検出されていることを確認すればよい。なお、提示部１５は、ユーザインターフェース４から、画像６００〜６０２のそれぞれごとに、作業者が確認したことを表す操作信号を受け付けてもよい。そして提示部１５は、その操作信号を受け付けると、外接矩形に被写体の識別情報を付してもよい。あるいは、作業者が、各画像に写っている被写体を確認して、各画像に被写体の識別情報を付してもよい。また、同一の被写体ごとに、その被写体の識別情報と、その被写体が写っている画像の数と、各画像における検出情報とを対応付けた被写体データが作成されてもよい。 In this case, for the images 600 to 602, the circumscribed rectangles 610 to 612 of the detected subject area are displayed together with the images 600 to 602, respectively. As a result, the operator may confirm that the subject is correctly detected with reference to the extrinsic rectangles 610 to 612 for the images 600 to 602. The presentation unit 15 may receive an operation signal indicating that the operator has confirmed each of the images 600 to 602 from the user interface 4. Then, when the presentation unit 15 receives the operation signal, the presentation unit 15 may attach the identification information of the subject to the circumscribed rectangle. Alternatively, the operator may confirm the subject appearing in each image and add the subject identification information to each image. Further, for each same subject, subject data may be created in which the identification information of the subject, the number of images in which the subject is captured, and the detection information in each image are associated with each other.

一方、要注意画像である画像６０３については、被写体領域の外接矩形は表示されない。その代り、被写体領域６２０の近傍に、要注意画像であることを表す矢印６１３が表示される。そして画像６０３について、作業者が、ユーザインターフェース４を操作して、被写体領域の外接矩形を設定し、設定された外接矩形の位置及び範囲を表す情報がＣＰＵ７に渡される。ＣＰＵ７が、ユーザにより設定された外接矩形の位置及び範囲を表す情報を受け取ると、提示部１５は、その情報で表される外接矩形を、被写体領域の本当の外接矩形とする。そして提示部１５は、検出情報テーブルに示された、要注意画像６０３の外接矩形の位置及び範囲を表す情報を、作業者により設定された外接矩形の位置及び範囲を表す情報で更新してもよい。さらに、作業者は、ユーザインターフェース４を操作して、設定した外接矩形に被写体の識別情報を付してもよい。 On the other hand, for the image 603 which is an image requiring attention, the circumscribed rectangle of the subject area is not displayed. Instead, an arrow 613 indicating that the image needs attention is displayed in the vicinity of the subject area 620. Then, with respect to the image 603, the operator operates the user interface 4 to set the circumscribed rectangle of the subject area, and information representing the position and range of the set circumscribed rectangle is passed to the CPU 7. When the CPU 7 receives information representing the position and range of the circumscribed rectangle set by the user, the presentation unit 15 sets the circumscribed rectangle represented by the information as the true circumscribed rectangle of the subject area. Then, the presentation unit 15 may update the information indicating the position and range of the circumscribed rectangle of the image requiring attention 603 shown in the detection information table with the information indicating the position and range of the circumscribed rectangle set by the operator. good. Further, the operator may operate the user interface 4 to add the subject identification information to the set circumscribing rectangle.

なお、変形例によれば、提示部１５は、要注意画像でない画像が連続する場合、その連続する画像を早送り表示させてもよい。あるいは、提示部１５は、その連続する画像のそれぞれの被写体領域及びその外接矩形を一度に表示させてもよい。例えば、提示部１５は、その連続する画像のそれぞれから被写体領域を切り出して、一つの画像に合成することで、連続する画像のそれぞれの被写体領域及び外接矩形を一つの画像に表してもよい。あるいは、提示部１５は、連続する画像のそれぞれを縮小してサムネイル画像を生成し、それらサムネイル画像をユーザインターフェース４に同時に表示させてもよい。その際、提示部１５は、各サムネイル画像上に、被写体領域を囲う外接矩形を表示すればよい。 According to the modified example, when the images that are not the images requiring attention are continuous, the presentation unit 15 may display the continuous images in fast forward display. Alternatively, the presentation unit 15 may display each subject area of the continuous image and its circumscribing rectangle at once. For example, the presentation unit 15 may cut out a subject area from each of the continuous images and combine them into one image to represent each subject area and the circumscribed rectangle of the continuous images in one image. Alternatively, the presentation unit 15 may reduce each of the continuous images to generate thumbnail images, and display the thumbnail images on the user interface 4 at the same time. At that time, the presentation unit 15 may display an extrinsic rectangle surrounding the subject area on each thumbnail image.

ただし、この変形例の場合も、提示部１５は、要注意画像については、個々の要注意画像ごとに、ユーザインターフェース４に表示することが好ましい。これにより、作業者は、要注意画像でない画像については、一度に複数の画像について検出された被写体領域が正しいか否か確認することができ、一方、要注意画像を見易い状態で、被写体領域の外接矩形を設定する操作を行うことができる。 However, also in this modified example, it is preferable that the presentation unit 15 displays the attention-requiring image on the user interface 4 for each individual attention-requiring image. As a result, the operator can confirm whether or not the subject area detected for a plurality of images at once is correct for an image that is not a caution image, and on the other hand, the subject area can be easily viewed in a state where the attention image is easy to see. You can perform operations to set the circumscribed rectangle.

図７（ａ）及び図７（ｂ）は、この変形例による、ユーザインターフェース４に表示される画像の一例を示す図である。この例では、時刻T0〜T2に得られた３枚の画像７００〜７０２は、何れも、要注意画像でないとする。 7 (a) and 7 (b) are diagrams showing an example of an image displayed on the user interface 4 according to this modified example. In this example, it is assumed that none of the three images 700 to 702 obtained at times T0 to T2 are images requiring attention.

図７（ａ）に示される例では、画像７００〜７０２は、被写体領域の外枠７１０〜７１２とともに早送り表示される。その際、作業者が、ユーザインターフェース４を介して早送り表示を停止する操作を行わない限り、提示部１５は、各画像の被写体領域が正しいことが確認されたとみなして、各画像について、一定期間（例えば、1〜5秒）表示後に次の画像を表示してもよい。 In the example shown in FIG. 7A, the images 700 to 702 are fast-forwarded together with the outer frames 710 to 712 of the subject area. At that time, unless the operator performs an operation to stop the fast-forward display via the user interface 4, the presentation unit 15 considers that the subject area of each image is correct, and considers that the subject area of each image is correct for a certain period of time. The next image may be displayed after the display (for example, 1 to 5 seconds).

また、図７（ｂ）に示される例では、画像７００〜７０２のそれぞれから検出された被写体領域７２０〜７２２が一つの画像７３０上に合成され、その画像７３０が表示される。そのため、作業者は、一枚の画像７３０を確認するだけで、３枚の画像７００〜７０２のそれぞれの被写体領域が正しいことを確認できる。 Further, in the example shown in FIG. 7B, the subject areas 720 to 722 detected from each of the images 700 to 702 are combined on one image 730, and the image 730 is displayed. Therefore, the operator can confirm that the subject areas of the three images 700 to 702 are correct only by checking one image 730.

図８は、ＣＰＵ７により実行される、画像認識処理の動作フローチャートである。ＣＰＵ７は、画像ごとに、下記の動作フローチャートに従って画像認識処理を実行すればよい。 FIG. 8 is an operation flowchart of the image recognition process executed by the CPU 7. The CPU 7 may execute the image recognition process for each image according to the following operation flowchart.

被写体検出部１１は、画像上で検出対象となる被写体が表されている被写体領域を検出する（ステップＳ１０１）。
画素値差算出部１２は、被写体領域を複数のサブ領域に分割する（ステップＳ１０２）。そして画素値差算出部１２は、サブ領域ごとに、前景と背景間の画素値の差の統計的代表値を算出する（ステップＳ１０３）。 The subject detection unit 11 detects a subject area in which the subject to be detected is represented on the image (step S101).
The pixel value difference calculation unit 12 divides the subject area into a plurality of sub-areas (step S102). Then, the pixel value difference calculation unit 12 calculates a statistical representative value of the difference in pixel values between the foreground and the background for each sub-region (step S103).

また、縦横比算出部１３は、被写体領域の縦横比を算出する（ステップＳ１０４）。 Further, the aspect ratio calculation unit 13 calculates the aspect ratio of the subject area (step S104).

要注意判定部１４は、被写体領域のサブ領域ごとの前景と背景間の画素値の差の統計的代表値、または、縦横比の時系列に沿った変化度合いが、画像が要注意画像となる条件を満たすか否か判定する（ステップＳ１０５）。画素値の差の統計的代表値及び縦横比の時系列に沿った変化度合いの何れも要注意画像となる条件を満たさない場合（ステップＳ１０５−Ｎｏ）、提示部１５は、ユーザインターフェース４に画像とともに被写体領域の外接矩形を表示させる（ステップＳ１０６）。一方、画素値の差の統計的代表値または縦横比の時系列に沿った変化度合いの何れかが要注意画像となる条件を満たす場合（ステップＳ１０５−Ｙｅｓ）、着目する画像は要注意画像である。そこで提示部１５は、ユーザインターフェース４に、画像とともに要注意画像であることを表すマークを表示させる（ステップＳ１０７）。そして提示部１５は、ユーザインターフェース４を介した被写体領域の外接矩形の設定情報及び被写体の識別情報を受け付け、その設定情報にしたがって、被写体領域の外接矩形を設定するとともに、その外接矩形に被写体の識別情報を付す（ステップＳ１０８）。
ステップＳ１０６またはＳ１０８の後、ＣＰＵ７は、画像認識処理を終了する。なお、ＣＰＵ７は、ステップＳ１０２及びＳ１０３の処理と、ステップＳ１０４の処理の順序を入れ替えてもよく、あるいは、ステップＳ１０２及びＳ１０３の処理と、ステップＳ１０４の処理を並列に実行してもよい。 In the caution determination unit 14, the image becomes a caution image based on the statistical representative value of the difference in pixel values between the foreground and the background for each sub-region of the subject area or the degree of change in the aspect ratio along the time series. It is determined whether or not the condition is satisfied (step S105). When neither the statistical representative value of the difference in pixel values nor the degree of change in the aspect ratio along the time series satisfies the condition of the image requiring attention (step S105-No), the presentation unit 15 displays the image on the user interface 4. At the same time, the circumscribed rectangle of the subject area is displayed (step S106). On the other hand, when either the statistical representative value of the difference in pixel values or the degree of change of the aspect ratio along the time series satisfies the condition of the image requiring attention (step S105-Yes), the image of interest is the image requiring attention. be. Therefore, the presentation unit 15 causes the user interface 4 to display a mark indicating that the image needs attention together with the image (step S107). Then, the presentation unit 15 receives the setting information of the circumscribed rectangle of the subject area and the identification information of the subject via the user interface 4, sets the circumscribed rectangle of the subject area according to the setting information, and sets the circumscribed rectangle of the subject in the circumscribed rectangle of the subject. Identification information is attached (step S108).
After step S106 or S108, the CPU 7 ends the image recognition process. The CPU 7 may change the order of the processes of steps S102 and S103 and the processes of step S104, or may execute the processes of steps S102 and S103 and the processes of step S104 in parallel.

以上に説明してきたように、この画像認識装置は、着目する画像の被写体領域の前景と背景間の画素値の差の統計的代表値、及び、被写体領域の縦横比の時系列に沿った変化度合いを算出する。そしてこの画像認識装置は、その統計的代表値または縦横比の時系列に沿った変化度合いに応じて、着目する画像が、被写体領域が誤検出されている可能性がある要注意画像か否か判定する。そのため、この画像認識装置は、識別器学習用の教師画像のセットを生成する際に、要注意画像を自動的に特定できる。また、この画像認識装置は、要注意画像を作業者に確認させる際、被写体領域の外接矩形を表示しないことで、その外接矩形を修正する手間を削減することができる。 As described above, this image recognition device has a statistical representative value of the difference in pixel values between the foreground and the background of the subject area of the image of interest, and changes in the aspect ratio of the subject area along the time series. Calculate the degree. Then, in this image recognition device, whether or not the image of interest is an image requiring attention in which the subject area may be erroneously detected according to the statistical representative value or the degree of change of the aspect ratio along the time series. judge. Therefore, this image recognition device can automatically identify the image requiring attention when generating a set of teacher images for discriminator learning. Further, this image recognition device can reduce the trouble of correcting the circumscribed rectangle by not displaying the circumscribed rectangle of the subject area when the operator confirms the image requiring attention.

なお、変形例によれば、要注意判定部１４は、被写体領域の前景と背景間の画素値の差の統計的代表値、及び、被写体領域の縦横比の時系列に沿った変化度合いのうちの何れか一方のみに基づいて、着目する画像が要注意画像か否か判定してもよい。この場合、画素値差算出部１２及び縦横比算出部１３のうち、算出される値が要注意判定部１４で使用されない方については省略されてもよい。 According to the modified example, the caution determination unit 14 has a statistical representative value of the difference in pixel values between the foreground and the background of the subject area, and the degree of change of the aspect ratio of the subject area along the time series. It may be determined whether or not the image of interest is an image requiring attention based on only one of the above. In this case, of the pixel value difference calculation unit 12 and the aspect ratio calculation unit 13, the one in which the calculated value is not used in the caution determination unit 14 may be omitted.

また他の変形例によれば、縦横比算出部１３は、被写体領域の外接矩形を設定する代わりに、被写体領域の外接楕円を設定してもよい。そして縦横比算出部１３は、その外接楕円の短軸方向の直径に対する長軸方向の直径の比を、縦横比として算出してもよい。 According to another modification, the aspect ratio calculation unit 13 may set the circumscribed ellipse of the subject area instead of setting the circumscribed rectangle of the subject area. Then, the aspect ratio calculation unit 13 may calculate the ratio of the diameter in the major axis direction to the diameter in the minor axis direction of the circumscribed ellipse as the aspect ratio.

さらに他の変形例によれば、一つの画像から複数の被写体領域が検出された場合、画素値差算出部１２、縦横比算出部１３及び要注意判定部１４は、被写体領域ごとに、上記の実施形態または変形例による処理を行ってもよい。そして要注意判定部１４は、複数の被写体領域のうちの何れか一つについて、画像が要注意画像に該当する判定条件が満たされる場合、その画像を要注意画像としてもよい。 According to still another modification, when a plurality of subject areas are detected from one image, the pixel value difference calculation unit 12, the aspect ratio calculation unit 13, and the caution determination unit 14 are described above for each subject area. The processing according to the embodiment or the modification may be performed. Then, when the determination condition that the image corresponds to the caution image is satisfied for any one of the plurality of subject areas, the caution determination unit 14 may use the image as the caution image.

さらに他の変形例によれば、提示部１５は省略されてもよい。この場合には、例えば、作業者は、一連の画像について画像認識処理が終了した後に、検出情報テーブルを参照して、ユーザインターフェース４を操作することで、特定された要注意画像をユーザインターフェース４に表示させてもよい。そして作業者は、表示された要注意画像を参照して、ユーザインターフェース４を操作することで、その要注意画像に被写体領域の外接矩形及び被写体の識別情報を付してもよい。 According to still another modification, the presentation unit 15 may be omitted. In this case, for example, after the image recognition process for a series of images is completed, the operator refers to the detection information table and operates the user interface 4 to obtain the specified image requiring attention in the user interface 4. It may be displayed in. Then, the operator may refer to the displayed image requiring attention and operate the user interface 4 to add the circumscribed rectangle of the subject area and the identification information of the subject to the image requiring attention.

さらに、上記の実施形態または変形例による画像認識装置のＣＰＵが有する各部の機能をコンピュータに実現させるコンピュータプログラムは、コンピュータによって読取り可能な記録媒体に記憶された形で提供されてもよい。なお、コンピュータによって読取り可能な記録媒体は、例えば、磁気記録媒体、光記録媒体、又は半導体メモリとすることができる。 Further, a computer program for realizing the functions of each part of the CPU of the image recognition device according to the above embodiment or the modification in the computer may be provided in a form stored in a recording medium readable by the computer. The recording medium that can be read by a computer can be, for example, a magnetic recording medium, an optical recording medium, or a semiconductor memory.

ここに挙げられた全ての例及び特定の用語は、読者が、本発明及び当該技術の促進に対する本発明者により寄与された概念を理解することを助ける、教示的な目的において意図されたものであり、本発明の優位性及び劣等性を示すことに関する、本明細書の如何なる例の構成、そのような特定の挙げられた例及び条件に限定しないように解釈されるべきものである。本発明の実施形態は詳細に説明されているが、本発明の精神及び範囲から外れることなく、様々な変更、置換及び修正をこれに加えることが可能であることを理解されたい。 All examples and specific terms given herein are intended for teaching purposes to help the reader understand the invention and the concepts contributed by the inventor to the promotion of the art. There is, and should be construed not to be limited to the constitution of any example herein, such specific cited examples and conditions relating to exhibiting superiority and inferiority of the present invention. Although embodiments of the present invention have been described in detail, it should be understood that various modifications, substitutions and modifications can be made thereto without departing from the spirit and scope of the invention.

１画像認識装置
２カメラ
３通信インターフェース
４ユーザインターフェース
５メモリ
６記憶媒体アクセス装置
７ＣＰＵ
８記憶媒体
１１被写体検出部
１２画素値差算出部
１３縦横比算出部
１４要注意判定部
１５提示部 1 Image recognition device 2 Camera 3 Communication interface 4 User interface 5 Memory 6 Storage medium access device 7 CPU
8 Storage medium 11 Subject detection unit 12 Pixel value difference calculation unit 13 Aspect ratio calculation unit 14 Attention judgment unit 15 Presentation unit

Claims

The area where the subject is represented is detected from each of the plurality of time-series images obtained by the imaging unit.
Of the plurality of images, the region has an absolute value depending on the degree of change in the aspect ratio of the region along the time series or the absolute value of the difference between the foreground value and the background value of each pixel in the region. identify suspicious images that may have been erroneously detected,
An image other than the caution image among the plurality of images is displayed on the display device after superimposing the outer frame of the area in which the subject detected from the image is represented, while the plurality of images The image requiring attention is displayed on the display device without superimposing the outer frame of the area in which the subject is represented, which is detected from the image.
A computer program for image recognition that lets a computer do things.

For each of the images in which the region in which the subject is represented among the plurality of images is detected, the region is divided into a plurality of sub-regions, and for each of the plurality of sub-regions, each pixel in the sub-region is obtained. Further let the computer perform the calculation of the statistical representative value of the absolute value of the difference between the foreground value and the background value.
To specify the image requiring attention, the statistical representative value for any sub-region of the plurality of sub-regions among the images in which the region in which the subject is represented is detected is less than the first threshold value. The computer program for image recognition according to claim 1, wherein the image is specified as the image requiring attention.

For each of the images in which the region in which the subject is represented is detected among the plurality of images, the computer is further made to calculate the aspect ratio of the region.
To specify the image requiring attention, the absolute value of the difference between the aspect ratio of the image and the aspect ratio of the immediately preceding image in the time series among the images in which the area in which the subject is represented is detected is the first. The computer program for image recognition according to claim 1 or 2, which comprises specifying an image larger than the threshold value of 2 as the image requiring attention.

A subject detection unit that detects a region in which a subject is represented from each of a plurality of time-series images obtained by the imaging unit, and a subject detection unit.
Of the plurality of images, the region has an absolute value depending on the degree of change in the aspect ratio of the region along the time series or the absolute value of the difference between the foreground value and the background value of each pixel in the region. A caution determination unit that identifies images requiring attention that may have been erroneously detected,
An image other than the caution image among the plurality of images is displayed on the display device after superimposing the outer frame of the area in which the subject detected from the image is represented, while the plurality of images A presentation unit that displays the image requiring attention on the display device without superimposing the outer frame of the area in which the subject is represented detected from the image.
An image recognition device having.

The area where the subject is represented is detected from each of the plurality of time-series images obtained by the imaging unit.
Of the plurality of images, the region has an absolute value depending on the degree of change in the aspect ratio of the region along the time series or the absolute value of the difference between the foreground value and the background value of each pixel in the region. identify suspicious images that may have been erroneously detected,
An image other than the caution image among the plurality of images is displayed on the display device after superimposing the outer frame of the area in which the subject detected from the image is represented, while the plurality of images The image requiring attention is displayed on the display device without superimposing the outer frame of the area in which the subject is represented, which is detected from the image.
Image recognition method including that.