JP2017158033A

JP2017158033A - Periphery monitoring system for work machine

Info

Publication number: JP2017158033A
Application number: JP2016039424A
Authority: JP
Inventors: 晋相澤; Susumu Aizawa
Original assignee: Sumitomo Heavy Industries Ltd
Current assignee: Sumitomo Heavy Industries Ltd
Priority date: 2016-03-01
Filing date: 2016-03-01
Publication date: 2017-09-07

Abstract

PROBLEM TO BE SOLVED: To provide a periphery monitoring system capable of detecting a person present in the periphery of a shovel.SOLUTION: A periphery monitoring system 100 is provided that detects a person present in the periphery of a shovel using a captured image of an imaging apparatus 40 fitted to the shovel. The system includes: an extraction part 31 that extracts a plurality of predetermined image portions from a captured image as a discerning processing target image; and a discerning part 32 that discerns whether an image included in the discerning processing target image is a person image by an image recognition processing. When the number of discerning processing target images extracted by the extraction part 31 is larger than a predetermined criterion, it is determined that the captured image is not proper for a person detection.SELECTED DRAWING: Figure 2

Description

本発明は、作業機械の周辺を監視する作業機械用周辺監視システムに関する。 The present invention relates to a work machine periphery monitoring system for monitoring the periphery of a work machine.

イメージ・センサと熱を感知するサーモパイル・アレイを持ち、撮像範囲と熱検出範囲を重複させ、サーモパイル・アレイの出力が示す人体らしき範囲のみを顔抽出範囲と限定して画像識別処理の際の不要な演算処理量を減らす人体検出装置が知られている（特許文献１参照。）。 An image sensor and a thermopile array that senses heat, overlaps the imaging range and heat detection range, and only the human-like range indicated by the thermopile array output is limited to the face extraction range, which is unnecessary for image identification processing A human body detection device that reduces the amount of computation processing is known (see Patent Document 1).

特開２００６−０５９０１５号公報JP 2006-059015 A

一方で、作業機械の周辺の人を信頼性高く検知できる作業機械用周辺監視システムの提供が望まれる。 On the other hand, it is desired to provide a work machine periphery monitoring system that can reliably detect people around the work machine.

本発明の実施例に係る作業機械用周辺監視システムは、作業機械に取り付けられる撮像装置の撮像画像を用いて前記作業機械の周辺に存在する人を検知する作業機械用周辺監視システムであって、前記撮像画像から複数の所定画像部分を識別処理対象画像として抽出する抽出部と、前記識別処理対象画像に含まれる画像が人の画像であるかを画像認識処理によって識別する識別部と、を有し、前記抽出部が抽出した前記識別処理対象画像の数が所定の基準より多いときに前記撮像画像が人検知に不適な状態であると判定する。 A work machine periphery monitoring system according to an embodiment of the present invention is a work machine periphery monitoring system that detects a person existing around the work machine using a captured image of an imaging device attached to the work machine, An extraction unit that extracts a plurality of predetermined image portions from the captured image as identification processing target images; and an identification unit that identifies whether an image included in the identification processing target image is a human image by image recognition processing. When the number of identification processing target images extracted by the extraction unit is larger than a predetermined reference, it is determined that the captured image is in an unsuitable state for human detection.

上述の手段により、作業機械の周辺の人を信頼性高く検知できる作業機械用周辺監視システムが提供される。 By the above-mentioned means, a work machine periphery monitoring system capable of reliably detecting a person around the work machine is provided.

本発明の実施例に係る周辺監視システムが搭載されるショベルの側面図である。It is a side view of the shovel in which the periphery monitoring system which concerns on the Example of this invention is mounted. 周辺監視システムの構成例を示す機能ブロック図である。It is a functional block diagram which shows the structural example of a periphery monitoring system. 後方カメラの撮像画像の例である。It is an example of the captured image of a rear camera. 撮像画像から識別処理対象画像を切り出す際に用いられる幾何学的関係の一例を示す概略図である。It is the schematic which shows an example of the geometric relationship used when cut out the identification process target image from a captured image. ショベル後方の実空間の上面図である。It is a top view of real space behind an excavator. 撮像画像から正規化画像を生成する処理の流れを示す図である。It is a figure which shows the flow of the process which produces | generates the normalization image from a captured image. 撮像画像と識別処理対象画像領域と正規化画像との関係を示す図である。It is a figure which shows the relationship between a captured image, an identification process target image area, and a normalized image. 識別処理対象画像領域と識別処理不適領域との関係を示す図である。It is a figure which shows the relationship between an identification process target image area | region and an identification process inappropriate area. 正規化画像の例を示す図である。It is a figure which shows the example of a normalized image. 実空間における仮想平面領域と後方カメラとの間の後方水平距離と、正規化画像における頭部画像部分の大きさとの関係を説明する図である。It is a figure explaining the relationship between the back horizontal distance between the virtual plane area | region in real space and a back camera, and the magnitude | size of the head image part in a normalized image. 撮像画像から識別処理対象画像を切り出す際に用いられる幾何学的関係の別の一例を示す概略図である。It is the schematic which shows another example of the geometric relationship used when cut out the identification process target image from a captured image. 撮像画像における特徴画像の一例を示す図である。It is a figure which shows an example of the characteristic image in a captured image. 画像抽出処理の一例の流れを示すフローチャートである。It is a flowchart which shows the flow of an example of an image extraction process. 画像抽出処理の別の一例の流れを示すフローチャートである。It is a flowchart which shows the flow of another example of an image extraction process. 画像抽出処理の更に別の一例の流れを示すフローチャートである。It is a flowchart which shows the flow of another example of an image extraction process. 画像抽出処理の更に別の一例の流れを示すフローチャートである。It is a flowchart which shows the flow of another example of an image extraction process. 人検知適否判定処理の一例の流れを示すフローチャートである。It is a flowchart which shows the flow of an example of a person detection suitability determination process. ヘルメット度の度数分布図である。It is a frequency distribution map of a helmet degree.

図１は、本発明の実施例に係る周辺監視システム１００が搭載される作業機械（建設機械）としてのショベル（掘削機）の側面図である。ショベルの下部走行体１には、旋回機構２を介して上部旋回体３が搭載される。上部旋回体３には、ブーム４が取り付けられる。ブーム４の先端にはアーム５が取り付けられ、アーム５の先端にはバケット６が取り付けられる。ブーム４、アーム５、及びバケット６は掘削アタッチメントを構成し、ブームシリンダ７、アームシリンダ８、及びバケットシリンダ９によりそれぞれ油圧駆動される。また、上部旋回体３には、キャビン１０が設けられ、且つエンジン等の動力源が搭載される。また、上部旋回体３の上部には撮像装置４０が取り付けられる。具体的には、上部旋回体３の後端上部、左端上部、右端上部に後方カメラ４０Ｂ、左側方カメラ４０Ｌ、右側方カメラ４０Ｒが取り付けられる。また、キャビン１０内にはコントローラ３０及び出力装置５０が設置される。 FIG. 1 is a side view of an excavator (excavator) as a work machine (construction machine) on which a periphery monitoring system 100 according to an embodiment of the present invention is mounted. An upper swing body 3 is mounted on the lower traveling body 1 of the excavator via a swing mechanism 2. A boom 4 is attached to the upper swing body 3. An arm 5 is attached to the tip of the boom 4, and a bucket 6 is attached to the tip of the arm 5. The boom 4, the arm 5, and the bucket 6 constitute an excavation attachment, and are hydraulically driven by the boom cylinder 7, the arm cylinder 8, and the bucket cylinder 9, respectively. Further, the upper swing body 3 is provided with a cabin 10 and is mounted with a power source such as an engine. An imaging device 40 is attached to the upper part of the upper swing body 3. Specifically, the rear camera 40B, the left side camera 40L, and the right side camera 40R are attached to the rear end upper part, the left end upper part, and the right end upper part of the upper swing body 3. A controller 30 and an output device 50 are installed in the cabin 10.

図２は、周辺監視システム１００の構成例を示す機能ブロック図である。周辺監視システム１００は、主に、コントローラ３０、撮像装置４０、入力装置４２、出力装置５０、及び機械制御装置５１を含む。本実施例では、撮像装置４０、入力装置４２、出力装置５０、及び機械制御装置５１は、ＣＡＮを介してコントローラ３０に接続されている。 FIG. 2 is a functional block diagram illustrating a configuration example of the periphery monitoring system 100. The periphery monitoring system 100 mainly includes a controller 30, an imaging device 40, an input device 42, an output device 50, and a machine control device 51. In the present embodiment, the imaging device 40, the input device 42, the output device 50, and the machine control device 51 are connected to the controller 30 via the CAN.

コントローラ３０は、ショベルの駆動制御を行う制御装置である。本実施例では、コントローラ３０は、ＣＰＵ及び内部メモリを含む演算処理装置で構成され、内部メモリに格納された駆動制御用のプログラムをＣＰＵに実行させて各種機能を実現する。 The controller 30 is a control device that performs drive control of the shovel. In the present embodiment, the controller 30 is configured by an arithmetic processing unit including a CPU and an internal memory, and realizes various functions by causing the CPU to execute a drive control program stored in the internal memory.

また、コントローラ３０は、各種装置の出力に基づいてショベルの周辺に人が存在するかを判定し、その判定結果に応じて各種装置を制御する。具体的には、コントローラ３０は、撮像装置４０及び入力装置４２の出力を受け、抽出部３１、識別部３２、追跡部３３、及び制御部３５のそれぞれに対応するソフトウェアプログラムを実行する。そして、その実行結果に応じて機械制御装置５１に制御指令を出力してショベルの駆動制御を実行し、或いは、出力装置５０から各種情報を出力させる。なお、コントローラ３０は、画像処理専用の制御装置であってもよい。 Further, the controller 30 determines whether there is a person around the shovel based on the outputs of the various devices, and controls the various devices according to the determination result. Specifically, the controller 30 receives the outputs of the imaging device 40 and the input device 42 and executes software programs corresponding to the extraction unit 31, the identification unit 32, the tracking unit 33, and the control unit 35, respectively. Then, according to the execution result, a control command is output to the machine control device 51 to execute drive control of the shovel, or various information is output from the output device 50. The controller 30 may be a control device dedicated to image processing.

撮像装置４０は、ショベルの周囲の画像を撮像する装置であり、撮像した画像をコントローラ３０に対して出力する。本実施例では、撮像装置４０は、ＣＣＤ等の撮像素子を採用するワイドカメラであり、上部旋回体３の上部において光軸が斜め下方を向くように取り付けられる。 The imaging device 40 is a device that captures an image around the excavator, and outputs the captured image to the controller 30. In the present embodiment, the imaging device 40 is a wide camera that employs an imaging element such as a CCD, and is mounted on the upper part of the upper swing body 3 so that the optical axis is directed obliquely downward.

入力装置４２は操作者の入力を受ける装置である。本実施例では、入力装置４２は、操作装置（操作レバー、操作ペダル等）、ゲートロックレバー、操作装置の先端に設置されたボタン、車載ディスプレイに付属のボタン、タッチパネル等を含む。 The input device 42 is a device that receives input from the operator. In the present embodiment, the input device 42 includes an operation device (operation lever, operation pedal, etc.), a gate lock lever, a button installed at the tip of the operation device, a button attached to the in-vehicle display, a touch panel, and the like.

出力装置５０は、各種情報を出力する装置であり、例えば、各種画像情報を表示する車載ディスプレイ、各種音声情報を音声出力する車載スピーカ、警報ブザー、警報ランプ等を含む。本実施例では、出力装置５０は、コントローラ３０からの制御指令に応じて各種情報を出力する。 The output device 50 is a device that outputs various types of information, and includes, for example, an in-vehicle display that displays various types of image information, an in-vehicle speaker that outputs various types of audio information as audio, an alarm buzzer, an alarm lamp, and the like. In the present embodiment, the output device 50 outputs various types of information in response to control commands from the controller 30.

機械制御装置５１は、ショベルの動きを制御する装置であり、例えば、油圧システムにおける作動油の流れを制御する制御弁、ゲートロック弁、エンジン制御装置等を含む。 The machine control device 51 is a device that controls the movement of the excavator, and includes, for example, a control valve that controls the flow of hydraulic oil in the hydraulic system, a gate lock valve, an engine control device, and the like.

抽出部３１は、撮像装置４０が撮像した撮像画像から識別処理対象画像を抽出する機能要素である。具体的には、抽出部３１は、局所的な輝度勾配又はエッジに基づく簡易な特徴、Hough変換等による幾何学的特徴、輝度に基づいて分割された領域の面積又はアスペクト比に関する特徴等を抽出する比較的演算量の少ない画像処理（以下、「前段画像認識処理」とする。）によって識別処理対象画像を抽出する。識別処理対象画像は、後続の画像処理の対象となる画像部分（撮像画像の一部）であり、人候補画像を含む。人候補画像は、人画像である可能性が高いとされる画像部分（撮像画像の一部）である。撮像画像はカラー画像であってもよく、グレースケール画像であってもよい。抽出部３１は、カラー画像をグレースケール化する複数種類の機能を備えていてもよい。それぞれの人候補画像は、人らしさの度合い、又はその度合いを示すレベルについて、大小の差異があると考えてもよい。その度合い、又はその度合いを示すレベルは評価値として捉えることもできる。また、抽出部３１は、それぞれで複数の絞り込みを行う複数段で構成されてもよい。例えば直列接続された前段の第１抽出部、後段の第２抽出部として構成されてもよい。 The extraction unit 31 is a functional element that extracts an identification processing target image from a captured image captured by the imaging device 40. Specifically, the extraction unit 31 extracts simple features based on local luminance gradients or edges, geometric features based on Hough transform, features related to the area or aspect ratio of a region divided based on luminance, and the like. The image to be identified is extracted by image processing with a relatively small amount of computation (hereinafter referred to as “previous image recognition processing”). The identification processing target image is an image portion (a part of the captured image) that is a target of subsequent image processing, and includes a human candidate image. The human candidate image is an image portion (part of the captured image) that is considered to be a human image. The captured image may be a color image or a gray scale image. The extraction unit 31 may have a plurality of types of functions for converting a color image into a gray scale. Each person candidate image may be considered to have a large or small difference in the degree of humanity or the level indicating the degree. The degree or a level indicating the degree can also be regarded as an evaluation value. Moreover, the extraction part 31 may be comprised by the multistage which each performs several narrowing down. For example, it may be configured as a first extraction unit at the front stage and a second extraction unit at the rear stage connected in series.

識別部３２は、抽出部３１が抽出した識別処理対象画像に含まれる人候補画像が人画像であるかを識別する機能要素である。具体的には、識別部３２は、ＨＯＧ（Histograms of Oriented Gradients）特徴量に代表される画像特徴量記述と機械学習により生成した識別器とを用いた画像認識処理等の比較的演算量の多い画像処理（以下、「後段画像認識処理」とする。）によって人候補画像が人画像であるかを識別する。識別部３２が人候補画像を人画像として識別する割合は、抽出部３１による識別処理対象画像の抽出が高精度であるほど高くなる。なお、識別部３２は、夜間、悪天候時等の撮像に適さない環境下で所望の品質の撮像画像を得られない場合等においては、人候補画像の全てが人画像であると識別し、抽出部３１が抽出した識別処理対象画像における人候補画像の全てを人であると識別してもよい。人の検知漏れを防止するためである。 The identification unit 32 is a functional element that identifies whether the human candidate image included in the identification processing target image extracted by the extraction unit 31 is a human image. Specifically, the identification unit 32 has a relatively large amount of calculation such as an image recognition process using an image feature description represented by HOG (Histograms of Oriented Gradients) feature and a classifier generated by machine learning. Image processing (hereinafter referred to as “rear-stage image recognition processing”) identifies whether the human candidate image is a human image. The rate at which the identification unit 32 identifies the human candidate image as a human image increases as the extraction processing target image is extracted by the extraction unit 31 with higher accuracy. Note that the identification unit 32 identifies and extracts all human candidate images as human images when, for example, a captured image having a desired quality cannot be obtained in an environment unsuitable for imaging at night or in bad weather. All of the human candidate images in the identification processing target image extracted by the unit 31 may be identified as people. This is to prevent human detection omissions.

次に、図３を参照し、後方カメラ４０Ｂが撮像したショベル後方の撮像画像における人画像の見え方について説明する。なお、図３の２つの撮像画像は、後方カメラ４０Ｂの撮像画像の例である。また、図３の点線円は人画像の存在を表し、実際の撮像画像には表示されない。 Next, with reference to FIG. 3, how the human image appears in the captured image behind the excavator captured by the rear camera 40B will be described. Note that the two captured images in FIG. 3 are examples of captured images of the rear camera 40B. 3 represents the presence of a human image, and is not displayed in an actual captured image.

後方カメラ４０Ｂは、ワイドカメラであり、且つ、人を斜め上から見下ろす高さに取り付けられる。そのため、撮像画像における人画像の見え方は、後方カメラ４０Ｂから見た人の存在方向によって大きく異なる。例えば、撮像画像中の人画像は、撮像画像の左右の端部に近いほど傾いて表示される。これは、ワイドカメラの広角レンズに起因する像倒れによる。また、後方カメラ４０Ｂに近いほど頭部が大きく表示される。また、脚部がショベルの車体の死角に入って見えなくなってしまう。これらは、後方カメラ４０Ｂの設置位置に起因する。そのため、撮像画像に何らの加工を施すことなく画像処理によってその撮像画像に含まれる人画像を識別するのは困難である。 The rear camera 40B is a wide camera, and is attached at a height at which a person is looked down obliquely from above. For this reason, how the human image is seen in the captured image varies greatly depending on the direction in which the person is seen from the rear camera 40B. For example, the human image in the captured image is displayed so as to be inclined closer to the left and right ends of the captured image. This is due to image collapse caused by the wide-angle lens of the wide camera. Further, the closer to the rear camera 40B, the larger the head is displayed. In addition, the leg part enters the blind spot of the excavator's car body and disappears. These are caused by the installation position of the rear camera 40B. Therefore, it is difficult to identify a human image included in the captured image by image processing without performing any processing on the captured image.

そこで、本発明の実施例に係る周辺監視システム１００は、識別処理対象画像を正規化することで、識別処理対象画像に含まれる人画像の識別を促進する。なお、「正規化」は、識別処理対象画像を所定サイズ及び所定形状の画像に変換することを意味する。本実施例では、撮像画像において様々な形状を取り得る識別処理対象画像は射影変換によって所定サイズの長方形画像に変換される。なお、射影変換としては例えば８変数の射影変換行列が用いられる。 Therefore, the periphery monitoring system 100 according to the embodiment of the present invention promotes the identification of the human image included in the identification processing target image by normalizing the identification processing target image. Note that “normalization” means that the identification processing target image is converted into an image having a predetermined size and a predetermined shape. In this embodiment, an identification processing target image that can take various shapes in a captured image is converted into a rectangular image of a predetermined size by projective conversion. As the projective transformation, for example, an 8-variable projective transformation matrix is used.

ここで、図４〜図６を参照し、周辺監視システム１００が識別処理対象画像を正規化する処理（以下、「正規化処理」とする。）の一例について説明する。なお、図４は、抽出部３１が撮像画像から識別処理対象画像を切り出す際に用いる幾何学的関係の一例を示す概略図である。 Here, an example of processing (hereinafter, referred to as “normalization processing”) in which the periphery monitoring system 100 normalizes the identification processing target image will be described with reference to FIGS. 4 to 6. FIG. 4 is a schematic diagram illustrating an example of a geometric relationship used when the extraction unit 31 cuts out an identification processing target image from a captured image.

図４のボックスＢＸは、実空間における仮想立体物であり、本実施例では、８つの頂点Ａ〜Ｈで定められる仮想直方体である。また、点Ｐｒは、識別処理対象画像を参照するために予め設定される参照点である。本実施例では、参照点Ｐｒは、人の想定立ち位置として予め設定される点であり、４つの頂点Ａ〜Ｄで定められる四角形ＡＢＣＤの中心に位置する。また、ボックスＢＸのサイズは、人の向き、歩幅、身長等に基づいて設定される。本実施例では、四角形ＡＢＣＤ及び四角形ＥＦＧＨは正方形であり、一辺の長さは例えば８００ｍｍである。また、直方体の高さは例えば１８００ｍｍである。すなわち、ボックスＢＸは、幅８００ｍｍ×奥行８００ｍｍ×高さ１８００ｍｍの直方体である。 A box BX in FIG. 4 is a virtual three-dimensional object in real space, and is a virtual rectangular parallelepiped defined by eight vertices A to H in this embodiment. The point Pr is a reference point that is set in advance to refer to the identification processing target image. In the present embodiment, the reference point Pr is a point set in advance as an assumed standing position of a person, and is located at the center of a quadrilateral ABCD defined by four vertices A to D. The size of the box BX is set based on the direction of the person, the stride, the height, and the like. In this embodiment, the rectangle ABCD and the rectangle EFGH are squares, and the length of one side is, for example, 800 mm. Further, the height of the rectangular parallelepiped is, for example, 1800 mm. That is, the box BX is a rectangular parallelepiped having a width of 800 mm, a depth of 800 mm, and a height of 1800 mm.

４つの頂点Ａ、Ｂ、Ｇ、Ｈで定められる四角形ＡＢＧＨは、撮像画像における識別処理対象画像の領域に対応する仮想平面領域ＴＲを形成する。また、仮想平面領域ＴＲとしての四角形ＡＢＧＨは、水平面である仮想地面に対して傾斜する。 A quadrangle ABGH defined by the four vertices A, B, G, and H forms a virtual plane region TR corresponding to the region of the identification target image in the captured image. Further, the quadrangle ABGH as the virtual plane region TR is inclined with respect to the virtual ground that is a horizontal plane.

なお、本実施例では、参照点Ｐｒと仮想平面領域ＴＲとの関係を定めるために仮想直方体としてのボックスＢＸが採用される。しかしながら、撮像装置４０の方向を向き且つ仮想地面に対して傾斜する仮想平面領域ＴＲを任意の参照点Ｐｒに関連付けて定めることができるのであれば、他の仮想立体物を用いた関係等の他の幾何学的関係が採用されてもよく、関数、変換テーブル等の他の数学的関係が採用されてもよい。 In the present embodiment, a box BX as a virtual rectangular parallelepiped is employed to define the relationship between the reference point Pr and the virtual plane region TR. However, as long as the virtual plane region TR that faces the direction of the imaging device 40 and is inclined with respect to the virtual ground can be determined in association with an arbitrary reference point Pr, the relationship using other virtual three-dimensional objects, etc. May be adopted, and other mathematical relations such as a function and a conversion table may be adopted.

図５は、ショベル後方の実空間の上面視であり、参照点Ｐｒ１、Ｐｒ２を用いて仮想平面領域ＴＲ１、ＴＲ２が参照された場合における後方カメラ４０Ｂと仮想平面領域ＴＲ１、ＴＲ２との位置関係を示す。なお、本実施例では、参照点Ｐｒは、仮想地面上の仮想グリッドの格子点のそれぞれに配置可能である。但し、参照点Ｐｒは、仮想地面上に不規則に配置されてもよく、後方カメラ４０Ｂの仮想地面への投影点から放射状に伸びる線分上に等間隔に配置されてもよい。例えば、各線分は１度刻みで放射状に伸び、参照点Ｐｒは各線分上に１００ｍｍ間隔に配置される。 FIG. 5 is a top view of the real space behind the shovel, and shows the positional relationship between the rear camera 40B and the virtual plane regions TR1 and TR2 when the virtual plane regions TR1 and TR2 are referenced using the reference points Pr1 and Pr2. Show. In this embodiment, the reference point Pr can be arranged at each of the lattice points of the virtual grid on the virtual ground. However, the reference points Pr may be irregularly arranged on the virtual ground, or may be arranged at equal intervals on a line segment radially extending from the projection point of the rear camera 40B on the virtual ground. For example, each line segment extends radially in increments of 1 degree, and the reference points Pr are arranged at intervals of 100 mm on each line segment.

図４及び図５に示すように、四角形ＡＢＦＥ（図４参照。）で定められるボックスＢＸの第１面は、参照点Ｐｒ１を用いて仮想平面領域ＴＲ１が参照される場合、後方カメラ４０Ｂに正対するように配置される。すなわち、後方カメラ４０Ｂと参照点Ｐｒ１とを結ぶ線分は、参照点Ｐｒ１に関連して配置されるボックスＢＸの第１面と上面視で直交する。同様に、ボックスＢＸの第１面は、参照点Ｐｒ２を用いて仮想平面領域ＴＲ２が参照される場合にも、後方カメラ４０Ｂに正対するように配置される。すなわち、後方カメラ４０Ｂと参照点Ｐｒ２とを結ぶ線分は、参照点Ｐｒ２に関連して配置されるボックスＢＸの第１面と上面視で直交する。この関係は、参照点Ｐｒが何れの格子点上に配置された場合であっても成立する。すなわち、ボックスＢＸは、その第１面が常に後方カメラ４０Ｂに正対するように配置される。 As shown in FIGS. 4 and 5, the first surface of the box BX defined by the quadrangle ABFE (see FIG. 4) is positive to the rear camera 40B when the virtual plane region TR1 is referenced using the reference point Pr1. It arranges so that it may be. That is, the line segment connecting the rear camera 40B and the reference point Pr1 is orthogonal to the first surface of the box BX arranged in association with the reference point Pr1 when viewed from above. Similarly, the first surface of the box BX is arranged to face the rear camera 40B even when the virtual plane region TR2 is referenced using the reference point Pr2. That is, a line segment connecting the rear camera 40B and the reference point Pr2 is orthogonal to the first surface of the box BX arranged in association with the reference point Pr2 when viewed from above. This relationship holds even when the reference point Pr is arranged on any lattice point. That is, the box BX is arranged so that the first surface thereof is always directly opposite the rear camera 40B.

図６は、撮像画像から正規化画像を生成する処理の流れを示す図である。具体的には、図６（Ａ）は、後方カメラ４０Ｂの撮像画像の一例であり、実空間における参照点Ｐｒに関連して配置されるボックスＢＸを映し出す。また、図６（Ｂ）は、撮像画像における識別処理対象画像の領域（以下、「識別処理対象画像領域ＴＲｇ」とする。）を切り出した図であり、図６（Ａ）の撮像画像に映し出された仮想平面領域ＴＲに対応する。また、図６（Ｃ）は、識別処理対象画像領域ＴＲｇを有する識別処理対象画像を正規化した正規化画像ＴＲｇｔを示す。 FIG. 6 is a diagram illustrating a flow of processing for generating a normalized image from a captured image. Specifically, FIG. 6A is an example of a captured image of the rear camera 40B, and displays a box BX arranged in association with the reference point Pr in real space. FIG. 6B is a diagram in which a region of the identification processing target image (hereinafter referred to as “identification processing target image region TRg”) in the captured image is cut out, and is displayed in the captured image of FIG. Corresponding to the virtual plane region TR. FIG. 6C shows a normalized image TRgt obtained by normalizing the identification processing target image having the identification processing target image region TRg.

図６（Ａ）に示すように、実空間上で参照点Ｐｒ１に関連して配置されるボックスＢＸは、実空間における仮想平面領域ＴＲの位置を定め、そして、仮想平面領域ＴＲに対応する撮像画像上の識別処理対象画像領域ＴＲｇを定める。 As shown in FIG. 6A, the box BX arranged in the real space in relation to the reference point Pr1 determines the position of the virtual plane region TR in the real space, and the imaging corresponding to the virtual plane region TR. An identification processing target image region TRg on the image is determined.

このように、実空間における参照点Ｐｒの位置が決まれば、実空間における仮想平面領域ＴＲの位置が一意に決まり、撮像画像における識別処理対象画像領域ＴＲｇも一意に決まる。そして、抽出部３１は、識別処理対象画像領域ＴＲｇを有する識別処理対象画像を正規化して所定サイズの正規化画像ＴＲｇｔを生成できる。本実施例では、正規化画像ＴＲｇｔのサイズは、例えば縦６４ピクセル×横３２ピクセルである。 Thus, if the position of the reference point Pr in the real space is determined, the position of the virtual plane region TR in the real space is uniquely determined, and the identification processing target image region TRg in the captured image is also uniquely determined. Then, the extraction unit 31 can generate a normalized image TRgt having a predetermined size by normalizing the identification processing target image having the identification processing target image region TRg. In the present embodiment, the size of the normalized image TRgt is, for example, 64 pixels long × 32 pixels wide.

図７は、撮像画像と識別処理対象画像領域と正規化画像との関係を示す図である。具体的には、図７（Ａ１）は、撮像画像における識別処理対象画像領域ＴＲｇ３を示し、図７（Ａ２）は、識別処理対象画像領域ＴＲｇ３を有する識別処理対象画像の正規化画像ＴＲｇｔ３を示す。また、図７（Ｂ１）は、撮像画像における識別処理対象画像領域ＴＲｇ４を示し、図７（Ｂ２）は、識別処理対象画像領域ＴＲｇ４を有する識別処理対象画像の正規化画像ＴＲｇｔ４を示す。同様に、図７（Ｃ１）は、撮像画像における識別処理対象画像領域ＴＲｇ５を示し、図７（Ｃ２）は、識別処理対象画像領域ＴＲｇ５を有する識別処理対象画像の正規化画像ＴＲｇｔ５を示す。 FIG. 7 is a diagram illustrating a relationship among a captured image, an identification processing target image region, and a normalized image. Specifically, FIG. 7A1 shows the identification processing target image region TRg3 in the captured image, and FIG. 7A2 shows the normalized image TRgt3 of the identification processing target image having the identification processing target image region TRg3. . FIG. 7B1 shows an identification processing target image region TRg4 in the captured image, and FIG. 7B2 shows a normalized image TRgt4 of the identification processing target image having the identification processing target image region TRg4. Similarly, FIG. 7C1 shows the identification processing target image region TRg5 in the captured image, and FIG. 7C2 shows the normalized image TRgt5 of the identification processing target image having the identification processing target image region TRg5.

図７に示すように、撮像画像における識別処理対象画像領域ＴＲｇ５は、撮像画像における識別処理対象画像領域ＴＲｇ４より大きい。識別処理対象画像領域ＴＲｇ５に対応する仮想平面領域と後方カメラ４０Ｂとの間の距離が、識別処理対象画像領域ＴＲｇ４に対応する仮想平面領域と後方カメラ４０Ｂとの間の距離より小さいためである。同様に、撮像画像における識別処理対象画像領域ＴＲｇ４は、撮像画像における識別処理対象画像領域ＴＲｇ３より大きい。識別処理対象画像領域ＴＲｇ４に対応する仮想平面領域と後方カメラ４０Ｂとの間の距離が、識別処理対象画像領域ＴＲｇ３に対応する仮想平面領域と後方カメラ４０Ｂとの間の距離より小さいためである。すなわち、撮像画像における識別処理対象画像領域は、対応する仮想平面領域と後方カメラ４０Ｂとの間の距離が大きいほど小さい。その一方で、正規化画像ＴＲｇｔ３、ＴＲｇｔ４、ＴＲｇｔ５は何れも同じサイズの長方形画像である。 As shown in FIG. 7, the identification processing target image region TRg5 in the captured image is larger than the identification processing target image region TRg4 in the captured image. This is because the distance between the virtual plane area corresponding to the identification processing target image area TRg5 and the rear camera 40B is smaller than the distance between the virtual plane area corresponding to the identification processing target image area TRg4 and the rear camera 40B. Similarly, the identification processing target image region TRg4 in the captured image is larger than the identification processing target image region TRg3 in the captured image. This is because the distance between the virtual plane area corresponding to the identification processing target image area TRg4 and the rear camera 40B is smaller than the distance between the virtual plane area corresponding to the identification processing target image area TRg3 and the rear camera 40B. That is, the identification processing target image area in the captured image is smaller as the distance between the corresponding virtual plane area and the rear camera 40B is larger. On the other hand, the normalized images TRgt3, TRgt4, and TRgt5 are all rectangular images having the same size.

このように、抽出部３１は、撮像画像において様々な形状及びサイズを取り得る識別処理対象画像を所定サイズの長方形画像に正規化し、人画像を含む人候補画像を正規化できる。具体的には、抽出部３１は、正規化画像の所定領域に人候補画像の頭部であると推定される画像部分（以下、「頭部画像部分」とする。）を配置する。また、正規化画像の別の所定領域に人候補画像の胴体部であると推定される画像部分（以下、「胴体部画像部分」とする。）を配置し、正規化画像のさらに別の所定領域に人候補画像の脚部であると推定される画像部分（以下、「脚部画像部分」とする。）を配置する。また、抽出部３１は、正規化画像の形状に対する人候補画像の傾斜（像倒れ）を抑えた状態で正規化画像を取得できる。 In this manner, the extraction unit 31 can normalize the identification processing target image that can take various shapes and sizes in the captured image to a rectangular image of a predetermined size, and can normalize a human candidate image including a human image. Specifically, the extraction unit 31 arranges an image portion (hereinafter referred to as “head image portion”) that is estimated to be the head of the human candidate image in a predetermined region of the normalized image. In addition, an image portion (hereinafter, referred to as a “body portion image portion”) that is estimated to be the trunk portion of the human candidate image is arranged in another predetermined region of the normalized image, and yet another predetermined portion of the normalized image. An image portion (hereinafter referred to as a “leg image portion”) estimated to be a leg portion of the human candidate image is arranged in the region. Further, the extraction unit 31 can acquire the normalized image in a state where the inclination (image collapse) of the human candidate image with respect to the shape of the normalized image is suppressed.

次に、図８を参照し、識別処理対象画像領域が、人画像の識別に悪影響を与える識別に適さない画像領域（以下、「識別処理不適領域」とする。）を含む場合の正規化処理について説明する。識別処理不適領域は、人画像が存在し得ない既知の領域であり、例えば、ショベルの車体が映り込んだ領域（以下、「車体映り込み領域」とする。）、撮像画像からはみ出た領域（以下、「はみ出し領域」とする。）等を含む。なお、図８は、識別処理対象画像領域と識別処理不適領域との関係を示す図であり、図７（Ｃ１）及び図７（Ｃ２）に対応する。また、図８左図の右下がりの斜線ハッチング領域は、はみ出し領域Ｒ１に対応し、左下がりの斜線ハッチング領域は、車体映り込み領域Ｒ２に対応する。 Next, referring to FIG. 8, normalization processing in a case where the identification processing target image area includes an image area that is not suitable for identification that adversely affects identification of human images (hereinafter referred to as “identification process inappropriate area”). Will be described. The identification processing inappropriate area is a known area where no human image can exist, for example, an area in which the excavator's vehicle body is reflected (hereinafter referred to as “vehicle body reflection area”), an area that protrudes from the captured image ( Hereinafter, it is referred to as “extrusion area”). FIG. 8 is a diagram illustrating the relationship between the identification processing target image area and the identification processing inappropriate area, and corresponds to FIG. 7 (C1) and FIG. 7 (C2). Further, the right-slanted hatched area in the left diagram of FIG. 8 corresponds to the protrusion area R1, and the left-slanted hatched area corresponds to the vehicle body reflection area R2.

本実施例では、抽出部３１は、識別処理対象画像領域ＴＲｇ５がはみ出し領域Ｒ１及び車体映り込み領域Ｒ２の一部を含む場合、それらの識別処理不適領域をマスク処理した後で、識別処理対象画像領域ＴＲｇ５を有する識別処理対象画像の正規化画像ＴＲｇｔ５を生成する。なお、抽出部３１は、正規化画像ＴＲｇｔ５を生成した後で、正規化画像ＴＲｇｔ５における識別処理不適領域に対応する部分をマスク処理してもよい。 In the present embodiment, when the identification process target image region TRg5 includes a part of the protrusion region R1 and the vehicle body reflection region R2, the extraction unit 31 performs mask processing on these identification processing inappropriate regions, and then performs an identification processing target image. A normalized image TRgt5 of the identification processing target image having the region TRg5 is generated. Note that the extraction unit 31 may mask the portion corresponding to the identification processing inappropriate region in the normalized image TRgt5 after generating the normalized image TRgt5.

図８右図は、正規化画像ＴＲｇｔ５を示す。また、図８右図において、右下がりの斜線ハッチング領域は、はみ出し領域Ｒ１に対応するマスク領域Ｍ１を表し、左下がりの斜線ハッチング領域は、車体映り込み領域Ｒ２の一部に対応するマスク領域Ｍ２を表す。 The right figure of FIG. 8 shows normalized image TRgt5. Further, in the right diagram of FIG. 8, the diagonally hatched area that falls to the right represents the mask area M1 corresponding to the protruding area R1, and the diagonally hatched area that falls to the left represents the mask area M2 corresponding to a part of the vehicle body reflection area R2. Represents.

このようにして、抽出部３１は、識別処理不適領域の画像をマスク処理することで、識別処理不適領域の画像が識別部３２による識別処理に影響を及ぼすのを防止する。このマスク処理により、識別部３２は、識別処理不適領域の画像の影響を受けることなく、正規化画像におけるマスク領域以外の領域の画像を用いて人画像であるかを識別できる。なお、抽出部３１は、マスク処理以外の他の任意の公知方法で、識別処理不適領域の画像が識別部３２による識別処理に影響を及ぼさないようにしてもよい。 In this way, the extraction unit 31 masks the image of the identification process inappropriate area, thereby preventing the image of the identification process inappropriate area from affecting the identification process performed by the identification unit 32. By this mask processing, the identification unit 32 can identify whether the image is a human image using an image of a region other than the mask region in the normalized image without being affected by the image of the identification processing inappropriate region. Note that the extraction unit 31 may use any known method other than the mask process so that the image in the identification process inappropriate region does not affect the identification process performed by the identification unit 32.

次に、図９を参照し、抽出部３１が生成する正規化画像の特徴について説明する。なお、図９は、正規化画像の例を示す図である。また、図９に示す１４枚の正規化画像は、図の左端に近い正規化画像ほど、後方カメラ４０Ｂから近い位置に存在する人候補の画像を含み、図の右端に近い正規化画像ほど、後方カメラ４０Ｂから遠い位置に存在する人候補の画像を含む。 Next, characteristics of the normalized image generated by the extraction unit 31 will be described with reference to FIG. FIG. 9 is a diagram illustrating an example of a normalized image. Further, the 14 normalized images shown in FIG. 9 include images of human candidates that are closer to the rear camera 40B as the normalized image is closer to the left end of the diagram, and the normalized image closer to the right end of the diagram is It includes images of human candidates that are located far from the rear camera 40B.

図９に示すように、抽出部３１は、実空間における仮想平面領域ＴＲと後方カメラ４０Ｂとの間の後方水平距離（図５に示すＹ軸方向の水平距離）に関係なく、何れの正規化画像内においてもほぼ同じ割合で頭部画像部分、胴体部画像部分、脚部画像部分等を配置できる。そのため、抽出部３１は、識別部３２が識別処理を実行する際の演算負荷を低減でき、且つ、その識別結果の信頼性を向上できる。なお、上述の後方水平距離は、実空間における仮想平面領域ＴＲと後方カメラ４０Ｂとの間の位置関係に関する情報の一例であり、抽出部３１は、抽出した識別処理対象画像にその情報を付加する。また、上述の位置関係に関する情報は、仮想平面領域ＴＲに対応する参照点Ｐｒと後方カメラ４０Ｂとを結ぶ線分の後方カメラ４０Ｂの光軸に対する上面視角度等を含む。 As shown in FIG. 9, the extraction unit 31 performs any normalization regardless of the rear horizontal distance (the horizontal distance in the Y-axis direction shown in FIG. 5) between the virtual plane region TR and the rear camera 40B in the real space. In the image, the head image portion, the torso image portion, the leg image portion, and the like can be arranged at substantially the same ratio. Therefore, the extraction unit 31 can reduce the calculation load when the identification unit 32 executes the identification process, and can improve the reliability of the identification result. The rear horizontal distance described above is an example of information regarding the positional relationship between the virtual plane region TR and the rear camera 40B in the real space, and the extraction unit 31 adds the information to the extracted identification processing target image. . Further, the information on the positional relationship described above includes a top view angle with respect to the optical axis of the rear camera 40B of a line segment connecting the reference point Pr corresponding to the virtual plane region TR and the rear camera 40B.

次に、図１０を参照し、実空間における仮想平面領域ＴＲと後方カメラ４０Ｂとの間の後方水平距離と、正規化画像における頭部画像部分の大きさとの関係について説明する。なお、図１０上図は、後方カメラ４０Ｂからの後方水平距離がそれぞれ異なる３つの参照点Ｐｒ１０、Ｐｒ１１、Ｐ１２のところに人が存在する場合の頭部画像部分の大きさＬ１０、Ｌ１１、Ｌ１２を示す図であり、横軸が後方水平距離に対応する。また、図１０下図は、後方水平距離と頭部画像部分の大きさの関係を示すグラフであり、縦軸が頭部画像部分の大きさに対応し、横軸が後方水平距離に対応する。なお、図１０上図及び図１０下図の横軸は共通である。また、本実施例は、カメラ高さを２１００ｍｍとし、頭部ＨＤの中心の地面からの高さを１６００ｍｍとし、頭部の直径を２５０ｍｍとする。 Next, the relationship between the rear horizontal distance between the virtual plane region TR and the rear camera 40B in the real space and the size of the head image portion in the normalized image will be described with reference to FIG. 10 shows the sizes L10, L11, and L12 of the head image portion when a person is present at three reference points Pr10, Pr11, and P12 that have different rear horizontal distances from the rear camera 40B. The horizontal axis corresponds to the rear horizontal distance. 10 is a graph showing the relationship between the rear horizontal distance and the size of the head image portion. The vertical axis corresponds to the size of the head image portion, and the horizontal axis corresponds to the rear horizontal distance. In addition, the horizontal axis of the upper figure of FIG. 10 and the lower figure of FIG. 10 is common. In this embodiment, the camera height is 2100 mm, the height of the center of the head HD from the ground is 1600 mm, and the head diameter is 250 mm.

図１０上図に示すように、参照点Ｐｒ１０で示す位置に人が存在する場合、頭部画像部分の大きさＬ１０は、後方カメラ４０Ｂから見た頭部ＨＤの仮想平面領域ＴＲ１０への投影像の大きさに相当する。同様に、参照点Ｐｒ１１、Ｐｒ１２で示す位置に人が存在する場合、頭部画像部分の大きさＬ１１、Ｌ１２は、後方カメラ４０Ｂから見た頭部ＨＤの仮想平面領域ＴＲ１１、ＴＲ１２への投影像の大きさに相当する。なお、正規化画像における頭部画像部分の大きさは投影像の大きさに伴って変化する。 As shown in the upper diagram of FIG. 10, when a person is present at the position indicated by the reference point Pr10, the size L10 of the head image portion is the projected image of the head HD on the virtual plane region TR10 viewed from the rear camera 40B. Corresponds to the size of. Similarly, when a person is present at the positions indicated by the reference points Pr11 and Pr12, the sizes L11 and L12 of the head image portion are the projection images of the head HD on the virtual plane regions TR11 and TR12 viewed from the rear camera 40B. Corresponds to the size of. Note that the size of the head image portion in the normalized image varies with the size of the projection image.

そして、図１０下図に示すように、正規化画像における頭部画像部分の大きさは、後方水平距離がＤ１（例えば７００ｍｍ）以上ではほぼ同じ大きさを維持するが、後方水平距離がＤ１を下回ったところで急激に増大する。 As shown in the lower diagram of FIG. 10, the size of the head image portion in the normalized image remains substantially the same when the rear horizontal distance is greater than or equal to D1 (eg, 700 mm), but the rear horizontal distance is less than D1. It suddenly increases.

そこで、識別部３２は、後方水平距離に応じて識別処理の内容を変更する。例えば、識別部３２は、教師あり学習（機械学習）の手法を用いる場合、所定の後方水平距離（例えば６５０ｍｍ）を境に、識別処理で用いる学習サンプルをグループ分けする。具体的には、近距離用グループと遠距離用グループに学習サンプルを分けるようにする。この構成により、識別部３２は、より高精度に人画像を識別できる。 Therefore, the identification unit 32 changes the content of the identification process according to the rear horizontal distance. For example, when using a supervised learning (machine learning) technique, the identification unit 32 groups learning samples used in the identification process at a predetermined rear horizontal distance (for example, 650 mm). Specifically, the learning samples are divided into a short distance group and a long distance group. With this configuration, the identification unit 32 can identify the human image with higher accuracy.

以上の構成により、周辺監視システム１００は、撮像装置４０の方向を向き且つ水平面である仮想地面に対して傾斜する仮想平面領域ＴＲに対応する識別処理対象画像領域ＴＲｇから正規化画像ＴＲｇｔを生成する。そのため、人の高さ方向及び奥行き方向の見え方を考慮した正規化を実現できる。その結果、人を斜め上から撮像するように建設機械に取り付けられる撮像装置４０の撮像画像を用いた場合であっても建設機械の周囲に存在する人をより確実に検知できる。特に、人が撮像装置４０に接近した場合であっても、撮像画像上の十分な大きさの領域を占める識別処理対象画像から正規化画像を生成できるため、その人を確実に検知できる。 With the above configuration, the periphery monitoring system 100 generates the normalized image TRgt from the identification processing target image region TRg corresponding to the virtual plane region TR that faces the imaging device 40 and is inclined with respect to the virtual ground that is a horizontal plane. . Therefore, normalization in consideration of how the person looks in the height direction and depth direction can be realized. As a result, even when a captured image of the imaging device 40 attached to the construction machine so as to capture an image of a person from above is used, a person existing around the construction machine can be detected more reliably. In particular, even when a person approaches the imaging device 40, the normalized image can be generated from the identification processing target image that occupies a sufficiently large area on the captured image, so that the person can be reliably detected.

また、周辺監視システム１００は、実空間における仮想直方体であるボックスＢＸの４つの頂点Ａ、Ｂ、Ｇ、Ｈで形成される矩形領域として仮想平面領域ＴＲを定義する。そのため、実空間における参照点Ｐｒと仮想平面領域ＴＲとを幾何学的に対応付けることができ、さらには、実空間における仮想平面領域ＴＲと撮像画像における識別処理対象画像領域ＴＲｇとを幾何学的に対応付けることができる。 In addition, the periphery monitoring system 100 defines the virtual plane region TR as a rectangular region formed by the four vertices A, B, G, and H of the box BX that is a virtual cuboid in real space. Therefore, the reference point Pr in the real space can be geometrically associated with the virtual plane region TR, and further, the virtual plane region TR in the real space and the identification processing target image region TRg in the captured image can be geometrically related. Can be associated.

また、抽出部３１は、識別処理対象画像領域ＴＲｇに含まれる識別処理不適領域の画像をマスク処理する。そのため、識別部３２は、車体映り込み領域Ｒ２を含む識別処理不適領域の画像の影響を受けることなく、正規化画像におけるマスク領域以外の領域の画像を用いて人画像であるかを識別できる。 In addition, the extraction unit 31 performs mask processing on the image of the identification processing inappropriate area included in the identification processing target image area TRg. Therefore, the identification unit 32 can identify whether the image is a human image by using an image of a region other than the mask region in the normalized image without being affected by the image of the identification inappropriate region including the vehicle body reflection region R2.

また、抽出部３１は、識別処理対象画像を抽出した場合、仮想平面領域ＴＲと撮像装置４０との位置関係に関する情報として両者間の後方水平距離をその識別処理対象画像に付加する。そして、識別部３２は、その後方水平距離に応じて識別処理の内容を変更する。具体的には、識別部３２は、所定の後方水平距離（例えば６５０ｍｍ）を境に、識別処理で用いる学習サンプルをグループ分けする。この構成により、識別部３２は、より高精度に人画像を識別できる。 In addition, when the identification processing target image is extracted, the extraction unit 31 adds the rear horizontal distance between the two to the identification processing target image as information regarding the positional relationship between the virtual plane region TR and the imaging device 40. And the identification part 32 changes the content of the identification process according to the back horizontal distance. Specifically, the identification unit 32 groups learning samples used in the identification process at a predetermined rear horizontal distance (for example, 650 mm). With this configuration, the identification unit 32 can identify the human image with higher accuracy.

また、抽出部３１は、参照点Ｐｒ毎に識別処理対象画像を抽出可能である。また、識別処理対象画像領域ＴＲｇのそれぞれは、対応する仮想平面領域ＴＲを介して、人の想定立ち位置として予め設定される参照点Ｐｒの１つに関連付けられる。そのため、周辺監視システム１００は、人が存在する可能性が高い参照点Ｐｒを任意の方法で抽出することで、人候補画像を含む可能性が高い識別処理対象画像を抽出できる。この場合、人候補画像を含む可能性が低い識別処理対象画像に対して、比較的演算量の多い画像処理による識別処理が施されてしまうのを防止でき、人検知処理の高速化を実現できる。 Further, the extraction unit 31 can extract an identification processing target image for each reference point Pr. In addition, each of the identification processing target image areas TRg is associated with one of the reference points Pr set in advance as an assumed standing position of the person via the corresponding virtual plane area TR. Therefore, the periphery monitoring system 100 can extract an identification processing target image that is likely to include a human candidate image by extracting the reference point Pr that is likely to be present by an arbitrary method. In this case, it is possible to prevent the identification processing target image having a low possibility of including the human candidate image from being subjected to the identification processing by the image processing having a relatively large calculation amount, and to realize the speedup of the human detection processing. .

次に、図１１及び図１２を参照し、人候補画像を含む可能性が高い識別処理対象画像を抽出部３１が抽出する処理の一例について説明する。なお、図１１は、抽出部３１が撮像画像から識別処理対象画像を切り出す際に用いる幾何学的関係の一例を示す概略図であり、図４に対応する。また、図１２は、撮像画像における特徴画像の一例を示す図である。なお、特徴画像は、人の特徴的な部分を表す画像であり、望ましくは、実空間における地面からの高さが変化し難い部分を表す画像である。そのため、特徴画像は、例えば、ヘルメットの画像、肩の画像、頭の画像、人に取り付けられる反射板若しくはマーカの画像等を含む。 Next, an example of processing in which the extraction unit 31 extracts an identification processing target image that is highly likely to include a human candidate image will be described with reference to FIGS. 11 and 12. FIG. 11 is a schematic diagram illustrating an example of a geometric relationship used when the extraction unit 31 cuts out an identification processing target image from a captured image, and corresponds to FIG. FIG. 12 is a diagram illustrating an example of a feature image in a captured image. Note that the feature image is an image that represents a characteristic part of a person, and is preferably an image that represents a part in which the height from the ground in real space is difficult to change. Therefore, the feature image includes, for example, an image of a helmet, an image of a shoulder, an image of a head, an image of a reflector or a marker attached to a person, and the like.

本実施例では、抽出部３１は、前段画像認識処理によって、撮像画像におけるヘルメット画像（厳密にはヘルメットであると推定できる画像）を見つけ出す。ショベルの周囲で作業する人はヘルメットを着用していると考えられるためである。そして、抽出部３１は、見つけ出したヘルメット画像の位置から最も関連性の高い参照点Ｐｒを導き出す。その上で、抽出部３１は、その参照点Ｐｒに対応する識別処理対象画像を抽出する。 In the present embodiment, the extraction unit 31 finds out a helmet image (an image that can be estimated to be a helmet strictly) in the captured image by the pre-stage image recognition process. This is because a person working around the excavator is considered to be wearing a helmet. Then, the extraction unit 31 derives the most relevant reference point Pr from the position of the found helmet image. Then, the extraction unit 31 extracts an identification processing target image corresponding to the reference point Pr.

具体的には、抽出部３１は、図１１に示す幾何学的関係を利用し、撮像画像におけるヘルメット画像の位置から関連性の高い参照点Ｐｒを導き出す。なお、図１１の幾何学的関係は、実空間における仮想頭部位置ＨＰを定める点で図４の幾何学的関係と相違するが、その他の点で共通する。 Specifically, the extraction unit 31 derives a highly relevant reference point Pr from the position of the helmet image in the captured image using the geometric relationship shown in FIG. The geometric relationship in FIG. 11 is different from the geometric relationship in FIG. 4 in that the virtual head position HP in the real space is determined, but is common in other points.

仮想頭部位置ＨＰは、参照点Ｐｒ上に存在すると想定される人の頭部位置を表し、参照点Ｐｒの真上に配置される。本実施例では、参照点Ｐｒ上の高さ１７００ｍｍのところに配置される。そのため、実空間における仮想頭部位置ＨＰが決まれば、実空間における参照点Ｐｒの位置が一意に決まり、実空間における仮想平面領域ＴＲの位置も一意に決まる。また、撮像画像における識別処理対象画像領域ＴＲｇも一意に決まる。そして、抽出部３１は、識別処理対象画像領域ＴＲｇを有する識別処理対象画像を正規化して所定サイズの正規化画像ＴＲｇｔを生成できる。 The virtual head position HP represents the head position of a person assumed to be present on the reference point Pr, and is disposed immediately above the reference point Pr. In this embodiment, it is arranged at a height of 1700 mm on the reference point Pr. Therefore, if the virtual head position HP in the real space is determined, the position of the reference point Pr in the real space is uniquely determined, and the position of the virtual plane region TR in the real space is also uniquely determined. Further, the identification processing target image region TRg in the captured image is also uniquely determined. Then, the extraction unit 31 can generate a normalized image TRgt having a predetermined size by normalizing the identification processing target image having the identification processing target image region TRg.

逆に、実空間における参照点Ｐｒの位置が決まれば、実空間における仮想頭部位置ＨＰが一意に決まり、実空間における仮想頭部位置ＨＰに対応する撮像画像上の頭部画像位置ＡＰも一意に決まる。そのため、頭部画像位置ＡＰは、予め設定されている参照点Ｐｒのそれぞれに対応付けて予め設定され得る。なお、頭部画像位置ＡＰは、参照点Ｐｒからリアルタイムに導き出されてもよい。 Conversely, if the position of the reference point Pr in the real space is determined, the virtual head position HP in the real space is uniquely determined, and the head image position AP on the captured image corresponding to the virtual head position HP in the real space is also unique. It is decided. Therefore, the head image position AP can be set in advance in association with each of the preset reference points Pr. The head image position AP may be derived from the reference point Pr in real time.

そこで、抽出部３１は、前段画像認識処理により後方カメラ４０Ｂの撮像画像内でヘルメット画像を探索する。図１２上図は、抽出部３１がヘルメット画像ＨＲｇを見つけ出した状態を示す。そして、抽出部３１は、ヘルメット画像ＨＲｇを見つけ出した場合、その代表位置ＲＰを決定する。なお、代表位置ＲＰは、ヘルメット画像ＨＲｇの大きさ、形状等から導き出される位置である。本実施例では、代表位置ＲＰは、ヘルメット画像ＨＲｇを含むヘルメット画像領域の中心画素の位置である。図１２下図は、図１２上図における白線で区切られた矩形画像領域であるヘルメット画像領域の拡大図であり、そのヘルメット画像領域の中心画素の位置が代表位置ＲＰであることを示す。 Therefore, the extraction unit 31 searches for the helmet image in the captured image of the rear camera 40B by the pre-stage image recognition process. The upper part of FIG. 12 shows a state where the extraction unit 31 has found the helmet image HRg. And the extraction part 31 determines the representative position RP, when the helmet image HRg is found. The representative position RP is a position derived from the size, shape, etc. of the helmet image HRg. In this embodiment, the representative position RP is the position of the central pixel in the helmet image area including the helmet image HRg. The lower diagram in FIG. 12 is an enlarged view of the helmet image region, which is a rectangular image region separated by white lines in the upper diagram in FIG. 12, and shows that the position of the central pixel in the helmet image region is the representative position RP.

その後、抽出部３１は、例えば最近傍探索アルゴリズムを用いて代表位置ＲＰの最も近傍にある頭部画像位置ＡＰを導き出す。図１２下図は、代表位置ＲＰの近くに６つの頭部画像位置ＡＰ１〜ＡＰ６が予め設定されており、そのうちの頭部画像位置ＡＰ５が代表位置ＲＰの最も近傍にある頭部画像位置ＡＰであることを示す。 Thereafter, the extraction unit 31 derives a head image position AP that is closest to the representative position RP using, for example, a nearest neighbor search algorithm. In the lower part of FIG. 12, six head image positions AP1 to AP6 are preset near the representative position RP, and the head image position AP5 is the head image position AP closest to the representative position RP. It shows that.

そして、抽出部３１は、図１１に示す幾何学的関係を利用し、導き出した最近傍の頭部画像位置ＡＰから、仮想頭部位置ＨＰ、参照点Ｐｒ、仮想平面領域ＴＲを辿って、対応する識別処理対象画像領域ＴＲｇを抽出する。その後、抽出部３１は、抽出した識別処理対象画像領域ＴＲｇを有する識別処理対象画像を正規化して正規化画像ＴＲｇｔを生成する。 Then, the extraction unit 31 uses the geometrical relationship shown in FIG. 11 to follow the virtual head position HP, the reference point Pr, and the virtual plane region TR from the nearest head image position AP derived. The identification processing target image region TRg to be extracted is extracted. Thereafter, the extraction unit 31 normalizes the identification processing target image having the extracted identification processing target image region TRg to generate a normalized image TRgt.

このようにして、抽出部３１は、撮像画像における人の特徴画像の位置であるヘルメット画像ＨＲｇの代表位置ＲＰと、予め設定された頭部画像位置ＡＰの１つ（頭部画像位置ＡＰ５）とを対応付けることで識別処理対象画像を抽出する。 In this way, the extraction unit 31 includes the representative position RP of the helmet image HRg that is the position of the human characteristic image in the captured image, and one of the preset head image positions AP (head image position AP5). Are associated with each other to extract an identification processing target image.

なお、抽出部３１は、図１１に示す幾何学的関係を利用する代わりに、頭部画像位置ＡＰと参照点Ｐｒ、仮想平面領域ＴＲ、又は識別処理対象画像領域ＴＲｇとを直接的に対応付ける参照テーブルを利用し、頭部画像位置ＡＰに対応する識別処理対象画像を抽出してもよい。 Note that the extraction unit 31 directly associates the head image position AP with the reference point Pr, the virtual plane region TR, or the identification processing target image region TRg instead of using the geometrical relationship shown in FIG. An identification processing target image corresponding to the head image position AP may be extracted using a table.

また、抽出部３１は、山登り法、Mean-shift法等の最近傍探索アルゴリズム以外の他の公知のアルゴリズムを用いて代表位置ＲＰから参照点Ｐｒを導き出してもよい。例えば、山登り法を用いる場合、抽出部３１は、代表位置ＲＰの近傍にある複数の頭部画像位置ＡＰを導き出し、代表位置ＲＰとそれら複数の頭部画像位置ＡＰのそれぞれに対応する参照点Ｐｒとを紐付ける。このとき、抽出部３１は、代表位置ＲＰと頭部画像位置ＡＰが近いほど重みが大きくなるように参照点Ｐｒに重みを付ける。そして、複数の参照点Ｐｒの重みの分布を山登りし、重みの極大点に最も近い重みを有する参照点Ｐｒから識別処理対象画像領域ＴＲｇを抽出する。 In addition, the extraction unit 31 may derive the reference point Pr from the representative position RP using a known algorithm other than the nearest neighbor search algorithm such as a hill-climbing method or a Mean-shift method. For example, when using the hill-climbing method, the extraction unit 31 derives a plurality of head image positions AP in the vicinity of the representative position RP, and the reference points Pr corresponding to the representative position RP and each of the plurality of head image positions AP. Is linked. At this time, the extraction unit 31 weights the reference point Pr so that the weight becomes larger as the representative position RP and the head image position AP are closer. Then, the weight distribution of the plurality of reference points Pr is climbed, and the identification processing target image region TRg is extracted from the reference point Pr having the weight closest to the maximum point of the weight.

次に、図１３を参照し、コントローラ３０の抽出部３１が識別処理対象画像を抽出する処理（以下、「画像抽出処理」とする。）の一例について説明する。なお、図１３は、画像抽出処理の一例の流れを示すフローチャートである。抽出部３１は、例えば、撮像画像を取得する度にこの画像抽出処理を実行する。 Next, an example of processing (hereinafter referred to as “image extraction processing”) in which the extraction unit 31 of the controller 30 extracts an identification processing target image will be described with reference to FIG. FIG. 13 is a flowchart showing an exemplary flow of image extraction processing. For example, the extraction unit 31 performs this image extraction process every time a captured image is acquired.

最初に、抽出部３１は、撮像画像内でヘルメット画像を探索する（ステップＳＴ１）。本実施例では、抽出部３１は、前段画像認識処理により後方カメラ４０Ｂの撮像画像をラスタスキャンしてヘルメット画像を見つけ出す。 First, the extraction unit 31 searches for a helmet image in the captured image (step ST1). In the present embodiment, the extraction unit 31 performs a raster scan on the image captured by the rear camera 40B by the preceding image recognition process to find a helmet image.

撮像画像でヘルメット画像ＨＲｇを見つけ出した場合（ステップＳＴ１のＹＥＳ）、抽出部３１は、ヘルメット画像ＨＲｇの代表位置ＲＰを取得する（ステップＳＴ２）。 When the helmet image HRg is found from the captured image (YES in step ST1), the extraction unit 31 acquires the representative position RP of the helmet image HRg (step ST2).

その後、抽出部３１は、取得した代表位置ＲＰの最近傍にある頭部画像位置ＡＰを取得する（ステップＳＴ３）。 Thereafter, the extraction unit 31 acquires a head image position AP that is closest to the acquired representative position RP (step ST3).

その後、抽出部３１は、取得した頭部画像位置ＡＰに対応する識別処理対象画像を抽出する（ステップＳＴ４）。本実施例では、抽出部３１は、図１１に示す幾何学的関係を利用し、撮像画像における頭部画像位置ＡＰ、実空間における仮想頭部位置ＨＰ、実空間における人の想定立ち位置としての参照点Ｐｒ、及び、実空間における仮想平面領域ＴＲの対応関係を辿って識別処理対象画像を抽出する。 Thereafter, the extraction unit 31 extracts an identification processing target image corresponding to the acquired head image position AP (step ST4). In the present embodiment, the extraction unit 31 uses the geometrical relationship shown in FIG. 11 and uses the head image position AP in the captured image, the virtual head position HP in the real space, and the assumed standing position of the person in the real space. The identification processing target image is extracted by following the correspondence between the reference point Pr and the virtual plane region TR in the real space.

なお、抽出部３１は、撮像画像でヘルメット画像ＨＲｇを見つけ出さなかった場合には（ステップＳＴ１のＮＯ）、識別処理対象画像を抽出することなく、処理をステップＳＴ５に移行させる。 If the helmet image HRg is not found in the captured image (NO in step ST1), the extraction unit 31 shifts the process to step ST5 without extracting the identification processing target image.

その後、抽出部３１は、撮像画像の全体にわたってヘルメット画像を探索したかを判定する（ステップＳＴ５）。 Then, the extraction part 31 determines whether the helmet image was searched over the whole captured image (step ST5).

撮像画像の全体を未だ探索していないと判定した場合（ステップＳＴ５のＮＯ）、抽出部３１は、撮像画像の別の領域に対し、ステップＳＴ１〜ステップＳＴ４の処理を実行する。 If it is determined that the entire captured image has not yet been searched (NO in step ST5), the extraction unit 31 performs the processing in steps ST1 to ST4 on another region of the captured image.

一方、撮像画像の全体にわたるヘルメット画像の探索を完了したと判定した場合（ステップＳＴ５のＹＥＳ）、抽出部３１は今回の画像抽出処理を終了させる。 On the other hand, when it is determined that the search for the helmet image over the entire captured image has been completed (YES in step ST5), the extraction unit 31 ends the current image extraction process.

このように、抽出部３１は、最初にヘルメット画像ＨＲｇを見つけ出し、見つけ出したヘルメット画像ＨＲｇの代表位置ＲＰから、頭部画像位置ＡＰ、仮想頭部位置ＨＰ、参照点（想定立ち位置）Ｐｒ、仮想平面領域ＴＲを経て識別処理対象画像領域ＴＲｇを特定する。そして、特定した識別処理対象画像領域ＴＲｇを有する識別処理対象画像を抽出して正規化することで、所定サイズの正規化画像ＴＲｇｔを生成できる。 In this way, the extraction unit 31 first finds the helmet image HRg, and from the representative position RP of the found helmet image HRg, the head image position AP, the virtual head position HP, the reference point (assumed standing position) Pr, the virtual An identification processing target image region TRg is specified through the plane region TR. Then, by extracting and normalizing the identification processing target image having the identified identification processing target image region TRg, a normalized image TRgt of a predetermined size can be generated.

次に、図１４を参照し、画像抽出処理の別の一例について説明する。なお、図１４は、画像抽出処理の別の一例の流れを示すフローチャートである。 Next, another example of image extraction processing will be described with reference to FIG. FIG. 14 is a flowchart showing the flow of another example of image extraction processing.

最初に、抽出部３１は、頭部画像位置ＡＰの１つを取得する（ステップＳＴ１１）。その後、抽出部３１は、その頭部画像位置ＡＰに対応するヘルメット画像領域を取得する（ステップＳＴ１２）。本実施例では、ヘルメット画像領域は、頭部画像位置ＡＰのそれぞれについて予め設定された所定サイズの画像領域である。 First, the extraction unit 31 acquires one of the head image positions AP (step ST11). Thereafter, the extraction unit 31 acquires a helmet image region corresponding to the head image position AP (step ST12). In the present embodiment, the helmet image area is an image area of a predetermined size that is preset for each of the head image positions AP.

その後、抽出部３１は、ヘルメット画像領域内でヘルメット画像を探索する（ステップＳＴ１３）。本実施例では、抽出部３１は、前段画像認識処理によりヘルメット画像領域内をラスタスキャンしてヘルメット画像を見つけ出す。 Thereafter, the extraction unit 31 searches for a helmet image within the helmet image region (step ST13). In the present embodiment, the extraction unit 31 raster scans the inside of the helmet image area by the preceding image recognition process and finds out the helmet image.

ヘルメット画像領域内でヘルメット画像ＨＲｇを見つけ出した場合（ステップＳＴ１３のＹＥＳ）、抽出部３１は、そのときの頭部画像位置ＡＰに対応する識別処理対象画像を抽出する（ステップＳＴ１４）。本実施例では、抽出部３１は、図１１に示す幾何学的関係を利用し、撮像画像における頭部画像位置ＡＰ、実空間における仮想頭部位置ＨＰ、実空間における人の想定立ち位置としての参照点Ｐｒ、及び、実空間における仮想平面領域ＴＲの対応関係を辿って識別処理対象画像を抽出する。 When the helmet image HRg is found in the helmet image area (YES in step ST13), the extraction unit 31 extracts an identification processing target image corresponding to the head image position AP at that time (step ST14). In the present embodiment, the extraction unit 31 uses the geometrical relationship shown in FIG. 11 and uses the head image position AP in the captured image, the virtual head position HP in the real space, and the assumed standing position of the person in the real space. The identification processing target image is extracted by following the correspondence between the reference point Pr and the virtual plane region TR in the real space.

なお、抽出部３１は、ヘルメット画像領域内でヘルメット画像ＨＲｇを見つけ出さなかった場合には（ステップＳＴ１３のＮＯ）、識別処理対象画像を抽出することなく、処理をステップＳＴ１５に移行させる。 If the helmet image HRg is not found in the helmet image area (NO in step ST13), the extraction unit 31 shifts the process to step ST15 without extracting the identification process target image.

その後、抽出部３１は、全ての頭部画像位置ＡＰを取得したかを判定する（ステップＳＴ１５）。そして、全ての頭部画像位置ＡＰを未だ取得していないと判定した場合（ステップＳＴ１５のＮＯ）、抽出部３１は、未取得の別の頭部画像位置ＡＰを取得し、ステップＳＴ１１〜ステップＳＴ１４の処理を実行する。一方、全ての頭部画像位置ＡＰを取得し終わったと判定した場合（ステップＳＴ１５のＹＥＳ）、抽出部３１は今回の画像抽出処理を終了させる。 Thereafter, the extraction unit 31 determines whether all the head image positions AP have been acquired (step ST15). And when it determines with not having acquired all the head image positions AP yet (NO of step ST15), the extraction part 31 acquires another head image position AP which has not been acquired, and step ST11-step ST14 Execute the process. On the other hand, when it is determined that all the head image positions AP have been acquired (YES in step ST15), the extraction unit 31 ends the current image extraction process.

このように、抽出部３１は、最初に頭部画像位置ＡＰの１つを取得し、取得した頭部画像位置ＡＰに対応するヘルメット画像領域でヘルメット画像ＨＲｇを見つけ出した場合に、そのときの頭部画像位置ＡＰから、仮想頭部位置ＨＰ、参照点（想定立ち位置）Ｐｒ、仮想平面領域ＴＲを経て、識別処理対象画像領域ＴＲｇを特定する。そして、特定した識別処理対象画像領域ＴＲｇを有する識別処理対象画像を抽出して正規化することで、所定サイズの正規化画像ＴＲｇｔを生成できる。 Thus, when the extraction unit 31 first acquires one of the head image positions AP and finds the helmet image HRg in the helmet image area corresponding to the acquired head image position AP, the head at that time From the partial image position AP, the identification processing target image area TRg is specified through the virtual head position HP, the reference point (assumed standing position) Pr, and the virtual plane area TR. Then, by extracting and normalizing the identification processing target image having the identified identification processing target image region TRg, a normalized image TRgt of a predetermined size can be generated.

以上の構成により、周辺監視システム１００の抽出部３１は、撮像画像における特徴画像としてのヘルメット画像を見つけ出し、そのヘルメット画像の代表位置ＲＰと所定画像位置としての頭部画像位置ＡＰの１つとを対応付けることで識別処理対象画像を抽出する。そのため、簡易なシステム構成で後段画像認識処理の対象となる画像部分を絞り込むことができる。 With the above configuration, the extraction unit 31 of the periphery monitoring system 100 finds a helmet image as a feature image in the captured image, and associates the representative position RP of the helmet image with one of the head image positions AP as a predetermined image position. Thus, the identification processing target image is extracted. Therefore, it is possible to narrow down the image portion to be subjected to the subsequent image recognition processing with a simple system configuration.

なお、抽出部３１は、最初に撮像画像からヘルメット画像ＨＲｇを見つけ出し、そのヘルメット画像ＨＲｇの代表位置ＲＰに対応する頭部画像位置ＡＰの１つを導き出し、その頭部画像位置ＡＰの１つに対応する識別処理対象画像を抽出してもよい。或いは、抽出部３１は、最初に頭部画像位置ＡＰの１つを取得し、その頭部画像位置ＡＰの１つに対応する特徴画像の位置を含む所定領域であるヘルメット画像領域内にヘルメット画像が存在する場合に、その頭部画像位置ＡＰの１つに対応する識別処理対象画像を抽出してもよい。 The extraction unit 31 first finds the helmet image HRg from the captured image, derives one of the head image positions AP corresponding to the representative position RP of the helmet image HRg, and sets it as one of the head image positions AP. A corresponding identification processing target image may be extracted. Alternatively, the extraction unit 31 first acquires one of the head image positions AP, and the helmet image is in a helmet image area that is a predetermined area including the position of the feature image corresponding to one of the head image positions AP. May exist, an identification processing target image corresponding to one of the head image positions AP may be extracted.

また、抽出部３１は、図１１に示すような所定の幾何学的関係を利用し、撮像画像におけるヘルメット画像の代表位置ＲＰから識別処理対象画像を抽出してもよい。この場合、所定の幾何学的関係は、撮像画像における識別処理対象画像領域ＴＲｇと、識別処理対象画像領域ＴＲｇに対応する実空間における仮想平面領域ＴＲと、仮想平面領域ＴＲに対応する実空間における参照点Ｐｒ（人の想定立ち位置）と、参照点Ｐｒに対応する仮想頭部位置ＨＰ（人の想定立ち位置に対応する人の特徴的な部分の実空間における位置である仮想特徴位置）と、仮想頭部位置ＨＰに対応する撮像画像における頭部画像位置ＡＰ（仮想特徴位置に対応する撮像画像における所定画像位置）との幾何学的関係を表す。 Further, the extraction unit 31 may extract the identification processing target image from the representative position RP of the helmet image in the captured image using a predetermined geometrical relationship as illustrated in FIG. In this case, the predetermined geometric relationship is such that the identification processing target image region TRg in the captured image, the virtual plane region TR in the real space corresponding to the identification processing target image region TRg, and the real space corresponding to the virtual plane region TR. Reference point Pr (assumed standing position of a person), virtual head position HP corresponding to reference point Pr (virtual feature position that is a position in a real space of a characteristic portion of the person corresponding to the assumed standing position of the person), and The geometrical relationship with the head image position AP (predetermined image position in the captured image corresponding to the virtual feature position) in the captured image corresponding to the virtual head position HP is represented.

次に、図１５を参照し、画像抽出処理の更に別の一例について説明する。なお、図１５は、画像抽出処理の更に別の一例の流れを示すフローチャートである。図１５の画像抽出処理は、正規化画像を生成した後でその正規化画像に特徴画像としてのヘルメット画像が含まれるか否かを判定する点で図１３及び図１４のそれぞれにおける画像抽出処理と異なる。図１３及び図１４のそれぞれにおける画像抽出処理はヘルメット画像を見つけた後でそのヘルメット画像を含む画像部分を正規化するためである。 Next, another example of the image extraction process will be described with reference to FIG. FIG. 15 is a flowchart showing the flow of still another example of the image extraction process. The image extraction process of FIG. 15 is the same as the image extraction process of FIG. 13 and FIG. 14 in that it determines whether a helmet image as a feature image is included in the normalized image after generating the normalized image. Different. The image extraction process in each of FIGS. 13 and 14 is for normalizing an image portion including the helmet image after finding the helmet image.

本実施例では、抽出部３１は、撮像画像における複数の所定画像部分のそれぞれを正規化して複数の正規化画像を生成し、それら正規化画像のうちヘルメット画像を含む正規化画像を識別処理対象画像として抽出する。複数の所定画像部分は、例えば、撮像画像上に予め定められた複数の識別処理対象画像領域ＴＲｇである。識別処理対象画像領域ＴＲｇ（図６参照。）は、実空間における仮想平面領域ＴＲに対応し、仮想平面領域ＴＲは実空間における参照点Ｐｒに対応する。そして、識別部３２は、抽出部３１が抽出した識別処理対象画像に含まれる人候補画像が人画像であるかを識別する。 In the present embodiment, the extraction unit 31 normalizes each of the plurality of predetermined image portions in the captured image to generate a plurality of normalized images, and among the normalized images, the normalized image including the helmet image is identified. Extract as an image. The plurality of predetermined image portions are, for example, a plurality of identification processing target image regions TRg predetermined on the captured image. The identification processing target image region TRg (see FIG. 6) corresponds to the virtual plane region TR in the real space, and the virtual plane region TR corresponds to the reference point Pr in the real space. Then, the identification unit 32 identifies whether the human candidate image included in the identification processing target image extracted by the extraction unit 31 is a human image.

最初に、抽出部３１は所定画像部分の１つから正規化画像を生成する（ステップＳＴ２１）。本実施例では、後方カメラ４０Ｂの撮像画像における予め設定された参照点Ｐｒの１つに対応する識別処理対象画像領域ＴＲｇの１つを所定画像部分とし、その所定画像部分を射影変換によって所定サイズの長方形画像に変換する。 First, the extraction unit 31 generates a normalized image from one of the predetermined image portions (step ST21). In the present embodiment, one of the identification processing target image regions TRg corresponding to one of the preset reference points Pr in the captured image of the rear camera 40B is set as a predetermined image portion, and the predetermined image portion is converted into a predetermined size by projective transformation. Convert to a rectangular image.

その後、抽出部３１は、生成した正規化画像にヘルメット画像が含まれるか否かを判定する（ステップＳＴ２２）。本実施例では、抽出部３１は、前段画像認識処理によりその正規化画像をラスタスキャンしてヘルメット画像を見つけ出す。 Thereafter, the extraction unit 31 determines whether or not a helmet image is included in the generated normalized image (step ST22). In the present embodiment, the extraction unit 31 raster scans the normalized image by the previous image recognition process to find a helmet image.

正規化画像にヘルメット画像が含まれると判定した場合（ステップＳＴ２２のＹＥＳ）、抽出部３１は、その正規化画像を識別処理対象画像として抽出する（ステップＳＴ２３）。このとき、識別部３２は、抽出部３１が抽出した識別処理対象画像に含まれる人候補画像が人画像であるかを識別する。すなわち、識別部３２は、抽出部３１による次の正規化が行われる前に、ヘルメット画像を含むと判定された正規化画像に対し、その正規化画像に含まれる画像が人画像であるかを識別する。そのため、周辺監視システム１００は、１つの正規化画像を記憶できるメモリ容量を有していれば撮像画像全体に対する画像抽出処理及び識別処理を実行できる。但し、識別部３２は、抽出部３１が複数の正規化画像を識別処理対象画像として抽出した後で、それら複数の識別処理対象画像のそれぞれに含まれる人候補画像が人画像であるかを識別してもよい。 If it is determined that the helmet image is included in the normalized image (YES in step ST22), the extraction unit 31 extracts the normalized image as an identification processing target image (step ST23). At this time, the identification unit 32 identifies whether the human candidate image included in the identification processing target image extracted by the extraction unit 31 is a human image. That is, the identification unit 32 determines whether the image included in the normalized image is a human image with respect to the normalized image determined to include the helmet image before the next normalization by the extraction unit 31 is performed. Identify. Therefore, if the periphery monitoring system 100 has a memory capacity capable of storing one normalized image, the periphery monitoring system 100 can execute image extraction processing and identification processing on the entire captured image. However, after the extraction unit 31 extracts a plurality of normalized images as identification processing target images, the identification unit 32 identifies whether the human candidate image included in each of the plurality of identification processing target images is a human image. May be.

正規化画像にヘルメット画像が含まれないと判定した場合（ステップＳＴ２２のＮＯ）、抽出部３１は、その正規化画像を識別処理対象画像として抽出することなく処理を進める。 When it is determined that the helmet image is not included in the normalized image (NO in step ST22), the extraction unit 31 proceeds with the process without extracting the normalized image as the identification processing target image.

その後、抽出部３１は、全ての所定画像部分から正規化画像を生成したか否かを判定する（ステップＳＴ２４）。本実施例では、後方カメラ４０Ｂの撮像画像における参照点Ｐｒのそれぞれに対応する識別処理対象画像領域ＴＲｇである所定画像部分の全てから正規化画像を生成したか否かを判定する。 Thereafter, the extraction unit 31 determines whether or not a normalized image has been generated from all the predetermined image portions (step ST24). In the present embodiment, it is determined whether or not a normalized image has been generated from all of the predetermined image portions that are the identification processing target image regions TRg corresponding to the respective reference points Pr in the captured image of the rear camera 40B.

全ての所定画像部分から正規化画像を生成していないと判定した場合（ステップＳＴ２４のＮＯ）、抽出部３１は、別の所定画像部分に対し、ステップＳＴ２１〜ステップＳＴ２３の処理を実行する。 When it is determined that the normalized image has not been generated from all the predetermined image portions (NO in step ST24), the extraction unit 31 performs the processes in steps ST21 to ST23 on another predetermined image portion.

一方、全ての所定画像部分から正規化画像を生成したと判定した場合（ステップＳＴ２４のＹＥＳ）、抽出部３１は今回の画像抽出処理を終了させる。 On the other hand, when it is determined that the normalized image has been generated from all the predetermined image portions (YES in step ST24), the extraction unit 31 ends the current image extraction process.

図１５の例では、周辺監視システム１００は、１つの正規化画像を生成した段階でその正規化画像にヘルメット画像が含まれるか否かを判定する。但し、複数の正規化画像を生成した段階でそれら複数の正規化画像のそれぞれにヘルメット画像が含まれるか否かを纏めて判定してもよい。また、全ての正規化画像を生成した段階でそれら全ての正規化画像のそれぞれにヘルメット画像が含まれるか否かを纏めて判定してもよい。 In the example of FIG. 15, the periphery monitoring system 100 determines whether or not a helmet image is included in the normalized image when one normalized image is generated. However, at the stage where a plurality of normalized images are generated, whether or not a helmet image is included in each of the plurality of normalized images may be collectively determined. Further, it may be determined collectively whether or not a helmet image is included in each of all the normalized images at a stage where all the normalized images are generated.

このような画像抽出処理により、周辺監視システム１００は、人検知処理の際のメモリアクセス回数を低減させることができる。ヘルメット画像を見つけ出す処理と識別処理対象画像を抽出する処理を同時に実行できるためである。 By such an image extraction process, the periphery monitoring system 100 can reduce the number of memory accesses during the human detection process. This is because the process of finding out the helmet image and the process of extracting the identification process target image can be executed simultaneously.

また、図１５の画像抽出処理を採用する周辺監視システム１００は、図１３又は図１４の画像抽出処理を採用する場合に比べ、メモリ容量を低減させることができる。図１３又は図１４の画像抽出処理は、少なくとも、正規化される前の画像部分（正規化画像より大きい画像）を記憶できるだけのメモリ容量が必要であるが、図１５の画像抽出処理は正規化画像を最初に生成するため正規化画像を記憶できるだけのメモリ容量があれば実行可能なためである。 Further, the periphery monitoring system 100 that employs the image extraction process of FIG. 15 can reduce the memory capacity compared to the case of employing the image extraction process of FIG. 13 or FIG. The image extraction process of FIG. 13 or FIG. 14 requires at least a memory capacity that can store an image part before normalization (an image larger than the normalized image), but the image extraction process of FIG. This is because the image can be executed if there is enough memory capacity to store the normalized image in order to generate the image first.

また、図１５の画像抽出処理を採用する周辺監視システム１００は、正規化画像にヘルメット画像が含まれるか否かを判定する。そのため、図１３又は図１４の画像抽出処理のように正規化される前の画像にヘルメット画像が含まれるか否かを判定する構成に比べ、ヘルメット画像が含まれるか否かの判定に要する処理時間を短縮できる。探索対象となるヘルメット画像のサイズ及び歪みのばらつきを小さくできるためである。その結果、識別処理対象画像の抽出に要する時間を短縮できる。また、探索対象となるヘルメット画像のサイズ及び歪みのばらつきを小さくできるため、ヘルメット画像が含まれるか否かの判定の精度を高めることができる。 Further, the periphery monitoring system 100 that employs the image extraction process of FIG. 15 determines whether or not a helmet image is included in the normalized image. Therefore, compared with the configuration for determining whether or not a helmet image is included in an image before normalization as in the image extraction process of FIG. 13 or FIG. 14, processing required for determining whether or not a helmet image is included. You can save time. This is because variations in the size and distortion of the helmet image to be searched can be reduced. As a result, the time required for extracting the identification processing target image can be shortened. Moreover, since the variation in the size and distortion of the helmet image to be searched can be reduced, the accuracy of the determination as to whether or not the helmet image is included can be increased.

また、図１５の画像抽出処理を採用する周辺監視システム１００は、人の足位置に対する頭部位置のバラツキが大きい場合（例えば屈んでいる人の画像が撮像画像に含まれる場合）であっても、より確実な人検知を実現できる。正規化画像にヘルメット画像が含まれるか否かを判定する構成を採用しているためである。すなわち、図１３又は図１４の画像抽出処理のようにヘルメット画像の位置に関連する頭部画像位置を用いて識別処理対象画像領域を特定する構成ではなく、正規化の対象となる所定画像部分としての識別処理対象画像領域が頭部画像位置とは無関係に設定されるためである。 Further, the periphery monitoring system 100 that employs the image extraction process of FIG. 15 is a case where the variation in the head position with respect to the human foot position is large (for example, when the image of the crooked person is included in the captured image). , More reliable human detection can be realized. This is because a configuration for determining whether or not the helmet image is included in the normalized image is employed. That is, it is not a configuration in which the identification processing target image region is specified using the head image position related to the position of the helmet image as in the image extraction processing of FIG. 13 or FIG. 14, but as a predetermined image portion to be normalized This is because the image processing target image area is set regardless of the head image position.

次に、図１６を参照し、画像抽出処理の更に別の一例について説明する。なお、図１６は画像抽出処理の更に別の一例の流れを示すフローチャートである。図１６の画像抽出処理は、所定画像部分の一部を正規化した段階でその部分的に正規化された画像（以下、「部分正規化画像」とする。）に特徴画像としてのヘルメット画像が含まれるか否かを判定する点で図１５の画像抽出処理と異なる。図１５の画像抽出処理は、１つの所定画像部分の全部を正規化した段階でその正規化画像にヘルメット画像が含まれるか否かを判定するためである。 Next, another example of the image extraction process will be described with reference to FIG. FIG. 16 is a flowchart showing the flow of still another example of the image extraction process. In the image extraction process of FIG. 16, a helmet image as a feature image is added to a partially normalized image (hereinafter referred to as “partial normalized image”) at a stage where a part of the predetermined image portion is normalized. It differs from the image extraction process of FIG. 15 in that it is determined whether or not it is included. The image extraction process in FIG. 15 is for determining whether or not a helmet image is included in the normalized image when all of one predetermined image portion is normalized.

最初に、抽出部３１は、所定画像部分の一部から部分正規化画像を生成する（ステップＳＴ３１）。そして、抽出部３１は、部分正規化画像を生成した段階でその部分正規化画像にヘルメット画像が含まれるか否かを判定する（ステップＳＴ３２）。本実施例では、抽出部３１は、正規化画像の上半分に対応する所定画像部分の一部を正規化して部分正規化画像を生成した段階で前段画像認識処理によりその部分正規化画像をラスタスキャンしてヘルメット画像を見つけ出す。 First, the extraction unit 31 generates a partially normalized image from a part of the predetermined image portion (step ST31). And the extraction part 31 determines whether a helmet image is contained in the partial normalization image in the step which produced | generated the partial normalization image (step ST32). In the present embodiment, the extraction unit 31 rasterizes the partially normalized image by the pre-stage image recognition process at the stage where the partially normalized image is generated by normalizing a part of the predetermined image portion corresponding to the upper half of the normalized image. Scan to find the helmet image.

部分正規化画像にヘルメット画像が含まれると判定した場合（ステップＳＴ３２のＹＥＳ）、抽出部３１は、所定画像部分の残りの部分を正規化して正規化画像を生成する（ステップＳＴ３３）。本実施例では、抽出部３１は、正規化画像の下半分に対応する所定画像部分の残りの部分を正規化して正規化画像を生成する。そして、抽出部３１は、その正規化画像を識別処理対象画像として抽出する（ステップＳＴ３４）。このとき、識別部３２は、抽出部３１が抽出した識別処理対象画像に含まれる人候補画像が人画像であるかを識別する。但し、識別部３２は、抽出部３１が複数の正規化画像を識別処理対象画像として抽出した後で、それら複数の識別処理対象画像のそれぞれに含まれる人候補画像が人画像であるかを識別してもよい。 If it is determined that the helmet image is included in the partially normalized image (YES in step ST32), the extraction unit 31 normalizes the remaining portion of the predetermined image portion to generate a normalized image (step ST33). In the present embodiment, the extraction unit 31 normalizes the remaining portion of the predetermined image portion corresponding to the lower half of the normalized image to generate a normalized image. Then, the extraction unit 31 extracts the normalized image as an identification processing target image (step ST34). At this time, the identification unit 32 identifies whether the human candidate image included in the identification processing target image extracted by the extraction unit 31 is a human image. However, after the extraction unit 31 extracts a plurality of normalized images as identification processing target images, the identification unit 32 identifies whether the human candidate image included in each of the plurality of identification processing target images is a human image. May be.

部分正規化画像にヘルメット画像が含まれないと判定した場合（ステップＳＴ３２のＮＯ）、抽出部３１は、所定画像部分の残りの部分を正規化して正規化画像を生成することなく処理を進める。 When it is determined that the helmet image is not included in the partially normalized image (NO in step ST32), the extraction unit 31 proceeds with the process without generating a normalized image by normalizing the remaining portion of the predetermined image portion.

その後、抽出部３１は、全ての所定画像部分から部分正規化画像を生成したか否かを判定する（ステップＳＴ３５）。本実施例では、後方カメラ４０Ｂの撮像画像における参照点Ｐｒのそれぞれに対応する識別処理対象画像領域ＴＲｇである所定画像部分の全てから部分正規化画像を生成したか否かを判定する。 Thereafter, the extraction unit 31 determines whether or not a partially normalized image has been generated from all the predetermined image portions (step ST35). In the present embodiment, it is determined whether or not a partial normalized image has been generated from all of the predetermined image portions that are the identification processing target image regions TRg corresponding to the respective reference points Pr in the captured image of the rear camera 40B.

全ての所定画像部分から部分正規化画像を生成していないと判定した場合（ステップＳＴ３５のＮＯ）、抽出部３１は、別の所定画像部分に対し、ステップＳＴ３１〜ステップＳＴ３４の処理を実行する。 When it is determined that the partially normalized image has not been generated from all the predetermined image portions (NO in step ST35), the extraction unit 31 performs the processes in steps ST31 to ST34 on another predetermined image portion.

一方、全ての所定画像部分から部分正規化画像を生成したと判定した場合（ステップＳＴ３５のＹＥＳ）、抽出部３１は今回の画像抽出処理を終了させる。 On the other hand, when it is determined that a partially normalized image has been generated from all the predetermined image portions (YES in step ST35), the extraction unit 31 ends the current image extraction process.

図１６の例では、周辺監視システム１００は、正規化画像の上半分に対応する所定画像部分の一部を正規化して部分正規化画像を生成する。ヘルメット画像は正規化画像の上半分（人の上半身に対応する部分）に含まれる蓋然性が高いためである。但し、正規化画像の上半分より小さい部分に対応する所定画像部分の一部を正規化して部分正規化画像を生成してもよい。例えば、正規化画像の上の１／３にあたる部分に対応する所定画像部分の一部を正規化して部分正規化画像を生成してもよい。或いは、周辺監視システム１００は、正規化画像の別の一部を正規化して部分正規化画像を生成してもよい。例えば、特徴画像としての肩の画像を探索する場合、正規化画像の右半分に対応する所定画像部分の一部を正規化して部分正規化画像を生成してもよく、特徴画像としてのマーカの画像を探索する場合、正規化画像の中央半分に対応する所定画像部分の一部を正規化して部分正規化画像を生成してもよい。 In the example of FIG. 16, the periphery monitoring system 100 normalizes a part of a predetermined image portion corresponding to the upper half of the normalized image to generate a partially normalized image. This is because the helmet image has a high probability of being included in the upper half of the normalized image (the portion corresponding to the upper body of a person). However, a partially normalized image may be generated by normalizing a part of a predetermined image portion corresponding to a portion smaller than the upper half of the normalized image. For example, a partially normalized image may be generated by normalizing a part of a predetermined image portion corresponding to a portion corresponding to 1/3 of the normalized image. Alternatively, the periphery monitoring system 100 may generate a partially normalized image by normalizing another part of the normalized image. For example, when searching for a shoulder image as a feature image, a part of a predetermined image portion corresponding to the right half of the normalized image may be normalized to generate a partially normalized image. When searching for an image, a partially normalized image may be generated by normalizing a part of a predetermined image portion corresponding to the central half of the normalized image.

このような画像抽出処理により、図１６の画像抽出処理を採用する周辺監視システム１００は、図１５の画像抽出処理を採用する場合と同様の効果を実現できる。 With such an image extraction process, the periphery monitoring system 100 that employs the image extraction process of FIG. 16 can achieve the same effects as when the image extraction process of FIG. 15 is employed.

また、図１６の画像抽出処理を採用する周辺監視システム１００は、図１５の画像抽出処理を採用する場合に比べ、人検知処理に要する処理時間を更に短縮できる。部分正規化画像にヘルメット画像が含まれないと判定した場合、その所定画像部分の残りの部分を正規化することなく、次の所定画像部分の正規化を開始できるためである。 In addition, the periphery monitoring system 100 that employs the image extraction process of FIG. 16 can further reduce the processing time required for the human detection process as compared to the case of employing the image extraction process of FIG. This is because, when it is determined that the helmet image is not included in the partial normalized image, normalization of the next predetermined image portion can be started without normalizing the remaining portion of the predetermined image portion.

また、図１３〜図１６のそれぞれにおける画像抽出処理において、抽出部３１は、前段画像認識処理によって、撮像画像の所定画像部分におけるヘルメット画像（厳密にはヘルメットであると推定できる画像）を見つけ出す。そして、抽出部３１は、見つけ出したヘルメット画像に対応する識別処理対象画像を抽出する。 Further, in the image extraction process in each of FIGS. 13 to 16, the extraction unit 31 finds out a helmet image (an image that can be estimated to be a helmet strictly) in a predetermined image portion of the captured image by the preceding image recognition process. And the extraction part 31 extracts the identification process target image corresponding to the found helmet image.

しかしながら、背景画像が複雑な場合、抽出部３１は、ヘルメットの画像ではない画像をヘルメット画像であると誤って認識してしまうことがある。また、ヘルメットの画像をヘルメット画像として認識できずに見逃してしまうことがある。すなわち、抽出部３１は、撮像画像の内容によっては識別処理対象画像の抽出結果の信頼性を低下させてしまう場合がある。ここでのヘルメット画像は、人候補画像として考えることができる。すなわち、人検知結果の信頼性を低下させる場面である。抽出結果の信頼性を低下させてしまう撮像画像は、例えば、夜間環境、逆光環境、日陰環境等で撮像された撮像画像、カメラのレンズが汚れた状態で撮像された撮像画像、フレア、被写体ブレ等のある撮像画像を含む。 However, when the background image is complicated, the extraction unit 31 may erroneously recognize an image that is not a helmet image as a helmet image. Also, the helmet image may be missed without being recognized as a helmet image. That is, the extraction unit 31 may reduce the reliability of the extraction result of the identification processing target image depending on the content of the captured image. The helmet image here can be considered as a human candidate image. That is, it is a scene that reduces the reliability of the human detection result. The captured images that reduce the reliability of the extraction results include, for example, captured images captured in a night environment, backlight environment, shaded environment, and the like, captured images captured with the camera lens dirty, flare, and subject blur. And the like.

そこで、コントローラ３０は、抽出部３１による識別処理対象画像の抽出結果の信頼性が低い状態を人検知に不適な状態（以下、「人検知不適状態」とする。）として検知できるように構成されてもよい。人検知不適状態の検知に応じて種々の対応をとることができるようにするためである。例えば、人検知不適状態が発生していることをショベルの操作者に通知できるようにするためである。或いは、人検知不適状態が発生しているときの画像特徴等を記録して周辺監視システムの改良に有効なデータを選択的に収集できるようにするためである。或いは、人検知不適状態の検知に応じて周辺監視システム１００の動作内容を変更できるようにするためである。周辺監視システム１００の動作内容の変更は、例えば、前段画像認識処理で用いられる各種パラメータの変更、後段画像認識処理で用いられる各種パラメータの変更等を含む。 Therefore, the controller 30 is configured to detect a state where the extraction result of the identification processing target image extracted by the extraction unit 31 is low in reliability as a state unsuitable for human detection (hereinafter referred to as “human detection unsuitable state”). May be. This is because various measures can be taken in response to detection of an unsuitable state of human detection. For example, it is possible to notify the operator of the excavator that a human detection inappropriate state has occurred. Alternatively, it is possible to selectively collect data effective for improvement of the peripheral monitoring system by recording image characteristics and the like when a human detection inappropriate state occurs. Alternatively, the operation content of the periphery monitoring system 100 can be changed in accordance with detection of a person detection inappropriate state. The change in the operation content of the periphery monitoring system 100 includes, for example, a change in various parameters used in the preceding image recognition process, a change in various parameters used in the subsequent image recognition process, and the like.

例えば、コントローラ３０は、所定の評価式を用いて各所定画像部分に関する評価値を算出し、その評価値の分布に基づいて撮像画像が人検知不適状態であるか否かを判定してもよい。具体的には、コントローラ３０は、ヘルメット度の分布に基づいて人検知不適状態が発生しているか否かを判定してもよい。ここでの評価値は、ヘルメット画像、すなわち人候補画像における人らしさの度合い又はその度合いを示すレベルとして捉えることもできる。この評価値には大小の差異があると考えてもよい。 For example, the controller 30 may calculate an evaluation value for each predetermined image portion using a predetermined evaluation formula, and determine whether the captured image is in a human detection inappropriate state based on the distribution of the evaluation values. . Specifically, the controller 30 may determine whether or not a human detection inappropriate state has occurred based on the helmet degree distribution. The evaluation value here can also be regarded as a degree of humanity in a helmet image, that is, a person candidate image, or a level indicating the degree. It may be considered that there is a difference between the evaluation values.

「ヘルメット度」は、評価値の一例であり、撮像画像の所定画像部分における判定対象画像のヘルメットらしさを表す値である。探索対象がヘルメットではなく頭である場合には頭らしさを表す頭度が評価値の別の一例として用いられる。 “Helmet degree” is an example of an evaluation value, and is a value representing the helmet-likeness of the determination target image in the predetermined image portion of the captured image. When the search target is not a helmet but a head, the degree of headness representing headiness is used as another example of the evaluation value.

「判定対象画像」は、特徴画像（ヘルメット画像）であるか否かの判定対象となった画像を意味する。図１３〜図１６のそれぞれにおける画像抽出処理での「ヘルメット画像」は、判定対象画像のうち、所定の採否判定値（例えばゼロである。）以上のヘルメット度を有するものを意味する。或いは、所定の採否判定値以上のヘルメット度を有する判定対象画像をヘルメット度が大きい順に並べたときの上位の所定数のものであってもよい。なお、判定対象画像は、採否判定値未満のヘルメット度を有する画像（ヘルメット画像以外の画像）を含む。仕様として選択可能な所定ヘルメット度が採否判定値として設定される。採否判定値は、抽出部３１が取得した判定対象画像のヘルメット度と比較される。 The “determination target image” means an image that is a determination target of whether or not it is a feature image (helmet image). The “helmet image” in the image extraction process in each of FIGS. 13 to 16 means a determination target image having a helmet degree equal to or higher than a predetermined acceptance / rejection determination value (for example, zero). Alternatively, it may be a predetermined upper number when the determination target images having a helmet degree equal to or higher than a predetermined acceptance / rejection determination value are arranged in descending order of the helmet degree. The determination target image includes an image (an image other than the helmet image) having a helmet degree less than the acceptance determination value. A predetermined helmet degree selectable as a specification is set as the acceptance / rejection determination value. The acceptance / rejection determination value is compared with the helmet degree of the determination target image acquired by the extraction unit 31.

採否判定値は、判定対象画像をヘルメット画像として採用するか否かを判定するための値である。判定対象画像は、例えば、評価値としてのヘルメット度が採否判定値以上であればヘルメット画像として採用される。採否判定値は、例えば、機械学習、統計分析等により予め設定される。採否判定値は、夜間環境、逆光環境、日陰環境等のショベルの作業環境毎に別々に設定されていてもよい。ショベルの作業環境は、例えば、キャビン１０内の入力装置４２を用いて入力されてもよい。 The acceptance / rejection determination value is a value for determining whether or not to adopt the determination target image as a helmet image. For example, if the helmet degree as the evaluation value is equal to or higher than the acceptance / rejection determination value, the determination target image is adopted as the helmet image. The acceptance / rejection determination value is set in advance by machine learning, statistical analysis, or the like, for example. The acceptance / rejection determination value may be set separately for each work environment of the excavator such as a night environment, a backlight environment, and a shade environment. The work environment of the excavator may be input using, for example, the input device 42 in the cabin 10.

ヘルメット度は、例えば、後段画像認識処理で用いられるＨＯＧ特徴量の算出よりも低い演算コストで算出される画像特徴量に基づいて導き出される。例えば、ヘルメット度は、ＨＯＧ特徴量の次元数よりも低い次元数の画像特徴量をランダムフォレスト等の機械学習アルゴリズムに入力して重回帰分析を行うことで導き出される。ヘルメット度は、例えば、−１〜＋１の間の実数として導き出され、＋１に近いほどヘルメットらしさが強く、−１に近いほどヘルメットらしさが弱いことを表す。「頭画像」に関する頭らしさを表す「頭度」等の他の特徴画像に関する特徴画像らしさを表す評価値についても同様である。 The degree of helmet is derived based on, for example, an image feature amount calculated at a calculation cost lower than that of the HOG feature amount used in the subsequent image recognition process. For example, the degree of helmet is derived by performing multiple regression analysis by inputting an image feature quantity having a dimension number lower than that of the HOG feature quantity into a machine learning algorithm such as a random forest. The degree of helmet is derived, for example, as a real number between −1 and +1. The closer to +1, the stronger the helmet, and the closer to −1, the weaker the helmet. The same applies to the evaluation value representing the feature image likelihood of other feature images such as “head degree” representing the head likelihood related to the “head image”.

次に図１７を参照し、ヘルメット度の分布に基づいて撮像画像が人検知不適状態であるか否かをコントローラ３０が判定する処理（以下、「人検知適否判定処理」とする。）について説明する。図１７は人検知適否判定処理の一例の流れを示すフローチャートである。 Next, referring to FIG. 17, a process in which the controller 30 determines whether the captured image is in a person detection inappropriate state based on the helmet degree distribution (hereinafter, referred to as “person detection suitability determination process”) will be described. To do. FIG. 17 is a flowchart showing an exemplary flow of the human detection suitability determination process.

コントローラ３０は、各所定画像部分における判定対象画像のヘルメット度を画像抽出処理の際に算出して内部メモリ等に記憶する。ヘルメット度は、例えば、所定順（降順）で並べ替えできるように記憶されてもよい。そして、コントローラ３０は、例えば、所定数の所定画像部分における判定対象画像のヘルメット度を記憶した時点で人検知適否判定処理の実行を開始し、その後はヘルメット度を新たに算出する度に人検知適否判定処理の実行を繰り返す。 The controller 30 calculates the helmet degree of the determination target image in each predetermined image portion at the time of image extraction processing, and stores it in an internal memory or the like. The helmet degree may be stored so that it can be rearranged in a predetermined order (descending order), for example. Then, for example, the controller 30 starts executing the person detection suitability determination process at the time when the degree of helmet of the determination target image in a predetermined number of predetermined image portions is stored, and then the person detection is performed every time the degree of helmet is newly calculated. The execution of the suitability determination process is repeated.

また、コントローラ３０は、画像抽出処理と人検知適否判定処理とを並列的に実行する。画像抽出処理で算出されるヘルメット度を人検知適否判定処理で利用するためである。但し、画像抽出処理が完了した後で人検知適否判定処理を実行してもよい。画像抽出処理は、例えば、図１３〜図１６に示す画像抽出処理の何れかが採用される。 Further, the controller 30 executes the image extraction process and the human detection suitability determination process in parallel. This is because the degree of helmet calculated in the image extraction process is used in the human detection suitability determination process. However, the person detection suitability determination process may be executed after the image extraction process is completed. As the image extraction process, for example, any of the image extraction processes shown in FIGS.

図１７に示すように、コントローラ３０はまずｎ番目に高いヘルメット度を取得する（ステップＳＴ４１）。コントローラ３０は、例えば、内部メモリに記憶されたヘルメット度を参照し、所定順位のヘルメット度であるｎ番目に高いヘルメット度を取得する。「ｎ」は、適宜設定される１以上の整数であり、例えば、所定画像部分（判定対象画像）の総数の２％に相当する値が採用される。内部メモリには、例えば、判定対象画像のヘルメット度が降順に並べられて記憶されている。 As shown in FIG. 17, the controller 30 first acquires the nth highest helmet degree (step ST41). For example, the controller 30 refers to the helmet degree stored in the internal memory, and obtains the nth highest helmet degree that is the helmet degree of a predetermined rank. “N” is an integer of 1 or more set as appropriate, and for example, a value corresponding to 2% of the total number of predetermined image portions (determination target images) is adopted. For example, the helmet degrees of the determination target images are stored in the internal memory in the descending order.

そして、ｎ番目に高いヘルメット度が所定の適否判定値以上であるか否かを判定する（ステップＳＴ４２）。仕様として選択可能な所定ヘルメット度が適否判定値として設定される。ｎ番目に高いヘルメット度は、適否判定値以上であるか判定される。 Then, it is determined whether or not the nth highest helmet degree is equal to or greater than a predetermined suitability determination value (step ST42). A predetermined helmet degree selectable as a specification is set as a suitability determination value. It is determined whether the nth highest helmet degree is equal to or higher than the suitability determination value.

適否判定値は、撮像画像が人検知不適状態であるか人検知適合状態であるかを判定するための値である。採否判定値と同様に、適否判定値は、例えば、機械学習、統計分析等により予め設定される。適否判定値は、夜間環境、逆光環境、日陰環境等のショベルの作業環境毎に別々に設定されていてもよい。 The suitability determination value is a value for determining whether the captured image is in a human detection inappropriate state or a human detection compatible state. Similar to the acceptance / rejection determination value, the suitability determination value is set in advance by machine learning, statistical analysis, or the like, for example. The suitability determination value may be set separately for each work environment of the excavator such as a night environment, a backlight environment, and a shade environment.

ｎ番目に高いヘルメット度が適否判定値以上であると判定した場合（ステップＳＴ４２のＹＥＳ）、コントローラ３０は、撮像画像が人検知不適状態であると判定する（ステップＳＴ４３）。適否判定値が採否判定値以上であれば、「ｎ番目に高いヘルメット度が適否判定値以上である」という条件が満たされる状態は、適否判定値以上のヘルメット度を有するヘルメット画像がｎ個以上存在する状態を意味する。図１７の例では、ヘルメット画像がｎ個以上存在する撮像画像の状態を人検知不適状態と定めている。「ｎ」は、抽出部３１から識別部３２に送られる人候補画像数の基準と考えることができる。すなわち、識別部３２で処理すべき人候補画像が多いか少ないかの判断基準となる。 When it is determined that the n-th highest helmet degree is equal to or higher than the suitability determination value (YES in step ST42), the controller 30 determines that the captured image is in a human detection inappropriate state (step ST43). If the suitability determination value is greater than or equal to the acceptance determination value, the condition that “the nth highest helmet degree is greater than or equal to the suitability determination value” is satisfied. Means an existing state. In the example of FIG. 17, a state of a captured image in which n or more helmet images exist is defined as a person detection inappropriate state. “N” can be considered as a reference for the number of human candidate images sent from the extraction unit 31 to the identification unit 32. That is, this is a criterion for determining whether there are many or few candidate images to be processed by the identification unit 32.

撮像画像が人検知不適状態であると判定した場合、コントローラ３０は撮像画像が人検知不適状態である旨をショベルの操作者に通知してもよい。例えば、コントローラ３０は、出力装置５０としての車載ディスプレイに制御指令を出力してその旨を伝えるテキストメッセージを表示させてもよい。或いは、コントローラ３０は、出力装置５０としての車載スピーカに制御指令を出力してその旨を伝える音声メッセージを音声出力させてもよい。 When it is determined that the captured image is in the human detection inappropriate state, the controller 30 may notify the shovel operator that the captured image is in the human detection inappropriate state. For example, the controller 30 may output a control command to an in-vehicle display as the output device 50 and display a text message informing that. Alternatively, the controller 30 may output a control command to a vehicle-mounted speaker serving as the output device 50 and output a voice message to that effect.

また、撮像画像が人検知不適状態であると判定した場合、コントローラ３０は、周辺監視システムによる人検知を中止させてもよい。信頼性の低い抽出結果に基づく人検知結果をショベルの操作者に提示しないようにするためである。 In addition, when it is determined that the captured image is in a person detection inappropriate state, the controller 30 may stop the person detection by the periphery monitoring system. This is to prevent the person detection result based on the extraction result with low reliability from being presented to the operator of the excavator.

また、撮像画像が人検知不適状態であると判定した場合、コントローラ３０は、その撮像画像をＮＶＲＡＭ等に記録してもよい。周辺監視システムの改良に利用できるようにするためである。 In addition, when it is determined that the captured image is in a human detection inappropriate state, the controller 30 may record the captured image in NVRAM or the like. This is so that it can be used to improve the peripheral monitoring system.

ｎ番目に高いヘルメット度が適否判定値未満であると判定した場合（ステップＳＴ４２のＮＯ）、コントローラ３０は、撮像画像が人検知に適した状態（以下、「人検知適合状態」とする。）であると判定する（ステップＳＴ４４）。適否判定値が採否判定値以上であれば、「ｎ番目に高いヘルメット度が適否判定値未満である」という条件が満たされる状態は、適否判定値以上のヘルメット度を有するヘルメット画像がｎ個未満である状態を意味する。図１７の例から考えると、この場合は人検知不適状態とはならない。識別器で処理すべき人候補画像の数が、基準である「ｎ」未満だからである。 When it is determined that the nth highest helmet degree is less than the suitability determination value (NO in step ST42), the controller 30 is in a state where the captured image is suitable for human detection (hereinafter referred to as “human detection conforming state”). Is determined (step ST44). If the suitability determination value is greater than or equal to the acceptance determination value, the condition that “the nth highest helmet degree is less than the suitability determination value” is satisfied. Means a state. Considering the example of FIG. 17, in this case, the human detection inappropriate state is not achieved. This is because the number of person candidate images to be processed by the classifier is less than the reference “n”.

次に図１８を参照し、人検知適合状態及び人検知不適状態のそれぞれの特徴について説明する。図１８は、ある環境下で撮像された撮像画像における全ての所定画像部分のそれぞれから導き出された−１〜＋１の間の実数値を有するヘルメット度の度数分布図（ヒストグラム）である。縦軸の度数は、ヘルメット度に対応する検出数と考えることができる。図１８（Ａ）は人検知適合状態のときの度数分布図を示し、図１８（Ｂ）は人検知不適状態のときの度数分布図を示す。図１８（Ａ）及び図１８（Ｂ）は、明瞭性を維持するため、各ビンの図示を省略して度数曲線のみを示す。度数曲線と横軸とで囲まれた範囲は、撮像画像における所定画像部分の総数を表す。 Next, with reference to FIG. 18, the characteristics of the human detection conformity state and the human detection inappropriate state will be described. FIG. 18 is a frequency distribution diagram (histogram) of helmet degrees having real values between −1 and +1 derived from each of all predetermined image portions in a captured image captured in a certain environment. The frequency on the vertical axis can be considered as the detection number corresponding to the helmet degree. FIG. 18A shows a frequency distribution diagram when the human detection suitability state, and FIG. 18B shows a frequency distribution diagram when the human detection inappropriate state. In FIG. 18A and FIG. 18B, in order to maintain clarity, the illustration of each bin is omitted and only the frequency curve is shown. A range surrounded by the frequency curve and the horizontal axis represents the total number of predetermined image portions in the captured image.

図１８（Ａ）及び図１８（Ｂ）において、ヘルメット度が＋０．５（第１基準）のところに引かれた一点鎖線は適否判定値の位置を示し、ヘルメット度が＋０．２（第２基準）のところに引かれた二点鎖線は採否判定値の位置を示す。したがって、ヘルメット度が＋０．２以上の判定対象画像はヘルメット画像として採用され、ヘルメット度が＋０．２未満の判定対象画像はヘルメット画像として採用されない。以下では、判定対象画像のうちヘルメット画像として採用されなかったものを非ヘルメット画像と称する。なお、第１基準及び第２基準はそれぞれ変更され得る。図１８（Ａ）のヒストグラムのうち、＋０．２未満のヘルメット度を有する判定対象画像は人候補画像としては抽出されない。＋０．２以上のヘルメット度を有する判定対象画像は抽出部３１により人候補画像として抽出されたといえる。抽出された人候補画像は、識別部によって処理される識別処理対象画像として取り扱われる。なお、抽出部３１による識別処理対象画像の抽出は複数の段階で行われてもよい。そして、識別部３２は、抽出部３１が抽出した識別処理対象画像の全てに対して画像認識処理を施す。 18A and 18B, the alternate long and short dash line drawn when the helmet degree is +0.5 (first reference) indicates the position of the suitability determination value, and the helmet degree is +0.2 (second). A two-dot chain line drawn at (reference) indicates the position of the acceptance / rejection determination value. Therefore, a determination target image with a helmet degree of +0.2 or more is adopted as a helmet image, and a determination target image with a helmet degree of less than +0.2 is not adopted as a helmet image. Below, what was not employ | adopted as a helmet image among determination object images is called a non-helmet image. The first standard and the second standard can be changed. In the histogram of FIG. 18A, a determination target image having a helmet degree of less than +0.2 is not extracted as a person candidate image. It can be said that the determination target image having a helmet degree of +0.2 or more was extracted as a human candidate image by the extraction unit 31. The extracted person candidate image is handled as an identification processing target image processed by the identification unit. Note that the extraction processing target image may be extracted by the extraction unit 31 in a plurality of stages. Then, the identification unit 32 performs image recognition processing on all of the identification processing target images extracted by the extraction unit 31.

図１８（Ａ）では、ｎ番目に高いヘルメット度は−０．３であり、適否判定値（＋０．５）より小さい。すなわち、「ｎ番目に高いヘルメット度が適否判定値未満である」という条件が満たされる。適否判定値より大きいヘルメット度を有するヘルメット画像がｎ個未満である状態を意味する。基準「ｎ」の観点から考えると、識別部３２で処理すべき識別処理対象画像の数が少ない人検知適合状態ということができる。なお、ｎ番目に高いヘルメット度を有する判定対象画像が非ヘルメット画像である場合に人検知適合状態であるということもできる。したがって、コントローラ３０は、現に取得した撮像画像が人検知適合状態であると判定する。 In FIG. 18A, the nth highest helmet degree is −0.3, which is smaller than the suitability determination value (+0.5). That is, the condition that “the nth highest helmet degree is less than the suitability determination value” is satisfied. This means that there are less than n helmet images having a helmet degree larger than the suitability determination value. From the viewpoint of the criterion “n”, it can be said that the human detection suitability state has a small number of identification processing target images to be processed by the identification unit 32. In addition, it can also be said that it is a person detection conformity state when the determination target image having the nth highest helmet degree is a non-helmet image. Therefore, the controller 30 determines that the actually acquired captured image is in the human detection compatible state.

このように、図１８（Ａ）に示すようなヘルメット度の度数分布をもたらす撮像画像は、見つけ出されたヘルメット画像の数が少なく、非ヘルメット画像の数が多いという特徴を有する。そのため識別部３２で処理する処理対象画像は相対的に少ない。 As described above, the captured image that provides the frequency distribution of the helmet degree as shown in FIG. 18A has a feature that the number of found helmet images is small and the number of non-helmet images is large. Therefore, there are relatively few processing target images to be processed by the identification unit 32.

図１８（Ｂ）では、ｎ番目に高いヘルメット度は＋０．６であり、適否判定値（＋０．５）より大きい。すなわち、「ｎ番目に高いヘルメット度が適否判定値以上である」という条件が満たされる。適否判定値より大きいヘルメット度を有するヘルメット画像がｎ個以上である状態を意味する。基準「ｎ」の観点から考えると、識別部３２で処理すべき識別処理対象画像の数が多い、人検知不適状態ということができる。なお、ｎ番目に高いヘルメット度を有する判定対象画像がヘルメット画像である場合に人検知不適状態であるということもできる。したがって、コントローラ３０は、現に取得した撮像画像が人検知不適状態であると判定する。 In FIG. 18B, the nth highest helmet degree is +0.6, which is larger than the suitability determination value (+0.5). That is, the condition that “the nth highest helmet degree is equal to or higher than the suitability determination value” is satisfied. It means a state where there are n or more helmet images having a helmet degree larger than the suitability determination value. Considering from the viewpoint of the criterion “n”, it can be said that the number of identification processing target images to be processed by the identification unit 32 is large and the human detection inappropriate state. In addition, it can also be said that it is a person detection unsuitable state when the determination target image which has the nth highest helmet degree is a helmet image. Therefore, the controller 30 determines that the actually acquired captured image is in a human detection inappropriate state.

このように、図１８（Ｂ）に示すようなヘルメット度の度数分布をもたらす撮像画像は、見つけ出されたヘルメット画像の数が多く、非ヘルメット画像の数が少ないという特徴を有する。すなわち、誤認識されたヘルメット画像の数が多いという特徴を有する。そのため識別部３２で処理する処理対象画像は相対的に多い。 As described above, the captured image that provides the frequency distribution of the helmet degree as shown in FIG. 18B has a feature that the number of found helmet images is large and the number of non-helmet images is small. In other words, the number of erroneously recognized helmet images is large. Therefore, there are relatively many processing target images to be processed by the identification unit 32.

上述の通り、コントローラ３０は、撮像装置４０の撮像画像のみに基づいて撮像画像が人検知不適状態であるか人検知適合状態であるかを判定できる。具体的には、抽出部３１が抽出した識別処理対象画像の数に基づいて撮像画像が人検知に不適な状態であるか否かを判定できる。この特徴によれば、人検知部は前段（抽出部３１）と後段（識別部３２）とを含み、後段の演算コストのほうが高いことは後段の負担を減らす契機となる。また、後段の処理に時間が掛かる場面であるかを判断することができる。 As described above, the controller 30 can determine whether the captured image is in the human detection inappropriate state or the human detection compatible state based only on the captured image of the imaging device 40. Specifically, it can be determined whether or not the captured image is unsuitable for human detection based on the number of identification processing target images extracted by the extraction unit 31. According to this feature, the human detection unit includes a front stage (extraction unit 31) and a rear stage (identification unit 32), and the fact that the calculation cost of the rear stage is higher is an opportunity to reduce the burden on the rear stage. In addition, it is possible to determine whether it is a scene that takes time for subsequent processing.

また、周辺監視システム１００は、撮像画像が人検知不適状態であると判定した場合にその旨をショベルの操作者に通知することで、現在の人検知結果に過度に依存しないよう注意を喚起できる。 In addition, when the periphery monitoring system 100 determines that the captured image is in a human detection inappropriate state, the periphery monitoring system 100 can alert the operator of the excavator to that effect so as not to excessively depend on the current human detection result. .

或いは、周辺監視システム１００は、撮像画像が人検知不適状態であると判定した場合に人検知を中止することで、信頼性の低い抽出結果に基づく人検知結果の提示を防止できる。 Or the periphery monitoring system 100 can prevent the presentation of the human detection result based on the extraction result with low reliability by stopping the human detection when it is determined that the captured image is in the human detection inappropriate state.

或いは、周辺監視システム１００は、撮像画像が人検知不適状態であると判定した場合にその撮像画像をＮＶＲＡＭ等に記録することで、その撮像画像を周辺監視システムの改良に利用できる。 Alternatively, the periphery monitoring system 100 can use the captured image for improvement of the periphery monitoring system by recording the captured image in NVRAM or the like when it is determined that the captured image is in a human detection inappropriate state.

以上、本発明の好ましい実施例について詳説したが、本発明は、上述した実施例に制限されることはなく、本発明の範囲を逸脱することなしに上述した実施例に種々の変形及び置換を加えることができる。 Although the preferred embodiments of the present invention have been described in detail above, the present invention is not limited to the above-described embodiments, and various modifications and substitutions can be made to the above-described embodiments without departing from the scope of the present invention. Can be added.

例えば、上述の実施例では、周辺監視システム１００は、人の一部を表すヘルメットの画像を特徴画像として利用しながら識別処理対象画像を抽出したが、人の全身の画像を特徴画像として利用しながら識別処理対象画像を抽出してもよい。 For example, in the above-described embodiment, the periphery monitoring system 100 extracts the identification processing target image while using the image of the helmet representing a part of the person as the feature image, but uses the whole body image of the person as the feature image. However, the identification processing target image may be extracted.

また、評価値は、撮像画像の一部が人の画像であるかの指標として広く用いられてもよい。評価値としてのヘルメット度は人らしさを表す値と言うこともできる。 The evaluation value may be widely used as an index as to whether a part of the captured image is a human image. It can be said that the helmet degree as the evaluation value is a value representing humanity.

また、上述の実施例では、ショベルの上部旋回体３の上に取り付けられる撮像装置４０の撮像画像を用いて人を検知する場合を想定するが、本発明はこの構成に限定されるものではない。移動式クレーン、固定式クレーン、リフマグ機、フォークリフト等の他の作業機械の本体部に取り付けられる撮像装置の撮像画像を用いる構成にも適用され得る。 In the above-described embodiment, it is assumed that a person is detected using an image captured by the imaging device 40 mounted on the upper swing body 3 of the excavator. However, the present invention is not limited to this configuration. . The present invention can also be applied to a configuration that uses a captured image of an imaging device that is attached to the main body of another work machine such as a mobile crane, a fixed crane, a riff mag machine, or a forklift.

また、上述の実施例では、３つのカメラを用いてショベルの死角領域を撮像するが、１つ、２つ、又は４つ以上のカメラを用いてショベルの死角領域を撮像してもよい。 In the above-described embodiment, the excavator blind spot area is imaged using three cameras, but the excavator blind spot area may be imaged using one, two, or four or more cameras.

また、上述の実施例では、複数の撮像画像のそれぞれに対して個別に人検知処理が適用されるが、複数の撮像画像から生成される１つの合成画像に対して人検知処理が適用されてもよい。 In the above-described embodiment, the human detection process is individually applied to each of the plurality of captured images. However, the human detection process is applied to one composite image generated from the plurality of captured images. Also good.

１・・・下部走行体２・・・旋回機構３・・・上部旋回体４・・・ブーム５・・・アーム６・・・バケット７・・・ブームシリンダ８・・・アームシリンダ９・・・バケットシリンダ１０・・・キャビン３０・・・コントローラ３１・・・抽出部３２・・・識別部３３・・・追跡部３４・・・人検知部３５・・・制御部４０・・・撮像装置４０Ｂ・・・後方カメラ４０Ｌ・・・左側方カメラ４０Ｒ・・・右側方カメラ４２・・・入力装置５０・・・出力装置５１・・・機械制御装置１００・・・周辺監視システムＡＰ、ＡＰ１〜ＡＰ６・・・頭部画像位置ＢＸ・・・ボックスＨＤ・・・頭部ＨＰ・・・仮想頭部位置ＨＲｇ・・・ヘルメット画像Ｍ１、Ｍ２・・・マスク領域Ｐｒ、Ｐｒ１、Ｐｒ２、Ｐｒ１０〜Ｐｒ１２・・・参照点Ｒ１・・・はみ出し領域Ｒ２・・・車体映り込み領域ＲＰ・・・代表位置ＴＲ、ＴＲ１、ＴＲ２、ＴＲ１０〜ＴＲ１２・・・仮想平面領域ＴＲｇ、ＴＲｇ３、ＴＲｇ４、ＴＲｇ５・・・識別処理対象画像領域ＴＲｇｔ、ＴＲｇｔ３、ＴＲｇｔ４、ＴＲｇｔ５・・・正規化画像 DESCRIPTION OF SYMBOLS 1 ... Lower traveling body 2 ... Turning mechanism 3 ... Upper turning body 4 ... Boom 5 ... Arm 6 ... Bucket 7 ... Boom cylinder 8 ... Arm cylinder 9 ... Bucket cylinder 10: cabin 30 ... controller 31 ... extraction unit 32 ... identification unit 33 ... tracking unit 34 ... human detection unit 35 ... control unit 40 ... imaging device 40B: Rear camera 40L: Left camera 40R: Right camera 42 ... Input device 50 ... Output device 51 ... Machine control device 100 ... Perimeter monitoring system AP, AP1 AP6 ... head image position BX ... box HD ... head HP ... virtual head position HRg ... helmet image M1, M2 ... mask regions Pr, Pr1, Pr2, r10 to Pr12 ... Reference point R1 ... Projection area R2 ... Car body reflection area RP ... Representative position TR, TR1, TR2, TR10 to TR12 ... Virtual plane area TRg, TRg3, TRg4, TRg5 ... Identification processing target image area TRgt, TRgt3, TRgt4, TRgt5 ... Normalized image

Claims

A work machine periphery monitoring system that detects a person existing around the work machine using a captured image of an image pickup device attached to the work machine, wherein a plurality of predetermined image portions from the captured image are used as identification processing target images. An extraction unit that extracts, and an identification unit that identifies whether an image included in the identification processing target image is a human image by image recognition processing;
When the number of the identification processing target images extracted by the extraction unit is greater than a predetermined reference, it is determined that the captured image is in an unsuitable state for human detection.
Perimeter monitoring system for work machines.

Arranging the evaluation values calculated for each of the plurality of predetermined image portions in descending order,
When the evaluation value in a predetermined order is equal to or higher than a predetermined suitability determination value, it is determined that the captured image is unsuitable for human detection.
The work machine periphery monitoring system according to claim 1.

The evaluation value is calculated based on the image feature amount of each predetermined image portion, and represents the humanity of each predetermined image portion,
The extraction unit extracts a predetermined image portion having the evaluation value equal to or higher than a predetermined acceptance / rejection determination value as the identification processing target image.
The work machine periphery monitoring system according to claim 2.

The suitability determination value is not less than the acceptance determination value.
The work machine periphery monitoring system according to claim 3.

The calculation cost required for calculating the image feature amount of each predetermined image portion is lower than the calculation cost required for calculating the image feature amount used in the image recognition processing executed by the identification unit.
The work machine periphery monitoring system according to claim 3 or 4.

When it is determined that the captured image is in an unsuitable state for human detection, a notification to that effect is given.
The work machine periphery monitoring system according to any one of claims 1 to 5.

Recording the captured image when it is determined that the captured image is unsuitable for human detection;
The work machine periphery monitoring system according to any one of claims 1 to 6.