JP2018036848A

JP2018036848A - Object state estimation system, object state estimation device, object state estimation method and object state estimation program

Info

Publication number: JP2018036848A
Application number: JP2016169405A
Authority: JP
Inventors: ウィドドアリ; Widodo Ari; 浩史土井; Hiroshi Doi
Original assignee: Denso IT Laboratory Inc
Current assignee: Denso IT Laboratory Inc
Priority date: 2016-08-31
Filing date: 2016-08-31
Publication date: 2018-03-08
Anticipated expiration: 2036-08-31
Also published as: JP6617085B2

Abstract

PROBLEM TO BE SOLVED: To provide an object detection system capable of automatically generating learning data necessary for updating a dictionary.SOLUTION: An object state estimation system 1 comprises: a camera 10 for taking an image to generate a photographic image; an object detection section 22 for using a person dictionary and detecting a person as a specific detection target object from the photographic image to estimate its state (an orientation); a movement information calculation section 23 for calculating movement information of a movement object in the photographic image; a learning data generation section 24 for generating the learning data by determining the state of the movement object on the basis of the movement information calculated by the movement information calculation section 23; and an additional learning section 25 for updating the person dictionary on the basis of the learning data.SELECTED DRAWING: Figure 1

Description

本発明は、カメラで撮影した画像から特定の検出対象物体を検出してその状況を推定する物体状況推定システム、物体状況推定装置、物体状況推定方法、及び物体状況推定プログラムに関する。 The present invention relates to an object situation estimation system, an object situation estimation apparatus, an object situation estimation method, and an object situation estimation program that detect a specific detection target object from an image captured by a camera and estimate the situation.

交差点等にカメラを設置して、カメラで撮影した画像（撮影画像）から人物（歩行者）等の特定の検出対象物体を検出して、その状況（例えば、道路を渡っている等）を推定する物体状況推定システムが知られている。また、撮影画像から人物を検出する技術として、機械学習技術が有望であることが知られている。 Install a camera at an intersection, etc., detect a specific detection target object such as a person (pedestrian) from an image (captured image) taken with the camera, and estimate its situation (for example, across a road, etc.) An object situation estimation system is known. Further, it is known that machine learning technology is promising as a technology for detecting a person from a captured image.

特開２０１５−９０６７９号公報Japanese Patent Laying-Open No. 2015-90679

カメラの設置位置は各所で異なっており、各カメラにおいて撮影画像に映る検出すべき人物の見え方（撮影方向、大きさ等）も異なっている。したがって、人物の検出に機械学習技術を用いる場合には、各カメラについて、その設置位置に応じた辞書を用意する必要がある。 The installation position of the camera is different in each place, and the appearance (photographing direction, size, etc.) of the person to be detected reflected in the captured image in each camera is also different. Therefore, when using a machine learning technique for detecting a person, it is necessary to prepare a dictionary corresponding to the installation position of each camera.

しかしながら、カメラごとに機械学習技術のための辞書を用意することは、費用面からも、工数面からも困難である。辞書を更新する機械学習はすでに提案されているが、辞書を更新するのに必要な学習データを用意することはやはり容易ではない。 However, it is difficult to prepare a dictionary for machine learning technology for each camera from the viewpoint of cost and man-hour. Machine learning for updating a dictionary has already been proposed, but it is still not easy to prepare learning data necessary for updating the dictionary.

本発明は、上記の問題点に鑑みてなされたものであり、辞書を更新するのに必要な学習データを自動で生成できる物体状況推定システム、物体状況推定装置、物体状況推定方法、及び物体状況推定プログラムを提供することを目的とする。 The present invention has been made in view of the above problems, and an object situation estimation system, an object situation estimation apparatus, an object situation estimation method, and an object situation capable of automatically generating learning data necessary for updating a dictionary. The purpose is to provide an estimation program.

本発明の一態様の物体状況推定システムは、撮影をして撮影画像を生成するカメラと、辞書を利用して、前記撮影画像から特定の検出対象物体の検出、及び前記検出対象物体の状況の推定をする物体検出部と、前記撮影画像における移動物体の移動情報を算出する移動情報算出部と、前記移動情報に基づいて、前記移動物体の状況を判定することで、学習データを生成する学習データ生成部と、前記学習データに基づいて前記辞書を更新する追加学習部とを備えた構成を有している。 An object situation estimation system according to an aspect of the present invention uses a camera that shoots and generates a photographed image and a dictionary to detect a specific detection target object from the photographed image, and to detect the situation of the detection target object. Learning that generates learning data by determining an object detection unit that performs estimation, a movement information calculation unit that calculates movement information of a moving object in the captured image, and a situation of the moving object based on the movement information It has a configuration including a data generation unit and an additional learning unit that updates the dictionary based on the learning data.

この構成により、辞書を利用して撮影画像から特定の検出対象物体を検出し、かつその検出対象物体の状況を推定するシステムにおいて、辞書を更新するのに必要な学習データを撮影画像から自ら自動で生成できる。 With this configuration, in a system that uses a dictionary to detect a specific detection target object from a captured image and estimates the status of the detection target object, the learning data necessary to update the dictionary is automatically generated from the captured image. Can be generated.

上記の物体状況推定システムにおいて、前記検出対象物体及び前記移動物体の状況は、前記検出対象物体及び前記移動物体の向きであってよい。 In the object state estimation system, the state of the detection target object and the moving object may be a direction of the detection target object and the moving object.

この構成により、学習データに必要な向きの情報を移動情報から判定することができる。 With this configuration, it is possible to determine the orientation information necessary for the learning data from the movement information.

上記の物体状況推定システムは、前記撮影画像から物体候補領域を抽出する物体候補領域抽出部をさらに備えていてよく、前記物体検出部は、前記撮影画像のうちの前記物体候補領域を含む一部領域から前記検出対象物体を検出してよい。 The object situation estimation system may further include an object candidate region extraction unit that extracts an object candidate region from the captured image, and the object detection unit includes a part of the captured image that includes the object candidate region. The detection target object may be detected from the region.

この構成により、物体候補領域を含む一部領域について検出対象物体の検出を行うので、物体検出の処理負荷を軽減して、処理速度を向上できる。 With this configuration, the detection target object is detected in a partial region including the object candidate region, so that the processing load of the object detection can be reduced and the processing speed can be improved.

上記の物体状況推定システムは、前記撮影画像から物体候補領域を抽出する物体候補領域抽出部をさらに備えていてよく、前記移動情報算出部は、前記移動物体の移動情報として、前記物体候補領域の移動情報を算出してよい。 The object situation estimation system may further include an object candidate area extraction unit that extracts an object candidate area from the captured image, and the movement information calculation unit uses the object candidate area as the movement information of the moving object. The movement information may be calculated.

この構成により、物体候補領域を移動物体としてその移動情報を算出するので、物体検出部で検出対象物体として検出されない移動物体についても、負例データとして学習データを生成できる。 With this configuration, since the movement information is calculated using the object candidate region as a moving object, learning data can be generated as negative example data for a moving object that is not detected as a detection target object by the object detection unit.

上記の物体状況推定システムにおいて、前記カメラは、連続的に撮影をして複数フレームの前記撮影画像を生成してよく、前記物体候補領域抽出部は、複数フレームの前記撮影画像の差分画像に基づいて前記物体候補領域を抽出してよい。 In the object situation estimation system, the camera may continuously shoot and generate a plurality of frames of the captured image, and the object candidate area extraction unit may be based on a difference image of the captured images of a plurality of frames. The object candidate area may be extracted.

この構成により、物体候補領域を抽出する処理負荷を軽減して、処理速度を向上できる。 With this configuration, it is possible to reduce the processing load for extracting the object candidate region and improve the processing speed.

上記の物体状況推定システムにおいて、前記学習データ生成部は、前記移動物体が前記検出対象物体であるか否かを判定する判定部を備え、前記検出対象物体であると判定された前記移動物体について学習データを生成してよい。 In the object situation estimation system, the learning data generation unit includes a determination unit that determines whether or not the moving object is the detection target object, and the moving object that is determined to be the detection target object Learning data may be generated.

この構成により、移動物体のうちの検出対象物体に該当するものについて、正例データとして学習データを生成できる。 With this configuration, learning data can be generated as positive example data for a moving object that corresponds to a detection target object.

上記の物体状況推定システムにおいて、前記判定部は、前記検出の結果に基づいて、前記移動物体が前記検出対象物体であるか否かを判定してよい。 In the object state estimation system, the determination unit may determine whether the moving object is the detection target object based on the detection result.

この構成により、物体検出部の検出結果を利用移動物体が検出対象物体であるか否かを判定できる。 With this configuration, it is possible to determine whether the moving object is a detection target object using the detection result of the object detection unit.

本発明の一態様の物体状況推定装置は、撮影をして撮影画像を生成するカメラと通信可能に接続され、前記カメラから前記撮影画像を取得する物体状況推定装置であって、辞書を利用して、前記撮影画像から特定の検出対象物体の検出、及び前記検出対象物体の状況の推定をする物体検出部と、前記撮影画像における移動物体の移動情報を算出する移動情報算出部と、前記移動情報に基づいて、前記移動物体の状況を判定することで、学習データを生成する学習データ生成部とを備えた構成を有している。 An object situation estimation apparatus according to an aspect of the present invention is an object situation estimation apparatus that is communicably connected to a camera that shoots and generates a photographed image, and that obtains the photographed image from the camera. An object detection unit that detects a specific detection target object from the captured image and estimates the state of the detection target object, a movement information calculation unit that calculates movement information of a moving object in the captured image, and the movement It has the structure provided with the learning data generation part which produces | generates learning data by determining the condition of the said moving object based on information.

この構成により、辞書を利用して撮影画像から特定の検出対象物体を検出し、その検出対象物体の状況を推定する装置において、辞書を更新するのに必要な学習データを撮影画像から自ら自動で生成できる。 With this configuration, a device that detects a specific detection target object from a captured image using a dictionary and estimates the state of the detection target object automatically acquires learning data necessary for updating the dictionary from the captured image. Can be generated.

本発明の一態様の物体状況推定方法は、撮影をして撮影画像を生成するステップと、辞書を利用して、前記撮影画像から特定の検出対象物体の検出、及び前記検出対象物体の状況の推定をするステップと、前記撮影画像における移動物体の移動情報を算出するステップと、前記移動情報に基づいて、前記移動物体の状況を判定することで、学習データを生成するステップと、前記学習データに基づいて前記辞書を更新するステップとを備えた構成を有している。 An object situation estimation method according to an aspect of the present invention includes a step of shooting to generate a shot image, a dictionary is used to detect a specific detection target object from the shot image, and a status of the detection target object A step of estimating, a step of calculating movement information of the moving object in the captured image, a step of generating learning data by determining a situation of the moving object based on the movement information, and the learning data And updating the dictionary based on the above.

この構成により、辞書を利用して撮影画像から特定の検出対象物体を検出し、その検出対象物体の状況を推定するとともに、辞書を更新するのに必要な学習データを撮影画像から自動で生成できる。 With this configuration, a specific detection target object is detected from the captured image using a dictionary, the situation of the detection target object is estimated, and learning data necessary to update the dictionary can be automatically generated from the captured image. .

本発明の一態様の物体状況推定プログラムは、撮影をして撮影画像を生成するカメラと通信可能に接続され、前記カメラから前記撮影画像を取得する物体状況推定装置のコンピュータを、辞書を利用して、前記撮影画像からの特定の検出対象物体の検出、及び前記検出対象物体の状況の推定をする物体検出部、前記撮影画像における移動物体の移動情報を算出する移動情報算出部、前記移動情報に基づいて、前記移動物体の状況を判定することで、学習データを生成する学習データ生成部、及び前記学習データに基づいて前記辞書を更新する追加学習部として機能させる。 The object situation estimation program according to one embodiment of the present invention uses a dictionary of a computer of an object situation estimation apparatus that is communicably connected to a camera that shoots and generates a photographed image and acquires the photographed image from the camera. An object detection unit that detects a specific detection target object from the photographed image and estimates the state of the detection target object, a movement information calculation unit that calculates movement information of a moving object in the photographed image, and the movement information By determining the state of the moving object based on the above, the learning data generating unit that generates learning data and the additional learning unit that updates the dictionary based on the learning data are caused to function.

本発明によれば、辞書を利用して撮影画像から特定の検出対象物体を検出し、その検出対象物体の状況を推定するシステムにおいて、辞書を更新するのに必要な学習データを撮影画像から自ら自動で生成できる。 According to the present invention, in a system for detecting a specific detection target object from a photographed image using a dictionary and estimating the situation of the detection target object, learning data necessary for updating the dictionary is obtained from the photographed image by itself. Can be generated automatically.

本発明の実施の形態の物体状況推定システムの構成を示すブロック図The block diagram which shows the structure of the object condition estimation system of embodiment of this invention 本発明の実施の形態の背景差分画像を生成する処理を示す図The figure which shows the process which produces | generates the background difference image of embodiment of this invention 本発明の実施の形態のフレーム間差分画像を生成する処理を示す図The figure which shows the process which produces | generates the difference image between frames of embodiment of this invention. 本発明の実施の形態の物体検出部における処理を説明する図The figure explaining the process in the object detection part of embodiment of this invention 本発明の実施の形態の出力判定部の処理を示す図The figure which shows the process of the output determination part of embodiment of this invention 本発明の実施の形態の動きベクトル算出部の処理を示す図The figure which shows the process of the motion vector calculation part of embodiment of this invention 本発明の実施の形態の人物有無判定部における人物有無判定処理のフロー図The flowchart of the person presence determination process in the person presence determination part of embodiment of this invention 本発明の実施の形態の人物向き判定部における人物向き判定処理のフロー図Flow chart of person orientation determination processing in the person orientation determination unit of the embodiment of the present invention 本発明の実施の形態の人物類似判定部における人物類似判定処理のフロー図Flow chart of person similarity determination processing in the person similarity determination unit of the embodiment of the present invention

以下、図面を参照して本発明の実施の形態を説明する。なお、以下に説明する実施の形態は、本発明を実施する場合の一例を示すものであって、本発明を以下に説明する具体的構成に限定するものではない。本発明の実施にあたっては、実施の形態に応じた具体的構成が適宜採用されてよい。 Embodiments of the present invention will be described below with reference to the drawings. The embodiment described below shows an example when the present invention is implemented, and the present invention is not limited to the specific configuration described below. In carrying out the present invention, a specific configuration according to the embodiment may be adopted as appropriate.

図１は、本発明の実施の形態の物体状況推定システムの構成を示すブロック図である。物体状況推定システム１は、カメラ１０と、物体状況推定装置２０とからなる。カメラ１０は、街頭に設置され、具体的には、例えば比較的高所から交差点に向くように設置される。カメラ１０は、所定のフレームレートで連続的に撮影をして、撮影画像を生成する。カメラ１０は、物体状況推定装置２０に対して、連続的に得られた撮影画像を所定の伝送レートで順次送信する。 FIG. 1 is a block diagram showing a configuration of an object situation estimation system according to an embodiment of the present invention. The object situation estimation system 1 includes a camera 10 and an object situation estimation apparatus 20. The camera 10 is installed on the street, and specifically, for example, is installed so as to face an intersection from a relatively high place. The camera 10 continuously shoots at a predetermined frame rate to generate a shot image. The camera 10 sequentially transmits captured images obtained continuously to the object state estimation apparatus 20 at a predetermined transmission rate.

物体状況推定装置２０は、カメラ１０で生成された撮影画像を取得して、この撮影画像に映っている特定の検出対象物体（本実施の形態では、歩行している人物（歩行者））の状況を推定する。本実施の形態において、物体状況推定装置２０が推定する人物の状況は、人物の向きであり、これに基づいて、人物が、信号待ちをしているのか、道路を渡っているのか、あるいは、道路を渡り終えたのかを判定できる。 The object state estimation device 20 acquires a captured image generated by the camera 10 and detects a specific detection target object (in this embodiment, a walking person (pedestrian)) reflected in the captured image. Estimate the situation. In the present embodiment, the situation of the person estimated by the object situation estimation apparatus 20 is the direction of the person, and based on this, whether the person is waiting for a signal, crossing the road, or You can determine whether you have crossed the road.

カメラ１０と物体状況推定装置２０とは、カメラ１０から物体状況推定装置２０に撮影画像を送信し、物体状況推定装置２０にて撮影画像を受信できるように、互いに通信可能に接続されている。カメラ１０と物体状況推定装置２０とは、有線で接続されていても、無線で接続されていてもよい。また、カメラ１０と物体状況推定装置２０との間にインターネット等の通信ネットワークが介在していてもよい。すなわち、物体状況推定装置２０は、カメラ１０とは地理的に離れた位置に設けられていてよい。 The camera 10 and the object situation estimation apparatus 20 are communicably connected to each other so that the camera 10 transmits a captured image to the object situation estimation apparatus 20 and the object situation estimation apparatus 20 can receive the photographed image. The camera 10 and the object situation estimation device 20 may be connected by wire or may be connected wirelessly. Further, a communication network such as the Internet may be interposed between the camera 10 and the object state estimation device 20. That is, the object state estimation device 20 may be provided at a position geographically separated from the camera 10.

物体状況推定装置２０は、記憶装置、ＲＯＭ、ＲＡＭ、ＣＰＵ等からなるコンピュータによって構成され、記憶装置に記憶された物体状況推定プログラムをＣＰＵが読み出して実行することで、以下の各機能が実現される。物体状況推定装置２０は、物体候補領域抽出部２１と、物体検出部２２と、移動情報産出部２３と、学習データ生成部２４と、追加学習部２５と、過去画像記憶部３１と、背景画像記憶部３２と、学習データ記憶部３３と、人物辞書記憶部３４とを備えており、これらを用いて物体状況推定方法を実行する。 The object situation estimation apparatus 20 is configured by a computer including a storage device, a ROM, a RAM, a CPU, and the like. The CPU reads and executes an object situation estimation program stored in the storage apparatus, thereby realizing the following functions. The The object situation estimation device 20 includes an object candidate area extraction unit 21, an object detection unit 22, a movement information generation unit 23, a learning data generation unit 24, an additional learning unit 25, a past image storage unit 31, and a background image. A storage unit 32, a learning data storage unit 33, and a person dictionary storage unit 34 are provided, and an object situation estimation method is executed using these.

なお、本実施の形態では、時間推移を離散的に示し、現在の時点を符号ｔで表し、現時点から１ステップ前の時点を符号ｔ−１で表し、現時点から２ステップ前の時点を符号ｔ−２で表す。また、１ステップは１フレームに相当するので、現時点の最新フレームを符号ｔで表し、１つ前のフレームを符号ｔ−１で表し、２つ前のフレームを符号ｔ−２で表す。 In the present embodiment, the time transition is discretely shown, the current time point is represented by a symbol t, the time point one step before the present time is represented by a symbol t-1, and the time point two steps before the current time is represented by a symbol t. -2. Since one step corresponds to one frame, the current latest frame is represented by a symbol t, the previous frame is represented by a symbol t-1, and the second previous frame is represented by a symbol t-2.

カメラ１０から物体状況推定装置２０に送信された撮影画像は、過去画像記憶部３１に記憶されるとともに、物体候補領域抽出部２１に入力される。過去画像記憶部３１には、過去のフレームの撮影画像が所定のフレーム数だけ記憶されている。物体候補領域抽出部２１は、撮影画像から、物体が存在すると考えられる領域（以下、「物体候補領域」という。）を抽出する。本実施の形態の物体候補領域抽出部２１は、特に、移動する物体が存在する領域を物体候補領域として抽出する。 The captured image transmitted from the camera 10 to the object situation estimation apparatus 20 is stored in the past image storage unit 31 and input to the object candidate region extraction unit 21. The past image storage unit 31 stores a predetermined number of frames of captured images of past frames. The object candidate area extraction unit 21 extracts an area where an object is considered to exist (hereinafter referred to as “object candidate area”) from the captured image. In particular, the object candidate area extraction unit 21 of the present embodiment extracts an area where a moving object exists as an object candidate area.

物体検出部２２は、物体候補領域抽出部２１から抽出された物体候補領域から特定の検出対象物体（人物）を検出する。撮影画像に人物が複数ある場合には、物体検出部２２は、それらをそれぞれ検出する。また、物体検出部２２は、検出した各人物について、その状況を推定して、人物の有無（人物の検出結果）及び状況の推定結果を出力する。 The object detection unit 22 detects a specific detection target object (person) from the object candidate region extracted from the object candidate region extraction unit 21. When there are a plurality of persons in the captured image, the object detection unit 22 detects each of them. In addition, the object detection unit 22 estimates the situation of each detected person, and outputs the presence / absence of the person (person detection result) and the situation estimation result.

移動情報算出部２３は、過去画像記憶部３１から読み出した撮影画像と、物体候補領域抽出部２１から入力された物体候補領域とに基づいて、移動物体（人物には限らない）の移動情報（具体的には、各移動物体の動きベクトル）を算出する。移動情報算出部２３は、算出した移動情報を学習データ生成部２４に出力する。 The movement information calculation unit 23 is based on the captured image read from the past image storage unit 31 and the object candidate area input from the object candidate area extraction unit 21, and the movement information (not limited to a person) of movement information (not limited to a person). Specifically, the motion vector of each moving object is calculated. The movement information calculation unit 23 outputs the calculated movement information to the learning data generation unit 24.

学習データ生成部２４は、移動情報算出部２３から入力した各移動物体の移動情報に基づいて、移動物体の向きを判定して、向きごとの学習データを生成する。生成された学習データは、追加学習データとして、追加学習データ記憶部３３に記憶される。 The learning data generation unit 24 determines the direction of the moving object based on the movement information of each moving object input from the movement information calculation unit 23, and generates learning data for each direction. The generated learning data is stored in the additional learning data storage unit 33 as additional learning data.

追加学習部２５は、追加学習データ記憶部３３に記憶されている追加学習用の学習データを用いて、追加学習を行って、人物辞書記憶部３４に記憶されている人物辞書を更新することで、新たな人物辞書を生成する。 The additional learning unit 25 performs additional learning using the learning data for additional learning stored in the additional learning data storage unit 33 and updates the person dictionary stored in the person dictionary storage unit 34. A new person dictionary is generated.

物体状況推定装置２０において、物体候補領域抽出部２１、物体検出部２２、過去画像記憶部３１、背景画像記憶部３２、及び人物辞書記憶部３４は、撮影画像から人物を検出して、その状況を推定するための構成であり、物体候補領域２１、移動情報算出部２３、学習データ生成部２４、追加学習部２５、過去画像記憶部３１、及び追加学習データ記憶部３３は、追加の学習を行って人物辞書を更新するための構成である。以下、物体状況推定装置２０のそれぞれの機能について、より詳細に説明する。 In the object situation estimation apparatus 20, the object candidate area extraction unit 21, the object detection unit 22, the past image storage unit 31, the background image storage unit 32, and the person dictionary storage unit 34 detect a person from a captured image, and the situation The object candidate region 21, the movement information calculation unit 23, the learning data generation unit 24, the additional learning unit 25, the past image storage unit 31, and the additional learning data storage unit 33 perform additional learning. It is the structure for going and updating a person dictionary. Hereinafter, each function of the object situation estimation apparatus 20 will be described in more detail.

まず、撮影画像から人物を検出して、その状況を推定するための構成について説明する。過去画像記憶部３１は、フレーム間差分画像ＦＤｔの算出に用いる過去フレームの撮影画像（以下、「過去フレーム画像」ともいう。）を記憶する。過去フレーム画像は、過去の時点においてカメラ１０が撮影した撮影画像であり、本実施の形態では、過去画像記憶部３１には、直近の過去２ステップ（過去２フレーム）の過去フレーム画像Ｉｔ−１、Ｉｔ−２が記憶される。過去画像記憶部３１に記憶される過去フレーム画像は、カメラ１０から新たに現時点の最新の撮影画像（以下、「現フレーム画像」ともいう。）を受信するたびに逐次更新される。 First, a configuration for detecting a person from a captured image and estimating the situation will be described. The past image storage unit 31 stores a captured image of a past frame (hereinafter also referred to as “past frame image”) used for calculating the inter-frame difference image FDt. The past frame image is a captured image taken by the camera 10 at a past time point, and in the present embodiment, the past image storage unit 31 stores the past frame image It-1 in the last two steps (the past two frames). , It-2 is stored. The past frame image stored in the past image storage unit 31 is sequentially updated every time a newest photographed image at the present time (hereinafter also referred to as “current frame image”) is newly received from the camera 10.

背景画像記憶部３２は、背景差分画像ＢＤｔの算出に用いる撮影画像（過去フレーム画像）を背景画像ＢＧとして記憶する。背景画像記憶部３２は、カメラ１０から受信して過去画像記憶部３１に記憶された撮影画像を読み出して記憶する。背景画像記憶部３２は、所定の時間間隔（例えば１時間間隔）で背景画像を更新する。なお、背景画像としては、移動物体を含まず、静止物体のみが含まれる撮影画像を採用することが望ましい。 The background image storage unit 32 stores a captured image (past frame image) used for calculation of the background difference image BDt as a background image BG. The background image storage unit 32 reads and stores captured images received from the camera 10 and stored in the past image storage unit 31. The background image storage unit 32 updates the background image at a predetermined time interval (for example, one hour interval). As the background image, it is desirable to adopt a captured image that does not include a moving object but includes only a stationary object.

物体候補領域抽出部２１は、背景差分演算部２１１とフレーム間差分演算部２１２とを備えている。背景差分演算部２１１は、カメラ１０から受信した現フレーム画像Ｉｔと、背景画像記憶部３２に記憶されている背景画像ＢＧとを用いて背景差分画像ＢＤｔを生成する。 The object candidate area extraction unit 21 includes a background difference calculation unit 211 and an inter-frame difference calculation unit 212. The background difference calculation unit 211 generates a background difference image BDt using the current frame image It received from the camera 10 and the background image BG stored in the background image storage unit 32.

図２は、背景差分画像ＢＤｔを生成する処理を示す図である。背景差分演算部２１１は、カメラ１０からの現フレーム画像Ｉｔを取得し、背景画像記憶部３２から背景画像ＢＧを読み出し、図２に示すように、現フレーム画像Ｉｔから背景画像ＢＧの情報を除外することによって背景差分画像ＢＤｔを算出する。 FIG. 2 is a diagram illustrating processing for generating the background difference image BDt. The background difference calculation unit 211 acquires the current frame image It from the camera 10, reads the background image BG from the background image storage unit 32, and excludes the information of the background image BG from the current frame image It as shown in FIG. As a result, the background difference image BDt is calculated.

背景画像ＢＧには現フレーム画像Ｉｔ中の静止物体に関する情報が含まれるので、背景差分画像ＢＤｔは、現フレーム画像Ｉｔに含まれる静止物体に関する情報が除外され、現フレーム画像Ｉｔから移動物体を抽出した画像となる。図２の例では、背景差分画像ＢＤｔは、現フレーム画像Ｉｔから、道路上にて一時停止している１台の自動車Ｍ１と道路上を走行する２台の自動車Ｍ２、Ｍ３とを抽出した画像となっている。 Since the background image BG includes information related to a stationary object in the current frame image It, the background difference image BDt excludes information related to the stationary object included in the current frame image It and extracts a moving object from the current frame image It. The resulting image. In the example of FIG. 2, the background difference image BDt is an image obtained by extracting one vehicle M1 temporarily stopped on the road and two vehicles M2 and M3 traveling on the road from the current frame image It. It has become.

フレーム間差分演算部２１２は、カメラ１０から受信した現フレーム画像Ｉｔと、過去画像記憶部３１に記憶されている過去フレーム画像Ｉｔ−１、Ｉｔ−２とを用いて、フレーム間差分画像ＦＤｔを算出する。 The interframe difference calculation unit 212 uses the current frame image It received from the camera 10 and the past frame images It-1 and It-2 stored in the past image storage unit 31 to calculate the interframe difference image FDt. calculate.

図３は、フレーム間差分画像ＦＤｔを生成する処理を示す図である。フレーム間差分演算部２１２は、カメラ１０から現フレーム画像Ｉｔを取得し、過去画像記憶部３１から過去フレーム画像Ｉｔ−１、Ｉｔ−２を読み出す。過去フレーム画像Ｉｔ−１、Ｉｔ−２は、現時点ｔより１フレーム前及び２フレーム前の撮影画像である。フレーム間差分演算部２１２は、現フレーム画像Ｉｔと、過去フレーム画像Ｉｔ−１と、過去フレーム画像Ｉｔ−２からフレーム間差分画像ＦＤｔを算出する。 FIG. 3 is a diagram illustrating processing for generating the inter-frame difference image FDt. The inter-frame difference calculation unit 212 acquires the current frame image It from the camera 10 and reads the past frame images It-1 and It-2 from the past image storage unit 31. The past frame images It-1 and It-2 are photographed images one frame before and two frames before the current time t. The interframe difference calculation unit 212 calculates an interframe difference image FDt from the current frame image It, the past frame image It-1, and the past frame image It-2.

フレーム間差分演算部２１２は、図３に示すように、まず、現フレーム画像Ｉｔと過去フレーム画像Ｉｔ−１との差分画像Ｄ１と、過去フレーム画像Ｉｔ−１と過去フレーム画像Ｉｔ−２との差分画像Ｄ２とを算出する。フレーム間差分演算部２１２は、次に、差分画像Ｄ１と差分画像Ｄ２との共通部分（ＡＮＤ）をフレーム間差分画像ＦＤｔとして算出する。このように、フレーム間差分画像ＦＤｔは、時系列の３フレームの画像から差分情報を抽出したものであり、フレーム間の変化量を安定的に抽出できる。 As shown in FIG. 3, the inter-frame difference calculation unit 212 first calculates the difference image D1 between the current frame image It and the past frame image It-1, and the past frame image It-1 and the past frame image It-2. The difference image D2 is calculated. Next, the interframe difference calculation unit 212 calculates a common part (AND) of the difference image D1 and the difference image D2 as an interframe difference image FDt. As described above, the inter-frame difference image FDt is obtained by extracting difference information from the time-series three-frame images, and the amount of change between frames can be stably extracted.

フレーム間差分画像ＦＤｔは、現フレーム画像Ｉｔから短期的な特徴（例えばフレーム間で瞬間的に動いている物体）を抽出した画像である点で、現フレーム画像Ｉｔの長期的な特徴を反映させている上記の背景差分画像ＢＤｔとは異なるものである。例えば、図３のフレーム間差分画像ＦＤｔを図２の背景差分画像ＢＤｔと比較すると、現フレーム画像Ｉｔにて一時停止している自動車Ｍ１は、背景差分画像ＢＤｔでは抽出されて視認できるが、フレーム間差分画像ＦＤｔでは抽出されず視認できない。背景差分画像ＢＤｔ及びフレーム間差分画像ＦＤｔのいずれにおいても、過去フレーム画像にはなく現フレーム画像に存在する物体が抽出されているので、この物体候補領域抽出部２１で抽出される物体を以下では「移動物体」ともいう。 The inter-frame difference image FDt is an image obtained by extracting a short-term feature (for example, an object moving instantaneously between frames) from the current frame image It, and reflects the long-term feature of the current frame image It. This is different from the above-described background difference image BDt. For example, when the inter-frame difference image FDt in FIG. 3 is compared with the background difference image BDt in FIG. 2, the automobile M1 temporarily stopped in the current frame image It can be extracted and visually recognized in the background difference image BDt. The intermediate difference image FDt is not extracted and cannot be visually recognized. In both the background difference image BDt and the inter-frame difference image FDt, an object that exists in the current frame image is extracted instead of the past frame image. Also called “moving object”.

フレーム間差分演算部２１２は、さらに、現フレーム画像Ｉｔを過去フレーム画像Ｉｔ−１とし、現在の過去フレーム画像Ｉｔ−１を過去フレーム画像Ｉｔ−２として、過去画像記憶部３１に記憶し直すことで過去画像記憶部３１を更新する。 The inter-frame difference calculation unit 212 further stores the current frame image It in the past image storage unit 31 as the past frame image It-1 and the current past frame image It-1 as the past frame image It-2. The past image storage unit 31 is updated.

なお、フレーム間差分演算部２１２がフレーム間差分画像ＦＤｔの算出に用いる過去フレーム画像は、本実施の形態の直近の２フレーム分（過去フレーム画像Ｉｔ−１、Ｉｔ−２）に限らず、直近の３フレーム以上でもよいし、または、相互に所定のフレーム分を開けた複数の過去フレーム画像を用いることもできる。 Note that the past frame images used by the inter-frame difference calculation unit 212 to calculate the inter-frame difference image FDt are not limited to the two most recent frames (past frame images It-1 and It-2) of the present embodiment, but are the most recent. 3 frames or more, or a plurality of past frame images having a predetermined number of frames opened can be used.

物体候補領域抽出部２１は、背景差分画像ＢＤｔから抽出された移動物体（図２の例では、自動車Ｍ１〜Ｍ３）と、フレーム間差分画像ＦＤｔから抽出された移動物体（図３の例では、自動車Ｍ２、Ｍ３）を囲う枠を生成し、その枠を物体候補領域として、物体検出部２２及び移動情報算出部２３に出力する。この枠は、抽出された移動物体をすべて包含する矩形の枠である。枠のサイズ（縦横幅）は、抽出された移動物体の大きさによって適宜決定される。物体候補領域抽出部２１は、物体候補領域の情報として、枠の基準点（例えば左上角）の撮影画像内における位置（座標）と、枠体の縦横幅を出力する。 The object candidate area extraction unit 21 includes a moving object extracted from the background difference image BDt (automobiles M1 to M3 in the example of FIG. 2) and a moving object extracted from the inter-frame difference image FDt (in the example of FIG. 3, A frame surrounding the automobiles M2 and M3) is generated, and the frame is output as an object candidate region to the object detection unit 22 and the movement information calculation unit 23. This frame is a rectangular frame that includes all the extracted moving objects. The size (vertical and horizontal width) of the frame is appropriately determined depending on the size of the extracted moving object. The object candidate area extraction unit 21 outputs the position (coordinates) of the frame reference point (for example, the upper left corner) in the captured image and the vertical and horizontal widths of the frame as the object candidate area information.

物体検出部２２は、現フレーム画像Ｉｔと、物体候補領域の情報とに基づいて、特定の検出対象物体（本実施の形態では、人物）を検出して、当該物体の状況に関する情報（状況情報）を付与して出力する。物体検出部２２は、ＥＬＭ（Extreme Learning Machine）２２１と、出力判定部２２２とを備える。 The object detection unit 22 detects a specific detection target object (a person in the present embodiment) based on the current frame image It and information on the object candidate area, and information (situation information) about the state of the object. ) And output. The object detection unit 22 includes an ELM (Extreme Learning Machine) 221 and an output determination unit 222.

図４は、物体検出部２２における処理を説明する図である。物体検出部２２は、現フレーム画像Ｉｔ内に、物体候補領域ＣＡを含み、かつ物体候補領域ＣＡよりも広い矩形の検出対象領域ＤＡを設定する。物体検出部２２は、さらに、検出対象領域ＤＡ内で、検出対象領域ＤＡより小さい矩形の切出領域ＣＬＡ（ｍ×ｎピクセル）をスライドさせながら、切出領域ＣＬＡ内の切出画像ＣＩａについて、ＥＬＭ２２１において人物検出及び状況の推定を行う。 FIG. 4 is a diagram for explaining processing in the object detection unit 22. The object detection unit 22 sets a rectangular detection target area DA that includes the object candidate area CA and is wider than the object candidate area CA in the current frame image It. Further, the object detection unit 22 slides a rectangular cut area CLA (m × n pixels) smaller than the detection target area DA in the detection target area DA, while regarding the cut image CIa in the cut area CLA. The ELM 221 performs person detection and situation estimation.

ＥＬＭ２２１は、切出画像ＣＩの各画素を入力層２２１１に入力し、中間層２２１２を経て、出力層２２１３から人物の有無と状況の情報を出力する。このＥＬＭ２２１における層間の重みは、人物辞書として人物辞書記憶部３４に記憶されている。ＥＬＭ２２１の出力、すなわち人物の有無と状況情報は、確率（０〜１．０）で表される。本実施の形態のＥＬＭ２２１は、人物の状況情報として、人物の向き（前向き、後ろ向き、右向き、又は左向き）の各確率を出力する。なお、上記の処理により、１つの物体候補領域ＣＡについて、複数の切出画像ＣＩａが得られ、ＥＬＭ２２１の出力は各切出画像ＣＩａについて得られる。 The ELM 221 inputs each pixel of the cut-out image CI to the input layer 2211, and outputs information on the presence / absence of the person and the situation from the output layer 2213 through the intermediate layer 2212. The weights between layers in the ELM 221 are stored in the person dictionary storage unit 34 as a person dictionary. The output of the ELM 221, that is, the presence / absence of a person and status information is represented by a probability (0 to 1.0). The ELM 221 of the present embodiment outputs each probability of the person's direction (forward, backward, right, or left) as the person status information. Note that, by the above processing, a plurality of cut images CIa are obtained for one object candidate area CA, and the output of the ELM 221 is obtained for each cut image CIa.

出力判定部２２２は、ＥＬＭ２２１から出力される各確率に基づいて、人物の有無と状況を判定して出力する。図５は、出力判定部２２２の処理を示す図である。出力判定部２２２は、まず、人物有無確率値が０．５以上であるか否かを判断する（ステップＳ５１）。人物有無確率値が０．５未満であれば（ステップＳ５１でＮＯ）、移動物体は人物ではないと判断する（ステップＳ５２）。 Based on the probabilities output from the ELM 221, the output determination unit 222 determines the presence and status of a person and outputs the result. FIG. 5 is a diagram illustrating processing of the output determination unit 222. The output determining unit 222 first determines whether or not the person presence / absence probability value is 0.5 or more (step S51). If the person presence / absence probability value is less than 0.5 (NO in step S51), it is determined that the moving object is not a person (step S52).

出力判定部２２２は、人物有無確率値が０．５以上であれば（ステップＳ５１でＹＥＳ）、移動物体は人物であると判断して（ステップＳ５３）、さらに、前向き確率、後向き確率、右向き確率、及び左向き確率のうちの最大の確率値をもつ方向を選択する（ステップＳ５４）。そして、その選択された方向の確率値が０．５以上であるか否かを判断する（ステップＳ５５）。 If the person presence / absence probability value is 0.5 or more (YES in step S51), the output determination unit 222 determines that the moving object is a person (step S53), and further determines the forward probability, backward probability, and rightward probability. , And the direction having the maximum probability value among the leftward probabilities is selected (step S54). Then, it is determined whether or not the probability value in the selected direction is 0.5 or more (step S55).

出力判定部２２２は、選択された方向の確率値が０．５以上であれば（ステップＳ５５でＹＥＳ）、その方向を検出された人物の方向として決定する（ステップＳ５６）。出力判定部２２２は、選択された方向の確率値が０．５未満であれば（ステップＳ５５でＮＯ）、これはどの方向も確率値が０．５未満ということになるので、人物はいるが向きは不明であると判断する（ステップＳ５７）。 If the probability value of the selected direction is 0.5 or more (YES in step S55), the output determination unit 222 determines the direction as the detected person's direction (step S56). If the probability value in the selected direction is less than 0.5 (NO in step S55), the output determination unit 222 means that the probability value is less than 0.5 in any direction. It is determined that the direction is unknown (step S57).

上述のように、１つの物体候補領域ＣＡについて複数のＥＬＭ出力が得られるが、出力判定部２２２は、すべてのＥＬＭ出力について上記の処理を行い、いずれかの切出画像ＣＩａについて人物有とされれば、その物体候補領域ＣＡに人物がいると判定し、決定された向きのうち、最も確率が高い向きをその人物の向きと推定して、人物の検出結果（人物の有無を示す）、及び人物の向きの推定結果（前後左右のいずれかで人物の向きを示す）を出力する。 As described above, a plurality of ELM outputs can be obtained for one object candidate area CA, but the output determination unit 222 performs the above processing for all ELM outputs, and any one of the clipped images CIa is regarded as having a person. Then, it is determined that there is a person in the object candidate area CA, and the direction with the highest probability among the determined directions is estimated as the direction of the person, and the detection result of the person (indicating the presence or absence of the person), And the estimation result of the direction of the person (indicating the direction of the person in front, back, left or right) is output.

以上の処理により、撮影画像から人物を検出して、その向きを推定することで、人物の検出結果及び無機の推定結果が得られる。物体状況推定装置２０は、カメラ１０から撮影画像を取得するごとに上記の処理を行うことで、カメラ１０で得られた撮影画像に対してリアルタイムに人物の検出及びその向きの推定を行うことができる。 Through the above processing, a person detection result and an inorganic estimation result can be obtained by detecting a person from a photographed image and estimating its orientation. The object situation estimation apparatus 20 can detect a person and estimate the direction of the captured image obtained by the camera 10 in real time by performing the above process every time a captured image is acquired from the camera 10. it can.

以下では、上記の人物の検出及びその状況（向き）の推定の精度を向上させるべく人物辞書を学習によって更新するための構成について説明する。この構成では、カメラ１０の撮影データから人物辞書を更新するための学習データを生成する。この学習データは、画像データに、人物であるか否か、人物である場合にはその向き（前向き、後向き、左向き、右向き）の情報が付加されて構成されたデータである。 In the following, a configuration for updating the person dictionary by learning so as to improve the accuracy of the person detection and the estimation of the situation (orientation) will be described. In this configuration, learning data for updating the person dictionary is generated from the shooting data of the camera 10. This learning data is data configured by adding information on whether or not the image data is a person and the orientation (forward, backward, leftward, rightward) in the case of a person.

図１に示すように、移動情報算出部２３は、動きベクトル算出部２３１を備えている。図６は、動きベクトル算出部２３１の処理を示す図である。動きベクトル算出部２３１は、現フレーム画像Ｉｔの物体候補領域ＣＡｔを切り出した切出画像ＣＩｂｔを物体領域抽出部２１から取得し、また、過去画像記憶部３１から過去フレーム画像Ｉｔ−１を読み出す。この切出画像ＣＩｂは、移動物体（人物とは限らない）の画像である。動きベクトル算出部２３１は、切出画像ＣＩｂｔを過去フレーム画像Ｉｔ−１内でスライドさせながらパターン照合による領域照合を行う（ステップＳ６１）。 As illustrated in FIG. 1, the movement information calculation unit 23 includes a motion vector calculation unit 231. FIG. 6 is a diagram illustrating processing of the motion vector calculation unit 231. The motion vector calculation unit 231 acquires the cut image CIbt obtained by cutting out the object candidate region CAt of the current frame image It from the object region extraction unit 21 and reads the past frame image It-1 from the past image storage unit 31. This cut-out image CIb is an image of a moving object (not necessarily a person). The motion vector calculation unit 231 performs region matching by pattern matching while sliding the clipped image CIbt within the past frame image It-1 (step S61).

また、動きベクトル算出部２３１は、領域照合の結果、切出画像ＣＩｂｔに対応する領域であると判定された過去フレーム画像Ｉｔ−１内の領域の位置から現フレーム画像Ｉｔにおける物体検出領域ＣＡｔの位置（切出画像ＣＩｂｔの位置）に向かう動きベクトル（オプティカルフロー）Ｖを算出する（ステップＳ６２）。この動きベクトルＶは、現フレーム画像Ｉｔにおける物体検出領域ＣＡｔの過去フレーム画像Ｉｔ−１からの移動量と移動方位をベクトルの長さ及び方向で表したものである。 Further, the motion vector calculation unit 231 determines the object detection area CAt in the current frame image It from the position of the area in the past frame image It-1 determined to be the area corresponding to the clipped image CIbt as a result of the area matching. A motion vector (optical flow) V toward the position (the position of the cut image CIbt) is calculated (step S62). This motion vector V represents the amount and direction of movement of the object detection area CAt from the past frame image It-1 in the current frame image It by the length and direction of the vector.

動きベクトル算出部２３１は、物体候補領域抽出部２１にて現フレーム画像Ｉｔから抽出されたすべての物体候補領域について上記の処理を行うことで、物体候補領域ごとに動きベクトルを算出する。移動情報算出部２３は、切出画像ＣＩｂｔに関連付けて、その動きベクトルＶを学習データ生成部２４に出力する。 The motion vector calculation unit 231 calculates the motion vector for each object candidate region by performing the above process on all the object candidate regions extracted from the current frame image It by the object candidate region extraction unit 21. The movement information calculation unit 23 outputs the motion vector V to the learning data generation unit 24 in association with the cut image CIbt.

学習データ算出部２４は、人物有無判定部２４１と、人物向き判定部２４２と、人物類似判定部２４３とを備えている。学習データ算出部２４は、物体検出部２２から判定結果を取得し、移動情報算出部２３から切出画像ＣＩｂとその動きベクトルＶを取得する。 The learning data calculation unit 24 includes a person presence / absence determination unit 241, a person orientation determination unit 242, and a person similarity determination unit 243. The learning data calculation unit 24 acquires the determination result from the object detection unit 22, and acquires the cut image CIb and its motion vector V from the movement information calculation unit 23.

人物有無判定部２４１は、切出画像ＣＩｂを正例データとすべきか、負例データとすべきかを判定する。図７は、人物有無判定部２４１における人物有無判定処理のフロー図である。人物有無判定部２４１は、当該切出画像ＣＩｂの移動物体について物体検出部２２が人物有と判定しているか否かを判断する（ステップＳ７１）。 The person presence / absence determination unit 241 determines whether the clipped image CIb should be positive example data or negative example data. FIG. 7 is a flowchart of person presence / absence determination processing in the person presence / absence determination unit 241. The person presence / absence determination unit 241 determines whether or not the object detection unit 22 has determined that the moving object of the clipped image CIb has a person (step S71).

人物でないと判定されている場合は（ステップＳ７１でＮＯ）、人物有無判定部２４１は、動きベクトルＶの移動量が、人物が走る速度として設定された所定の閾値（例えば、１０ｋｍ／ｈ）より大きいか否かを判断する（ステップＳ７２）。移動量が人物の走る速度より速い場合は（ステップＳ７２でＹＥＳ）、移動物体は人物以外のなんらかの物体があることが明らかであるので、人物有無判定部２４１は、この切出画像ＣＩｂを負例データとして追加する（ステップＳ７３）。この負例データは、人物ではない移動物体（例えば、走行している車両等）が映った画像である。 When it is determined that the person is not a person (NO in step S71), the person presence / absence determination unit 241 determines that the movement amount of the motion vector V is based on a predetermined threshold (for example, 10 km / h) set as the speed at which the person runs. It is determined whether it is larger (step S72). If the moving amount is faster than the running speed of the person (YES in step S72), since it is clear that the moving object is some object other than the person, the person presence / absence determination unit 241 uses the cut image CIb as a negative example. It adds as data (step S73). This negative example data is an image showing a moving object that is not a person (for example, a traveling vehicle or the like).

移動量が人物の走る速度以下である場合は（ステップＳ７２でＮＯ）、移動量がほぼ０（０の誤差範囲内）であるか否かを判断する（ステップＳ７４）。具体的には、例えば、移動量が０．１ｋｍ／ｈ以下であるか否かを判断する。移動量が０の誤差範囲内にない場合には、人物有無判定部２４１は、これを正体不明な移動物体として、データ追加はしない（ステップＳ７６）。また、移動量が０の誤差範囲内にある場合であって（ステップＳ７４でＮＯ）、滞在時間が所定の閾値Ｔ［秒］を超えている場合には（ステップＳ７５でＹＥＳ）、人物有無判定部２４１は、この物体が移動してきた後に停止している人物以外の物体であると判断して、この切出画像ＣＩｂを負例データとして追加学習データ記憶部３３に追加する（ステップＳ７３）。 If the movement amount is equal to or less than the running speed of the person (NO in step S72), it is determined whether or not the movement amount is substantially 0 (within an error range of 0) (step S74). Specifically, for example, it is determined whether or not the moving amount is 0.1 km / h or less. If the movement amount is not within the error range of 0, the person presence / absence determination unit 241 sets this as an unidentified moving object and does not add data (step S76). If the movement amount is within an error range of 0 (NO in step S74), and if the stay time exceeds a predetermined threshold T [second] (YES in step S75), the presence / absence determination of the person The unit 241 determines that the object is an object other than the person who has stopped after moving, and adds the cut image CIb to the additional learning data storage unit 33 as negative example data (step S73).

一方、移動量が０の誤差範囲内にあるが（ステップＳ７４でＹＥＳ）、滞在時間が閾値Ｔ［秒］以下である場合には（ステップＳ７５でＮＯ）、これを人物ではないが人物の歩行速度と同程度の速度で移動する正体不明な移動物体、あるいは何らかのノイズであると判断して、データ追加はしない（ステップＳ７６）。 On the other hand, if the movement amount is within the error range of 0 (YES in step S74), but the stay time is equal to or less than the threshold T [seconds] (NO in step S75), this is not a person but a person walking It is determined that the object is an unidentified moving object that moves at a speed similar to the speed, or some noise, and no data is added (step S76).

一方、物体検出部２２における判定結果において人物有と判定されている場合には（ステップＳ７１でＹＥＳ）、人物有無判定部２４１は、動きベクトルＶの移動量が、人物が走る速度として設定された所定の閾値（例えば、１０ｋｍ／ｈ）より大きいか否かを判断する（ステップＳ７７）。移動量が人物の走る速度より速い場合は（ステップＳ７７でＹＥＳ）、人物有無判定部２４１は、この人物が、例えば、二輪車に乗っている人物等、歩行者ではないと判断して、この切出画像ＣＩｂを負例データとして追加学習データ記憶部３３に追加する（ステップＳ７８）。 On the other hand, if it is determined that there is a person in the determination result in the object detection unit 22 (YES in step S71), the person presence / absence determination unit 241 sets the movement amount of the motion vector V as the speed at which the person runs. It is determined whether or not it is greater than a predetermined threshold (for example, 10 km / h) (step S77). If the amount of movement is faster than the running speed of the person (YES in step S77), the person presence / absence determination unit 241 determines that the person is not a pedestrian, such as a person riding a two-wheeled vehicle. The output image CIb is added to the additional learning data storage unit 33 as negative example data (step S78).

移動量が人物の走る速度以下であり（ステップＳ７７でＹＥＳ）、かつ、移動量が０の誤差範囲外である場合は（ステップＳ７９でＮＯ）、この人物であると判定された移動物体は、人物の歩行速度程度の速度で移動しているので、人物有無判定部２４１は、この切出画像ＣＩｂを正例データ候補とする（ステップＳ８０）。正例データは、歩行する人物（歩行者）の画像である。 If the movement amount is equal to or less than the speed of the person running (YES in step S77) and the movement amount is outside the error range of 0 (NO in step S79), the moving object determined to be this person is Since the person is moving at a speed approximately equal to the walking speed of the person, the person presence / absence determination unit 241 sets the cut image CIb as a positive example data candidate (step S80). Positive example data is an image of a walking person (pedestrian).

一方、移動量が人物の走る速度以下であり（ステップＳ７７でＮＯ）、かつ、移動量が０の誤差範囲内である場合は（ステップＳ７９でＹＥＳ）、その滞在時間が所定の閾値Ｔ［秒］より長いかを判断する（ステップＳ８１）。滞在時間が閾値Ｔより長い場合には（ステップＳ８１でＹＥＳ）、人物有無判定部２４１は、人物以外のなんらかの固定物であると判断して、この切出画像ＣＩｂを負例データとして追加学習データ記憶部３３に追加する（ステップＳ７８）。 On the other hand, if the movement amount is equal to or less than the speed at which the person runs (NO in step S77) and the movement amount is within the error range of 0 (YES in step S79), the stay time is a predetermined threshold T [seconds]. ] Is determined (step S81). If the staying time is longer than the threshold T (YES in step S81), the person presence / absence determination unit 241 determines that the object is a fixed object other than a person, and uses this cut-out image CIb as negative example data for additional learning data. It adds to the memory | storage part 33 (step S78).

移動量が０の誤差範囲外である（人物が歩く程度の速度で移動している）が（ステップＳ７９でＹＥＳ）、滞在時間が閾値Ｔ以下である場合（ステップＳ８１でＮＯ）、人物有無判定部２４１は、これを正体不明な移動物体として、データ追加はしない（ステップＳ８２）。 If the amount of movement is outside the error range of 0 (the person is moving at a walking speed) (YES in step S79), but the stay time is less than or equal to the threshold T (NO in step S81), the presence / absence determination of the person The unit 241 sets this as an unidentified moving object and does not add data (step S82).

図８は、人物向き判定部２４２における人物向き判定処理のフロー図である。人物向き判定部２４２は、人物有無判定処理において正例データ候補とされた切出画像ＣＩｂにおける人物の向きを判定する。人物向き判定部２４２は、まず、正例データ候補について、その動きベクトルの方向（動き方向）を前、後、右、左の４クラスのいずれかにクラス分けしてその向きを表すラベルを付与する（ステップＳ９１）。 FIG. 8 is a flowchart of person orientation determination processing in the person orientation determination unit 242. The person orientation determination unit 242 determines the orientation of the person in the cut-out image CIb determined as the positive example data candidate in the person presence / absence determination process. First, the person orientation determination unit 242 classifies the direction of the motion vector (motion direction) of the positive example data candidate into one of four classes of front, back, right, and left, and gives a label indicating the direction. (Step S91).

次に、人物向き判定部２４２は、物体検出部２２で検出された向き（状況情報が示す人物の向き）と、動きベクトル算出部２３１で算出された動きベクトルの向きとを比較して、それらが一致するか否かを判断する（ステップＳ９２）。一致する場合は、人物向き判定部２４２は、その一致する向きを人物の向きとして、その切出画像ＣＩｂを正例データ候補とする（ステップＳ９３）。 Next, the person direction determination unit 242 compares the direction detected by the object detection unit 22 (the direction of the person indicated by the situation information) with the direction of the motion vector calculated by the motion vector calculation unit 231 and compares them. Are determined to match (step S92). If they match, the person orientation determination unit 242 sets the matching orientation as the person orientation and sets the cut image CIb as a positive example data candidate (step S93).

物体検出部２２で推定された向き（状況情報が示す人物の向き）と、動きベクトル算出部２３１で算出された動きベクトルの向きとが一致しない場合には（ステップＳ９２でＮＯ）、物体検出部２２で推定された向きを動きベクトルの向きに変更した上で（ステップＳ９４）、その切出画像ＣＩｂを正例データ候補とする（ステップＳ９３）。なお、変形例として、物体検出部２２で推定された向きと動きベクトルの向きとが一致しない場合には、その切出画像ＣＩｂを正例データ候補から除外して、その切出画像ＣＩｂからは学習データを生成しないようにしてもよい。 If the direction estimated by the object detection unit 22 (the direction of the person indicated by the situation information) does not match the direction of the motion vector calculated by the motion vector calculation unit 231 (NO in step S92), the object detection unit After changing the direction estimated in 22 to the direction of the motion vector (step S94), the cut image CIb is set as a positive example data candidate (step S93). As a modification, when the direction estimated by the object detection unit 22 and the direction of the motion vector do not match, the cut image CIb is excluded from the positive example data candidates, and the cut image CIb The learning data may not be generated.

人物向き判定部２４２で正例データ候補とされた複数の切出画像ＣＩｂにおける同じ人物の切出画像ＣＩｂを１つにまとめることでオーバフィッティングの問題を回避することができる。そこで、人物類似判定部２４３は、正例データ候補とされた複数の切出画像ＣＩｂの互いの類似度を判定し、同じ人物の切出画像ＣＩｂを重複して追加学習データ記憶部２２に記憶させないようにする。 The problem of overfitting can be avoided by combining the cut-out images CIb of the same person in the plurality of cut-out images CIb determined as the positive example data candidates by the person orientation determination unit 242. Therefore, the person similarity determination unit 243 determines the degree of similarity between the plurality of cut images CIb determined as the positive example data candidates, and stores the cut images CIb of the same person redundantly in the additional learning data storage unit 22. Do not let it.

図９は、人物類似判定部２４３における人物類似判定処理のフロー図である。人物類似度判定部２４３は、ある正例データ候補の切出画像ＣＩｂ１と、他の正例データ候補の切出画像ＣＩｂ２とについて、正規化相互相関マッチング（ＮＣＣ）によって、互いの類似度を求めて、類似度が高い場合には、それらのいずれかの切出画像ＩＣｂを削除する。 FIG. 9 is a flowchart of person similarity determination processing in the person similarity determination unit 243. The person similarity determination unit 243 obtains the degree of similarity between the extracted image CIb1 of a certain positive example data candidate and the extracted image CIb2 of another positive example data candidate by normalized cross-correlation matching (NCC). If the degree of similarity is high, one of those cut-out images ICb is deleted.

具体的には、図９に示すように、人物類似判定部２４３は、まず、正例データ候補ＣＩｂ１と正例データ候補ＣＩｂ２について、ＮＣＣにより類似度を計算し（ステップＳ１０１）、計算された類似度が所定御閾値（本実施の形態では、０．５以上）であるか否かを判断する（ステップＳ１０２）。 Specifically, as shown in FIG. 9, the person similarity determination unit 243 first calculates the similarity by NCC for the positive example data candidate CIb1 and the positive example data candidate CIb2 (step S101), and the calculated similarity It is determined whether or not the degree is a predetermined threshold value (0.5 or more in the present embodiment) (step S102).

類似度が閾値より高い場合には（ステップＳ１０２でＹＥＳ）、人物類似判定部２４３は、人物有無の確率が低い方の正例データ候補を削除する（ステップＳ１０３）。類似度が閾値より低い場合には（ステップＳ１０２でＮＯ）、人物類似判定部２４３は、両正例データ候補を残す（ステップＳ１０４）。 When the degree of similarity is higher than the threshold (YES in step S102), the person similarity determination unit 243 deletes the positive example data candidate with the lower probability of the presence or absence of a person (step S103). If the similarity is lower than the threshold (NO in step S102), the person similarity determination unit 243 leaves both positive example data candidates (step S104).

人物類似判定部２４３は、以上の処理をすべての正例データ候補について行い、残った正例データ候補の切出画像を歩行している人物のデータとして、その向きの情報とともに、正例データとして追加学習データ記憶部３３に出力する。追加学習データ記憶部３３は、学習データ生成部２４から出力された切出画像をその人物の向きの情報とともに記憶する。また、追加学習データ記憶部３３は、学習データ生成部２４から出力された負例データも記憶する。 The person similarity determination unit 243 performs the above processing for all the positive example data candidates, and as the data of the person who is walking the extracted image of the remaining positive example data candidates, along with the direction information, as the positive example data The data is output to the additional learning data storage unit 33. The additional learning data storage unit 33 stores the cut image output from the learning data generation unit 24 together with information on the orientation of the person. The additional learning data storage unit 33 also stores negative example data output from the learning data generation unit 24.

追加学習部２５は、人物辞書生成部２５１を備えている。追加学習部２５は、追加学習データ記憶部３３にある程度の学習データ（正例データ及び負例データ）が蓄積されると、それらの学習データを追加学習データ記憶部３３から読み出す。追加学習部２５が追加学習を行うタイミングは、生成された学習データの数によって決定されてよく（例えば１０００の学習データが追加されるごとに追加学習を行ってよく）、あるいは、追加学習部２５は、所定の時間間隔で定期的に（例えば、１週間ごとに）追加学習を行ってもよい。人物辞書作成部２５１は、追加学習データ記憶部３３から読み出された追加学習データを用いて追加学習を実行し、更新された人物辞書を生成する。本実施の形態では、追加学習部２５は、追加学習データを用いた学習として機械学習を行う。追加学習部２５は、人物辞書作成部２５１にて更新された人物辞書を生成すると、それを用いて人物辞書記憶部３４に記憶される人物辞書を更新する。 The additional learning unit 25 includes a person dictionary generation unit 251. When a certain amount of learning data (positive example data and negative example data) is accumulated in the additional learning data storage unit 33, the additional learning unit 25 reads the learning data from the additional learning data storage unit 33. The timing at which the additional learning unit 25 performs additional learning may be determined by the number of generated learning data (for example, additional learning may be performed every time 1000 learning data is added), or the additional learning unit 25. May perform additional learning periodically (for example, every week) at predetermined time intervals. The person dictionary creation unit 251 performs additional learning using the additional learning data read from the additional learning data storage unit 33, and generates an updated person dictionary. In the present embodiment, the additional learning unit 25 performs machine learning as learning using additional learning data. When the additional learning unit 25 generates the person dictionary updated by the person dictionary creation unit 251, the additional learning unit 25 updates the person dictionary stored in the person dictionary storage unit 34 using the person dictionary.

以上のように、本実施の形態の物体状況推定装置２０は、カメラ１０から得られた撮影画像に基づいて人物の有無を検出して状況（向き）を推定する機能を有するとともに、同じカメラ１０の撮影画像に基づいて、人物の検出及び状況の推定に用いる人物辞書を更新する機能を備えている。人物辞書を更新するのに用いる撮影画像は、物体検出の対象となる撮影画像と同じ条件（位置、角度）のカメラ１０で撮影されたものであるので、追加の学習データは、そのカメラ１０の撮影画像に対して検索対象物体の検出や状況推定を行うのに適した学習データとなる。よって、学習を繰り返すことにより、検索対象物体の検出及び状況（向き）の推定の精度が向上することになる。 As described above, the object situation estimation apparatus 20 according to the present embodiment has a function of detecting the presence or absence of a person based on a captured image obtained from the camera 10 and estimating the situation (orientation), and the same camera 10. A function of updating a person dictionary used for detecting a person and estimating a situation based on the photographed image is provided. Since the captured image used to update the person dictionary is captured by the camera 10 under the same conditions (position and angle) as the captured image that is the object detection target, the additional learning data is stored in the camera 10. This is learning data suitable for detecting a search target object and estimating a situation for a captured image. Therefore, by repeating the learning, the accuracy of detection of the search target object and estimation of the situation (orientation) is improved.

このような物体状況推定システム１は、カメラ１０が所望の位置に設置され他後に追加の学習を行う。よって、カメラ１０を設置した直後に正式な運用、すなわち、検索対象物体の検出及び状況（向き）の推定の結果に基づく各種の制御を開始してもよいが、カメラ１０を設置して一定程度の回数の追加学習が行われるまでの一定の期間は、人物辞書を更新するための準備期間とし、その準備期間が終了してから正式な運用を開始してもよい。いずれにしても、運用していくことで、人物辞書が更新されて、検索対象物体の検出及び状況の推定の精度が向上する。 Such an object situation estimation system 1 performs additional learning after the camera 10 is installed at a desired position. Therefore, immediately after the camera 10 is installed, formal operation, that is, various controls based on the result of detection of the search target object and the estimation of the situation (orientation) may be started. The fixed period until the additional learning is performed may be a preparation period for updating the person dictionary, and the formal operation may be started after the preparation period ends. In any case, by operating, the person dictionary is updated, and the accuracy of detection of the search target object and estimation of the situation is improved.

以上のように、本実施の形態の物体状況推定システム１によれば、物体検出部２２で特定の検出対象物体を検出できず、あるいはその状況を推定できない撮影画像であっても、その中から特定の検出対象物体であると思われる物体について、その移動情報からその状況を推定して、学習データを生成できる。よって、この学習データを用いて追加の学習を行うことで、ひいては特定の検出対象物体の検出あるいはその状況の推定ができなかった撮影画像からも、特定の検出対象物体を検出してその状況を推定できるようになることが期待できる。 As described above, according to the object situation estimation system 1 of the present embodiment, even a captured image in which the object detection unit 22 cannot detect a specific detection target object or cannot estimate the situation is included. Learning data can be generated by estimating the situation of the object that is considered to be a specific detection target object from the movement information. Therefore, by performing additional learning using this learning data, it is possible to detect a specific detection target object from a captured image in which the detection of the specific detection target object or the estimation of the situation could not be performed. We can expect to be able to estimate.

なお、上記の物体状況推定装置２０において、学習データを生成する機能は、リアルタイムに行う必要はなく、一定のフレーム数の撮影画像を記憶しておき、それらの撮影画像を用いて、それらの撮影画像が得られた時点よりも後の時点において実行されてもよい。 In the object situation estimation apparatus 20 described above, the function of generating the learning data does not need to be performed in real time, and the captured images of a certain number of frames are stored, and those captured images are used to capture the captured images. It may be executed at a later time than when the image is obtained.

また、上記の物体状況推定装置２０では、移動物体である人物を特定の検出対象物体として検出するために、物体検知部２２における機械学習による検出及び推定の前に差分画像を用いて物体候補領域を抽出したが、特定の検出対象物体が移動物体に限られない場合には、物体候補領域抽出部２１を省略して、撮影画像の全体から特定の検出対象物体を検出してよい。この場合には、動きベクトル算出部２３１は、物体検出部２２における検出対象物体の検出結果及び状況の推定結果を利用することになる。 Further, in the object state estimation device 20 described above, in order to detect a person who is a moving object as a specific detection target object, the object candidate region is detected using a difference image before detection and estimation by machine learning in the object detection unit 22. However, if the specific detection target object is not limited to the moving object, the object candidate region extraction unit 21 may be omitted and the specific detection target object may be detected from the entire captured image. In this case, the motion vector calculation unit 231 uses the detection target object detection result and the situation estimation result in the object detection unit 22.

また、上記の物体状況推定システム１は、検出対象物体の状況として、検出対象物体の向きを推定し、学習データ生成部２４では、動きベクトルの向きに基づいて、学習データにおける向きを決定したが、検出対象物体の状況は、その向きに限られない。例えば、検出対象物体の状況は、検出対象物体である人物が走っているか、歩いているか、止まっているかという移動状況であってよく、その場合には、動きベクトル算出部２３１で算出された動きベクトルの大きさ（速度）によって、学習データにて付加される状況（走っている、歩いている、止まっている）を決定できる。 The object situation estimation system 1 estimates the direction of the detection target object as the state of the detection target object, and the learning data generation unit 24 determines the direction in the learning data based on the direction of the motion vector. The state of the detection target object is not limited to the direction. For example, the state of the detection target object may be a movement state of whether the person who is the detection target object is running, walking, or stopped. In this case, the motion calculated by the motion vector calculation unit 231 The situation (running, walking, stopping) added in the learning data can be determined by the magnitude (speed) of the vector.

本発明は、辞書を利用して、撮影画像から特定の検出対象物体を検出し、前記検出対象物体の状況を推定するシステムにおいて、辞書を更新するのに必要な学習データを撮影画像から自ら自動で生成でき、カメラで撮影した画像から特定の検出対象物体を検出してその状況を推定する物体状況推定システム等として有用である。 The present invention uses a dictionary to detect a specific detection target object from a photographed image, and automatically estimates learning data necessary for updating the dictionary from the photographed image in a system that estimates the state of the detection target object. It is useful as an object situation estimation system that detects a specific detection target object from an image photographed by a camera and estimates its situation.

１物体状況推定システム
１０カメラ
２０物体状況推定装置
２１物体候補領域抽出部
２１１背景差分演算部
２１２フレーム間差分演算部
２２物体検出部
２２１ＥＬＭ
２２２出力判定部
２３移動情報算出部
２３１動きベクトル算出部
２４学習データ生成部
２４１人物有無判定部
２４２人物向き判定部
２４３人物類似判定部
２５追加学習部
２５１人物辞書生成部
３１過去画像記憶部
３２背景画像記憶部
３３追加学習データ記憶部
３４人物辞書記憶部 DESCRIPTION OF SYMBOLS 1 Object condition estimation system 10 Camera 20 Object condition estimation apparatus 21 Object candidate area extraction part 211 Background difference calculation part 212 Inter-frame difference calculation part 22 Object detection part 221 ELM
222 output determination unit 23 movement information calculation unit 231 motion vector calculation unit 24 learning data generation unit 241 person presence determination unit 242 person orientation determination unit 243 person similarity determination unit 25 additional learning unit 251 person dictionary generation unit 31 past image storage unit 32 background Image storage unit 33 Additional learning data storage unit 34 Person dictionary storage unit

Claims

A camera that shoots and generates a shot image;
Using a dictionary, an object detection unit that detects a specific detection target object from the captured image and estimates the state of the detection target object;
A movement information calculation unit for calculating movement information of a moving object in the captured image;
A learning data generation unit that generates learning data by determining the status of the moving object based on the movement information;
An additional learning unit that updates the dictionary based on the learning data;
An object situation estimation system.

The object state estimation system according to claim 1, wherein the states of the detection target object and the moving object are directions of the detection target object and the moving object.

An object candidate area extracting unit that extracts an object candidate area from the captured image;
The object detection unit detects the detection target object from a partial region including the object candidate region in the captured image;
The object situation estimation system according to claim 2.

An object candidate area extracting unit that extracts an object candidate area from the captured image;
The movement information calculation unit calculates movement information of the object candidate area as movement information of the moving object.
The object situation estimation system according to claim 2.

The camera continuously shoots to generate a plurality of frames of the captured image,
5. The object situation estimation system according to claim 3, wherein the object candidate area extraction unit extracts the object candidate area based on a difference image of the captured images of a plurality of frames.

The learning data generation unit includes a determination unit that determines whether or not the moving object is the detection target object, and generates learning data for the moving object determined to be the detection target object. The object situation estimation system according to any one of 1 to 5.

The object state estimation system according to claim 6, wherein the determination unit determines whether or not the moving object is the detection target object based on the detection result.

An object situation estimation device that is communicably connected to a camera that shoots and generates a captured image, and that acquires the captured image from the camera,
Using a dictionary, an object detection unit that detects a specific detection target object from the captured image and estimates the state of the detection target object;
A movement information calculation unit for calculating movement information of a moving object in the captured image;
A learning data generation unit that generates learning data by determining the status of the moving object based on the movement information;
An additional learning unit that updates the dictionary based on the learning data;
An object state estimation device comprising:

Shooting and generating a shot image;
Using a dictionary, detecting a specific detection target object from the captured image, and estimating the state of the detection target object;
Calculating movement information of a moving object in the captured image;
Generating learning data by determining the status of the moving object based on the movement information;
Updating the dictionary based on the learning data;
An object state estimation method comprising:

A computer of an object state estimation device that is communicably connected to a camera that takes a photograph and generates a photographed image and acquires the photographed image from the camera,
Using a dictionary, an object detection unit that detects a specific detection target object from the captured image and estimates the state of the detection target object;
A movement information calculation unit for calculating movement information of a moving object in the captured image;
A learning data generation unit that generates learning data by determining the status of the moving object based on the movement information; and an additional learning unit that updates the dictionary based on the learning data;
Object situation estimation program to function as.