JP7167850B2

JP7167850B2 - Apparatus for generating trained model set for person determination, generation method, generation program, person region determination apparatus, determination method, and determination program

Info

Publication number: JP7167850B2
Application number: JP2019095945A
Authority: JP
Inventors: 卓也小倉
Original assignee: JVCKenwood Corp
Current assignee: JVCKenwood Corp
Priority date: 2019-05-22
Filing date: 2019-05-22
Publication date: 2022-11-09
Anticipated expiration: 2039-05-22
Also published as: JP2020190931A

Description

本発明は、人物判定用の学習済みモデルセットの生成装置、生成方法、生成プログラム、人物領域判定装置、判定方法、および判定プログラムに関する。 BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a trained model set generation apparatus, generation method, generation program, person area determination apparatus, determination method, and determination program for a person determination.

車載カメラから得られた撮像画像を解析して、進行方向の歩行者を検出する技術が知られている。車載カメラが搭載された車両は多様な環境下で走行することが想定されるため、歩行者の検出精度を高めることが困難であった。そこで、ソナーやレーダーによる検出結果を併せて利用し、その検出結果と最も整合する識別辞書を選択して撮像画像を解析する技術等が検討されている（例えば、特許文献１参照）。 There is known a technique of analyzing captured images obtained from an in-vehicle camera and detecting pedestrians in the traveling direction. Vehicles equipped with in-vehicle cameras are expected to travel in a variety of environments, making it difficult to improve pedestrian detection accuracy. Therefore, techniques for analyzing captured images by using detection results from sonar and radar, selecting an identification dictionary that best matches the detection results, and the like are being studied (see, for example, Patent Document 1).

特開２０１６－３１５６４号公報JP 2016-31564 A

ソナー等の他のセンサー出力を利用することなく、入力画像から人物の有無を判定したい。このような要望に応えられるものとして、入力された撮像画像に人物が写っているかを判定する学習済みモデルを利用した判定装置が考えられる。しかし、このような判定装置を車両に搭載すると、上述のように大きく変化する走行環境の違いから、やはり検出精度が所望の水準に達しないという問題に直面している。 I want to determine the presence or absence of a person from an input image without using other sensor outputs such as sonar. A determination device that uses a trained model to determine whether or not a person appears in an input captured image is conceivable as a device that can meet such a demand. However, when such a determination device is installed in a vehicle, there is the problem that the detection accuracy does not reach the desired level due to the difference in the driving environment that changes greatly as described above.

本発明は、このような問題を解決するためになされたものであり、撮像環境が変化しても入力画像に人物が写っているか否かを高い確率で判定できる人物判定用の学習済みモデルの生成装置、およびその学習済みモデルを搭載した人物領域判定装置等を提供することを目的とする。 The present invention has been made to solve such problems, and is a trained model for person determination that can determine with a high probability whether or not a person appears in an input image even if the imaging environment changes. It is an object of the present invention to provide a generation device and a person region determination device or the like equipped with a trained model thereof.

本発明の第１の態様における人物判定用の学習済みモデルセットの生成装置は、第１対象地域で撮像された、人物が写る第１人物画像データを正解、人物が写らない第１非人物画像データを不正解として与えて教師あり学習をさせることにより、入力画像データに人物が写っている確からしさをスコアとして出力する第１の学習済みモデルを生成する第１生成部と、第１対象地域とは異なる第２対象地域で撮像された、人物が写る第２人物画像データを正解、人物が写らない第２非人物画像データを不正解として与えて教師あり学習をさせることにより、入力画像データに人物が写っている確からしさをスコアとして出力する第２の学習済みモデルを生成する第２生成部とを備える。 According to the first aspect of the present invention, the apparatus for generating a trained model set for person determination corrects first person image data in which a person is captured, and first non-person images in which a person is not captured. a first generating unit that generates a first trained model that outputs a score indicating the likelihood that a person is shown in input image data by giving data as an incorrect answer and performing supervised learning; Second human image data captured in a second target area different from the input image data and a second generation unit that generates a second trained model that outputs a score indicating the probability that a person is captured in the image.

本発明の第２の態様における人物判定用の学習済みモデルセットの生成方法は、第１対象地域で撮像された、人物が写る第１人物画像データを正解、人物が写らない第１非人物画像データを不正解として与えて教師あり学習をさせることにより、入力画像データに人物が写っている確からしさをスコアとして出力する第１の学習済みモデルを生成する第１生成ステップと、第１対象地域とは異なる第２対象地域で撮像された、人物が写る第２人物画像データを正解、人物が写らない第２非人物画像データを不正解として与えて教師あり学習をさせることにより、入力画像データに人物が写っている確からしさをスコアとして出力する第２の学習済みモデルを生成する第２生成ステップとを含む。 A method for generating a trained model set for person determination according to a second aspect of the present invention is to correct first person image data in which a person is captured and first non-person image data in which a person is not captured, captured in a first target area. a first generation step of generating a first trained model that outputs a score indicating the likelihood that a person is shown in input image data by giving data as an incorrect answer and performing supervised learning; Second human image data captured in a second target area different from the input image data and a second generation step of generating a second trained model that outputs a score indicating the probability that a person is captured in the image.

本発明の第３の態様における人物判定用の学習済みモデルセットの生成プログラムは、第１対象地域で撮像された、人物が写る第１人物画像データを正解、人物が写らない第１非人物画像データを不正解として与えて教師あり学習をさせることにより、入力画像データに人物が写っている確からしさをスコアとして出力する第１の学習済みモデルを生成する第１生成ステップと、第１対象地域とは異なる第２対象地域で撮像された、人物が写る第２人物画像データを正解、人物が写らない第２非人物画像データを不正解として与えて教師あり学習をさせることにより、入力画像データに人物が写っている確からしさをスコアとして出力する第２の学習済みモデルを生成する第２生成ステップとをコンピュータに実行させる。 According to the third aspect of the present invention, a program for generating a learned model set for person determination corrects first person image data in which a person is captured, and first non-person image data in which a person is not captured, captured in a first target area. a first generation step of generating a first trained model that outputs a score indicating the likelihood that a person is shown in input image data by giving data as an incorrect answer and performing supervised learning; Second human image data captured in a second target area different from the input image data and a second generation step of generating a second trained model outputting a score indicating the likelihood that a person is captured in the image.

本発明の第４の態様における人物領域判定装置は、第１対象地域で撮像された、人物が写る第１人物画像データを正解、人物が写らない第１非人物画像データを不正解として与えて教師あり学習をさせることにより生成された、入力画像データに人物が写っている確からしさをスコアとして出力する第１の学習済みモデルと、第１対象地域とは異なる第２対象地域で撮像された、人物が写る第２人物画像データを正解、人物が写らない第２非人物画像データを不正解として与えて教師あり学習をさせることにより生成された、入力画像データに人物が写っている確からしさをスコアとして出力する第２の学習済みモデルとを含む人物判定用の学習済みモデルセットを記憶する記憶部と、移動体に搭載されたカメラが出力した撮像画像データを取得する取得部と、移動体が予め定められたいずれの対象地域に位置するかを判断する判断部と、移動体が第１対象地域に位置すると判断部が判断した場合に、記憶部から第１の学習済みモデルを読み出し、移動体が第２対象地域に位置すると判断部が判断した場合に、記憶部から第２の学習済みモデルを読み出す読出部と、取得部が取得した撮像画像データの画像に対し読出部が読み出した学習済みモデルを用いてスコアを算出させ、スコアが予め定められた閾値を超える特定の画像領域を人物領域と判定する判定部とを備える。 The human area determination device according to the fourth aspect of the present invention provides first human image data in which a person is captured as a correct answer and first non-human image data in which a person is not captured as an incorrect answer. A first trained model, which is generated by supervised learning and outputs the likelihood that a person appears in the input image data as a score, and a second target area different from the first target area. , the probability that a person is shown in the input image data generated by giving the second person image data in which a person is shown as a correct answer and the second non-person image data in which a person is not shown as an incorrect answer, and performing supervised learning. as a score, an acquisition unit for acquiring captured image data output by a camera mounted on a moving object, A determining unit for determining which predetermined target area the body is located in, and reading the first learned model from the storage unit when the determining unit determines that the moving object is located in the first target area. a reading unit that reads out the second trained model from the storage unit when the determining unit determines that the mobile object is located in the second target area; a determination unit that calculates a score using the learned model and determines that a specific image region whose score exceeds a predetermined threshold is a person region.

本発明の第５の態様における人物領域判定方法は、第１対象地域で撮像された、人物が写る第１人物画像データを正解、人物が写らない第１非人物画像データを不正解として与えて教師あり学習をさせることにより生成された、入力画像データに人物が写っている確からしさをスコアとして出力する第１の学習済みモデルと、第１対象地域とは異なる第２対象地域で撮像された、人物が写る第２人物画像データを正解、人物が写らない第２非人物画像データを不正解として与えて教師あり学習をさせることにより生成された、入力画像データに人物が写っている確からしさをスコアとして出力する第２の学習済みモデルとを含む人物判定用の学習済みモデルセットを記憶する記憶部を利用して人物領域を判定する人物領域判定方法であって、移動体に搭載されたカメラが出力した撮像画像データを取得する取得ステップと、移動体が予め定められたいずれの対象地域に位置するかを判断する判断ステップと、移動体が第１対象地域に位置すると判断ステップで判断された場合に、記憶部から第１の学習済みモデルを読み出し、移動体が第２対象地域に位置すると判断ステップで判断された場合に、記憶部から第２の学習済みモデルを読み出す読出ステップと、取得ステップで取得された撮像画像データの画像に対し読出ステップで読み出された学習済みモデルを用いてスコアを算出させ、スコアが予め定められた閾値を超える特定の画像領域を人物領域と判定する判定ステップとを含む。 In the human area determination method according to the fifth aspect of the present invention, first human image data in which a person is captured is given as a correct answer, and first non-human image data in which a person is not shown is given as an incorrect answer. A first trained model, which is generated by supervised learning and outputs the likelihood that a person appears in the input image data as a score, and a second target area different from the first target area. , the probability that a person is shown in the input image data generated by giving the second person image data in which a person is shown as a correct answer and the second non-person image data in which a person is not shown as an incorrect answer, and performing supervised learning. as a score, and a second learned model for outputting as a score. An acquisition step of acquiring captured image data output by a camera, a determination step of determining in which predetermined target area the moving object is located, and a determination step determining that the moving object is located in the first target area. a reading step of reading out the first learned model from the storage unit if the moving object is located in the second target area, and reading out the second learned model from the storage unit if the determination step determines that the moving object is located in the second target area; calculating a score using the learned model read out in the reading step for the image of the captured image data acquired in the acquiring step, and determining a specific image region in which the score exceeds a predetermined threshold value as a person region; and a determining step to

本発明の第６の態様における人物領域判定プログラムは、第１対象地域で撮像された、人物が写る第１人物画像データを正解、人物が写らない第１非人物画像データを不正解として与えて教師あり学習をさせることにより生成された、入力画像データに人物が写っている確からしさをスコアとして出力する第１の学習済みモデルと、第１対象地域とは異なる第２対象地域で撮像された、人物が写る第２人物画像データを正解、人物が写らない第２非人物画像データを不正解として与えて教師あり学習をさせることにより生成された、入力画像データに人物が写っている確からしさをスコアとして出力する第２の学習済みモデルとを含む人物判定用の学習済みモデルセットを記憶する記憶部を利用して人物領域を判定する人物領域判定プログラムであって、移動体に搭載されたカメラが出力した撮像画像データを取得する取得ステップと、移動体が予め定められたいずれの対象地域に位置するかを判断する判断ステップと、移動体が第１対象地域に位置すると判断ステップで判断された場合に、記憶部から第１の学習済みモデルを読み出し、移動体が第２対象地域に位置すると判断ステップで判断された場合に、記憶部から第２の学習済みモデルを読み出す読出ステップと、取得ステップで取得された撮像画像データの画像に対し読出ステップで読み出された学習済みモデルを用いてスコアを算出させ、スコアが予め定められた閾値を超える特定の画像領域を人物領域と判定する判定ステップとをコンピュータに実行させる。 A human area determination program according to a sixth aspect of the present invention gives first human image data in which a person is captured as a correct answer and first non-human image data in which a person is not captured as an incorrect answer. A first trained model, which is generated by supervised learning and outputs the likelihood that a person appears in the input image data as a score, and a second target area different from the first target area. , the probability that a person is shown in the input image data generated by giving the second person image data in which a person is shown as a correct answer and the second non-person image data in which a person is not shown as an incorrect answer, and performing supervised learning. as a score, and a second learned model for outputting as a score. An acquisition step of acquiring captured image data output by a camera, a determination step of determining in which predetermined target area the moving object is located, and a determination step determining that the moving object is located in the first target area. a reading step of reading out the first learned model from the storage unit if the moving object is located in the second target area, and reading out the second learned model from the storage unit if the determination step determines that the moving object is located in the second target area; calculating a score using the learned model read out in the reading step for the image of the captured image data acquired in the acquiring step, and determining a specific image region in which the score exceeds a predetermined threshold value as a person region; and a determination step to be performed by the computer.

本発明により、撮像環境が変化しても入力画像に人物が写っているか否かを高い確率で判定できる人物判定用の学習済みモデルの生成装置、およびその学習済みモデルを搭載した人物領域判定装置等を提供することができる。 INDUSTRIAL APPLICABILITY According to the present invention, a trained model generating apparatus for person determination that can determine with a high probability whether or not a person is captured in an input image even if the imaging environment changes, and a person area determination apparatus equipped with the learned model etc. can be provided.

都市部地域で撮像された画像から学習用の人物画像と非人物画像を作成する様子を示す図である。FIG. 4 is a diagram showing how human images and non-human images for learning are created from images captured in an urban area. 非都市部地域で撮像された画像から学習用の人物画像と非人物画像を作成する様子を示す図である。FIG. 10 is a diagram showing how human images and non-human images for learning are created from images captured in a non-urban area; 学習済みモデルセットを生成する生成装置の構成を示すブロック図である。FIG. 4 is a block diagram showing the configuration of a generation device that generates a trained model set; 学習済みモデルセットを生成する処理手順を示すフロー図である。FIG. 10 is a flow diagram showing a processing procedure for generating a trained model set; 判定装置を搭載した車両の室内から走行方向を観察した様子を示す概略図である。FIG. 3 is a schematic diagram showing a state in which the traveling direction is observed from inside the vehicle equipped with the determination device; 撮像画像から判定画像を切り出す手法を説明する説明図である。FIG. 10 is an explanatory diagram illustrating a method of cutting out a judgment image from a captured image; 判定画像が人物領域であるか否かを判定する手法を説明する説明図である。FIG. 10 is an explanatory diagram illustrating a method of determining whether or not a determination image is a person area; 入力画像から人物領域を判定する判定装置の構成を示すブロック図である。FIG. 3 is a block diagram showing the configuration of a determination device that determines a human region from an input image; 人物領域を判定する処理手順を示すフロー図である。FIG. 10 is a flow diagram showing a processing procedure for determining a person area;

以下、発明の実施の形態を通じて本発明を説明するが、特許請求の範囲に係る発明を以下の実施形態に限定するものではない。また、実施形態で説明する構成の全てが課題を解決するための手段として必須であるとは限らない。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, the present invention will be described through embodiments of the invention, but the invention according to the scope of claims is not limited to the following embodiments. Moreover, not all the configurations described in the embodiments are essential as means for solving the problems.

近時、入力された撮像画像に人物が写っているかを判定する学習済みモデルを利用した人物判定装置が知られるようになってきた。しかし、走行する車両から撮像された撮像画像を入力画像とすると、当該人物判定装置は、誤った判定結果を示すことが多かった。本願発明者は、その原因を追究したところ、画像中に人物と背景物体が混在して写り込み、しかも、背景物体が多様であることが主な原因であると理解した。特に、背景物体が多様なことに対応すべく、あらゆる物体を背景として人物が写り込む多くの画像を用意して機械学習を行っても、一定の水準を満たす検出精度は得られないことを見出した。 2. Description of the Related Art Recently, there has been known a person determination device that uses a trained model to determine whether a person is shown in an input captured image. However, when a captured image captured from a running vehicle is used as an input image, the person determination device often indicates an erroneous determination result. After investigating the cause, the inventors of the present application have found that the main cause is that a person and a background object appear together in an image, and that the background object is diverse. In particular, in order to deal with the variety of background objects, we found that even if machine learning was performed using many images in which people were captured against all kinds of background objects, it was not possible to achieve a certain level of detection accuracy. rice field.

本願発明者は、さらに原因を追究すると、入力された撮像画像がどの地域で撮像されたかという要因が、判定結果に大きな影響を与えることに気付いた。すなわち、背景物体として写り込む物体は地域ごとに特徴があるにも関わらず、このような特徴を考慮せずにあらゆる地域で撮像された画像を教師データとして学習させても、検出精度の高い学習済みモデルは得られないということに気付いた。そこで、背景物体の特徴を基準として地域を区分し、それぞれの地域で撮像された画像を教師画像として、地域ごとの学習済みモデルを生成する手法を構築するに至った。しかも、人物が写る画像を正解画像とするだけではなく、地域ごとに特徴的な背景物体が写るのみで人物が写っていない画像を不正解画像として教示することにより、それぞれの学習済みモデルにおける人物判定精度を向上させることに成功した。以下に、具体的に説明する。 The inventors of the present application further investigated the cause and found that the region in which the input captured image was captured had a large effect on the determination result. In other words, even though objects that appear as background objects have characteristics in each region, even if images taken in all regions are used as training data without considering such characteristics, learning with high detection accuracy can be achieved. I noticed that I can't get a ready model. Therefore, we constructed a method of classifying regions based on the characteristics of background objects and generating trained models for each region by using images captured in each region as teacher images. Moreover, not only images with people in them are treated as correct images, but also images with background objects that are characteristic of each region but without people in them are taught as incorrect images. We succeeded in improving the judgment accuracy. A specific description will be given below.

以下に説明する本実施形態に係る人物判定用の学習済みモデルセットは、車両に搭載されたカメラユニットで撮像された撮像画像から人物が写り込む領域を判定する人物領域判定装置に組み込まれることを想定している。具体的には、本実施形態に係る人物領域判定装置は、運転支援システム装置やドライブレコーダの一部として装備される。 The learned model set for person determination according to the present embodiment described below is incorporated into a person area determination device that determines an area in which a person is captured from an image captured by a camera unit mounted on a vehicle. I assume. Specifically, the human area determination device according to the present embodiment is installed as part of a driving support system device or a drive recorder.

様々な環境を走行し得る車両に装備された人物領域判定装置は、主に屋外環境を背景とする撮像画像から人物領域を判定する。このとき、屋外環境の背景として顕著な違いが現れるのは、都市部地域の環境とそれ以外の非都市部地域の環境との間である。そこで、本実施形態においては、第１対象地域として都市部地域を定め、第２対象地域として都市部以外の非都市部地域を定める。具体的には、人口密度や建築物の密集度等を基準として、車両が走行し得る地域を都市部地域と非都市部地域に区分して規定する。学習用画像データセットを生成する準備として、都市部地域および非都市部地域のそれぞれで撮像した画像を十分に用意する。そして、用意した画像から学習用の教師データを作成する。 A human region determining device installed in a vehicle capable of traveling in various environments determines a human region mainly from a captured image having an outdoor environment as a background. At this time, it is between the environment of the urban area and the environment of the non-urban area that a remarkable difference appears as the background of the outdoor environment. Therefore, in the present embodiment, an urban area is defined as the first target area, and a non-urban area other than the urban area is defined as the second target area. Specifically, based on factors such as population density and building density, areas in which vehicles can travel are divided into urban areas and non-urban areas. In preparation for generating the training image data set, sufficient images taken in both urban and non-urban areas are prepared. Then, teacher data for learning is created from the prepared images.

図１は、都市部地域で撮像された画像から学習用の教師データとしての人物画像と非人物画像を作成する様子を示す図である。図示するように、まず都市部地域で撮像された画像を多数用意する。そして、画像中から予め設定されたアスペクト比で人物領域と非人物領域を切り出す。ここで、アスペクト比は、屋外環境において人は一般的に立った姿勢であることが多いことから、横の長さよりも縦の長さが大きく設定されることが好ましい。具体的には、縦：横＝２：１、４：３、５：２等であり、本実施例においては、点線の枠で示すように、２：１のアスペクト比を採用する。 FIG. 1 is a diagram showing how a human image and a non-human image are created as teaching data for learning from images captured in an urban area. As shown in the figure, first, a large number of images captured in an urban area are prepared. Then, a person area and a non-person area are cut out from the image at a preset aspect ratio. Here, as for the aspect ratio, it is preferable that the vertical length is set larger than the horizontal length, because people generally stand in an outdoor environment in many cases. Specifically, length:width=2:1, 4:3, 5:2, etc. In this embodiment, as indicated by the dotted line frame, an aspect ratio of 2:1 is employed.

都市部地域で撮像された画像から人物が写り込む領域（人物領域）を２：１のアスペクト比で切り出し、規定の大きさになるように拡大処理または縮小処理して、人物画像の画像データを生成する。人物画像データは、正解のタグ情報が付与される。なお、人物が写り込む領域の切り出しについては、対象となる人物の全身が収まるように、切り出し枠の大きさが調整される。このとき、人物の周辺に都市部地域を構成する物体、典型的には人工物が写り込んでいることが好ましい。なお、１つの切り出し領域に複数の人物が写り込んでいても構わない。この場合は、複数の人物の全身が収まるように、切り出し枠の大きさが調整されることが好ましい。また、自転車に乗った人物、ベンチに座る人物等を人物領域として切り出しても良い。 An area in which a person appears (person area) is cut out from an image taken in an urban area with an aspect ratio of 2:1, and then enlarged or reduced to a specified size, and the image data of the person image is obtained. Generate. Correct tag information is added to the person image data. In addition, regarding the clipping of the area in which the person is captured, the size of the clipping frame is adjusted so that the whole body of the target person is included. At this time, it is preferable that an object, typically an artificial object, that constitutes an urban area is reflected around the person. It should be noted that a plurality of persons may be captured in one cropped area. In this case, it is preferable that the size of the clipping frame is adjusted so that the whole bodies of the plurality of persons can be accommodated. Also, a person riding a bicycle, a person sitting on a bench, or the like may be cut out as a person area.

同様に、都市部地域で撮像された画像から人物が写っていない領域（非人物領域）を２：１のアスペクト比で切り出し、規定の大きさになるように拡大処理または縮小処理して、非人物画像の画像データを生成する。非人物画像データは、不正解のタグ情報が付与される。ここで、切り出しの対象とする非人物領域は、都市部地域を構成する物体、典型的には人工物のうち、その輪郭がおよそ２：１であるものが選ばれる。図の例では、歩行者用信号機や、ビルであり、この他にも電柱、窓、標識、看板等も対象となり得る。一般的に、このようなアスペクト比の輪郭を有する背景物体は、都市部地域において人物と誤認識される場合が多いので、本実施形態においては、非人物画像の画像データを不正解の画像データとして積極的に学習させる。 Similarly, a region (non-human region) in which no person is captured from an image captured in an urban area is cut out at an aspect ratio of 2:1, and is enlarged or reduced to a specified size to obtain a non-human image. Generate image data of a person image. Incorrect tag information is added to the non-person image data. Here, the non-human area to be cut out is selected from among objects, typically man-made objects, that make up the urban area and whose contours are about 2:1. In the example of the figure, the target is a traffic light for pedestrians and a building. In general, a background object having a contour with such an aspect ratio is often erroneously recognized as a person in an urban area. actively learn as

図２は、非都市部地域で撮像された画像から学習用の教師データとしての人物画像と非人物画像を作成する様子を示す図である。図示するように、まず非都市部地域で撮像された画像を多数用意する。そして、都市部地域の画像に対して採用したアスペクト比と同じアスペクト比で、人物領域と非人物領域を切り出す。 FIG. 2 is a diagram showing how a human image and a non-human image are created as teaching data for learning from images captured in a non-urban area. As illustrated, first, a large number of images captured in a non-urban area are prepared. Then, the person area and the non-person area are cut out with the same aspect ratio as the aspect ratio adopted for the image of the urban area.

非都市部地域で撮像された画像から人物が写り込む領域（人物領域）を２：１のアスペクト比で切り出し、規定の大きさになるように拡大処理または縮小処理して、人物画像の画像データを生成する。人物画像データは、正解のタグ情報が付与される。都市部地域で人物画像データを生成する場合と同様に、人物が写り込む領域の切り出しについては、対象となる人物の全身が収まるように、切り出し枠の大きさが調整される。このとき、人物の周辺に非都市部地域を構成する物体、典型的には自然物が写り込んでいることが好ましい。なお、１つの切り出し領域に複数の人物が写り込んでいても構わない。 Image data of a person image by extracting an area in which a person appears (human area) from an image taken in a non-urban area with an aspect ratio of 2:1, and then enlarging or reducing it to a specified size. to generate Correct tag information is added to the person image data. As in the case of generating person image data in an urban area, when extracting an area in which a person is captured, the size of the clipping frame is adjusted so that the entire body of the target person is included. At this time, it is preferable that an object constituting a non-urban area, typically a natural object, is captured around the person. It should be noted that a plurality of persons may be captured in one cropped area.

同様に、非都市部地域で撮像された画像から人物が写っていない領域（非人物領域）を２：１のアスペクト比で切り出し、規定の大きさになるように拡大処理または縮小処理して、非人物画像の画像データを生成する。非人物画像データは、不正解のタグ情報が付与される。ここで、切り出しの対象とする非人物領域は、非都市部地域を構成する物体、典型的には自然物のうち、その輪郭がおよそ２：１であるものが選ばれる。図の例では、樹木であり、この他にも雑草、田畑等も対象となり得る。ただし、本実施形態のように生成された学習済みモデルセットをドライブレコーダの一部として装備される人物領域判定装置に組み込むような場合には、非都市部地域の道路を走行中に観察される人工物も非人物領域として切り出しても良い。例えば、踏切の警報器や交通標識も対象として良い。このようなアスペクト比の輪郭を有する背景物体は、非都市部地域において人物と誤認識される場合が多いので、本実施形態においては、非人物画像の画像データを不正解の画像データとして積極的に学習させる。 Similarly, from an image captured in a non-urban area, a region in which no person is shown (non-human region) is cut out at an aspect ratio of 2:1, and enlarged or reduced to a specified size. Generate image data of a non-person image. Incorrect tag information is added to the non-person image data. Here, the non-human area to be cut out is selected from objects, typically natural objects, that make up the non-urban area and whose contours are about 2:1. In the example shown in the figure, the object is trees, but weeds, fields, and the like can also be objects. However, when the trained model set generated as in the present embodiment is incorporated into a human area determination device equipped as part of a drive recorder, it is observed while driving on a road in a non-urban area. An artificial object may also be cut out as a non-human area. For example, railroad crossing alarms and traffic signs may also be targeted. A background object having a contour with such an aspect ratio is often erroneously recognized as a person in a non-urban area. to learn.

次に、このような前準備によって用意された都市部地域における人物画像データと非人物画像データ、および非都市部地域における人物画像データと非人物画像データを教師データとして人物判定用の学習済みモデルセットを生成する生成装置１００について説明する。図３は、学習済みモデルセットを生成する生成装置１００の構成を示すブロック図である。生成装置１００は、機械学習装置であり、汎用の計算機を用いることもできる。なお、ＧＰＧＰＵ(General-Purpose computing on Graphics Processing Units)や大規模ＰＣクラスターを利用すれば、より高速に処理できる。 Next, human image data and non-human image data in urban areas and human image data and non-human image data in non-urban areas prepared by such preparations are used as training data for a trained model for person determination. A generation device 100 that generates a set will be described. FIG. 3 is a block diagram showing the configuration of the generation device 100 that generates a trained model set. The generation device 100 is a machine learning device, and a general-purpose computer can also be used. Higher speed processing can be achieved by using GPGPUs (General-Purpose computing on Graphics Processing Units) or large-scale PC clusters.

生成装置１００は、主に、画像取得部１１０、操作入力部１２０、演算部１３０、出力部１４０を備える。画像取得部１１０は、上述のように用意された都市部地域における人物画像データと非人物画像データ、および非都市部地域における人物画像データと非人物画像データを取り込む。それぞれの人物画像データおよび非人物画像データがネットワークを介して送られて来るのであれば、画像取得部１１０は、例えばＬＡＮインタフェースである。 The generation device 100 mainly includes an image acquisition unit 110 , an operation input unit 120 , a calculation unit 130 and an output unit 140 . The image acquiring unit 110 acquires the person image data and non-person image data in urban areas and the person image data and non-person image data in non-urban areas prepared as described above. If the person image data and the non-person image data are sent via a network, the image acquisition unit 110 is, for example, a LAN interface.

操作入力部１２０は、ユーザの入力操作を受け付けるインタフェースであり、モニタに重畳されたタッチパネルやキーボード等である。画像取得部１１０が取得する画像データに「正解」または「不正解」のタグ情報が付与されていない場合には、ユーザは、当該画像データの画像を確認しながら操作入力部１２０を操作して「正解」または「不正解」の情報を当該画像データに付与することができる。また、タグ情報に誤りがある場合には、修正することができる。 The operation input unit 120 is an interface that receives user input operations, and is a touch panel, a keyboard, or the like superimposed on a monitor. When the image data acquired by the image acquisition unit 110 is not tagged with “correct” or “incorrect” tag information, the user operates the operation input unit 120 while checking the image of the image data. "Correct" or "Incorrect" information can be added to the image data. Also, if there is an error in the tag information, it can be corrected.

演算部１３０は、処理プロセッサであり、生成装置１００全体の制御とさまざまな演算処理とを担う。演算部１３０が担う機能実行部としての第１生成部１３１は、画像取得部１１０で取得された画像データのうち、都市部地域で撮像された画像から切り出されて作成された人物画像データを正解、非人物画像データを不正解として教師あり学習を行う。このような機械学習を行うことにより、入力画像データに人物が写っている確からしさをスコアとして出力する都市部地域用学習済みモデルである第１モデルを生成する。演算部１３０が担う機能実行部としての第２生成部１３２は、画像取得部１１０で取得された画像データのうち、非都市部地域で撮像された画像から切り出されて作成された人物画像データを正解、非人物画像データを不正解として教師あり学習を行う。 The calculation unit 130 is a processor, and is responsible for controlling the entire generation device 100 and performing various calculation processes. A first generation unit 131 serving as a function execution unit of the calculation unit 130 corrects human image data created by cutting out an image captured in an urban area from among the image data acquired by the image acquisition unit 110. , supervised learning is performed with non-human image data as incorrect answers. By performing such machine learning, a first model, which is a trained model for urban areas, is generated that outputs a score indicating the probability that a person appears in the input image data. A second generation unit 132 serving as a function execution unit of the calculation unit 130 generates human image data extracted from an image captured in a non-urban area from among the image data acquired by the image acquisition unit 110. Supervised learning is performed with correct answers and non-human image data as incorrect answers.

このような機械学習を行うことにより、入力画像データに人物が写っている確からしさをスコアとして出力する非都市部地域用学習済みモデルである第２モデルを生成する。本実施形態においてスコアは、０から１までの値として出力され、１に近いほど人物が写っている確からしさが大きいことを表す。したがって、第１生成部１３１および第２生成部１３２は、人物画像データが入力された場合の出力値と１との誤差を減らすように、また非人物画像データが入力された場合の出力と０との誤差を減らすように、例えば誤差逆伝播法を用いてエッジの重みを調整する学習を繰り返す。 By performing such machine learning, a second model, which is a trained model for non-urban areas, is generated, which outputs a score indicating the probability that a person appears in the input image data. In this embodiment, the score is output as a value between 0 and 1, and the closer to 1, the higher the probability that a person is captured. Therefore, the first generation unit 131 and the second generation unit 132 reduce the error between the output value of 1 when the person image data is input, and the output value of 0 when the non-person image data is input. Learning to adjust edge weights using, for example, error backpropagation is repeated so as to reduce the error between .

出力部１４０は、第１生成部１３１が生成した第１モデル、第２生成部１３２が生成した第２モデルを学習済みモデルセットとして外部装置へ出力する。外部装置がネットワークを介して接続されているのであれば、出力部１４０は、例えばＬＡＮインタフェースである。画像取得部１１０も同一のインタフェースで構成されるのであれば、１つのインタフェースを画像取得部１１０と出力部１４０で共用しても良い。外部装置は、例えば、出力された第１モデルおよび第２モデルを記憶しておくための記憶装置や、出力された第１モデルおよび第２モデルを後述する判定装置へ実装するためのシステムサーバである。 The output unit 140 outputs the first model generated by the first generation unit 131 and the second model generated by the second generation unit 132 to an external device as a trained model set. If the external device is connected via a network, the output unit 140 is, for example, a LAN interface. If the image acquisition unit 110 is also configured with the same interface, one interface may be shared by the image acquisition unit 110 and the output unit 140 . The external device is, for example, a storage device for storing the output first model and second model, or a system server for mounting the output first model and second model in a determination device to be described later. be.

なお、生成装置１００は、図１および図２を用いて説明した前準備を行うための構成を備えていても良い。生成装置１００が前準備を行うための構成を備えるのであれば、学習済みモデルセットを生成するまでの処理を、一貫して行うことができる。具体的には、画像取得部１１０が都市部地域および非都市部地域でそれぞれ撮像された画像の画像データを取り込むと、ユーザは、不図示のモニタに表示されたそれらの画像を視認しながら、操作入力部１２０を操作して上述の切り出し作業を行う。演算部１３０は、切り出された領域を規定の大きさになるように拡大処理または縮小処理して、それぞれの地域に対応する人物画像データおよび非人物画像データを生成する。また、ユーザは、操作入力部１２０を操作してそれぞれの画像データに「正解」または「不正解」のタグ情報を付与する。このように準備された人物画像データと非人物画像データは、不図示の記憶部へ一旦記憶され、第１生成部１３１および第２生成部１３２へ順次供給される。 Note that the generation device 100 may have a configuration for performing the preparation described with reference to FIGS. 1 and 2 . As long as the generation device 100 has a configuration for making preparations, it is possible to consistently perform processing up to generation of a trained model set. Specifically, when the image acquisition unit 110 acquires image data of images captured in an urban area and a non-urban area, the user visually recognizes the images displayed on a monitor (not shown) and The operation input unit 120 is operated to perform the above-described extraction work. The calculation unit 130 enlarges or reduces the cut-out area to a specified size, and generates person image data and non-person image data corresponding to each area. Also, the user operates the operation input unit 120 to add tag information of “correct” or “incorrect” to each image data. The person image data and the non-person image data prepared in this manner are temporarily stored in a storage unit (not shown) and sequentially supplied to the first generation unit 131 and the second generation unit 132 .

図４は、学習済みモデルセットを生成する処理手順を示すフロー図である。ここでは、「正解」または「不正解」のタグ情報が付与され人物画像データと非人物画像データが画像取得部１１０によって取得された状態からの処理手順を説明する。 FIG. 4 is a flow diagram showing a processing procedure for generating a trained model set. Here, a processing procedure will be described from a state in which the image acquisition unit 110 acquires the person image data and the non-person image data with tag information of “correct” or “incorrect”.

第１生成部１３１は、ステップＳ１０１で、都市部地域の人物画像データを選択的に取り込み、学習を実行する。具体的には、教師データである都市部地域の人物画像データを入力すると１に近い値を出力するように、第１モデルのエッジの重みを調整する。第１生成部１３１は、ステップＳ１０２で、都市部地域の非人物画像データを選択的に取り込み、学習を実行する。具体的には、教師データである都市部地域の非人物画像データを入力すると０に近い値を出力するように、第１モデルのエッジの重みを調整する。ステップＳ１０１とステップＳ１０２は逆順であっても良いし、取り込んだ都市部地域の画像データが人物画像データか非人物画像データかを認識して、それに応じた処理を行っても良い。第１生成部１３１は、第１モデルの学習が完了したら、ステップＳ１０３で、出力部１４０から当該第１モデルを出力する。 In step S101, the first generating unit 131 selectively takes in human image data in an urban area and performs learning. Specifically, the edge weight of the first model is adjusted so that a value close to 1 is output when human image data in an urban area, which is training data, is input. In step S102, the first generation unit 131 selectively takes in non-human image data of an urban area and performs learning. Specifically, the edge weight of the first model is adjusted so that a value close to 0 is output when non-human image data of an urban area, which is teacher data, is input. Steps S101 and S102 may be performed in reverse order, or it may be recognized whether the captured image data of an urban area is human image data or non-human image data, and processing may be performed accordingly. After completing the learning of the first model, the first generation unit 131 outputs the first model from the output unit 140 in step S103.

第２生成部１３２は、ステップＳ１０４で、非都市部地域の人物画像データを選択的に取り込み、学習を実行する。具体的には、教師データである非都市部地域の人物画像データを入力すると１に近い値を出力するように、第２モデルのエッジの重みを調整する。第２生成部１３２は、ステップＳ１０４で、非都市部地域の非人物画像データを選択的に取り込み、学習を実行する。具体的には、教師データである非都市部地域の非人物画像データを入力すると０に近い値を出力するように、第２モデルのエッジの重みを調整する。ステップＳ１０４とステップＳ１０５は逆順であっても良いし、取り込んだ非都市部地域の画像データが人物画像データか非人物画像データかを認識して、それに応じた処理を行っても良い。第２生成部１３２は、第２モデルの学習が完了したら、ステップＳ１０６で、出力部１４０から当該第２モデルを出力する。 In step S104, the second generation unit 132 selectively takes in the person image data in the non-urban area and performs learning. Specifically, the weight of the edge of the second model is adjusted so that a value close to 1 is output when human image data in a non-urban area, which is teacher data, is input. In step S104, the second generation unit 132 selectively captures non-human image data of non-urban areas and performs learning. Specifically, the edge weight of the second model is adjusted so that a value close to 0 is output when non-human image data of a non-urban area, which is teacher data, is input. Steps S104 and S105 may be performed in the reverse order, or it may be recognized whether the captured image data of the non-urban area is human image data or non-human image data, and processing may be performed accordingly. After completing the learning of the second model, the second generation unit 132 outputs the second model from the output unit 140 in step S106.

第１モデルおよび第２モデルの出力が完了したら、一連の処理を終了する。なお、ステップＳ１０１からステップＳ１０３における第１生成部１３１の処理と、ステップＳ１０４からステップＳ１０６における第２生成部１３２の処理の順序は入れ替えても良いし、それぞれの処理が並列して行われても良い。 When the output of the first model and the second model is completed, the series of processing ends. The order of the processing of the first generation unit 131 in steps S101 to S103 and the processing of the second generation unit 132 in steps S104 to S106 may be changed, or the respective processing may be performed in parallel. good.

次に、生成装置１００によって生成された学習済みモデルセットが組み込まれた判定装置３００について説明する。判定装置３００は、人物領域判定装置の一例である。本実施形態において人物領域を判定する判定装置３００は、運転支援システム装置やドライブレコーダの一部として装備される。図５は、判定装置３００を搭載した車両の室内から走行方向を観察した様子を示す概略図である。なお、図５においては、搭乗者の描画を省いている。 Next, the determination device 300 incorporating the trained model set generated by the generation device 100 will be described. Determination device 300 is an example of a person region determination device. In this embodiment, the determination device 300 that determines the human region is equipped as part of a driving support system device or a drive recorder. FIG. 5 is a schematic diagram showing a state in which the traveling direction is observed from the interior of the vehicle in which the determination device 300 is mounted. It should be noted that illustration of passengers is omitted in FIG.

判定装置３００は、カメラユニット３２０を備え、カメラユニット３２０は、フロントガラス越しに前方の様子を撮像する。判定装置３００は、ナビシステム５００と連動しており、カメラユニット３２０が撮像した画像は、ナビシステム５００が備える表示パネル５１０に撮像画像５１２として表示される。 The determination device 300 includes a camera unit 320, and the camera unit 320 captures an image of the front through the windshield. The determination device 300 is linked with the navigation system 500 , and the image captured by the camera unit 320 is displayed as the captured image 512 on the display panel 510 provided in the navigation system 500 .

判定装置３００は、車両が走行可能な状態においてカメラユニット３２０に撮像を行わせて、撮像画像をナビシステム５００へ送信する。さらに、判定装置３００は、当該撮像画像を解析して画像内に人物が写り込んでいるか否かを判定し、人物が写り込んでいる場合には、その領域の位置情報をナビシステム５００へ送信する。 The determination device 300 causes the camera unit 320 to take an image while the vehicle is in a drivable state, and transmits the taken image to the navigation system 500 . Further, the determination device 300 analyzes the captured image to determine whether or not a person appears in the image, and if a person appears in the image, transmits the position information of the area to the navigation system 500. do.

ナビシステム５００は、人物領域の位置情報を受信すると、撮像画像５１２の人物領域に対してＣＧで描画する人物枠５１３を重畳すると共に、警告表示５１４を表出させる。警告表示５１４は、例えば図示するように「注意！」のように、テキストやアニメーションを用いて目立つように表示される。警告は、音声を伴っても良い。運転者は、このような警告を認知することで、より安全な運転を行うことができる。 Upon receiving the position information of the person area, the navigation system 500 superimposes a person frame 513 drawn by CG on the person area of the captured image 512 and displays a warning display 514 . Warning display 514 is prominently displayed using text or animation, for example, "Attention!" as shown. The warning may be accompanied by sound. By recognizing such a warning, the driver can drive more safely.

次に、カメラユニット３２０が撮像した撮像画像５１２から人物が写り込んでいる領域を判定するまでの処理について説明する。図６は、撮像画像５１２から判定画像ＣＦを切り出す手法を説明する説明図である。ここでは、車両が非都市部地域を走行している場合を想定して説明する。 Next, processing up to determining a region in which a person is captured from the captured image 512 captured by the camera unit 320 will be described. FIG. 6 is an explanatory diagram illustrating a method of cutting out the determination image CF from the captured image 512. FIG. Here, it is assumed that the vehicle is traveling in a non-urban area.

判定装置３００は、撮像画像５１２を取得すると、予め設定されたアスペクト比のフレーム窓ＦＷを生成する。ここで設定されているアスペクト比は、上述の第１モデルおよび第２モデルが学習した人物画像および非人物画像のアスペクト比と等しい。また、フレーム窓ＦＷは、撮像画像中に現れ得る人物の大きさを考慮して、異なる大きさのものが複数用意されており、それぞれの大きさのフレーム窓ＦＷに対して以下の処理が実行される。 When acquiring the captured image 512, the determination device 300 generates a frame window FW having a preset aspect ratio. The aspect ratio set here is equal to the aspect ratio of the human image and the non-human image learned by the above-described first model and second model. In addition, a plurality of frame windows FW of different sizes are prepared in consideration of the size of a person who may appear in the captured image, and the following processing is executed for the frame windows FW of each size. be done.

判定装置３００は、生成したフレーム窓ＦＷを撮像画像５１２の画像領域上を所定幅ずつずらしつつ走査する。そして、走査されるそれぞれの位置においてフレーム窓ＦＷに囲われた画像領域に対して輪郭抽出を行い、フレーム窓ＦＷに適応する物体が存在するか否かを判断する。具体的には、例えばフレーム窓ＦＷの面積に対して５０％以上の面積を有する閉じた輪郭が存在する場合に、適応する物体が存在すると判断する。 The determination device 300 scans the generated frame window FW over the image area of the captured image 512 while shifting it by a predetermined width. Then, contour extraction is performed on the image area surrounded by the frame window FW at each position to be scanned, and it is determined whether or not there is an object suitable for the frame window FW. Specifically, for example, when there is a closed contour having an area of 50% or more of the area of the frame window FW, it is determined that a suitable object exists.

判定装置３００は、フレーム窓ＦＷに適応する物体が存在すると判断した場合には、そのフレーム窓ＦＷで囲われた領域を判定画像ＣＦとして切り出す。切り出された判定画像ＣＦは、第１モデルおよび第２モデルが入力画像データとして受け付ける形式に整形され、順番に判定画像データＣＦ_１、ＣＦ_２、ＣＦ_３…として保存される。それぞれの判定画像データには、撮像画像５１２のどの領域を切り出したかを表す切出座標情報が関連付けられる。 If the determination device 300 determines that there is an object suitable for the frame window FW, the determination device 300 cuts out the area surrounded by the frame window FW as the determination image CF. The cut-out determination image CF is shaped into a format that the first model and the second model accept as input image data, and is stored in order as determination image data CF ₁ , CF ₂ , CF ₃ . Each piece of determination image data is associated with clipping coordinate information indicating which region of the captured image 512 has been clipped.

図７は、撮像画像５１２から切り出された判定画像ＣＦが人物領域であるか否かを判定する手法を説明する説明図である。判定装置３００は、上段に示す判定画像ＣＦ_ｍを、非都市部地域用学習済みモデルである第２モデルへ入力する。その結果としてスコア＝０．０１２７が出力されたとする。判定装置３００は、スコア＝０．０１２７が人物だと判定する閾値（例えば、０．７５）以下であるので、判定画像ＣＦ_ｍの領域は人物領域ではないと判定する。 FIG. 7 is an explanatory diagram illustrating a method for determining whether or not the determination image CF cut out from the captured image 512 is a person area. The determination device 300 inputs the determination image CF _m shown in the upper part to the second model, which is a trained model for non-urban areas. Assume that a score of 0.0127 is output as a result. The determination device 300 determines that the area of the determination image _CFm is not the person area because the score=0.0127 is equal to or less than the threshold value (for example, 0.75) for determining that the person is a person.

同様に、判定装置３００は、下段に示す判定画像ＣＦ_ｎを第２モデルへ入力する。その結果としてスコア＝０．８７５２が出力されたとする。判定装置３００は、スコア＝０．８７５２が閾値を超えているので、判定画像ＣＦ_ｎの領域は人物領域と判定する。判定装置３００は、人物領域と判定した判定画像ＣＦの切出座標情報と共に撮像画像５１２をナビシステム５００へ送信すると、ナビシステム５００は、上述のように撮像画像５１２に人物枠５１３を重畳して表示することができる。撮像画像５１２に複数の人物領域が存在する場合には、それぞれの領域に対して人物枠５１３が重畳される。カメラユニット３２０は、例えば３０ｆｐｓで撮像画像を生成するので、判定装置３００は、生成される撮像画像に対して逐次上述の処理を実行する。 Similarly, the determination device 300 inputs the determination image _CFn shown in the lower stage to the second model. Assume that a score of 0.8752 is output as a result. Since the score of 0.8752 exceeds the threshold, the determination device 300 determines that the area of the determination image _CFn is the human area. When the determination device 300 transmits the captured image 512 to the navigation system 500 together with the cutout coordinate information of the determination image CF determined as the human area, the navigation system 500 superimposes the human frame 513 on the captured image 512 as described above. can be displayed. When a plurality of person areas exist in the captured image 512, a person frame 513 is superimposed on each area. Since the camera unit 320 generates captured images at, for example, 30 fps, the determination device 300 sequentially performs the above-described processing on the generated captured images.

なお、図５から図７の例は、車両が非都市部地域を走行する例であったが、都市部地域を走行する場合には、適用する学習済みモデルが都市部地域用学習済みモデルである第１モデルに切り替えられる。その他の処理については、都市部地域を走行する場合も同様である。このような手法により、都市部地域と非都市部地域のそれぞれに対応する学習済みモデルを用意して、現時点の環境に合わせて選択的にこれらを使い分ければ、それぞれの環境に応じて、入力画像に人物が写っているか否かを高い確率で判定することができる。 5 to 7 are examples in which the vehicle travels in a non-urban area. Switch to some first model. Other processing is the same when traveling in an urban area. Using this method, trained models corresponding to both urban and non-urban areas can be prepared and selectively used according to the current environment. It is possible to determine with high probability whether or not a person appears in the image.

次に、入力画像から人物領域を判定する判定装置３００の構成について説明する。図８は、判定装置３００の構成を示すブロック図である。判定装置３００は、主に、演算部３１０、カメラユニット３２０、画像処理部３３０、ＧＰＳユニット３４０、出力部３５０、記憶部３６０を備える。演算部３１０は、処理プロセッサであり、判定装置３００全体の制御とさまざまな演算処理とを担う。カメラユニット３２０は、レンズや撮像素子を含み、周辺環境を撮像して撮像信号を生成し、画像処理部３３０へ出力する。画像処理部３３０は、カメラユニット３２０から撮像信号を受け取って、規定のフォーマットに従って撮像画像データを生成する。また、画像処理部３３０は、当該撮像画像データに対して上述の切出し処理や整形処理等を実行して判定画像データを生成する。なお、カメラユニット３２０は、外部装置として判定装置３００に有線または無線で接続される構成であっても良い。この場合、判定装置３００は、カメラユニット３２０から画像信号を受け取るインタフェースとしての画像取得部を備えれば良い。 Next, the configuration of the determination device 300 that determines a human region from an input image will be described. FIG. 8 is a block diagram showing the configuration of the determination device 300. As shown in FIG. The determination device 300 mainly includes an arithmetic unit 310 , a camera unit 320 , an image processing unit 330 , a GPS unit 340 , an output unit 350 and a storage unit 360 . The calculation unit 310 is a processor, and is responsible for controlling the entire determination device 300 and performing various calculation processes. The camera unit 320 includes a lens and an imaging element, images the surrounding environment, generates an imaging signal, and outputs the imaging signal to the image processing section 330 . The image processing section 330 receives an imaging signal from the camera unit 320 and generates captured image data according to a specified format. The image processing unit 330 also executes the above-described clipping processing, shaping processing, and the like on the captured image data to generate judgment image data. Note that the camera unit 320 may be configured to be connected to the determination device 300 by wire or wirelessly as an external device. In this case, the determination device 300 may include an image acquisition section as an interface that receives image signals from the camera unit 320 .

ＧＰＳユニット３４０は、ＧＰＳ信号を受信し、現在位置を緯度経度情報に変換して演算部３１０へ引き渡す。なお、ＧＰＳユニット３４０は、外部装置として判定装置３００に有線または無線で接続される構成であっても良い。この場合、判定装置３００は、ＧＰＳユニット３４０から現在位置の緯度経度情報を受け取るインタフェースとしての情報取得部を備えれば良い。出力部３５０は、上述のように、生成されたカメラユニット３２０で撮像され画像処理部３３０で生成された撮像データと、人物領域と判定した判定画像ＣＦの切出座標情報を外部装置（本実施形態においては、ナビシステム５００）へ出力する。 The GPS unit 340 receives GPS signals, converts the current position into latitude and longitude information, and transfers the information to the calculation unit 310 . Note that the GPS unit 340 may be configured to be connected to the determination device 300 by wire or wirelessly as an external device. In this case, the determination device 300 may include an information acquisition section as an interface for receiving the latitude and longitude information of the current position from the GPS unit 340 . As described above, the output unit 350 outputs the generated captured image data captured by the camera unit 320 and generated by the image processing unit 330 and the extraction coordinate information of the determination image CF determined as the person region to an external device (this embodiment). In the form, it outputs to the navigation system 500).

記憶部２１７は、例えばＳＳＤ（Solid State Drive）であり、判定装置３００を制御するための制御プログラムや人物用域の判定演算を行わせるための判定プログラムの他にも、制御や演算に用いられる様々なパラメータ値、関数、ルックアップテーブル等を記憶している。特に、生成装置１００で生成された第１モデル３６１および第２モデル３６２、判定装置３００が利用されると想定される範囲の地図情報３６３を格納している。地図情報３６３の地図には、上述のように規定された都市部地域と非都市部地域が記述されている。 The storage unit 217 is, for example, an SSD (Solid State Drive), and is used for control and calculation in addition to a control program for controlling the determination device 300 and a determination program for performing a determination calculation for a human area. It stores various parameter values, functions, lookup tables, etc. In particular, it stores a first model 361 and a second model 362 generated by the generation device 100 and map information 363 of a range where the determination device 300 is expected to be used. The map of the map information 363 describes the urban area and the non-urban area defined as described above.

演算部３１０が担う機能実行部としての地域判断部３１１は、ＧＰＳユニット３４０から緯度経度情報を受け取り、地図情報３６３を参照して、現地点が都市部地域に位置するのか、非都市部地域に位置するのかを判断して、その判断結果を読出部３１２へ引き渡す。読出部３１２は、当該判断結果が都市部地域に位置するものであった場合に、記憶部３６０から第１モデル３６１を読み出し、非都市部地域に位置するものであった場合に、記憶部３６０から第２モデル３６２を読み出す。 An area determination unit 311 serving as a function execution unit of the calculation unit 310 receives latitude and longitude information from the GPS unit 340 and refers to the map information 363 to determine whether the local point is located in an urban area or not. It determines whether it is positioned, and hands over the determination result to the reading unit 312 . The reading unit 312 reads the first model 361 from the storage unit 360 when the determination result indicates that the model is located in an urban area, and reads the first model 361 from the storage unit 360 when the determination result indicates that the model is located in a non-urban area. Read the second model 362 from the .

判定部３１３は、画像処理部３３０が撮像画像データから生成した判定画像データを、読出部３１２が読み出した学習済みモデルに順次入力して、スコアを算出させる。算出されたスコアが予め設定された閾値を超えていれば、当該判定画像データに対応する撮像画像領域を人物領域と判定する。 The determination unit 313 sequentially inputs the determination image data generated from the captured image data by the image processing unit 330 to the trained model read by the reading unit 312 to calculate a score. If the calculated score exceeds a preset threshold value, the captured image area corresponding to the determination image data is determined as the person area.

図９は、人物領域を判定する処理手順を示すフロー図である。ここでは、車両が走行可能な状態になった時点からの処理手順を説明する。判定装置３００は、ステップＳ２０１で、撮像画像データを取得する。具体的には、画像処理部３３０が、カメラユニット３２０から撮像信号を受け取り、撮像画像データを生成する。地域判断部３１１は、ステップＳ２０２で、ＧＰＳユニット３４０から緯度経度情報を受け取り、記憶部３６０の地図情報３６３を参照して、現地点が都市部地域に位置するのか、非都市部地域に位置するのかを判断する。 FIG. 9 is a flow diagram showing a processing procedure for determining a person area. Here, the processing procedure from the time when the vehicle becomes drivable will be described. The determination device 300 acquires captured image data in step S201. Specifically, the image processing section 330 receives an imaging signal from the camera unit 320 and generates captured image data. In step S202, the region determination unit 311 receives the latitude and longitude information from the GPS unit 340, refers to the map information 363 in the storage unit 360, and determines whether the local point is located in an urban area or a non-urban area. determine whether

ステップＳ２０３へ進み、読出部３１２は、ステップＳ２０２の判断結果が都市部地域に位置するものであった場合に、記憶部３６０から第１モデル３６１を読み出し、非都市部地域に位置するものであった場合に、記憶部３６０から第２モデル３６２を読み出す。演算部３１０は、ステップＳ２０４で、画像処理部３３０と協働して、ステップＳ２０１で取得した撮像画像データの画像から判定画像ＣＦを切り出す。画像処理部３３０は、切り出した判定画像ＣＦから判定画像データを生成する。ステップＳ２０５へ進み、演算部３１０は、用意されたフレーム窓ＦＷの全てに対して走査処理を完了したか、すなわち、切出し処理を完了したか否かを判断する。完了していなければステップＳ２０４へ戻り、完了していればステップＳ２０６へ進む。 Proceeding to step S203, the reading unit 312 reads out the first model 361 from the storage unit 360 when the determination result of step S202 indicates that the model is located in the non-urban area. , the second model 362 is read from the storage unit 360 . In step S204, the calculation unit 310 cooperates with the image processing unit 330 to cut out the judgment image CF from the captured image data acquired in step S201. The image processing unit 330 generates determination image data from the cut-out determination image CF. Proceeding to step S205, the calculation unit 310 determines whether or not the scanning process has been completed for all of the prepared frame windows FW, that is, whether or not the clipping process has been completed. If not completed, the process returns to step S204, and if completed, the process proceeds to step S206.

判定部３１３は、ステップＳ２０６で、生成した判定画像データのそれぞれをステップＳ２０３で読み出した学習済みモデルへ順次入力して、スコアを算出させる。算出されたスコアが予め設定された閾値を超えていれば、当該判定画像データに対応する撮像画像領域を人物領域と判定する。ここで適用する閾値は、第１モデル３６１が出力したスコアに対して用いる第１閾値と、第２モデル３６２が出力したスコアに対して用いる第２閾値を設定しても良い。すなわち、第１モデル３６１を用いるか第２モデル３６２を用いるかで適用する閾値を異ならせても良い。この場合、第１閾値は、第２閾値よりも大きいことが好ましい。都市部地域において背景物体となり得る人工物は多種多様であるので、第１モデルは、判定画像に人物が含まれていなくても比較的高いスコアを出力する傾向にある。このように、第１閾値を第２閾値よりも大きな値とすれば、都市部地域において人物領域をより正しく判定することができる。 In step S206, the determination unit 313 sequentially inputs each of the generated determination image data to the trained model read out in step S203 to calculate a score. If the calculated score exceeds a preset threshold value, the captured image area corresponding to the determination image data is determined as the person area. As the thresholds applied here, a first threshold used for the score output by the first model 361 and a second threshold used for the score output by the second model 362 may be set. That is, different thresholds may be applied depending on whether the first model 361 or the second model 362 is used. In this case, the first threshold is preferably greater than the second threshold. Since there are a wide variety of artifacts that can be background objects in urban areas, the first model tends to output a relatively high score even if the determination image does not include a person. In this way, if the first threshold is set to a value larger than the second threshold, it is possible to more accurately determine the person area in the urban area.

ステップＳ２０７へ進み、演算部３１０は、ステップＳ２０１で取得した撮像画像データと、ステップＳ２０６で人物領域と判定した判定画像の切出座標情報とを出力部３５０から外部装置へ出力する。演算部３１０は、ステップＳ２０８で、終了指示があったか否かを確認する。終了指示は、車両が停止したり、判定装置３００の電源ボタンがオフにされたりした場合に生成される。終了指示がなければステップＳ２０１へ戻る。終了指示があれば、一連の処理を終了する。 Proceeding to step S<b>207 , the calculation unit 310 outputs the captured image data acquired in step S<b>201 and the cropped coordinate information of the determination image determined as the person region in step S<b>206 from the output unit 350 to an external device. Calculation unit 310 checks in step S208 whether or not there is an end instruction. The end instruction is generated when the vehicle stops or the power button of the determination device 300 is turned off. If there is no end instruction, the process returns to step S201. If there is an end instruction, the series of processing ends.

以上説明した本実施形態においては、可視光を撮像した可視画像を前提としたが、赤外光を撮像した赤外画像であっても良い。赤外画像は、可視画像に比べて画素数が少なく粗い画像であることが多いことと、遠赤外領域を撮影する場合は、撮影対象物が発する熱を画像として取得するため、熱源の多い都市部地域では人物に近いアスペクト比の物体が人物として誤認識されやすい。このため、赤外画像（遠赤外画像を含む）から人物領域を判定する場合に上記の手法は特に有効である。 In the present embodiment described above, it is assumed that the visible image is obtained by capturing visible light, but an infrared image obtained by capturing infrared light may be used. Infrared images have fewer pixels than visible images and are often rough images, and when shooting far-infrared images, the heat emitted by the object to be shot is captured as an image, so there are many heat sources. In urban areas, an object with an aspect ratio close to that of a person is likely to be erroneously recognized as a person. Therefore, the above method is particularly effective when determining a person area from an infrared image (including a far-infrared image).

また、以上説明した本実施形態においては、第１モデル３６１へ入力する判定画像のアスペクト比と第２モデル３６２へ入力する判定画像のアスペクト比を同一としたが、互いに異ならせても良い。それぞれの対象地域において誤認識を生じやすい背景物体があれば、その背景物体の輪郭に合わせてアスペクト比を個別に設定しても良い。 Further, in the present embodiment described above, the aspect ratio of the determination image input to the first model 361 and the aspect ratio of the determination image input to the second model 362 are the same, but they may be different. If there is a background object that tends to cause erroneous recognition in each target area, the aspect ratio may be individually set according to the outline of the background object.

また、以上説明した本実施形態においては、車両に搭載される判定装置３００を想定したが、人物領域を判定する判定装置の用途はこれに限らない。車両に搭載されるのであれば、自動車道を想定して対象地域を都市部地域と非都市部地域に分けるのが有効であるが、異なる用途に用いるのであれば、対象地域をその用途に応じて区分するのが好ましい。例えば、都市部地域内であっても住宅部地域と商業部地域との区分や、公園部地域と非公園部地域、非都市部地域内であっても住宅部地域と郊外部地域などである。また、区分数も２つに限らず、３つ以上に区分しても良い。この場合、学習済みモデルは、対象地域の区分に応じて生成される。 Further, in the present embodiment described above, the determination device 300 mounted on a vehicle is assumed, but the application of the determination device for determining the human region is not limited to this. If it is installed in a vehicle, it is effective to divide the target area into an urban area and a non-urban area assuming a motorway. It is preferable to separate For example, even within urban areas, there are residential areas and commercial areas, park areas and non-park areas, and even within non-urban areas, there are residential areas and suburban areas. . Also, the number of divisions is not limited to two, and may be divided into three or more. In this case, the learned model is generated according to the division of the target area.

また、以上説明した本実施形態においては、判定装置３００が運転支援システム装置やドライブレコーダの一部として装備される場合を想定したが、車両と通信によって接続される、遠隔地に設置されたサーバが判定装置の機能を担っても構わない。この場合、サーバは、車両に設置されたカメラユニットで撮像された撮像画像データを受け取り、人物領域を判定して人物領域の座標情報を車両の運転支援システム装置等に返せば良い。 In addition, in the present embodiment described above, it is assumed that the determination device 300 is installed as part of a driving support system device or a drive recorder. may serve as the determination device. In this case, the server receives captured image data captured by a camera unit installed in the vehicle, determines the person's area, and returns the coordinate information of the person's area to the vehicle's driving support system or the like.

１００生成装置、１１０画像取得部、１２０操作入力部、１３０演算部、１３１第１生成部、１３２第２生成部、１４０出力部、３００判定装置、３１０演算部、３１１地域判断部、３１２読出部、３１３判定部、３２０カメラユニット、３３０画像処理部、３４０ＧＰＳユニット、３５０出力部、３６０記憶部、３６１第１モデル、３６２第２モデル、３６３地図情報、５００ナビシステム、５１０表示パネル、５１２撮像画像、５１３人物枠、５１４警告表示、９０１人 100 generation device 110 image acquisition unit 120 operation input unit 130 calculation unit 131 first generation unit 132 second generation unit 140 output unit 300 determination device 310 calculation unit 311 area determination unit 312 reading unit , 313 determination unit, 320 camera unit, 330 image processing unit, 340 GPS unit, 350 output unit, 360 storage unit, 361 first model, 362 second model, 363 map information, 500 navigation system, 510 display panel, 512 imaging Image, 513 person frame, 514 warning display, 901 person

Claims

The first person image data captured in the first target area, in which a person is captured, is defined as the correct first person image data, and the first non-person image data in which the person is not captured. a first generation unit that generates a first trained model that outputs a score indicating the likelihood that a person is shown in the input image data by giving an object that is incorrect as an incorrect answer and performing supervised learning;
Second person image data in which a person is captured in a second target area different from the first target area is erroneously recognized as a person in the second target area as second non-person image data in which a person is not captured. Generating a second trained model that outputs, as a score, the likelihood that a person appears in the input image data by giving an object that constitutes the second target area as an incorrect answer and performing supervised learning. 2 generating unit for generating a trained model set for person determination.

The first person image data and the first non-person image data have a width greater than a preset width from the first image captured in the first target area defined as an urban area. is based on an image region cropped at a large aspect ratio with a length of
The first non-human image data is image data obtained by photographing an artificial object,
The second person image data and the second non-person image data are image areas cut out at the aspect ratio from the second image captured in the second target area defined as an area other than an urban area. was created on the basis of
The second non-human image data is image data obtained by photographing a natural object,
2. The generator of claim 1.

The first person image data captured in the first target area, in which a person is captured, is defined as the correct first person image data, and the first non-person image data in which the person is not captured. a first generation step of generating a first trained model that outputs a score indicating the probability that a person appears in the input image data, by giving an object as an incorrect answer and performing supervised learning;
Second person image data in which a person is captured in a second target area different from the first target area is erroneously recognized as a person in the second target area as second non-person image data in which a person is not captured. Generating a second trained model that outputs, as a score, the likelihood that a person appears in the input image data by giving an object that constitutes the second target area as an incorrect answer and performing supervised learning. 2. A method for generating a trained model set for person determination, including a generation step.

The first person image data captured in the first target area, in which a person is captured, is defined as the correct first person image data, and the first non-person image data in which the person is not captured. a first generation step of generating a first trained model that outputs a score indicating the probability that a person appears in the input image data, by giving an object as an incorrect answer and performing supervised learning;
Second person image data in which a person is captured in a second target area different from the first target area is erroneously recognized as a person in the second target area as second non-person image data in which a person is not captured. Generating a second trained model that outputs, as a score, the likelihood that a person appears in the input image data by giving an object that constitutes the second target area as an incorrect answer and performing supervised learning. 2. A program for generating a trained model set for person determination that causes a computer to execute the generation step.

The first person image data captured in the first target area, in which a person is captured, is defined as the correct first person image data, and the first non-person image data in which the person is not captured. a first trained model that outputs a score indicating the probability that a person appears in the input image data, which is generated by supervised learning by giving an object as an incorrect answer, and the first target area; The second person image data captured in a different second target area, in which a person is captured, is defined as the correct second person image data, and the second target area, which is erroneously recognized as a person in the second target area, as second non-person image data in which the person is not captured. learning for person determination, including a second trained model that outputs a score indicating the probability that a person appears in the input image data, which is generated by supervised learning by giving constituent objects as incorrect answers; a storage unit that stores the completed model set;
an acquisition unit that acquires captured image data output by a camera mounted on a moving object;
a determination unit that determines in which predetermined target area the mobile object is located;
When the determining unit determines that the moving object is located in the first target area, the determining unit reads the first trained model from the storage unit, and determines that the moving object is located in the second target area. a reading unit that reads the second trained model from the storage unit when determined by
The score is calculated using the learned model read by the reading unit for the image of the captured image data acquired by the acquiring unit, and a specific image region in which the score exceeds a predetermined threshold is defined as a person region. A person region determination device comprising a determination unit for determining.

The first target area is an area defined as an urban area, the first non-human image data is image data obtained by photographing an artificial object,
The second target area is an area defined as an area other than an urban area, the second non-human image data is image data obtained by photographing a natural object,
The threshold includes a first threshold used for the score output by the first trained model and a second threshold used for the score output by the second trained model,
6. The person area determination device according to claim 5, wherein the first threshold is greater than the second threshold.

The first person image data captured in the first target area, in which a person is captured, is defined as the correct first person image data, and the first non-person image data in which the person is not captured. a first trained model that outputs a score indicating the probability that a person appears in the input image data, which is generated by supervised learning by giving an object as an incorrect answer, and the first target area; The second person image data captured in a different second target area, in which a person is captured, is defined as the correct second person image data, and the second target area, which is erroneously recognized as a person in the second target area, as second non-person image data in which the person is not captured. learning for person determination, including a second trained model that outputs a score indicating the probability that a person appears in the input image data, which is generated by supervised learning by giving constituent objects as incorrect answers; A human region determination method for determining a human region using a storage unit that stores a completed model set,
an acquisition step of acquiring captured image data output by a camera mounted on a moving object;
a determination step of determining in which predetermined target area the mobile object is located;
when it is determined in the determining step that the moving object is located in the first target area, the first learned model is read from the storage unit, and the determination is made that the moving object is located in the second target area. a reading step of reading the second trained model from the storage unit if determined in step;
calculating the score using the learned model read out in the reading step for the image of the captured image data acquired in the acquiring step, and determining a specific image region in which the score exceeds a predetermined threshold; A human region determination method including a determination step of determining a human region.

The first person image data captured in the first target area, in which a person is captured, is defined as the correct first person image data, and the first non-person image data in which the person is not captured. a first trained model that outputs a score indicating the probability that a person appears in the input image data, which is generated by supervised learning by giving an object as an incorrect answer, and the first target area; The second person image data captured in a different second target area, in which a person is captured, is defined as the correct second person image data, and the second target area, which is erroneously recognized as a person in the second target area, as second non-person image data in which the person is not captured. learning for person determination, including a second trained model that outputs a score indicating the probability that a person appears in the input image data, which is generated by supervised learning by giving constituent objects as incorrect answers; A human region determination program for determining a human region using a storage unit that stores a completed model set,
an acquisition step of acquiring captured image data output by a camera mounted on a moving object;
a determination step of determining in which predetermined target area the mobile object is located;
when it is determined in the determining step that the moving object is located in the first target area, the first learned model is read from the storage unit, and the determination is made that the moving object is located in the second target area. a reading step of reading the second trained model from the storage unit if determined in step;
The score is calculated using the learned model read out in the reading step for the image of the captured image data acquired in the acquiring step, and a specific image region in which the score exceeds a predetermined threshold is selected. A human area determination program that causes a computer to execute a human area determination step.