JP2021064025A

JP2021064025A - Position estimation device, position learning device and program

Info

Publication number: JP2021064025A
Application number: JP2019186619A
Authority: JP
Inventors: 俊枝三須; Toshie Misu; 秀樹三ツ峰; Hideki Mitsumine
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2021-04-22
Anticipated expiration: 2039-10-10
Also published as: JP7373352B2

Abstract

To estimate a position even if an object or the like is completely concealed, when estimating a position of the object or the like in an image.SOLUTION: A person position measurement unit 10 of a position estimation device 1-1 measures a position of each person from an image, and a person density map calculation unit 11 calculates a person density map from the position for each person. A person posture measuring unit 12 measures a posture for each person from the image, and a gaze-grade calculation unit 13 calculates a gaze-degree of the person for each position (coordinates) in a predetermined coordinate system from the position for each person and the posture for each person. A neural network unit 14-1 performs calculation of a neural network using a learned coefficient by using the person density map which is person density for each area and the gaze-degree for each position as input data, and acquires a position estimation value of play equipment as output data.SELECTED DRAWING: Figure 1

Description

本発明は、入力画像内の物体等の位置を推定する位置推定装置、位置学習装置及びプログラムに関する。 The present invention relates to a position estimation device, a position learning device, and a program for estimating the position of an object or the like in an input image.

従来、物体の位置を推定する技術として、ビーコン、ＲＦＩＤ（Radio Frequency IDentifier）等のタグまたはリフレクタ（再帰性反射素材等）を物体に装着し、当該物体からの電波、赤外線、可視光線等を受動的または能動的にセンシングするものが知られている（例えば、特許文献１を参照）。 Conventionally, as a technique for estimating the position of an object, a tag such as a beacon or RFID (Radio Frequency IDentifier) or a reflector (retroreflective material, etc.) is attached to the object, and radio waves, infrared rays, visible light, etc. from the object are passively transmitted. Those that sense targetly or actively are known (see, for example, Patent Document 1).

特許文献１には、非可視画像から被写体に付された非可視光マーカを検出する処理と、可視画像から被写体を検出する処理とを併用することで、被写体を追跡する技術が開示されている。 Patent Document 1 discloses a technique for tracking a subject by using a process of detecting an invisible light marker attached to a subject from an invisible image and a process of detecting the subject from a visible image in combination. ..

また、入力画像から画像特徴を抽出し、当該画像特徴が所定の条件に符合するか否かを判別することで、物体の位置を推定する技術がある。この技術は、例えば入力画像からHaar-Like特徴なる画像特徴を抽出し、当該画像特徴をブースティング法によって判別することで、人物の顔領域を抽出するものである（例えば、非特許文献１を参照）。この技術は、デジタルカメラの露出、焦点位置の調整等に活用されている。 Further, there is a technique of estimating the position of an object by extracting an image feature from an input image and determining whether or not the image feature matches a predetermined condition. In this technique, for example, a Haar-Like characteristic image feature is extracted from an input image, and the image feature is discriminated by a boosting method to extract a person's face region (for example, Non-Patent Document 1). reference). This technology is used for exposure of digital cameras, adjustment of focal position, and the like.

特開２０１７−２０４７５７号公報JP-A-2017-204757

P.Viola and M.Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features”, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA, 2001, pp.I-IP.Viola and M.Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features”, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA, 2001, pp.I-I

前述の非特許文献１のような画像特徴に基づく物体検出技術では、物体が他の物体に隠蔽された場合、または照明の当たらない影領域に物体が存在する場合に、物体の位置の検出が困難となる。 In the object detection technique based on the image feature as in Non-Patent Document 1 described above, the position of the object can be detected when the object is concealed by another object or when the object exists in a shadow area not illuminated. It will be difficult.

また、前述の特許文献１の技術は、非可視光による能動的なセンシング及び可視光による受動的なセンシング、さらに、能動的なセンシングの正常動作時のオンライン学習を行うものである。これらの相乗効果により、物体の隠蔽状態の変化及び照明状態の変化に対し、頑健な物体の位置の検出が可能となる。 Further, the technique of Patent Document 1 described above performs active sensing by invisible light, passive sensing by visible light, and online learning during normal operation of active sensing. Due to these synergistic effects, it is possible to detect the position of a robust object against changes in the concealed state and the lighting state of the object.

しかしながら、物体が完全に隠れてしまい、非可視画像から非可視光マーカを観測することができず、また、可視画像から被写体の画像特徴も抽出できない場合には、物体の位置の検出が困難となる。 However, if the object is completely hidden, the invisible light marker cannot be observed from the invisible image, and the image feature of the subject cannot be extracted from the visible image, it is difficult to detect the position of the object. Become.

そこで、本発明は前記課題を解決するためになされたものであり、その目的は、画像内の物体等の位置を推定する際に、物体等が完全に隠蔽された場合であっても、その位置を推定可能な位置推定装置、位置学習装置及びプログラムを提供することにある。 Therefore, the present invention has been made to solve the above-mentioned problems, and an object of the present invention is to estimate the position of an object or the like in an image even when the object or the like is completely concealed. An object of the present invention is to provide a position estimation device, a position learning device, and a program capable of estimating a position.

前記課題を解決するために、請求項１の位置推定装置は、入力画像内の所定物の位置を推定する位置推定装置において、前記入力画像の画像特徴に基づいて人物を検出し、所定の座標系における前記人物毎の位置を計測する人物位置計測部と、前記人物位置計測部により計測された前記人物毎の位置から、前記所定の座標系における領域毎に人物密度を演算し、前記領域毎の人物密度を人物密度マップとして生成する人物密度マップ演算部と、前記人物密度マップ演算部により生成された前記人物密度マップを入力データとして、予め設定された学習済み係数を用いたニューラルネットワークの演算を行い、前記所定物の位置の推定値を示す位置推定値を出力データとして求めるニューラルネットワーク部と、を備えたことを特徴とする。 In order to solve the above problem, the position estimation device according to claim 1 detects a person based on the image features of the input image in the position estimation device that estimates the position of a predetermined object in the input image, and determines the predetermined coordinates. From the person position measuring unit that measures the position of each person in the system and the position of each person measured by the person position measuring unit, the person density is calculated for each area in the predetermined coordinate system, and for each area. A person density map calculation unit that generates the person density of the person as a person density map, and a neural network calculation using a preset learned coefficient using the person density map generated by the person density map calculation unit as input data. It is characterized in that it is provided with a neural network unit for obtaining a position estimated value indicating an estimated value of the position of the predetermined object as output data.

請求項１の位置推定装置によれば、入力画像内の人物との干渉等により所定物が明確に見えていない場合であっても、人物の空間的な分布に基づいて、所定物の存在する場所を精度高く推定することができる。 According to the position estimation device of claim 1, even if the predetermined object is not clearly visible due to interference with the person in the input image, the predetermined object exists based on the spatial distribution of the person. The location can be estimated with high accuracy.

また、請求項２の位置推定装置は、請求項１に記載の位置推定装置において、さらに、前記入力画像の画像特徴に基づいて前記人物を検出し、前記人物毎の姿勢を計測する人物姿勢計測部と、前記人物姿勢計測部により計測された前記人物毎の姿勢から、前記所定の座標系におけるそれぞれの座標位置に対して全ての前記人物が注視する度合いを、位置毎の注視度として演算する注視度演算部と、を備え、前記ニューラルネットワーク部が、前記人物密度マップ演算部により生成された前記人物密度マップ、及び前記注視度演算部により演算された前記位置毎の注視度を前記入力データとして、前記学習済み係数を用いた前記ニューラルネットワークの演算を行い、前記位置推定値を前記出力データとして求める、ことを特徴とする。 Further, the position estimation device according to claim 2 is the position estimation device according to claim 1, further detecting the person based on the image features of the input image, and measuring the posture of each person. From the unit and the posture of each person measured by the person posture measuring unit, the degree to which all the persons gaze at each coordinate position in the predetermined coordinate system is calculated as the gaze degree for each position. The gaze calculation unit is provided, and the neural network unit inputs the person density map generated by the person density map calculation unit and the gaze degree for each position calculated by the gaze calculation unit as input data. It is characterized in that the calculation of the neural network using the learned coefficient is performed and the position estimated value is obtained as the output data.

請求項２の位置推定装置によれば、入力画像内の人物との干渉等により所定物が明確に見えていない場合であっても、人物の空間的な分布と、各人物の姿勢から得られた人物が注目する領域とに基づいて、所定物の存在する場所を精度高く推定することができる。 According to the position estimation device of claim 2, even when a predetermined object is not clearly visible due to interference with a person in the input image, it can be obtained from the spatial distribution of the person and the posture of each person. It is possible to accurately estimate the location of a predetermined object based on the area of interest of the person.

また、請求項３の位置学習装置は、入力画像と、当該入力画像に対応する所定物の位置の真値を示す位置真値とを学習データとして入力し、当該学習データに基づいてニューラルネットワークの係数を求める位置学習装置において、前記入力画像の画像特徴に基づいて人物を検出し、所定の座標系における前記人物毎の位置を計測する人物位置計測部と、前記人物位置計測部により計測された前記人物毎の位置から、前記所定の座標系における領域毎に人物密度を演算し、前記領域毎の人物密度を人物密度マップとして生成する人物密度マップ演算部と、前記人物密度マップ演算部により生成された前記人物密度マップを入力データとして、前記係数を用いた前記ニューラルネットワークの演算を行い、前記所定物の位置の推定値を示す位置推定値を出力データとして求めるニューラルネットワーク部と、前記入力画像に対応する前記位置真値と、前記ニューラルネットワーク部により求めた前記位置推定値との間の誤差を演算する誤差演算部と、前記誤差演算部により演算された前記誤差に基づいて、前記ニューラルネットワークの前記係数を更新する係数更新部と、を備えたことを特徴とする。 Further, the position learning device according to claim 3 inputs an input image and a position true value indicating a true value of a position of a predetermined object corresponding to the input image as training data, and based on the training data, a neural network In the position learning device for obtaining the coefficient, a person is detected based on the image feature of the input image, and the position of each person in a predetermined coordinate system is measured by the person position measuring unit and the person position measuring unit. From the position of each person, the person density is calculated for each area in the predetermined coordinate system, and the person density for each area is generated as a person density map. The neural network unit that calculates the neural network using the coefficient using the person density map as input data and obtains the position estimated value indicating the estimated value of the position of the predetermined object as output data, and the input image. Based on the error calculation unit that calculates the error between the position true value corresponding to the above and the position estimation value obtained by the neural network unit, and the error calculated by the error calculation unit, the neural network. It is characterized in that it is provided with a coefficient updating unit for updating the above-mentioned coefficient.

請求項３の位置学習装置によれば、画像及び当該画像に対応する所定物の位置真値を学習データとして、請求項１の位置推定装置にて用いる学習済み係数を求めることができる。 According to the position learning device of claim 3, the learned coefficient used in the position estimation device of claim 1 can be obtained by using the image and the position true value of the predetermined object corresponding to the image as learning data.

また、請求項４の位置学習装置は、請求項３に記載の位置学習装置において、さらに、前記入力画像の画像特徴に基づいて前記人物を検出し、前記人物毎の姿勢を計測する人物姿勢計測部と、前記人物姿勢計測部により計測された前記人物毎の姿勢から、前記所定の座標系におけるそれぞれの座標位置に対して全ての前記人物が注視する度合いを、位置毎の注視度として演算する注視度演算部と、を備え、前記ニューラルネットワーク部が、前記人物密度マップ演算部により生成された前記人物密度マップ、及び前記注視度演算部により演算された前記位置毎の注視度を入力データとして、前記係数を用いた前記ニューラルネットワークの演算を行い、前記位置推定値を出力データとして求める、ことを特徴とする。 Further, the position learning device according to claim 4 is the position learning device according to claim 3, further detecting the person based on the image features of the input image, and measuring the posture of each person. From the unit and the posture of each person measured by the person posture measuring unit, the degree to which all the persons gaze at each coordinate position in the predetermined coordinate system is calculated as the gaze degree for each position. The gaze calculation unit is provided, and the neural network unit receives the person density map generated by the person density map calculation unit and the gaze degree for each position calculated by the gaze calculation unit as input data. It is characterized in that the calculation of the neural network using the coefficient is performed and the position estimated value is obtained as output data.

請求項４の位置学習装置によれば、画像及び当該画像に対応する所定物の位置真値を学習データとして、請求項２の位置推定装置にて用いる学習済み係数を求めることができる。 According to the position learning device of claim 4, the learned coefficient used in the position estimation device of claim 2 can be obtained by using the image and the position true value of the predetermined object corresponding to the image as learning data.

さらに、請求項５のプログラムは、コンピュータを、請求項１または２に記載の位置推定装置として機能させることを特徴とする。 Further, the program of claim 5 is characterized in that the computer functions as the position estimation device according to claim 1 or 2.

さらに、請求項６のプログラムは、コンピュータを、請求項３または４に記載の位置学習装置として機能させることを特徴とする。 Further, the program of claim 6 is characterized in that the computer functions as the position learning device according to claim 3 or 4.

以上のように、本発明によれば、画像内の物体等の位置を推定する際に、物体等が完全に隠蔽された場合であっても、その位置を推定することが可能となる。 As described above, according to the present invention, when estimating the position of an object or the like in an image, it is possible to estimate the position even when the object or the like is completely concealed.

実施例１の位置推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the position estimation apparatus of Example 1. FIG. 実施例１の位置推定装置の処理を示すフローチャートである。It is a flowchart which shows the process of the position estimation apparatus of Example 1. FIG. 実施例１のニューラルネットワーク部を説明する図である。It is a figure explaining the neural network part of Example 1. FIG. 実施例１の位置学習装置の構成を示すブロック図である。It is a block diagram which shows the structure of the position learning apparatus of Example 1. FIG. 実施例１の位置学習装置の処理を示すフローチャートである。It is a flowchart which shows the process of the position learning apparatus of Example 1. FIG. 実施例２の位置推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the position estimation apparatus of Example 2. FIG. 実施例２の位置推定装置の処理を示すフローチャートである。It is a flowchart which shows the process of the position estimation apparatus of Example 2. 実施例２のニューラルネットワーク部を説明する図である。It is a figure explaining the neural network part of Example 2. 実施例２の位置学習装置の構成を示すブロック図である。It is a block diagram which shows the structure of the position learning apparatus of Example 2. 実施例２の位置学習装置の処理を示すフローチャートである。It is a flowchart which shows the process of the position learning apparatus of Example 2.

以下、本発明を実施するための形態について図面を用いて詳細に説明する。以下に説明する実施例１の位置推定装置は、画像内の競技等に用いる遊具（サッカーボール、テニスボール、アイスホッケーのパック、バドミントンのシャトル、カーリングのストーン等を含む）の位置を推定する際に、画像内の人物毎の位置を計測し、領域毎の人物密度を演算し、画像内の人物毎の姿勢を計測し、位置毎の（当該位置に対する全人物による）注視度を演算し、領域毎の人物密度及び位置毎の注視度を入力データとしてニューラルネットワークの演算を行い、遊具の位置の推定値（遊具の位置推定値）を出力データとして求める。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings. The position estimation device of the first embodiment described below is used when estimating the position of play equipment (including a soccer ball, a tennis ball, an ice hockey pack, a badminton shuttle, a curling stone, etc.) used for a competition or the like in an image. In addition, the position of each person in the image is measured, the person density for each area is calculated, the posture of each person in the image is measured, and the gaze degree for each position (by all persons with respect to the position) is calculated. The neural network is calculated using the person density for each area and the gaze degree for each position as input data, and the estimated value of the position of the play equipment (estimated value of the position of the play equipment) is obtained as output data.

実施例２の位置推定装置は、画像内の人物毎の位置を計測し、領域毎の人物密度を演算し、領域毎の人物密度を入力データとしてニューラルネットワークの演算を行い、遊具の位置推定値を出力データとして求める。 The position estimation device of the second embodiment measures the position of each person in the image, calculates the person density for each area, calculates the neural network using the person density for each area as input data, and performs the position estimation value of the play equipment. Is obtained as output data.

実施例１，２の位置学習装置は、画像及び当該画像に対応する遊具の位置の真値（遊具の位置真値）を学習データとして、対応する位置推定装置に備えたニューラルネットワークにて使用する最適な係数（結合重み及びバイアス値）を求める。 The position learning devices of Examples 1 and 2 are used in a neural network provided in the corresponding position estimation device using the image and the true value of the position of the playset corresponding to the image (the true value of the position of the playset) as learning data. Find the optimum coefficients (coupling weight and bias value).

これにより、遊具が人物に隠れている、または照明の当たらない影領域に遊具が存在する等、遊具が完全に隠蔽された場合であっても、遊具の位置を推定することが可能となる。 This makes it possible to estimate the position of the playset even when the playset is completely concealed, such as when the playset is hidden by a person or when the playset is present in a shadow area that is not illuminated.

〔実施例１〕
まず、実施例１について説明する。前述のとおり、実施例１は、画像から、領域毎の人物密度及び位置毎の注視度を演算し、これらのデータを入力データとし、遊具の位置推定値を出力データとしたニューラルネットワークを用いて、遊具の位置を推定するものである。 [Example 1]
First, Example 1 will be described. As described above, in the first embodiment, a neural network is used in which the person density for each area and the gaze degree for each position are calculated from the image, these data are used as input data, and the position estimated value of the playset is used as output data. , The position of the playset is estimated.

（位置推定装置／実施例１）
図１は、実施例１の位置推定装置の構成を示すブロック図であり、図２は、その処理を示すフローチャートである。この位置推定装置１−１は、人物位置計測部１０、人物密度マップ演算部１１、人物姿勢計測部１２、注視度演算部１３、ニューラルネットワーク部１４−１及びメモリ１５−１を備えている。 (Position Estimator / Example 1)
FIG. 1 is a block diagram showing the configuration of the position estimation device of the first embodiment, and FIG. 2 is a flowchart showing the processing. The position estimation device 1-1 includes a person position measurement unit 10, a person density map calculation unit 11, a person posture measurement unit 12, a gaze calculation unit 13, a neural network unit 14-1, and a memory 15-1.

位置推定装置１−１は、画像を入力し、この入力画像を処理してニューラルネットワークの演算を行い、遊具の位置推定値を出力する。画像は、例えば、地面上に設定されるコート内において、選手である人物がボール、パック、シャトル、ストーン等の遊具を用いて行うサッカー、ホッケー、バドミントン、カーリング等の球技の画像である。 The position estimation device 1-1 inputs an image, processes the input image, performs a neural network calculation, and outputs a position estimation value of the playset. The image is, for example, an image of a ball game such as soccer, hockey, badminton, or curling performed by a person who is a player using playsets such as balls, pucks, shuttles, and stones on a court set on the ground.

人物位置計測部１０は、画像を入力し（ステップＳ２０１）、この入力画像の画像特徴に基づいて人物を抽出し、人物毎の位置（座標）を計測する（ステップＳ２０２）。そして、人物位置計測部１０は、人物毎の位置を人物密度マップ演算部１１及び注視度演算部１３に出力する。 The person position measuring unit 10 inputs an image (step S201), extracts a person based on the image features of the input image, and measures the position (coordinates) of each person (step S202). Then, the person position measurement unit 10 outputs the position of each person to the person density map calculation unit 11 and the gaze calculation unit 13.

人物の数は１人であってもよいし、複数人であってもよい。また、人物の位置は、入力した画像の座標系（以下、「画像座標系」という。）における人物が存在する座標であってもよいし、人物が存在する空間に設定された座標系（以下、「世界座標系」という。）における座標であってもよい。 The number of persons may be one or a plurality of persons. Further, the position of the person may be the coordinates in which the person exists in the coordinate system of the input image (hereinafter referred to as "image coordinate system"), or the coordinate system set in the space in which the person exists (hereinafter referred to as "image coordinate system"). , "World coordinate system").

人物位置計測部１０は、例えば、全球測位衛星システム（ＧＮＮＳ：Global Navigation Satellite System、例えばＧＰＳ（Global Positioning System））の受信機であってもよい。また、人物位置計測部１０は、顔画像認識等の技術を用いて画像から被写体の人物を抽出し、当該画像を撮影したカメラの位置及び姿勢情報に基づいて、人物の位置を推定する方式（ステレオ画像による測位であってもよいし、単眼画像及び拘束条件(例えば、被写体である人物の足元が所定の面に接している等)から測位するものであってもよい）を用いるようにしてもよい。また、無線タグ（ＲＦＩＤ等）を人物に装着して測位するものであってもよい。 The person position measuring unit 10 may be, for example, a receiver of a global positioning satellite system (GNNS: Global Navigation Satellite System, for example, GPS (Global Positioning System)). Further, the person position measuring unit 10 extracts a person as a subject from an image by using a technique such as face image recognition, and estimates the position of the person based on the position and posture information of the camera that captured the image ( Positioning may be based on a stereo image, or positioning may be performed from a monocular image and restraint conditions (for example, the feet of the person who is the subject are in contact with a predetermined surface). May be good. Further, a wireless tag (RFID or the like) may be attached to a person for positioning.

人物密度マップ演算部１１は、人物位置計測部１０から人物毎の位置を入力し、人物毎の位置に基づいて、画像座標系または世界座標系における局所的な領域毎に人口密度（人物密度）を演算し、人物密度マップを生成する（ステップＳ２０３）。そして、人物密度マップ演算部１１は、人物密度マップ（領域毎の人物密度）をニューラルネットワーク部１４−１に出力する。 The person density map calculation unit 11 inputs the position of each person from the person position measurement unit 10, and based on the position of each person, the population density (person density) for each local area in the image coordinate system or the world coordinate system. To generate a person density map (step S203). Then, the person density map calculation unit 11 outputs the person density map (person density for each area) to the neural network unit 14-1.

人物密度マップ演算部１１は、例えば、人物の存在し得る領域（例えば、サッカー場、テニスコート等の競技場）を縦横の格子状に分割し、人物毎の位置から各格子に存在する人物の数を求めることで、領域毎の人物密度を演算するようにしてもよい。格子の総数がＬ個の場合、人物密度マップ演算部１１は、各格子に対する人物密度を要素とするＬ次元のベクトルを生成し、これを人物密度マップとしてもよい。 The person density map calculation unit 11 divides, for example, an area in which a person can exist (for example, a stadium such as a soccer field or a tennis court) into a vertical and horizontal grid pattern, and from the position of each person, the person existing in each grid. By finding the number, the person density for each area may be calculated. When the total number of grids is L, the person density map calculation unit 11 may generate an L-dimensional vector having the person density as an element for each grid, and use this as a person density map.

尚、格子は、互いに重ならないように配置してもよいし、例えば１ｍ四方の部分領域を０．５ｍ四方の間隔で設定した格子を配置する等、隣接格子が互いに重なり合うよう配置してもよい。 The grids may be arranged so as not to overlap each other, or the adjacent grids may be arranged so as to overlap each other, for example, a grid in which a 1 m square partial region is set at an interval of 0.5 m square may be arranged. ..

人物姿勢計測部１２は、画像を入力し、この入力画像の画像特徴に基づいて人物頭部を抽出し、人物頭部から人物毎の姿勢（顔の向き）を計測する（ステップＳ２０４）。そして、人物姿勢計測部１２は、人物毎の姿勢を注視度演算部１３に出力する。人物の姿勢は、画像座標系において計測してもよいし、世界座標系において計測してもよい。 The person posture measuring unit 12 inputs an image, extracts a person's head based on the image features of the input image, and measures the posture (face orientation) of each person from the person's head (step S204). Then, the person posture measuring unit 12 outputs the posture of each person to the gaze calculation unit 13. The posture of the person may be measured in the image coordinate system or in the world coordinate system.

人物姿勢計測部１２は、例えば、画像の色ヒストグラムに基づく識別結果と、色ヒストグラム以外の特徴量（例えば、勾配ヒストグラム）に基づく識別結果とを統合化し、顔の向きを推定する技術を用いるようにしてもよい。この顔の向きを推定する技術は既知であり、例えば特開２０１８−２２４１６号公報を参照されたい。 The human posture measuring unit 12 uses, for example, a technique of integrating the identification result based on the color histogram of the image and the identification result based on the feature amount other than the color histogram (for example, the gradient histogram) to estimate the orientation of the face. It may be. The technique for estimating the orientation of the face is known, and for example, refer to Japanese Patent Application Laid-Open No. 2018-22416.

尚、人物姿勢計測部１２は、姿勢を計測すべき人物の位置を特定するために、人物位置計測部１０から人物毎の位置を入力するようにしてもよい。 The person posture measuring unit 12 may input the position of each person from the person position measuring unit 10 in order to specify the position of the person whose posture should be measured.

注視度演算部１３は、人物位置計測部１０から人物毎の位置を入力すると共に、人物姿勢計測部１２から人物毎の姿勢を入力する。そして、注視度演算部１３は、人物毎の位置及び人物毎の姿勢に基づいて、画像座標系または世界座標系における予め設定されたそれぞれの座標位置に対して全人物が注視する度合い（注視度）、すなわち位置（座標）毎の注視度を演算する（ステップＳ２０５）。注視度演算部１３は、位置毎の注視度をニューラルネットワーク部１４−１に出力する。位置毎の注視度は、画像座標系または世界座標系における位置毎に、当該位置に対する各人物による注視度の合計値である。 The gaze calculation unit 13 inputs the position of each person from the person position measurement unit 10, and also inputs the posture of each person from the person posture measurement unit 12. Then, the gaze calculation unit 13 gazes at each preset coordinate position in the image coordinate system or the world coordinate system based on the position of each person and the posture of each person (gaze degree). ), That is, the gaze degree for each position (coordinates) is calculated (step S205). The gaze calculation unit 13 outputs the gaze for each position to the neural network unit 14-1. The gaze level for each position is the total value of the gaze level of each person with respect to the position for each position in the image coordinate system or the world coordinate system.

人物毎の位置をＸ_kとし、人物毎の姿勢をＰ_kとし、画像座標系または世界座標系における位置をＹとし、位置Ｙの注視度をＧ（Ｙ）とする。ｋは、人物を区別するためのインデックスであり、１以上Ｋ以下の整数とする。Ｋのｋの最大値とする。 _{Let X k} be the position of each person _{, P k} be the posture of each person, Y be the position in the image coordinate system or the world coordinate system, and G (Y) be the gaze degree of the position Y. k is an index for distinguishing people, and is an integer of 1 or more and K or less. Let it be the maximum value of k of K.

ここでは、人物毎の位置Ｘ_kは、世界座標における３次元の位置べクトルとする。また、人物毎の姿勢Ｐ_kは、人物の特定箇所に固定した方向べクトル（例えば、顔面に固定した顔の向きのべクトル、眼球に固定した視線べクトル）とし、単位ベクトルとする。位置Ｙは、世界座標における位置べクトルとする。 Here, the position X _k for each person is a three-dimensional position vector in world coordinates. Further, the posture P _{k for} each person is a direction vector fixed to a specific part of the person (for example, a face orientation vector fixed to the face, a line-of-sight vector fixed to the eyeball), and is used as a unit vector. The position Y is a position vector in world coordinates.

この場合、注視度演算部１３は、例えば関数ｆを用いた以下の式にて、位置Ｙ毎の注視度Ｇ（Ｙ）を演算する。

In this case, the gaze calculation unit 13 calculates the gaze G (Y) for each position Y by, for example, the following formula using the function f.

関数ｆは、引数である（Ｙ−Ｘ_k）のベクトル及び人物毎の姿勢Ｐ_kについて、これらの２つのベクトルが同じ方向を向くほど、大きい（または小さくない）スカラー値をとるものとするのが望ましい。また、関数ｆは、（Ｙ−Ｘ_k）のベクトルが短いほど、大きい（または小さくない）スカラー値をとるものとするのが望ましい。 The function f shall take a scalar value that is larger (or not smaller) so that these two vectors point in the same direction with respect to the vector of the argument (Y-X _k ) and the posture P _{k of each person.} Is desirable. Further, it is desirable that the function f takes a larger (or not smaller) scalar value as the vector of _{(Y−Xk) becomes shorter.}

例えば、関数ｆは、以下の式のように、ベクトルの内積（２つのベクトルのなす角θの余弦値（ｃｏｓθ））を演算するものであってもよい。ベクトルＱ_k＝Ｙ−Ｘ_kとする。

For example, the function f may calculate the inner product of the vectors (the cosine value (cos θ) of the angle θ formed by the two vectors) as in the following equation. Let the vector Q _k = Y-X _k .

また、関数ｆは、以下の式のように演算するものであってもよい。

Further, the function f may be calculated as in the following equation.

ニューラルネットワーク部１４−１は、メモリ１５−１から学習済み係数（結合重み及びバイアス値）を読み出す。また、ニューラルネットワーク部１４−１は、人物密度マップ演算部１１から人物密度マップ（領域毎の人物密度）を入力すると共に、注視度演算部１３から位置毎の注視度を入力する。 The neural network unit 14-1 reads the learned coefficients (coupling weight and bias value) from the memory 15-1. Further, the neural network unit 14-1 inputs a person density map (person density for each area) from the person density map calculation unit 11, and inputs the gaze degree for each position from the gaze degree calculation unit 13.

ニューラルネットワーク部１４−１は、人物密度マップ及び位置毎の注視度を入力データとして、学習済み係数を用いたニューラルネットワークの演算を行い、遊具の位置推定値（遊具の位置を示すベクトル）を出力データとして求める（ステップＳ２０６）。そして、ニューラルネットワーク部１４−１は、遊具の位置推定値を出力する（ステップＳ２０７）。 The neural network unit 14-1 uses the person density map and the gaze degree for each position as input data, performs a neural network calculation using the learned coefficients, and outputs an estimated value of the position of the playset (vector indicating the position of the playset). Obtained as data (step S206). Then, the neural network unit 14-1 outputs the position estimated value of the playset (step S207).

ニューラルネットワーク部１４−１としては、例えば、多層パーセプトロン、畳み込みニューラルネットワーク、リカレントニューラルネットワーク、ポップフィールドネットワーク、それらの組み合わせ等、任意のものを使用することができる。 As the neural network unit 14-1, for example, any one such as a multi-layer perceptron, a convolutional neural network, a recurrent neural network, a pop field network, and a combination thereof can be used.

人物位置計測部１０が出力する人物毎の位置の座標系が画像座標系である場合、ニューラルネットワーク部１４−１が出力する遊具の位置推定値の座標系も、画像座標系とすることが典型的であるが、世界座標系等の他の座標系とするようにしてもよい。逆に、人物位置計測部１０が出力する人物毎の位置の座標系が世界座標系である場合、ニューラルネットワーク部１４−１が出力する遊具の位置推定値の座標系も、世界座標系とすることが典型的であるが、画像座標系等の他の座標系とするようにしてもよい。 When the coordinate system of the position of each person output by the person position measuring unit 10 is the image coordinate system, the coordinate system of the position estimated value of the play equipment output by the neural network unit 14-1 is also typically the image coordinate system. However, it may be set to another coordinate system such as the world coordinate system. On the contrary, when the coordinate system of the position of each person output by the person position measuring unit 10 is the world coordinate system, the coordinate system of the position estimated value of the play equipment output by the neural network unit 14-1 is also the world coordinate system. This is typical, but other coordinate systems such as an image coordinate system may be used.

領域毎の人物密度である人物密度マップにおける領域の数をＬ（Ｌは２以上の整数）とし、位置毎の注視度における位置の数をＭ（Ｍは０以上の整数）とする。尚、Ｍ＝０は、後述する実施例２の場合に相当するため、ここでは、Ｍは１以上の整数とする。 Let L (L is an integer of 2 or more) be the number of areas in the person density map, which is the person density for each area, and M (M is an integer of 0 or more) in the gaze degree for each position. Since M = 0 corresponds to the case of the second embodiment described later, M is an integer of 1 or more here.

例えば、ニューラルネットワーク部１４−１が多層パーセプトロンを用いる場合、ニューラルネットワーク部１４−１は、Ｌ個の領域毎の人物密度及びＭ個の位置毎の注視度を入力データとして、多層パーセプトロンの入力層におけるＮ個（Ｎ＝Ｌ＋Ｍ）の素子に入力する。 For example, when the neural network unit 14-1 uses the multi-layer perceptron, the neural network unit 14-1 uses the person density for each L region and the gaze degree for each M position as input data, and the input layer of the multi-layer perceptron is used. Input to N (N = L + M) elements in.

図３は、実施例１において、多層パーセプトロンを用いたニューラルネットワーク部１４−１を説明する図であり、Ｌ＝４，Ｍ＝４，Ｎ＝８の場合を示している。このニューラルネットワーク部１４−１は、３層パーセプトロンを用いて構成され、入力層の素子数は８、中間層の素子数は３、出力層の素子数は２である。 FIG. 3 is a diagram for explaining the neural network unit 14-1 using the multi-layer perceptron in the first embodiment, and shows the case of L = 4, M = 4, N = 8. The neural network unit 14-1 is configured by using a three-layer perceptron, and has eight elements in the input layer, three elements in the intermediate layer, and two elements in the output layer.

ニューラルネットワーク部１４−１は、４つの領域のそれぞれについて、人物密度（変数）ｘ₁〜ｘ₄を入力層の４個の素子（１〜４番目の素子）に入力する。また、ニューラルネットワーク部１４−１は、４つの位置のそれぞれについて、注視度（変数）ｘ₅〜ｘ₈を入力層の４個の素子（５〜８番目の素子）に入力する。 The neural network unit 14-1 inputs the person densities (variables) x _{1 to} _{x 4} to the four elements (1st to 4th elements) of the input layer for each of the four regions. Further, the neural network unit 14-1 _{inputs gaze (variables) x 5 to} _{x 8} to the four elements (5th to 8th elements) of the input layer for each of the four positions.

ニューラルネットワーク部１４−１は、入力層の１〜８番目の素子から変数ｘ₁〜ｘ₈をそのまま出力し、中間層の９〜１１番目の素子（ｑ番目の素子）において、以下の式にて、変数ｘ₁〜ｘ₈等を用いて出力値である変数ｘ₉〜ｘ₁₁（ｘ_q）を演算する。

_{The neural network unit 14-1 outputs variables x 1 to x 8 as} _{they are from the 1st to} 8th elements of the input layer, and in the 9th to 11th elements (qth element) of the intermediate layer, the following equation is used. _{Then, the variables x 9 to} _{x 11} (x _q ), which are the output values, are calculated using the variables x _{1 to} _{x 8 and the like.}

ここで、Ｃ_qは、ｑ番目の素子の入力に接続される素子の番号の集合を示し、ｗ_p,qは、ｐ番目の素子の出力とｑ番目の素子の入力との間の結合重みを示す。また、ｂ_qは、ｑ番目の素子の入力として与えるバイアス値を示し、φ_qは、ｑ番目の素子の活性化関数を示す。 Here, C _q indicates a set of element numbers connected to the input of the q-th element, and w _{p and q} are the coupling weights between the output of the p-th element and the input of the q-th element. Is shown. Further, b _q indicates a bias value given as an input of the qth element, and φ _q indicates an activation function of the qth element.

ニューラルネットワーク部１４−１は、中間層の９〜１１番目の素子から変数ｘ₉〜ｘ₁₁を出力し、出力層の１２，１３番目の素子（ｑ番目の素子）において、前記式（４）にて、変数ｘ₉〜ｘ₁₁等を用いて出力値である遊具の位置推定値（Ｘ座標、Ｙ座標）（ｘ_q）を演算する。ニューラルネットワーク部１４−１は、出力層の１２，１３番目の素子から、遊具の位置推定値（Ｘ座標、Ｙ座標）を出力する。 Neural network unit 14-1, and outputs the variable x ₉ ~x ₁₁ from 9 to 11 th element of the intermediate layer, in 12 and 13-th element of the output layer (q-th element), the formula (4) In, the position estimated value (X coordinate, Y coordinate) (x _q ) of the playset, which is the output value, is calculated using the variables x _{9 to} _{x 11 and the like.} The neural network unit 14-1 outputs the position estimated values (X coordinate, Y coordinate) of the playset from the 12th and 13th elements of the output layer.

図３の例では、Ｃ₉＝Ｃ₁₀＝Ｃ₁₁＝｛１，２，３，４，５，６，７，８｝，Ｃ₁₂＝Ｃ₁₃＝｛９，１０，１１｝であり、活性化関数φ_qは、任意のものを使用することができる。例えば、活性化関数φ_qとして、ReLU関数（Rectified Linear Unit：半波整流関数：φ_q(ｘ)＝max｛0,ｘ｝）、シグモイド関数、双曲線正接関数（φ_q(ｘ)＝tanhｘ）を用いることができる。活性化関数φ_qは、素子毎に異なる関数を混在させてもよいし、全ての素子に同じ関数を用いてもよい。 In the example of FIG. 3, C ₉ = C ₁₀ = C ₁₁ = {1, 2, 3, 4, 5, 6, 7, 8}, C ₁₂ = C ₁₃ = {9, 10, 11}, and the activity. Any conversion function φ _q can be used. For example, as the activation function φ _q , the ReLU function (Rectified Linear Unit: half-wave rectifier function: φ _q (x) = max {0, x}), the sigmoid function, and the hyperbolic _{rectifier function (φ q} (x) = tanhx) Can be used. The activation function φ _q may be a mixture of different functions for each element, or the same function may be used for all the elements.

図１に戻って、メモリ１５−１には、例えば、後述する位置学習装置２−１によりニューラルネットワーク部１４−１を学習することにより求めた最適な学習済み係数（結合重みｗ_p,q及びバイアス値ｂ_q）が格納されている。 Returning to FIG. 1, in the memory 15-1, for example, the optimum learned coefficients (coupling weights w _{p, q} and) obtained by learning the neural network unit 14-1 by the position learning device 2-1 described later. The bias value b _q ) is stored.

メモリ１５−１に格納された学習済み係数は、ニューラルネットワーク部１４−１により読み出され、ニューラルネットワークの入力層の素子から中間層の素子を介して出力層の素子への演算に用いられる。 The learned coefficient stored in the memory 15-1 is read out by the neural network unit 14-1, and is used for the calculation from the element of the input layer of the neural network to the element of the output layer via the element of the intermediate layer.

学習済み係数は、必要に応じて外部からメモリ１５−１に書き込まれるようにしてもよいし、読み取り専用のメモリ１５−１に予め求めておいたデータとして設定しておくようにしてもよい。 The learned coefficient may be written to the memory 15-1 from the outside as needed, or may be set as data obtained in advance in the read-only memory 15-1.

以上のように、実施例１の位置推定装置１−１によれば、人物位置計測部１０は、画像から人物毎の位置を計測し、人物密度マップ演算部１１は、人物毎の位置から人物密度マップを演算する。また、人物姿勢計測部１２は、画像から人物毎の姿勢を計測し、注視度演算部１３は、人物毎の位置及び人物毎の姿勢から、所定の座標系における位置毎の注視度を演算する。 As described above, according to the position estimation device 1-1 of the first embodiment, the person position measurement unit 10 measures the position of each person from the image, and the person density map calculation unit 11 measures the position of each person from the position of each person. Calculate the density map. Further, the person posture measuring unit 12 measures the posture of each person from the image, and the gaze calculation unit 13 calculates the gaze of each position in a predetermined coordinate system from the position of each person and the posture of each person. ..

ニューラルネットワーク部１４−１は、領域毎の人物密度である人物密度マップ及び位置毎の注視度を入力データとして、学習済み係数を用いたニューラルネットワークの演算を行い、遊具の位置推定値を出力データとして求める。 The neural network unit 14-1 uses the person density map, which is the person density for each area, and the gaze degree for each position as input data, performs a neural network calculation using the learned coefficients, and outputs the position estimated value of the play equipment as output data. Ask as.

これにより、人物の空間的な分布と、各人物の姿勢から得られた位置毎の注視度とを用いて、遊具の存在する場所を、ニューラルネットワークの演算により精度高く推定することができる。つまり、人物との干渉等により遊具が完全に隠蔽され、明確に見えていない場合であっても、その位置を推定することが可能となる。 Thereby, the location where the playset exists can be estimated with high accuracy by the calculation of the neural network by using the spatial distribution of the person and the gaze degree for each position obtained from the posture of each person. That is, the playset is completely concealed due to interference with a person or the like, and even if it is not clearly visible, its position can be estimated.

（位置学習装置／実施例１）
図４は、実施例１の位置学習装置の構成を示すブロック図であり、図５は、その処理を示すフローチャートである。この位置学習装置２−１は、人物位置計測部１０、人物密度マップ演算部１１、人物姿勢計測部１２、注視度演算部１３、ニューラルネットワーク部１４−１、メモリ２０−１、誤差演算部２１及び係数更新部２２−１を備えている。 (Position learning device / Example 1)
FIG. 4 is a block diagram showing the configuration of the position learning device of the first embodiment, and FIG. 5 is a flowchart showing the processing. The position learning device 2-1 includes a person position measurement unit 10, a person density map calculation unit 11, a person posture measurement unit 12, a gaze calculation unit 13, a neural network unit 14-1, a memory 20-1, and an error calculation unit 21. And the coefficient updating unit 22-1 is provided.

位置学習装置２−１は、画像、及び当該画像に対応する真の遊具の位置である遊具の位置真値を対にして学習データとして１回以上入力する（ステップＳ５０１）。そして、位置学習装置２−１は、例えば誤差逆伝播法によりニューラルネットワークを学習し、最適な学習済み係数（結合重みｗ_p,q及びバイアス値ｂ_q）を求める。 The position learning device 2-1 inputs an image and a pair of true positions of the play device, which is the position of the true play device corresponding to the image, as learning data one or more times (step S501). Then, the position learning device 2-1 learns the neural network by, for example, the back-propagation method _{, and obtains the optimum learned coefficients (coupling weights w p, q} and bias value b _q ).

人物位置計測部１０、人物密度マップ演算部１１、人物姿勢計測部１２、注視度演算部１３及びニューラルネットワーク部１４−１は、図１に示した構成部と同様であるため、説明を省略する。また、図５に示すステップＳ５０２〜Ｓ５０６は、図２に示したステップＳ２０２〜Ｓ２０６と同様であるから、説明を省略する。 Since the person position measurement unit 10, the person density map calculation unit 11, the person posture measurement unit 12, the gaze calculation unit 13, and the neural network unit 14-1 are the same as the components shown in FIG. 1, description thereof will be omitted. .. Further, since steps S502 to S506 shown in FIG. 5 are the same as steps S202 to S206 shown in FIG. 2, the description thereof will be omitted.

ここで、ニューラルネットワーク部１４−１は、メモリ２０−１から係数更新部２２−１により更新され格納された係数（結合重みｗ_p,q及びバイアス値ｂ_q）を読み出し、係数を用いて、入力層の素子から中間層の素子を介して出力層の素子へと、ニューラルネットワークの演算を実行する。そして、ニューラルネットワーク部１４−１は、遊具の位置推定値を誤差演算部２１に出力する。 _{Here, the neural network unit 14-1 reads out the coefficients (coupling weights w p, q} and bias value b _q ) updated and stored by the coefficient updating unit 22-1 from the memory 20-1, and uses the coefficients. Neural network operations are performed from the elements of the input layer to the elements of the output layer via the elements of the intermediate layer. Then, the neural network unit 14-1 outputs the position estimated value of the playset to the error calculation unit 21.

メモリ２０−１は、書き換え可能なメモリ（例えばＲＡＭ（Random Access Memory））であり、ニューラルネットワーク部１４−１にて用いる係数が一時的に格納される。 The memory 20-1 is a rewritable memory (for example, RAM (Random Access Memory)), and the coefficient used by the neural network unit 14-1 is temporarily stored.

メモリ２０−１に格納された係数は、ニューラルネットワーク部１４−１により読み出され、ニューラルネットワークの入力層の素子から中間層の素子を介して出力層の素子への演算に用いられる。 The coefficient stored in the memory 20-1 is read out by the neural network unit 14-1, and is used for the calculation from the element of the input layer of the neural network to the element of the output layer via the element of the intermediate layer.

位置学習装置２−１の動作開始時点において、メモリ２０−１には、係数の初期値が格納されている。初期値は、予め設定された数値列であってもよいし、乱数（疑似乱数を含む）であってもよい。 At the time when the operation of the position learning device 2-1 starts, the initial value of the coefficient is stored in the memory 20-1. The initial value may be a preset numerical string or a random number (including a pseudo-random number).

誤差演算部２１は、当該位置学習装置２−１が入力する画像と対をなす遊具の位置真値を入力すると共に、ニューラルネットワーク部１４−１から遊具の位置推定値を入力する。そして、誤差演算部２１は、遊具の位置真値と位置推定値との間の誤差を演算し（ステップＳ５０７）、誤差を係数更新部２２−１に出力する。誤差は、例えば、遊具の位置推定値のベクトルと遊具の位置真値のベクトルとの間の差分べクトルである。 The error calculation unit 21 inputs the true position value of the playset paired with the image input by the position learning device 2-1 and also inputs the position estimation value of the playset from the neural network unit 14-1. Then, the error calculation unit 21 calculates an error between the true position value and the estimated position value of the playset (step S507), and outputs the error to the coefficient update unit 22-1. The error is, for example, the difference vector between the vector of the estimated position of the playset and the vector of the true position of the playset.

係数更新部２２−１は、メモリ２０−１から係数を読み出すと共に、誤差演算部２１から誤差を入力する。そして、係数更新部２２−１は、誤差に基づいて、ニューラルネットワーク部１４−１が使用する係数を更新し（ステップＳ５０８）、更新後の係数を、次時点にニューラルネットワーク部１４−１が使用する係数としてメモリ２０−１に格納する。係数更新部２２−１は、例えば誤差伝播法により、誤差に基づいて係数を変更する。 The coefficient updating unit 22-1 reads the coefficient from the memory 20-1 and inputs an error from the error calculation unit 21. Then, the coefficient updating unit 22-1 updates the coefficient used by the neural network unit 14-1 based on the error (step S508), and the updated coefficient is used by the neural network unit 14-1 at the next time point. It is stored in the memory 20-1 as a coefficient to be used. The coefficient updating unit 22-1 changes the coefficient based on the error, for example, by the error propagation method.

係数更新部２２−１は、当該位置学習装置２−１が画像及び遊具の位置真値の学習データを入力する毎に、係数を更新することで最適化し、更新した係数をメモリ２０−１に格納する。 The coefficient updating unit 22-1 is optimized by updating the coefficient each time the position learning device 2-1 inputs learning data of the true position value of the image and the play device, and the updated coefficient is stored in the memory 20-1. Store.

位置学習装置２−１は、学習が完了していない場合（ステップＳ５０９：Ｎ）、ステップＳ５０１へ移行して、新たな画像及び遊具の位置真値の学習データを入力し、ステップＳ５０２〜Ｓ５０８の処理を繰り返す。 When the learning is not completed (step S509: N), the position learning device 2-1 proceeds to step S501, inputs a new image and learning data of the position true value of the play device, and steps S502 to S508. Repeat the process.

一方、位置学習装置２−１は、学習が完了した場合（ステップＳ５０９：Ｙ）、メモリ２０−１から係数を読み出し、学習済み係数として外部へ出力する（ステップＳ５１０）。位置学習装置２−１から出力された学習済み係数は、図１に示した位置推定装置１−１のメモリ１５−１に格納される。 On the other hand, when the learning is completed (step S509: Y), the position learning device 2-1 reads the coefficient from the memory 20-1 and outputs it to the outside as a learned coefficient (step S510). The learned coefficient output from the position learning device 2-1 is stored in the memory 15-1 of the position estimation device 1-1 shown in FIG.

以上のように、実施例１の位置学習装置２−１によれば、画像及びこれに対応する遊具の位置真値を学習データとして入力し、位置推定装置１−１と同様に、人物位置計測部１０は人物毎の位置を計測し、人物密度マップ演算部１１は人物密度マップを演算する。また、人物姿勢計測部１２は人物毎の姿勢を計測し、注視度演算部１３は位置毎の注視度を演算する。 As described above, according to the position learning device 2-1 of the first embodiment, the image and the position true value of the play equipment corresponding thereto are input as learning data, and the person position is measured in the same manner as the position estimation device 1-1. The unit 10 measures the position of each person, and the person density map calculation unit 11 calculates the person density map. Further, the person posture measuring unit 12 measures the posture of each person, and the gaze calculation unit 13 calculates the gaze degree for each position.

ニューラルネットワーク部１４−１は、領域毎の人物密度である人物密度マップ及び位置毎の注視度を入力データとして、メモリ２０−１に格納された係数を用いたニューラルネットワークの演算を行い、遊具の位置推定値を出力データとして求める。 The neural network unit 14-1 uses the person density map, which is the person density for each area, and the gaze degree for each position as input data, and performs the calculation of the neural network using the coefficient stored in the memory 20-1 to perform the calculation of the neural network of the playset. Obtain the position estimate value as output data.

誤差演算部２１は、遊具の位置真値と位置推定値との間の誤差を演算する。係数更新部２２−１は、例えば誤差伝播法により、誤差に基づいて、ニューラルネットワーク部１４−１が使用する係数を更新し、更新後の係数を、次時点にニューラルネットワーク部１４−１が使用する係数としてメモリ２０−１に格納する。 The error calculation unit 21 calculates an error between the true position value and the estimated position value of the playset. The coefficient update unit 22-1 updates the coefficient used by the neural network unit 14-1 based on the error, for example, by the error propagation method, and the updated coefficient is used by the neural network unit 14-1 at the next time point. It is stored in the memory 20-1 as a coefficient to be used.

位置学習装置２−１は、係数を更新する学習処理を繰り返して学習が完了した場合、メモリ２０−１から係数を読み出し、学習済み係数として外部へ出力する。 When the learning process of updating the coefficient is repeated and the learning is completed, the position learning device 2-1 reads the coefficient from the memory 20-1 and outputs the learned coefficient to the outside.

このようにして得られた学習済み係数は、図１に示した位置推定装置１−１のメモリ１５−１に格納され、ニューラルネットワーク部１４−１が遊具の位置推定値を求める際に用いられる。 The learned coefficient thus obtained is stored in the memory 15-1 of the position estimation device 1-1 shown in FIG. 1, and is used when the neural network unit 14-1 obtains the position estimation value of the playset. ..

これにより、位置推定装置１−１が画像内の遊具の位置を推定する際に、人物との干渉等により遊具が完全に隠蔽され、明確に見えていない場合であっても、その位置を推定することが可能となる。 As a result, when the position estimation device 1-1 estimates the position of the playset in the image, the position is estimated even if the playset is completely concealed due to interference with a person or the like and cannot be clearly seen. It becomes possible to do.

〔実施例２〕
次に、実施例２について説明する。前述のとおり、実施例２は、画像から領域毎の人物密度を演算し、領域毎の人物密度を入力データとし、遊具の位置推定値を出力データとしたニューラルネットワークを用いて、遊具の位置を推定するものである。 [Example 2]
Next, Example 2 will be described. As described above, in the second embodiment, the position of the playset is determined by using a neural network in which the person density for each area is calculated from the image, the person density for each area is used as input data, and the position estimated value of the playset is used as output data. It is an estimate.

（位置推定装置／実施例２）
図６は、実施例２の位置推定装置の構成を示すブロック図であり、図７は、その処理を示すフローチャートである。この位置推定装置１−２は、人物位置計測部１０、人物密度マップ演算部１１、ニューラルネットワーク部１４−２及びメモリ１５−２を備えている。 (Position Estimator / Example 2)
FIG. 6 is a block diagram showing the configuration of the position estimation device of the second embodiment, and FIG. 7 is a flowchart showing the processing. The position estimation device 1-2 includes a person position measurement unit 10, a person density map calculation unit 11, a neural network unit 14-2, and a memory 15-2.

図１に示した実施例１の位置推定装置１−１とこの位置推定装置１−２とを比較すると、両位置推定装置１−１，１−２は、人物位置計測部１０及び人物密度マップ演算部１１を備えている点で共通する。 Comparing the position estimation device 1-1 of the first embodiment shown in FIG. 1 with the position estimation device 1-2, both position estimation devices 1-1 and 1-2 have a person position measurement unit 10 and a person density map. It is common in that it includes a calculation unit 11.

一方、位置推定装置１−２は、位置推定装置１−１の人物姿勢計測部１２及び注視度演算部１３を備えておらず、位置推定装置１−１のニューラルネットワーク部１４−１及びメモリ１５−１とは異なるニューラルネットワーク部１４−２及びメモリ１５−２を備えている点で、位置推定装置１−１と相違する。 On the other hand, the position estimation device 1-2 does not include the person posture measurement unit 12 and the gaze calculation unit 13 of the position estimation device 1-1, and the neural network unit 14-1 and the memory 15 of the position estimation device 1-1. It differs from the position estimation device 1-1 in that it has a neural network unit 14-2 and a memory 15-2, which are different from -1.

人物位置計測部１０及び人物密度マップ演算部１１は、図１に示した構成部と同様であるため、説明を省略する。また、図７に示すステップＳ７０１〜Ｓ７０３は、図２に示したステップＳ２０１〜Ｓ２０３と同様であるため、説明を省略する。ここで、人物密度マップ演算部１１は、人物密度マップ（領域毎の人物密度）をニューラルネットワーク部１４−２に出力する。 Since the person position measurement unit 10 and the person density map calculation unit 11 are the same as the constituent units shown in FIG. 1, the description thereof will be omitted. Further, since steps S701 to S703 shown in FIG. 7 are the same as steps S201 to S203 shown in FIG. 2, the description thereof will be omitted. Here, the person density map calculation unit 11 outputs the person density map (person density for each area) to the neural network unit 14-2.

ニューラルネットワーク部１４−２は、メモリ１５−２から学習済み係数を読み出す。また、ニューラルネットワーク部１４−２は、人物密度マップ演算部１１から人物密度マップを入力する。 The neural network unit 14-2 reads the learned coefficient from the memory 15-2. Further, the neural network unit 14-2 inputs the person density map from the person density map calculation unit 11.

ニューラルネットワーク部１４−２は、領域毎の人物密度である人物密度マップを入力データとして、学習済み係数を用いたニューラルネットワークの演算を行い、遊具の位置推定値を出力データとして求める（ステップＳ７０４）。そして、ニューラルネットワーク部１４−２は、遊具の位置推定値を出力する（ステップＳ７０５）。 The neural network unit 14-2 uses the person density map, which is the person density for each region, as input data, performs a neural network calculation using the learned coefficients, and obtains the position estimated value of the play equipment as output data (step S704). .. Then, the neural network unit 14-2 outputs the position estimated value of the playset (step S705).

ニューラルネットワーク部１４−２としては、図１に示したニューラルネットワーク部１４−１と同様に、例えば多層パーセプトロン等を使用することができる。 As the neural network unit 14-2, for example, a multi-layer perceptron or the like can be used in the same manner as the neural network unit 14-1 shown in FIG.

人物位置計測部１０が出力する人物毎の位置の座標系が画像座標系である場合、ニューラルネットワーク部１４−２が出力する遊具の位置推定値の座標系も、画像座標系とすることが典型的であるが、世界座標系等の他の座標系とするようにしてもよい。逆に、人物位置計測部１０が出力する人物毎の位置の座標系が世界座標系である場合、ニューラルネットワーク部１４−２が出力する遊具の位置推定値の座標系も、世界座標系とすることが典型的であるが、画像座標系等の他の座標系とするようにしてもよい。 When the coordinate system of the position of each person output by the person position measuring unit 10 is the image coordinate system, the coordinate system of the position estimated value of the play equipment output by the neural network unit 14-2 is also typically the image coordinate system. However, it may be set to another coordinate system such as the world coordinate system. On the contrary, when the coordinate system of the position of each person output by the person position measuring unit 10 is the world coordinate system, the coordinate system of the position estimated value of the play equipment output by the neural network unit 14-2 is also the world coordinate system. This is typical, but other coordinate systems such as an image coordinate system may be used.

図８は、実施例２において、多層パーセプトロンを用いたニューラルネットワーク部１４−２を説明する図である。このニューラルネットワーク部１４−２は、３層パーセプトロンを用いて構成され、入力層の素子数は４、中間層の素子数は３、出力層の素子数は２である。 FIG. 8 is a diagram illustrating a neural network unit 14-2 using a multi-layer perceptron in the second embodiment. The neural network unit 14-2 is configured by using a three-layer perceptron, and has four elements in the input layer, three elements in the intermediate layer, and two elements in the output layer.

ニューラルネットワーク部１４−２は、４つの領域のそれぞれについて、人物密度（変数）ｘ₁〜ｘ₄を入力層の４個の素子（１〜４番目の素子）に入力する。 The neural network unit 14-2 inputs the person densities (variables) x _{1 to} _{x 4} to the four elements (1st to 4th elements) of the input layer for each of the four regions.

ニューラルネットワーク部１４−２は、入力層の１〜４番目の素子から変数ｘ₁〜ｘ₄をそのまま出力し、中間層の５〜７番目の素子（ｑ番目の素子）において、前記式（４）にて、変数ｘ₁〜ｘ₄等を用いて出力値である変数ｘ₅〜ｘ₇（ｘ_q）を演算する。 _{The neural network unit 14-2 outputs variables x 1 to x 4 as} _{they are from the 1st to} 4th elements of the input layer, and in the 5th to 7th elements (qth element) of the intermediate layer, the above equation (4). ), The output values x _{5 to} _{x 7} (x _q ) are calculated _{using the variables x 1 to} _{x 4 and the like.}

ニューラルネットワーク部１４−２は、中間層の５〜７番目の素子から変数ｘ₅〜ｘ₇を出力し、出力層の８，９番目の素子（ｑ番目の素子）において、前記式（４）にて、変数ｘ₅〜ｘ₇等を用いて出力値である遊具の位置推定値（Ｘ座標、Ｙ座標）（ｘ_q）を演算する。ニューラルネットワーク部１４−２は、出力層の８，９番目の素子から、遊具の位置推定値（Ｘ座標、Ｙ座標）を出力する。図８の例では、Ｃ₅＝Ｃ₆＝Ｃ₇＝｛１，２，３，４｝，Ｃ₈＝Ｃ₉＝｛５，６，７｝である。 The neural network unit 14-2 outputs variables x5 _{to x7 from the 5th to} _7th elements of the intermediate layer, and in the 8th and 9th elements (qth element) of the output layer, the above equation (4) In, the position estimated value (X coordinate, Y coordinate) (x _q ) of the playset, which is the output value, is calculated using the variables x _{5 to} _{x 7 and the like.} The neural network unit 14-2 outputs the position estimated values (X coordinate, Y coordinate) of the playset from the 8th and 9th elements of the output layer. In the example of FIG. 8, C ₅ = C ₆ = C ₇ = {1, 2, 3, 4}, C ₈ = C ₉ = {5, 6, 7}.

図６に戻って、メモリ１５−２には、例えば、後述する位置学習装置２−２によりニューラルネットワーク部１４−２を学習することにより求めた最適な学習済み係数が格納されている。 Returning to FIG. 6, the memory 15-2 stores, for example, the optimum learned coefficient obtained by learning the neural network unit 14-2 by the position learning device 2-2 described later.

メモリ１５−２に格納された学習済み係数は、ニューラルネットワーク部１４−２により読み出され、ニューラルネットワークの入力層の素子から中間層の素子を介して出力層の素子への演算に用いられる。 The learned coefficient stored in the memory 15-2 is read out by the neural network unit 14-2 and used for the calculation from the element of the input layer of the neural network to the element of the output layer via the element of the intermediate layer.

学習済み係数は、必要に応じて外部からメモリ１５−２に書き込まれるようにしてもよいし、読み取り専用のメモリ１５−２に予め求めておいたデータとして設定しておくようにしてもよい。 The learned coefficient may be written to the memory 15-2 from the outside as needed, or may be set as data previously obtained in the read-only memory 15-2.

以上のように、実施例２の位置推定装置１−２によれば、人物位置計測部１０は、画像から人物毎の位置を計測し、人物密度マップ演算部１１は、人物毎の位置から人物密度マップを演算する。 As described above, according to the position estimation device 1-2 of the second embodiment, the person position measurement unit 10 measures the position of each person from the image, and the person density map calculation unit 11 measures the position of each person from the position of each person. Calculate the density map.

ニューラルネットワーク部１４−２は、領域毎の人物密度である人物密度マップを入力データとして、学習済み係数を用いたニューラルネットワークの演算を行い、遊具の位置推定値を出力データとして求める。 The neural network unit 14-2 uses the person density map, which is the person density for each area, as input data, performs a neural network calculation using the learned coefficients, and obtains the position estimated value of the play equipment as output data.

これにより、人物の空間的な分布を用いて、遊具の存在する場所をニューラルネットワークの演算により精度高く推定することができる。つまり、人物との干渉等により遊具が完全に隠蔽され、明確に見えていない場合であっても、その位置を推定することが可能となる。 As a result, the location of the playset can be estimated with high accuracy by the calculation of the neural network using the spatial distribution of the person. That is, the playset is completely concealed due to interference with a person or the like, and even if it is not clearly visible, its position can be estimated.

（位置学習装置／実施例２）
図９は、実施例２の位置学習装置の構成を示すブロック図であり、図１０は、その処理を示すフローチャートである。この位置学習装置２−２は、人物位置計測部１０、人物密度マップ演算部１１、ニューラルネットワーク部１４−２、メモリ２０−２、誤差演算部２１及び係数更新部２２−２を備えている。 (Position learning device / Example 2)
FIG. 9 is a block diagram showing the configuration of the position learning device of the second embodiment, and FIG. 10 is a flowchart showing the processing. The position learning device 2-2 includes a person position measurement unit 10, a person density map calculation unit 11, a neural network unit 14-2, a memory 20-2, an error calculation unit 21, and a coefficient update unit 22-2.

図４に示した実施例１の位置学習装置２−１とこの位置学習装置２−２とを比較すると、両位置学習装置２−１，２−２は、人物位置計測部１０、人物密度マップ演算部１１及び誤差演算部２１を備えている点で共通する。 Comparing the position learning device 2-1 of the first embodiment shown in FIG. 4 with the position learning device 2-2, both position learning devices 2-1 and 2-2 have a person position measuring unit 10 and a person density map. It is common in that it includes a calculation unit 11 and an error calculation unit 21.

一方、位置学習装置２−２は、位置学習装置２−１の人物姿勢計測部１２及び注視度演算部１３を備えておらず、位置学習装置２−１のニューラルネットワーク部１４−１、メモリ２０−１及び係数更新部２２−１とは異なるニューラルネットワーク部１４−２、メモリ２０−２及び係数更新部２２−２を備えている点で、位置学習装置２−１と相違する。 On the other hand, the position learning device 2-2 does not include the person posture measuring unit 12 and the gaze calculation unit 13 of the position learning device 2-1. It differs from the position learning device 2-1 in that it includes a neural network unit 14-2, a memory 20-2, and a coefficient updating unit 22-2, which are different from the -1 and the coefficient updating unit 22-1.

人物位置計測部１０、人物密度マップ演算部１１及び誤差演算部２１は、図４に示した構成部と同様であるため、説明を省略する。また、図１０に示すステップＳ１００１〜Ｓ１００５は、図５に示したステップＳ５０１〜Ｓ５０３、図７に示したステップＳ７０４及び図５に示したステップＳ５０７と同様であるため、説明を省略する。ここで、人物密度マップ演算部１１は、人物密度マップをニューラルネットワーク部１４−２に出力し、誤差演算部２１は、誤差を係数更新部２２−２に出力する。 Since the person position measurement unit 10, the person density map calculation unit 11, and the error calculation unit 21 are the same as the configuration units shown in FIG. 4, description thereof will be omitted. Further, since steps S1001 to S1005 shown in FIG. 10 are the same as steps S501 to S503 shown in FIG. 5, steps S704 shown in FIG. 7, and step S507 shown in FIG. 5, the description thereof will be omitted. Here, the person density map calculation unit 11 outputs the person density map to the neural network unit 14-2, and the error calculation unit 21 outputs the error to the coefficient update unit 22-2.

ここで、ニューラルネットワーク部１４−２は、メモリ２０−２から係数更新部２２−２により更新され格納された係数を読み出し、係数を用いて、入力層の素子から中間層の素子を介して出力層の素子へと、ニューラルネットワークの演算を実行する。そして、ニューラルネットワーク部１４−２は、遊具の位置推定値を誤差演算部２１に出力する。 Here, the neural network unit 14-2 reads out the coefficient updated and stored by the coefficient updating unit 22-2 from the memory 20-2, and outputs the coefficient from the element of the input layer to the element of the intermediate layer via the element of the intermediate layer. Perform neural network operations on the layer elements. Then, the neural network unit 14-2 outputs the position estimated value of the playset to the error calculation unit 21.

メモリ２０−２は、図４に示したメモリ２０−１と同様に、書き換え可能なメモリであり、ニューラルネットワーク部１４−２にて用いる係数が一時的に格納される。 The memory 20-2 is a rewritable memory like the memory 20-1 shown in FIG. 4, and the coefficient used by the neural network unit 14-2 is temporarily stored.

メモリ２０−２に格納された係数は、ニューラルネットワーク部１４−２により読み出され、ニューラルネットワークの入力層の素子から中間層の素子を介して出力層の素子への演算に用いられる。 The coefficients stored in the memory 20-2 are read out by the neural network unit 14-2 and used for the calculation from the element of the input layer of the neural network to the element of the output layer via the element of the intermediate layer.

位置学習装置２−２の動作開始時点において、メモリ２０−２には、係数の初期値が格納されている。初期値は、予め設定された数値列であってもよいし、乱数（疑似乱数を含む）であってもよい。 At the start of operation of the position learning device 2-2, the initial value of the coefficient is stored in the memory 20-2. The initial value may be a preset numerical string or a random number (including a pseudo-random number).

係数更新部２２−２は、メモリ２０−２から係数を読み出すと共に、誤差演算部２１から誤差を入力する。そして、係数更新部２２−２は、誤差に基づいて、ニューラルネットワーク部１４−２が使用する係数を更新し（ステップＳ１００６）、更新後の係数を、次時点にニューラルネットワーク部１４−２が使用する係数としてメモリ２０−２に格納する。係数更新部２２−２は、例えば誤差伝播法により、誤差に基づいて係数を変更する。 The coefficient updating unit 22-2 reads the coefficient from the memory 20-2 and inputs an error from the error calculation unit 21. Then, the coefficient updating unit 22-2 updates the coefficient used by the neural network unit 14-2 based on the error (step S1006), and the updated coefficient is used by the neural network unit 14-2 at the next time point. It is stored in the memory 20-2 as a coefficient to be used. The coefficient updating unit 22-2 changes the coefficient based on the error, for example, by the error propagation method.

係数更新部２２−２は、当該位置学習装置２−２が画像及び遊具の位置真値の学習データを入力する毎に、係数を更新することで最適化し、更新した係数をメモリ２０−２に格納する。 The coefficient updating unit 22-2 optimizes by updating the coefficient each time the position learning device 2-2 inputs the learning data of the position true value of the image and the play device, and the updated coefficient is stored in the memory 20-2. Store.

位置学習装置２−２は、学習が完了していない場合（ステップＳ１００７：Ｎ）、ステップＳ１００１へ移行して、新たな画像及び遊具の位置真値の学習データを入力し、ステップＳ１００２〜Ｓ１００６の処理を繰り返す。 When the learning is not completed (step S1007: N), the position learning device 2-2 proceeds to step S1001, inputs a new image and learning data of the position true value of the play device, and steps S1002 to S1006. Repeat the process.

一方、位置学習装置２−２は、学習が完了した場合（ステップＳ１００７：Ｙ）、メモリ２０−２から係数を読み出し、学習済み係数として外部へ出力する（ステップＳ１００８）。位置学習装置２−２から出力された学習済み係数は、図６に示した位置推定装置１−２のメモリ１５−２に格納される。 On the other hand, when the learning is completed (step S1007: Y), the position learning device 2-2 reads the coefficient from the memory 20-2 and outputs it to the outside as a learned coefficient (step S1008). The learned coefficient output from the position learning device 2-2 is stored in the memory 15-2 of the position estimation device 1-2 shown in FIG.

以上のように、実施例２の位置学習装置２−２によれば、画像及びこれに対応する遊具の位置真値を学習データとして入力し、位置推定装置１−２と同様に、人物位置計測部１０は人物毎の位置を計測し、人物密度マップ演算部１１は人物密度マップを演算する。 As described above, according to the position learning device 2-2 of the second embodiment, the image and the position true value of the play equipment corresponding thereto are input as learning data, and the person position is measured in the same manner as the position estimation device 1-2. The unit 10 measures the position of each person, and the person density map calculation unit 11 calculates the person density map.

ニューラルネットワーク部１４−２は、領域毎の人物密度である人物密度マップを入力データとして、メモリ２０−２に格納された係数を用いたニューラルネットワークの演算を行い、遊具の位置推定値を出力データとして求める。 The neural network unit 14-2 uses the person density map, which is the person density for each area, as input data, performs a neural network calculation using the coefficients stored in the memory 20-2, and outputs the position estimated value of the playset as output data. Ask as.

誤差演算部２１は、遊具の位置真値と位置推定値との間の誤差を演算する。係数更新部２２−２は、例えば誤差伝播法により、誤差に基づいて、ニューラルネットワーク部１４−２が使用する係数を更新し、更新後の係数を、次時点にニューラルネットワーク部１４−２が使用する係数としてメモリ２０−２に格納する。 The error calculation unit 21 calculates an error between the true position value and the estimated position value of the playset. The coefficient update unit 22-2 updates the coefficient used by the neural network unit 14-2 based on the error, for example, by the error propagation method, and the updated coefficient is used by the neural network unit 14-2 at the next time point. It is stored in the memory 20-2 as a coefficient to be used.

位置学習装置２−２は、係数を更新する学習処理を繰り返して学習が完了した場合、メモリ２０−２から係数を読み出し、学習済み係数として外部へ出力する。 When the learning process of updating the coefficient is repeated and the learning is completed, the position learning device 2-2 reads the coefficient from the memory 20-2 and outputs the learned coefficient to the outside.

このようにして得られた学習済み係数は、図６に示した位置推定装置１−２のメモリ１５−２に格納され、ニューラルネットワーク部１４−２が遊具の位置推定値を求める際に用いられる。 The learned coefficient obtained in this way is stored in the memory 15-2 of the position estimation device 1-2 shown in FIG. 6, and is used when the neural network unit 14-2 obtains the position estimation value of the playset. ..

これにより、位置推定装置１−２が画像内の遊具の位置を推定する際に、人物との干渉等により遊具が完全に隠蔽され、明確に見えていない場合であっても、その位置を推定することが可能となる。 As a result, when the position estimation device 1-2 estimates the position of the playset in the image, the position is estimated even if the playset is completely concealed due to interference with a person or the like and cannot be clearly seen. It becomes possible to do.

以上、実施例１，２を挙げて本発明を説明したが、本発明は前記実施例１，２に限定されるものではなく、その技術思想を逸脱しない範囲で種々変形可能である。前記実施例１，２では、位置推定装置１−１，１−２は画像内の遊具の位置を推定し、位置学習装置２−１，２−２は画像及び遊具の位置を学習データとして学習処理を行うようにした。 Although the present invention has been described above with reference to Examples 1 and 2, the present invention is not limited to the above Examples 1 and 2, and can be variously modified without departing from the technical idea. In the first and second embodiments, the position estimation devices 1-1 and 1-2 estimate the position of the playset in the image, and the position learning devices 2-1 and 2-2 learn the positions of the image and the playset as learning data. Changed to perform processing.

これに対し、本発明は、遊具の位置を推定し、遊具の位置を学習データとすることに限定されるものではなく、遊具以外の物体の位置を推定し、物体の位置を学習データとするようにしてもよい。また、本発明は、物体以外の物、例えば人物の位置を推定し、人物の位置を学習データとするようにしてもよい。 On the other hand, the present invention is not limited to estimating the position of the play device and using the position of the play device as learning data, but estimating the position of an object other than the play device and using the position of the object as learning data. You may do so. Further, in the present invention, an object other than an object, for example, the position of a person may be estimated, and the position of the person may be used as learning data.

また、前記実施例１の人物位置計測部１０、人物密度マップ演算部１１、人物姿勢計測部１２及び注視度演算部１３、並びに前記実施例２の人物位置計測部１０及び人物密度マップ演算部１１は、球技に参加している全ての人物を対象として、人物毎の位置等を求めるようにした。 Further, the person position measurement unit 10, the person density map calculation unit 11, the person posture measurement unit 12, and the gaze calculation unit 13 of the first embodiment, and the person position measurement unit 10 and the person density map calculation unit 11 of the second embodiment. To find the position of each person for all the people participating in the ball game.

これに対し、人物位置計測部１０、人物密度マップ演算部１１、人物姿勢計測部１２及び注視度演算部１３は、球技に参加しているチーム毎の人物を対象として、チーム毎に、人物毎の位置等を求めるようにしてもよい。この場合、ニューラルネットワーク部１４−１は、チーム毎の人物密度マップ、及びチーム毎かつ位置毎の注視度を入力データとして、学習済み係数を用いたニューラルネットワークの演算を行う。また、ニューラルネットワーク部１４−２は、チーム毎の人物密度マップを入力データとして、学習済み係数を用いたニューラルネットワークの演算を行う。 On the other hand, the person position measurement unit 10, the person density map calculation unit 11, the person posture measurement unit 12, and the gaze calculation unit 13 target the people of each team participating in the ball game, and for each team, for each person. You may try to find the position of. In this case, the neural network unit 14-1 calculates the neural network using the learned coefficient by using the person density map for each team and the gaze degree for each team and each position as input data. Further, the neural network unit 14-2 uses the person density map for each team as input data and performs the calculation of the neural network using the learned coefficients.

また、前記実施例１のニューラルネットワーク部１４−１は、人物密度マップ及び位置毎の注視度を入力データとして処理を行うようにし、実施例２のニューラルネットワーク部１４−２は、人物密度マップを入力データとして処理を行うようにした。 Further, the neural network unit 14-1 of the first embodiment processes the person density map and the gaze degree for each position as input data, and the neural network unit 14-2 of the second embodiment uses the person density map. Processed as input data.

これに対し、ニューラルネットワーク部１４−１，１４−２は、人物位置計測部１０により計測された人物の位置、または他の構成部により計測された人物の移動速度、人物の移動方向等を入力データとして処理を行うようにしてもよい。この場合、人物速度計測部は、時系列の画像から人物を検出し、人物毎の移動速度を計測し、人物方向計測部は、時系列の画像から人物を検出し、人物毎の移動方向を計測する。 On the other hand, the neural network units 14-1 and 14-2 input the position of the person measured by the person position measuring unit 10, the moving speed of the person measured by other constituent units, the moving direction of the person, and the like. It may be processed as data. In this case, the person speed measuring unit detects a person from the time-series images and measures the moving speed of each person, and the person direction measuring unit detects the person from the time-series images and determines the moving direction of each person. measure.

尚、実施例１，２による位置推定装置１−１，１−２及び位置学習装置２−１，２−２のハードウェア構成としては、通常のコンピュータを使用することができる。位置推定装置１−１，１−２及び位置学習装置２−１，２−２は、ＣＰＵ、ＲＡＭ等の揮発性の記憶媒体、ＲＯＭ等の不揮発性の記憶媒体、及びインターフェース等を備えたコンピュータによって構成される。 As the hardware configuration of the position estimation devices 1-1, 1-2 and the position learning devices 2-1 and 2-2 according to the first and second embodiments, a normal computer can be used. The position estimation devices 1-1, 1-2 and the position learning devices 2-1 and 2-2 are computers equipped with a volatile storage medium such as a CPU and RAM, a non-volatile storage medium such as a ROM, and an interface. Consists of.

位置推定装置１−１に備えた人物位置計測部１０、人物密度マップ演算部１１、人物姿勢計測部１２、注視度演算部１３、ニューラルネットワーク部１４−１及びメモリ１５−１の各機能は、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 The functions of the person position measurement unit 10, the person density map calculation unit 11, the person posture measurement unit 12, the gaze calculation unit 13, the neural network unit 14-1 and the memory 15-1 provided in the position estimation device 1-1 are Each of these is realized by causing the CPU to execute a program that describes these functions.

また、位置推定装置１−２に備えた人物位置計測部１０、人物密度マップ演算部１１、ニューラルネットワーク部１４−２及びメモリ１５−２の各機能も、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 Further, each function of the person position measurement unit 10, the person density map calculation unit 11, the neural network unit 14-2, and the memory 15-2 provided in the position estimation device 1-2 also uses a program describing these functions in the CPU. It is realized by executing each.

また、位置学習装置２−１に備えた人物位置計測部１０、人物密度マップ演算部１１、人物姿勢計測部１２、注視度演算部１３、ニューラルネットワーク部１４−１、メモリ２０−１、誤差演算部２１及び係数更新部２２−１の各機能も、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 Further, the person position measurement unit 10, the person density map calculation unit 11, the person posture measurement unit 12, the gaze calculation unit 13, the neural network unit 14-1, the memory 20-1, and the error calculation provided in the position learning device 2-1. Each function of the unit 21 and the coefficient updating unit 22-1 is also realized by causing the CPU to execute a program describing these functions.

また、位置学習装置２−２に備えた人物位置計測部１０、人物密度マップ演算部１１、ニューラルネットワーク部１４−２、メモリ２０−２、誤差演算部２１及び係数更新部２２−２の各機能も、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 Further, each function of the person position measurement unit 10, the person density map calculation unit 11, the neural network unit 14-2, the memory 20-2, the error calculation unit 21, and the coefficient update unit 22-2 provided in the position learning device 2-2. Is also realized by causing the CPU to execute a program that describes these functions.

これらのプログラムは、前記記憶媒体に格納されており、ＣＰＵに読み出されて実行される。また、これらのプログラムは、磁気ディスク（フロッピー（登録商標）ディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ等）、半導体メモリ等の記憶媒体に格納して頒布することもでき、ネットワークを介して送受信することもできる。 These programs are stored in the storage medium, read by the CPU, and executed. In addition, these programs can be stored and distributed in storage media such as magnetic disks (floppy (registered trademark) disks, hard disks, etc.), optical disks (CD-ROM, DVD, etc.), semiconductor memories, etc., and can be distributed via a network. You can also send and receive.

１−１，１−２位置推定装置
２−１，２−２位置学習装置
１０人物位置計測部
１１人物密度マップ演算部
１２人物姿勢計測部
１３注視度演算部
１４−１，１４−２ニューラルネットワーク部
１５−１，１５−２，２０−１，２０−２メモリ
２１誤差演算部
２２−１，２２−２係数更新部 1-1, 1-2 Position estimation device 2-1, 2-2 Position learning device 10 Person position measurement unit 11 Person density map calculation unit 12 Person posture measurement unit 13 Gaze calculation unit 14-1, 14-2 Neural network Units 15-1, 15-2, 20-1, 20-2 Memory 21 Error calculation unit 22-1,22-2 Coefficient update unit

Claims

In a position estimation device that estimates the position of a predetermined object in an input image,
A person position measuring unit that detects a person based on the image features of the input image and measures the position of each person in a predetermined coordinate system.
A person density map calculation unit that calculates the person density for each area in the predetermined coordinate system from the position of each person measured by the person position measurement unit and generates the person density for each area as a person density map. ,
Using the person density map generated by the person density map calculation unit as input data, a neural network calculation is performed using preset learned coefficients, and a position estimated value indicating an estimated value of the position of the predetermined object is obtained. Neural network part to be obtained as output data and
A position estimation device characterized by being equipped with.

In the position estimation device according to claim 1,
Further, a person posture measuring unit that detects the person based on the image features of the input image and measures the posture of each person, and a person posture measuring unit.
Gaze calculation that calculates the degree to which all the persons gaze at each coordinate position in the predetermined coordinate system as the gaze degree for each position from the posture of each person measured by the person posture measuring unit. With a department,
The neural network unit
Using the person density map generated by the person density map calculation unit and the gaze degree for each position calculated by the gaze calculation unit as input data, the neural network calculation using the learned coefficient is performed. A position estimation device, characterized in that the position estimation value is obtained as the output data.

In a position learning device that inputs an input image and a position true value indicating a true value of a position of a predetermined object corresponding to the input image as training data and obtains a neural network coefficient based on the training data.
A person position measuring unit that detects a person based on the image features of the input image and measures the position of each person in a predetermined coordinate system.
A person density map calculation unit that calculates the person density for each area in the predetermined coordinate system from the position of each person measured by the person position measurement unit and generates the person density for each area as a person density map. ,
Using the person density map generated by the person density map calculation unit as input data, the neural network is calculated using the coefficient, and a position estimated value indicating an estimated value of the position of the predetermined object is obtained as output data. Neural network part and
An error calculation unit that calculates an error between the true position value corresponding to the input image and the position estimation value obtained by the neural network unit.
A coefficient update unit that updates the coefficient of the neural network based on the error calculated by the error calculation unit, and a coefficient update unit.
A position learning device characterized by being equipped with.

In the position learning apparatus according to claim 3,
Further, a person posture measuring unit that detects the person based on the image features of the input image and measures the posture of each person, and a person posture measuring unit.
Gaze calculation that calculates the degree to which all the persons gaze at each coordinate position in the predetermined coordinate system as the gaze degree for each position from the posture of each person measured by the person posture measuring unit. With a department,
The neural network unit
Using the person density map generated by the person density map calculation unit and the gaze degree for each position calculated by the gaze calculation unit as input data, the neural network is calculated using the coefficients, and the operation is performed. A position learning device characterized in that a position estimated value is obtained as output data.

A program for causing a computer to function as the position estimation device according to claim 1 or 2.

A program for causing a computer to function as the position learning device according to claim 3 or 4.