JP7373352B2

JP7373352B2 - Location estimation device, location learning device and program

Info

Publication number: JP7373352B2
Application number: JP2019186619A
Authority: JP
Inventors: 俊枝三須; 秀樹三ツ峰
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2023-11-02
Anticipated expiration: 2039-10-10
Also published as: JP2021064025A

Description

本発明は、入力画像内の物体等の位置を推定する位置推定装置、位置学習装置及びプログラムに関する。 The present invention relates to a position estimation device, a position learning device, and a program for estimating the position of an object or the like in an input image.

従来、物体の位置を推定する技術として、ビーコン、ＲＦＩＤ（Radio Frequency IDentifier）等のタグまたはリフレクタ（再帰性反射素材等）を物体に装着し、当該物体からの電波、赤外線、可視光線等を受動的または能動的にセンシングするものが知られている（例えば、特許文献１を参照）。 Conventionally, as a technology to estimate the position of an object, a tag such as a beacon or RFID (Radio Frequency IDentifier) or a reflector (retroreflective material, etc.) is attached to the object, and radio waves, infrared rays, visible light, etc. from the object are passively emitted. There are known devices that perform passive or active sensing (for example, see Patent Document 1).

特許文献１には、非可視画像から被写体に付された非可視光マーカを検出する処理と、可視画像から被写体を検出する処理とを併用することで、被写体を追跡する技術が開示されている。 Patent Document 1 discloses a technique for tracking a subject by using both a process of detecting an invisible light marker attached to a subject from a non-visible image and a process of detecting the subject from a visible image. .

また、入力画像から画像特徴を抽出し、当該画像特徴が所定の条件に符合するか否かを判別することで、物体の位置を推定する技術がある。この技術は、例えば入力画像からHaar-Like特徴なる画像特徴を抽出し、当該画像特徴をブースティング法によって判別することで、人物の顔領域を抽出するものである（例えば、非特許文献１を参照）。この技術は、デジタルカメラの露出、焦点位置の調整等に活用されている。 Furthermore, there is a technique for estimating the position of an object by extracting image features from an input image and determining whether the image features match a predetermined condition. This technology extracts a person's face area by extracting an image feature, such as a Haar-Like feature, from an input image and determining the image feature using a boosting method (for example, see Non-Patent Document 1). reference). This technology is used to adjust the exposure and focus position of digital cameras.

特開２０１７－２０４７５７号公報JP2017-204757A

P.Viola and M.Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features”, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA, 2001, pp.I-IP.Viola and M.Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features”, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Kauai, HI, USA, 2001, pp.I-I

前述の非特許文献１のような画像特徴に基づく物体検出技術では、物体が他の物体に隠蔽された場合、または照明の当たらない影領域に物体が存在する場合に、物体の位置の検出が困難となる。 In object detection technology based on image features such as the one described in Non-Patent Document 1, it is difficult to detect the position of an object when the object is hidden by another object or when the object exists in a shadow area that is not illuminated. It becomes difficult.

また、前述の特許文献１の技術は、非可視光による能動的なセンシング及び可視光による受動的なセンシング、さらに、能動的なセンシングの正常動作時のオンライン学習を行うものである。これらの相乗効果により、物体の隠蔽状態の変化及び照明状態の変化に対し、頑健な物体の位置の検出が可能となる。 Further, the technology of Patent Document 1 described above performs active sensing using invisible light, passive sensing using visible light, and online learning during normal operation of active sensing. These synergistic effects make it possible to detect the position of an object robustly against changes in the object's concealment state and illumination state.

しかしながら、物体が完全に隠れてしまい、非可視画像から非可視光マーカを観測することができず、また、可視画像から被写体の画像特徴も抽出できない場合には、物体の位置の検出が困難となる。 However, if the object is completely hidden, the invisible light marker cannot be observed from the invisible image, and the image features of the object cannot be extracted from the visible image, it may be difficult to detect the object's position. Become.

そこで、本発明は前記課題を解決するためになされたものであり、その目的は、画像内の物体等の位置を推定する際に、物体等が完全に隠蔽された場合であっても、その位置を推定可能な位置推定装置、位置学習装置及びプログラムを提供することにある。 Therefore, the present invention was made to solve the above problem, and its purpose is to estimate the position of an object, etc. in an image, even if the object, etc. is completely hidden. An object of the present invention is to provide a position estimating device, a position learning device, and a program capable of estimating a position.

前記課題を解決するために、請求項１の位置推定装置は、入力画像内の所定物の位置を推定する位置推定装置において、前記入力画像の画像特徴に基づいて人物を検出し、所定の座標系における前記人物毎の位置を計測する人物位置計測部と、前記人物位置計測部により計測された前記人物毎の位置から、前記所定の座標系における領域毎に人物密度を演算し、前記領域毎の人物密度を人物密度マップとして生成する人物密度マップ演算部と、前記人物密度マップ演算部により生成された前記人物密度マップを入力データとして、予め設定された学習済み係数を用いたニューラルネットワークの演算を行い、前記所定物の位置の推定値を示す位置推定値を出力データとして求めるニューラルネットワーク部と、を備え、前記人物密度マップ演算部が、前記人物の存在し得る領域を格子状に分割し、前記人物毎の位置から各格子に存在する人物の数を求めることで、前記領域毎の人物密度を演算するものであって、前記格子が、隣接格子との間で互いに重なり合うように配置されている、ことを特徴とする。 In order to solve the above problem, a position estimating device according to claim 1 is a position estimating device that estimates the position of a predetermined object in an input image, detects a person based on image features of the input image, and detects a person at predetermined coordinates. a person position measurement unit that measures the position of each person in the system; and a person position measurement unit that calculates a person density for each area in the predetermined coordinate system from the position of each person measured by the person position measurement unit; a person density map calculation unit that generates a person density map as a person density map; and a neural network calculation using preset learned coefficients using the person density map generated by the person density map calculation unit as input data. and a neural network unit that calculates a position estimate value representing an estimated value of the position of the predetermined object as output data , and the person density map calculation unit divides an area in which the person can exist into a grid pattern. , the density of people in each region is calculated by calculating the number of people present in each grid from the position of each person, and the grids are arranged so that adjacent grids overlap each other. It is characterized by the fact that

請求項１の位置推定装置によれば、入力画像内の人物との干渉等により所定物が明確に見えていない場合であっても、人物の空間的な分布に基づいて、所定物の存在する場所を精度高く推定することができる。 According to the position estimation device of claim 1, even if the predetermined object is not clearly visible due to interference with a person in the input image, the presence of the predetermined object can be determined based on the spatial distribution of the person. The location can be estimated with high accuracy.

また、請求項２の位置推定装置は、入力画像内の所定物の位置を推定する位置推定装置において、前記入力画像の画像特徴に基づいて人物を検出し、所定の座標系における前記人物毎の位置を計測する人物位置計測部と、前記人物位置計測部により計測された前記人物毎の位置から、前記所定の座標系における領域毎に人物密度を演算し、前記領域毎の人物密度を人物密度マップとして生成する人物密度マップ演算部と、前記入力画像の画像特徴に基づいて前記人物を検出し、前記人物毎の姿勢を計測する人物姿勢計測部と、前記人物姿勢計測部により計測された前記人物毎の姿勢から、前記所定の座標系におけるそれぞれの座標位置に対して全ての前記人物が注視する度合いを、位置毎の注視度として演算する注視度演算部と、前記人物密度マップ演算部により生成された前記人物密度マップ、及び前記注視度演算部により演算された前記位置毎の注視度を入力データとして、予め設定された学習済み係数を用いたニューラルネットワークの演算を行い、前記所定物の位置の推定値を示す位置推定値を出力データとして求めるニューラルネットワーク部と、を備えたことを特徴とする。 The position estimating device according to claim 2 is a position estimating device that estimates the position of a predetermined object in an input image, detects a person based on image features of the input image, and detects a person based on image features of the input image, and A person position measurement unit that measures the position, and a person density calculated for each area in the predetermined coordinate system from the position of each person measured by the person position measurement unit, and the person density for each area calculated as the person density. a person density map calculation unit that generates a map; a person posture measurement unit that detects the person based on the image characteristics of the input image and measures the posture of each person; A gaze degree calculation unit that calculates the degree of gaze of all the people to each coordinate position in the predetermined coordinate system from the posture of each person as a gaze degree for each position, and the person density map calculation unit. Using the generated person density map and the gaze degree for each position calculated by the gaze degree calculation unit as input data, a neural network calculation is performed using preset learned coefficients to determine the predetermined object. The present invention is characterized by comprising a neural network unit that obtains a position estimate value indicating an estimated position value as output data .

請求項２の位置推定装置によれば、入力画像内の人物との干渉等により所定物が明確に見えていない場合であっても、人物の空間的な分布と、各人物の姿勢から得られた人物が注目する領域とに基づいて、所定物の存在する場所を精度高く推定することができる。 According to the position estimating device of claim 2, even if a predetermined object is not clearly visible due to interference with a person in the input image, the position estimation device can be obtained from the spatial distribution of the person and the posture of each person. The location where the predetermined object is present can be estimated with high accuracy based on the area that the person is paying attention to.

また、請求項３の位置学習装置は、入力画像と、当該入力画像に対応する所定物の位置の真値を示す位置真値とを学習データとして入力し、当該学習データに基づいてニューラルネットワークの係数を求める位置学習装置において、前記入力画像の画像特徴に基づいて人物を検出し、所定の座標系における前記人物毎の位置を計測する人物位置計測部と、前記人物位置計測部により計測された前記人物毎の位置から、前記所定の座標系における領域毎に人物密度を演算し、前記領域毎の人物密度を人物密度マップとして生成する人物密度マップ演算部と、前記人物密度マップ演算部により生成された前記人物密度マップを入力データとして、前記係数を用いた前記ニューラルネットワークの演算を行い、前記所定物の位置の推定値を示す位置推定値を出力データとして求めるニューラルネットワーク部と、前記入力画像に対応する前記位置真値と、前記ニューラルネットワーク部により求めた前記位置推定値との間の誤差を演算する誤差演算部と、前記誤差演算部により演算された前記誤差に基づいて、前記ニューラルネットワークの前記係数を更新する係数更新部と、を備え、前記人物密度マップ演算部が、前記人物の存在し得る領域を格子状に分割し、前記人物毎の位置から各格子に存在する人物の数を求めることで、前記領域毎の人物密度を演算するものであって、前記格子は、隣接格子との間で互いに重なり合うように配置されている、ことを特徴とする。 Further, the position learning device according to claim 3 inputs an input image and a position true value indicating the true value of the position of a predetermined object corresponding to the input image as learning data, and performs a neural network based on the learning data. In a position learning device for calculating coefficients, a person position measuring unit detects a person based on image characteristics of the input image and measures the position of each person in a predetermined coordinate system; a person density map calculation unit that calculates a person density for each area in the predetermined coordinate system from the position of each person, and generates the person density map for each area as a person density map; a neural network unit that uses the calculated person density map as input data to perform calculations on the neural network using the coefficients, and obtains a position estimate value indicating an estimated value of the position of the predetermined object as output data; an error calculation section that calculates an error between the true position value corresponding to the position value and the estimated position value obtained by the neural network section; a coefficient updating unit that updates the coefficients of , and the person density map calculation unit divides the area where the person can exist into grids, and calculates the number of people existing in each grid from the position of each person. The density of people in each region is calculated by determining the density of people in each region, and the grid is characterized in that the grids are arranged so as to overlap each other with adjacent grids.

請求項３の位置学習装置によれば、画像及び当該画像に対応する所定物の位置真値を学習データとして、請求項１の位置推定装置にて用いる学習済み係数を求めることができる。 According to the position learning device according to the third aspect, learned coefficients used in the position estimating device according to the first aspect can be obtained by using an image and a true position value of a predetermined object corresponding to the image as learning data.

また、請求項４の位置学習装置は、入力画像と、当該入力画像に対応する所定物の位置の真値を示す位置真値とを学習データとして入力し、当該学習データに基づいてニューラルネットワークの係数を求める位置学習装置において、前記入力画像の画像特徴に基づいて人物を検出し、所定の座標系における前記人物毎の位置を計測する人物位置計測部と、前記人物位置計測部により計測された前記人物毎の位置から、前記所定の座標系における領域毎に人物密度を演算し、前記領域毎の人物密度を人物密度マップとして生成する人物密度マップ演算部と、前記入力画像の画像特徴に基づいて前記人物を検出し、前記人物毎の姿勢を計測する人物姿勢計測部と、前記人物姿勢計測部により計測された前記人物毎の姿勢から、前記所定の座標系におけるそれぞれの座標位置に対して全ての前記人物が注視する度合いを、位置毎の注視度として演算する注視度演算部と、前記人物密度マップ演算部により生成された前記人物密度マップ、及び前記注視度演算部により演算された前記位置毎の注視度を入力データとして、前記係数を用いた前記ニューラルネットワークの演算を行い、前記所定物の位置の推定値を示す位置推定値を出力データとして求めるニューラルネットワーク部と、前記入力画像に対応する前記位置真値と、前記ニューラルネットワーク部により求めた前記位置推定値との間の誤差を演算する誤差演算部と、前記誤差演算部により演算された前記誤差に基づいて、前記ニューラルネットワークの前記係数を更新する係数更新部と、を備えたことを特徴とする。 Further, the position learning device according to claim 4 inputs an input image and a true position value indicating the true value of the position of a predetermined object corresponding to the input image as learning data, and performs a neural network based on the learning data. In a position learning device for calculating coefficients, a person position measuring unit detects a person based on image characteristics of the input image and measures the position of each person in a predetermined coordinate system; a person density map calculation unit that calculates a person density for each region in the predetermined coordinate system from the position of each person, and generates the person density for each region as a person density map, based on image characteristics of the input image; a human posture measuring section that detects the person and measures the posture of each person; and a human posture measuring section that detects the person and measures the posture of each person, and from the posture of each person measured by the human posture measuring section, for each coordinate position in the predetermined coordinate system. a gaze degree calculation unit that calculates the degree of gaze of all the people as a gaze degree for each position; the person density map generated by the person density map calculation unit; and the person density map calculated by the gaze degree calculation unit. a neural network unit that uses the degree of gaze for each position as input data to perform calculations on the neural network using the coefficients, and obtains a position estimate value indicating an estimated value of the position of the predetermined object as output data; an error calculation section that calculates an error between the corresponding true position value and the estimated position value obtained by the neural network section; The present invention is characterized by comprising a coefficient updating unit that updates the coefficients .

請求項４の位置学習装置によれば、画像及び当該画像に対応する所定物の位置真値を学習データとして、請求項２の位置推定装置にて用いる学習済み係数を求めることができる。 According to the position learning device according to the fourth aspect, learned coefficients used in the position estimating device according to the second aspect can be obtained by using an image and the true position value of a predetermined object corresponding to the image as learning data.

さらに、請求項５のプログラムは、コンピュータを、請求項１または２に記載の位置推定装置として機能させることを特徴とする。 Furthermore, the program according to claim 5 causes a computer to function as the position estimating device according to claim 1 or 2.

さらに、請求項６のプログラムは、コンピュータを、請求項３または４に記載の位置学習装置として機能させることを特徴とする。 Furthermore, the program according to claim 6 causes a computer to function as the position learning device according to claim 3 or 4.

以上のように、本発明によれば、画像内の物体等の位置を推定する際に、物体等が完全に隠蔽された場合であっても、その位置を推定することが可能となる。 As described above, according to the present invention, when estimating the position of an object in an image, it is possible to estimate the position even if the object is completely hidden.

実施例１の位置推定装置の構成を示すブロック図である。1 is a block diagram showing the configuration of a position estimating device according to a first embodiment. FIG. 実施例１の位置推定装置の処理を示すフローチャートである。5 is a flowchart illustrating processing of the position estimating device according to the first embodiment. 実施例１のニューラルネットワーク部を説明する図である。FIG. 2 is a diagram illustrating a neural network section of Example 1. FIG. 実施例１の位置学習装置の構成を示すブロック図である。1 is a block diagram showing the configuration of a position learning device of Example 1. FIG. 実施例１の位置学習装置の処理を示すフローチャートである。3 is a flowchart illustrating processing of the position learning device according to the first embodiment. 実施例２の位置推定装置の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a position estimating device according to a second embodiment. 実施例２の位置推定装置の処理を示すフローチャートである。7 is a flowchart illustrating the processing of the position estimating device according to the second embodiment. 実施例２のニューラルネットワーク部を説明する図である。FIG. 7 is a diagram illustrating a neural network section of Example 2. 実施例２の位置学習装置の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a position learning device according to a second embodiment. 実施例２の位置学習装置の処理を示すフローチャートである。7 is a flowchart showing the processing of the position learning device according to the second embodiment.

以下、本発明を実施するための形態について図面を用いて詳細に説明する。以下に説明する実施例１の位置推定装置は、画像内の競技等に用いる遊具（サッカーボール、テニスボール、アイスホッケーのパック、バドミントンのシャトル、カーリングのストーン等を含む）の位置を推定する際に、画像内の人物毎の位置を計測し、領域毎の人物密度を演算し、画像内の人物毎の姿勢を計測し、位置毎の（当該位置に対する全人物による）注視度を演算し、領域毎の人物密度及び位置毎の注視度を入力データとしてニューラルネットワークの演算を行い、遊具の位置の推定値（遊具の位置推定値）を出力データとして求める。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments for carrying out the present invention will be described in detail using the drawings. The position estimation device of Example 1, which will be described below, is used to estimate the position of play equipment used for competitions (including soccer balls, tennis balls, ice hockey pucks, badminton shuttles, curling stones, etc.) in images. Measure the position of each person in the image, calculate the density of people in each area, measure the posture of each person in the image, calculate the degree of gaze (by all people for the position) for each position, A neural network calculation is performed using the density of people for each region and the degree of gaze for each position as input data, and an estimated value of the position of the play equipment (estimated value of the position of the play equipment) is obtained as output data.

実施例２の位置推定装置は、画像内の人物毎の位置を計測し、領域毎の人物密度を演算し、領域毎の人物密度を入力データとしてニューラルネットワークの演算を行い、遊具の位置推定値を出力データとして求める。 The position estimation device of the second embodiment measures the position of each person in an image, calculates the density of people in each area, uses the density of people in each area as input data to perform neural network calculations, and calculates the estimated position of the play equipment. is obtained as output data.

実施例１，２の位置学習装置は、画像及び当該画像に対応する遊具の位置の真値（遊具の位置真値）を学習データとして、対応する位置推定装置に備えたニューラルネットワークにて使用する最適な係数（結合重み及びバイアス値）を求める。 The position learning devices of Examples 1 and 2 use an image and the true value of the position of the play equipment corresponding to the image (true position value of the play equipment) as learning data in a neural network provided in the corresponding position estimation device. Find the optimal coefficients (coupling weights and bias values).

これにより、遊具が人物に隠れている、または照明の当たらない影領域に遊具が存在する等、遊具が完全に隠蔽された場合であっても、遊具の位置を推定することが可能となる。 This makes it possible to estimate the position of the play equipment even when the play equipment is completely hidden, such as when the play equipment is hidden by a person or the play equipment is present in a shadow area that is not illuminated.

〔実施例１〕
まず、実施例１について説明する。前述のとおり、実施例１は、画像から、領域毎の人物密度及び位置毎の注視度を演算し、これらのデータを入力データとし、遊具の位置推定値を出力データとしたニューラルネットワークを用いて、遊具の位置を推定するものである。 [Example 1]
First, Example 1 will be described. As mentioned above, the first embodiment uses a neural network that calculates the density of people for each area and the degree of gaze for each position from the image, uses these data as input data, and uses the estimated position of the play equipment as output data. , which estimates the position of play equipment.

（位置推定装置／実施例１）
図１は、実施例１の位置推定装置の構成を示すブロック図であり、図２は、その処理を示すフローチャートである。この位置推定装置１－１は、人物位置計測部１０、人物密度マップ演算部１１、人物姿勢計測部１２、注視度演算部１３、ニューラルネットワーク部１４－１及びメモリ１５－１を備えている。 (Position estimation device/Example 1)
FIG. 1 is a block diagram showing the configuration of a position estimating device according to a first embodiment, and FIG. 2 is a flowchart showing its processing. This position estimation device 1-1 includes a person position measurement section 10, a person density map calculation section 11, a person posture measurement section 12, a gaze degree calculation section 13, a neural network section 14-1, and a memory 15-1.

位置推定装置１－１は、画像を入力し、この入力画像を処理してニューラルネットワークの演算を行い、遊具の位置推定値を出力する。画像は、例えば、地面上に設定されるコート内において、選手である人物がボール、パック、シャトル、ストーン等の遊具を用いて行うサッカー、ホッケー、バドミントン、カーリング等の球技の画像である。 The position estimating device 1-1 inputs an image, processes the input image, performs neural network calculations, and outputs an estimated position value of the play equipment. The image is, for example, an image of a ball game such as soccer, hockey, badminton, curling, etc., played by players using play equipment such as balls, pucks, shuttles, and stones in a court set on the ground.

人物位置計測部１０は、画像を入力し（ステップＳ２０１）、この入力画像の画像特徴に基づいて人物を抽出し、人物毎の位置（座標）を計測する（ステップＳ２０２）。そして、人物位置計測部１０は、人物毎の位置を人物密度マップ演算部１１及び注視度演算部１３に出力する。 The person position measurement unit 10 inputs an image (step S201), extracts a person based on the image characteristics of this input image, and measures the position (coordinates) of each person (step S202). Then, the person position measurement section 10 outputs the position of each person to the person density map calculation section 11 and the gaze degree calculation section 13.

人物の数は１人であってもよいし、複数人であってもよい。また、人物の位置は、入力した画像の座標系（以下、「画像座標系」という。）における人物が存在する座標であってもよいし、人物が存在する空間に設定された座標系（以下、「世界座標系」という。）における座標であってもよい。 The number of people may be one or more than one. Furthermore, the position of the person may be the coordinates where the person exists in the coordinate system of the input image (hereinafter referred to as "image coordinate system"), or the coordinate system set in the space where the person exists (hereinafter referred to as "image coordinate system"). , referred to as the "world coordinate system").

人物位置計測部１０は、例えば、全球測位衛星システム（ＧＮＮＳ：Global Navigation Satellite System、例えばＧＰＳ（Global Positioning System））の受信機であってもよい。また、人物位置計測部１０は、顔画像認識等の技術を用いて画像から被写体の人物を抽出し、当該画像を撮影したカメラの位置及び姿勢情報に基づいて、人物の位置を推定する方式（ステレオ画像による測位であってもよいし、単眼画像及び拘束条件(例えば、被写体である人物の足元が所定の面に接している等)から測位するものであってもよい）を用いるようにしてもよい。また、無線タグ（ＲＦＩＤ等）を人物に装着して測位するものであってもよい。 The person position measurement unit 10 may be, for example, a receiver of a Global Navigation Satellite System (GNNS), such as a GPS (Global Positioning System). In addition, the person position measurement unit 10 uses a method (such as a method) that extracts a person as a subject from an image using technology such as facial image recognition, and estimates the person's position based on the position and orientation information of the camera that captured the image. Positioning may be performed using stereo images, or may be performed using monocular images and constraint conditions (for example, the subject's feet are in contact with a predetermined surface). Good too. Furthermore, positioning may be performed by attaching a wireless tag (RFID or the like) to a person.

人物密度マップ演算部１１は、人物位置計測部１０から人物毎の位置を入力し、人物毎の位置に基づいて、画像座標系または世界座標系における局所的な領域毎に人口密度（人物密度）を演算し、人物密度マップを生成する（ステップＳ２０３）。そして、人物密度マップ演算部１１は、人物密度マップ（領域毎の人物密度）をニューラルネットワーク部１４－１に出力する。 The person density map calculation unit 11 inputs the position of each person from the person position measurement unit 10, and calculates the population density (person density) for each local area in the image coordinate system or the world coordinate system based on the position of each person. is calculated to generate a person density map (step S203). Then, the person density map calculation unit 11 outputs the person density map (person density for each region) to the neural network unit 14-1.

人物密度マップ演算部１１は、例えば、人物の存在し得る領域（例えば、サッカー場、テニスコート等の競技場）を縦横の格子状に分割し、人物毎の位置から各格子に存在する人物の数を求めることで、領域毎の人物密度を演算するようにしてもよい。格子の総数がＬ個の場合、人物密度マップ演算部１１は、各格子に対する人物密度を要素とするＬ次元のベクトルを生成し、これを人物密度マップとしてもよい。 For example, the person density map calculation unit 11 divides an area where people can exist (for example, a stadium such as a soccer field or a tennis court) into a horizontal and vertical grid, and calculates the number of people present in each grid based on the position of each person. By calculating the number, the density of people in each area may be calculated. When the total number of grids is L, the person density map calculation unit 11 may generate an L-dimensional vector whose element is the person density for each grid, and use this as the person density map.

尚、格子は、互いに重ならないように配置してもよいし、例えば１ｍ四方の部分領域を０．５ｍ四方の間隔で設定した格子を配置する等、隣接格子が互いに重なり合うよう配置してもよい。 The grids may be arranged so that they do not overlap each other, or they may be arranged so that adjacent grids overlap each other, for example, by arranging grids in which partial areas of 1 m square are set at intervals of 0.5 m square. .

人物姿勢計測部１２は、画像を入力し、この入力画像の画像特徴に基づいて人物頭部を抽出し、人物頭部から人物毎の姿勢（顔の向き）を計測する（ステップＳ２０４）。そして、人物姿勢計測部１２は、人物毎の姿勢を注視度演算部１３に出力する。人物の姿勢は、画像座標系において計測してもよいし、世界座標系において計測してもよい。 The human posture measurement unit 12 inputs an image, extracts a human head based on the image characteristics of the input image, and measures the posture (face orientation) of each person from the human head (step S204). Then, the person posture measurement section 12 outputs the posture of each person to the gaze degree calculation section 13. The posture of a person may be measured in the image coordinate system or in the world coordinate system.

人物姿勢計測部１２は、例えば、画像の色ヒストグラムに基づく識別結果と、色ヒストグラム以外の特徴量（例えば、勾配ヒストグラム）に基づく識別結果とを統合化し、顔の向きを推定する技術を用いるようにしてもよい。この顔の向きを推定する技術は既知であり、例えば特開２０１８－２２４１６号公報を参照されたい。 For example, the human posture measurement unit 12 integrates the identification result based on the color histogram of the image and the identification result based on the feature amount other than the color histogram (for example, a gradient histogram), and uses a technology to estimate the face orientation. You can also do this. This technique of estimating the direction of the face is known, and please refer to, for example, Japanese Patent Application Publication No. 2018-22416.

尚、人物姿勢計測部１２は、姿勢を計測すべき人物の位置を特定するために、人物位置計測部１０から人物毎の位置を入力するようにしてもよい。 Note that the human posture measuring section 12 may input the position of each person from the human position measuring section 10 in order to specify the position of the person whose posture is to be measured.

注視度演算部１３は、人物位置計測部１０から人物毎の位置を入力すると共に、人物姿勢計測部１２から人物毎の姿勢を入力する。そして、注視度演算部１３は、人物毎の位置及び人物毎の姿勢に基づいて、画像座標系または世界座標系における予め設定されたそれぞれの座標位置に対して全人物が注視する度合い（注視度）、すなわち位置（座標）毎の注視度を演算する（ステップＳ２０５）。注視度演算部１３は、位置毎の注視度をニューラルネットワーク部１４－１に出力する。位置毎の注視度は、画像座標系または世界座標系における位置毎に、当該位置に対する各人物による注視度の合計値である。 The gaze degree calculation unit 13 receives the position of each person from the person position measurement unit 10 and the posture of each person from the person posture measurement unit 12. Then, based on the position of each person and the posture of each person, the gaze degree calculation unit 13 calculates the degree to which all the people gaze at each preset coordinate position in the image coordinate system or the world coordinate system (gazing degree). ), that is, the degree of gaze for each position (coordinate) is calculated (step S205). The gaze degree calculation unit 13 outputs the gaze degree for each position to the neural network unit 14-1. The degree of gaze for each position is the total value of the degree of gaze by each person for each position in the image coordinate system or the world coordinate system.

人物毎の位置をＸ_kとし、人物毎の姿勢をＰ_kとし、画像座標系または世界座標系における位置をＹとし、位置Ｙの注視度をＧ（Ｙ）とする。ｋは、人物を区別するためのインデックスであり、１以上Ｋ以下の整数とする。Ｋのｋの最大値とする。 Let the position of each person be _Xk , the posture of each person be _Pk , the position in the image coordinate system or the world coordinate system be Y, and the degree of gaze at position Y be G(Y). k is an index for distinguishing people, and is an integer from 1 to K, inclusive. Let it be the maximum value of k of K.

ここでは、人物毎の位置Ｘ_kは、世界座標における３次元の位置べクトルとする。また、人物毎の姿勢Ｐ_kは、人物の特定箇所に固定した方向べクトル（例えば、顔面に固定した顔の向きのべクトル、眼球に固定した視線べクトル）とし、単位ベクトルとする。位置Ｙは、世界座標における位置べクトルとする。 Here, the position X _k of each person is assumed to be a three-dimensional position vector in world coordinates. In addition, the posture P _k of each person is a directional vector fixed to a specific part of the person (for example, a face direction vector fixed to the face, a line of sight vector fixed to the eyeballs), and is a unit vector. The position Y is a position vector in world coordinates.

この場合、注視度演算部１３は、例えば関数ｆを用いた以下の式にて、位置Ｙ毎の注視度Ｇ（Ｙ）を演算する。

In this case, the gaze level calculation unit 13 calculates the gaze level G(Y) for each position Y using the following equation using the function f, for example.

関数ｆは、引数である（Ｙ－Ｘ_k）のベクトル及び人物毎の姿勢Ｐ_kについて、これらの２つのベクトルが同じ方向を向くほど、大きい（または小さくない）スカラー値をとるものとするのが望ましい。また、関数ｆは、（Ｙ－Ｘ_k）のベクトルが短いほど、大きい（または小さくない）スカラー値をとるものとするのが望ましい。 Assume that the function f takes on a scalar value that is larger (or not smaller) with respect to the argument (Y-X _k ) vector and the pose P _k of each person, the more these two vectors point in the same direction. is desirable. Further, it is desirable that the function f takes on a larger (or not smaller) scalar value as the vector of (Y−X _k ) is shorter.

例えば、関数ｆは、以下の式のように、ベクトルの内積（２つのベクトルのなす角θの余弦値（ｃｏｓθ））を演算するものであってもよい。ベクトルＱ_k＝Ｙ－Ｘ_kとする。

For example, the function f may be one that calculates the inner product of vectors (the cosine value (cos θ) of the angle θ formed by two vectors), as in the following equation. Let vector Q _k =Y−X _k .

また、関数ｆは、以下の式のように演算するものであってもよい。

Further, the function f may be calculated as shown in the following equation.

ニューラルネットワーク部１４－１は、メモリ１５－１から学習済み係数（結合重み及びバイアス値）を読み出す。また、ニューラルネットワーク部１４－１は、人物密度マップ演算部１１から人物密度マップ（領域毎の人物密度）を入力すると共に、注視度演算部１３から位置毎の注視度を入力する。 The neural network unit 14-1 reads learned coefficients (coupling weights and bias values) from the memory 15-1. Further, the neural network unit 14-1 inputs a person density map (person density for each region) from the person density map calculation unit 11, and also inputs the degree of gaze for each position from the degree of gaze calculation unit 13.

ニューラルネットワーク部１４－１は、人物密度マップ及び位置毎の注視度を入力データとして、学習済み係数を用いたニューラルネットワークの演算を行い、遊具の位置推定値（遊具の位置を示すベクトル）を出力データとして求める（ステップＳ２０６）。そして、ニューラルネットワーク部１４－１は、遊具の位置推定値を出力する（ステップＳ２０７）。 The neural network unit 14-1 uses the person density map and the degree of gaze for each position as input data, performs neural network calculations using learned coefficients, and outputs the estimated position of the play equipment (vector indicating the position of the play equipment). It is obtained as data (step S206). Then, the neural network unit 14-1 outputs the estimated position of the play equipment (step S207).

ニューラルネットワーク部１４－１としては、例えば、多層パーセプトロン、畳み込みニューラルネットワーク、リカレントニューラルネットワーク、ポップフィールドネットワーク、それらの組み合わせ等、任意のものを使用することができる。 As the neural network unit 14-1, for example, any one such as a multilayer perceptron, a convolutional neural network, a recurrent neural network, a pop field network, or a combination thereof can be used.

人物位置計測部１０が出力する人物毎の位置の座標系が画像座標系である場合、ニューラルネットワーク部１４－１が出力する遊具の位置推定値の座標系も、画像座標系とすることが典型的であるが、世界座標系等の他の座標系とするようにしてもよい。逆に、人物位置計測部１０が出力する人物毎の位置の座標系が世界座標系である場合、ニューラルネットワーク部１４－１が出力する遊具の位置推定値の座標系も、世界座標系とすることが典型的であるが、画像座標系等の他の座標系とするようにしてもよい。 When the coordinate system of the position of each person outputted by the person position measurement unit 10 is an image coordinate system, the coordinate system of the estimated position of the play equipment outputted by the neural network unit 14-1 is typically also set to the image coordinate system. However, other coordinate systems such as the world coordinate system may be used. Conversely, if the coordinate system of the position of each person outputted by the person position measurement section 10 is the world coordinate system, the coordinate system of the estimated position of the play equipment outputted by the neural network section 14-1 is also set to the world coordinate system. Although this is typical, other coordinate systems such as an image coordinate system may be used.

領域毎の人物密度である人物密度マップにおける領域の数をＬ（Ｌは２以上の整数）とし、位置毎の注視度における位置の数をＭ（Ｍは０以上の整数）とする。尚、Ｍ＝０は、後述する実施例２の場合に相当するため、ここでは、Ｍは１以上の整数とする。 Let the number of regions in the person density map, which is the person density for each region, be L (L is an integer greater than or equal to 2), and the number of positions in the degree of gaze for each location be M (M is an integer greater than or equal to 0). Note that since M=0 corresponds to the case of Example 2, which will be described later, M is an integer of 1 or more here.

例えば、ニューラルネットワーク部１４－１が多層パーセプトロンを用いる場合、ニューラルネットワーク部１４－１は、Ｌ個の領域毎の人物密度及びＭ個の位置毎の注視度を入力データとして、多層パーセプトロンの入力層におけるＮ個（Ｎ＝Ｌ＋Ｍ）の素子に入力する。 For example, when the neural network unit 14-1 uses a multilayer perceptron, the neural network unit 14-1 uses the human density for each of L regions and the degree of gaze for each M position as input data, and uses the input layer of the multilayer perceptron as input data. input to N elements (N=L+M) at.

図３は、実施例１において、多層パーセプトロンを用いたニューラルネットワーク部１４－１を説明する図であり、Ｌ＝４，Ｍ＝４，Ｎ＝８の場合を示している。このニューラルネットワーク部１４－１は、３層パーセプトロンを用いて構成され、入力層の素子数は８、中間層の素子数は３、出力層の素子数は２である。 FIG. 3 is a diagram illustrating the neural network unit 14-1 using a multilayer perceptron in the first embodiment, and shows a case where L=4, M=4, and N=8. This neural network section 14-1 is configured using a three-layer perceptron, and has eight elements in the input layer, three elements in the intermediate layer, and two elements in the output layer.

ニューラルネットワーク部１４－１は、４つの領域のそれぞれについて、人物密度（変数）ｘ₁～ｘ₄を入力層の４個の素子（１～４番目の素子）に入力する。また、ニューラルネットワーク部１４－１は、４つの位置のそれぞれについて、注視度（変数）ｘ₅～ｘ₈を入力層の４個の素子（５～８番目の素子）に入力する。 The neural network unit 14-1 inputs the person densities (variables) x ₁ to x ₄ to four elements (first to fourth elements) of the input layer for each of the four regions. Further, the neural network unit 14-1 inputs the degrees of gaze (variables) x ₅ to x ₈ to four elements (fifth to eighth elements) of the input layer for each of the four positions.

ニューラルネットワーク部１４－１は、入力層の１～８番目の素子から変数ｘ₁～ｘ₈をそのまま出力し、中間層の９～１１番目の素子（ｑ番目の素子）において、以下の式にて、変数ｘ₁～ｘ₈等を用いて出力値である変数ｘ₉～ｘ₁₁（ｘ_q）を演算する。

The neural network unit 14-1 outputs the variables x ₁ to x ₈ as they are from the 1st to 8th elements of the input layer, and uses the following formula in the 9th to 11th elements (q-th element) of the intermediate layer. Then, variables x ₉ to x ₁₁ (x _q ), which are output values, are calculated using variables x ₁ to x ₈ and the like.

ここで、Ｃ_qは、ｑ番目の素子の入力に接続される素子の番号の集合を示し、ｗ_p,qは、ｐ番目の素子の出力とｑ番目の素子の入力との間の結合重みを示す。また、ｂ_qは、ｑ番目の素子の入力として与えるバイアス値を示し、φ_qは、ｑ番目の素子の活性化関数を示す。 Here, C _q indicates a set of element numbers connected to the input of the q-th element, and w _p,q is the coupling weight between the output of the p-th element and the input of the q-th element. shows. Further, b _q indicates a bias value given as an input to the q-th element, and φ _q indicates an activation function of the q-th element.

ニューラルネットワーク部１４－１は、中間層の９～１１番目の素子から変数ｘ₉～ｘ₁₁を出力し、出力層の１２，１３番目の素子（ｑ番目の素子）において、前記式（４）にて、変数ｘ₉～ｘ₁₁等を用いて出力値である遊具の位置推定値（Ｘ座標、Ｙ座標）（ｘ_q）を演算する。ニューラルネットワーク部１４－１は、出力層の１２，１３番目の素子から、遊具の位置推定値（Ｘ座標、Ｙ座標）を出力する。 The neural network unit 14-1 outputs variables x ₉ to x ₁₁ from the 9th to 11th elements of the intermediate layer, and in the 12th and 13th elements (q-th element) of the output layer, formula (4) , the estimated position value (X coordinate, Y coordinate) (x _q ) of the play equipment, which is an output value, is calculated using variables x ₉ to x ₁₁ and the like. The neural network unit 14-1 outputs the estimated position value (X coordinate, Y coordinate) of the play equipment from the 12th and 13th elements of the output layer.

図３の例では、Ｃ₉＝Ｃ₁₀＝Ｃ₁₁＝｛１，２，３，４，５，６，７，８｝，Ｃ₁₂＝Ｃ₁₃＝｛９，１０，１１｝であり、活性化関数φ_qは、任意のものを使用することができる。例えば、活性化関数φ_qとして、ReLU関数（Rectified Linear Unit：半波整流関数：φ_q(ｘ)＝max｛0,ｘ｝）、シグモイド関数、双曲線正接関数（φ_q(ｘ)＝tanhｘ）を用いることができる。活性化関数φ_qは、素子毎に異なる関数を混在させてもよいし、全ての素子に同じ関数を用いてもよい。 In the example of FIG. 3, C ₉ =C ₁₀ =C ₁₁ ={1,2,3,4,5,6,7,8}, C ₁₂ =C ₁₃ ={9,10,11}, and the activity Any arbitrary function φ _q can be used. For example, as the activation function φ _q , the ReLU function (Rectified Linear Unit: half-wave rectification function: φ _q (x) = max {0, x}), the sigmoid function, the hyperbolic tangent function (φ _q (x) = tanhx) can be used. For the activation function φ _q , different functions may be mixed for each element, or the same function may be used for all elements.

図１に戻って、メモリ１５－１には、例えば、後述する位置学習装置２－１によりニューラルネットワーク部１４－１を学習することにより求めた最適な学習済み係数（結合重みｗ_p,q及びバイアス値ｂ_q）が格納されている。 Returning to FIG. 1, the memory 15-1 stores, for example, optimal learned coefficients (connection weights w _p,q and Bias value b _q ) is stored.

メモリ１５－１に格納された学習済み係数は、ニューラルネットワーク部１４－１により読み出され、ニューラルネットワークの入力層の素子から中間層の素子を介して出力層の素子への演算に用いられる。 The learned coefficients stored in the memory 15-1 are read out by the neural network section 14-1 and used for calculations from the input layer elements of the neural network to the output layer elements via the intermediate layer elements.

学習済み係数は、必要に応じて外部からメモリ１５－１に書き込まれるようにしてもよいし、読み取り専用のメモリ１５－１に予め求めておいたデータとして設定しておくようにしてもよい。 The learned coefficients may be written into the memory 15-1 from the outside as necessary, or may be set as predetermined data in the read-only memory 15-1.

以上のように、実施例１の位置推定装置１－１によれば、人物位置計測部１０は、画像から人物毎の位置を計測し、人物密度マップ演算部１１は、人物毎の位置から人物密度マップを演算する。また、人物姿勢計測部１２は、画像から人物毎の姿勢を計測し、注視度演算部１３は、人物毎の位置及び人物毎の姿勢から、所定の座標系における位置毎の注視度を演算する。 As described above, according to the position estimation device 1-1 of the first embodiment, the person position measurement unit 10 measures the position of each person from the image, and the person density map calculation unit 11 measures the position of each person from the position of each person. Compute the density map. Further, the person posture measurement unit 12 measures the posture of each person from the image, and the gaze degree calculation unit 13 calculates the degree of gaze for each position in a predetermined coordinate system from the position of each person and the posture of each person. .

ニューラルネットワーク部１４－１は、領域毎の人物密度である人物密度マップ及び位置毎の注視度を入力データとして、学習済み係数を用いたニューラルネットワークの演算を行い、遊具の位置推定値を出力データとして求める。 The neural network unit 14-1 performs neural network calculations using learned coefficients using a person density map that is the density of people in each area and the degree of gaze for each position as input data, and outputs the estimated position value of the play equipment. Find it as.

これにより、人物の空間的な分布と、各人物の姿勢から得られた位置毎の注視度とを用いて、遊具の存在する場所を、ニューラルネットワークの演算により精度高く推定することができる。つまり、人物との干渉等により遊具が完全に隠蔽され、明確に見えていない場合であっても、その位置を推定することが可能となる。 Thereby, using the spatial distribution of people and the degree of gaze for each position obtained from the posture of each person, the location where the play equipment is present can be estimated with high accuracy by calculation of the neural network. In other words, even if the play equipment is completely hidden and cannot be clearly seen due to interference with a person, it is possible to estimate its position.

（位置学習装置／実施例１）
図４は、実施例１の位置学習装置の構成を示すブロック図であり、図５は、その処理を示すフローチャートである。この位置学習装置２－１は、人物位置計測部１０、人物密度マップ演算部１１、人物姿勢計測部１２、注視度演算部１３、ニューラルネットワーク部１４－１、メモリ２０－１、誤差演算部２１及び係数更新部２２－１を備えている。 (Position learning device/Example 1)
FIG. 4 is a block diagram showing the configuration of the position learning device according to the first embodiment, and FIG. 5 is a flowchart showing its processing. This position learning device 2-1 includes a person position measurement section 10, a person density map calculation section 11, a person posture measurement section 12, a gaze degree calculation section 13, a neural network section 14-1, a memory 20-1, an error calculation section 21 and a coefficient updating section 22-1.

位置学習装置２－１は、画像、及び当該画像に対応する真の遊具の位置である遊具の位置真値を対にして学習データとして１回以上入力する（ステップＳ５０１）。そして、位置学習装置２－１は、例えば誤差逆伝播法によりニューラルネットワークを学習し、最適な学習済み係数（結合重みｗ_p,q及びバイアス値ｂ_q）を求める。 The position learning device 2-1 inputs the image and the true position value of the play equipment, which is the true position of the play equipment corresponding to the image, as a pair at least once as learning data (step S501). Then, the position learning device 2-1 learns the neural network using, for example, an error backpropagation method, and obtains optimal learned coefficients (coupling weights w _p,q and bias values b _q ).

人物位置計測部１０、人物密度マップ演算部１１、人物姿勢計測部１２、注視度演算部１３及びニューラルネットワーク部１４－１は、図１に示した構成部と同様であるため、説明を省略する。また、図５に示すステップＳ５０２～Ｓ５０６は、図２に示したステップＳ２０２～Ｓ２０６と同様であるから、説明を省略する。 The person position measurement section 10, the person density map calculation section 11, the person posture measurement section 12, the gaze degree calculation section 13, and the neural network section 14-1 are the same as the components shown in FIG. 1, so their explanation will be omitted. . Further, steps S502 to S506 shown in FIG. 5 are the same as steps S202 to S206 shown in FIG. 2, so the explanation will be omitted.

ここで、ニューラルネットワーク部１４－１は、メモリ２０－１から係数更新部２２－１により更新され格納された係数（結合重みｗ_p,q及びバイアス値ｂ_q）を読み出し、係数を用いて、入力層の素子から中間層の素子を介して出力層の素子へと、ニューラルネットワークの演算を実行する。そして、ニューラルネットワーク部１４－１は、遊具の位置推定値を誤差演算部２１に出力する。 Here, the neural network unit 14-1 reads out the coefficients (coupling weight w _p,q and bias value b _q ) updated and stored by the coefficient update unit 22-1 from the memory 20-1, and uses the coefficients to Neural network operations are executed from the input layer elements to the output layer elements via the intermediate layer elements. The neural network section 14-1 then outputs the estimated position of the play equipment to the error calculation section 21.

メモリ２０－１は、書き換え可能なメモリ（例えばＲＡＭ（Random Access Memory））であり、ニューラルネットワーク部１４－１にて用いる係数が一時的に格納される。 The memory 20-1 is a rewritable memory (for example, RAM (Random Access Memory)), and temporarily stores coefficients used by the neural network unit 14-1.

メモリ２０－１に格納された係数は、ニューラルネットワーク部１４－１により読み出され、ニューラルネットワークの入力層の素子から中間層の素子を介して出力層の素子への演算に用いられる。 The coefficients stored in the memory 20-1 are read out by the neural network unit 14-1 and used for calculations from the input layer elements of the neural network to the output layer elements via the intermediate layer elements.

位置学習装置２－１の動作開始時点において、メモリ２０－１には、係数の初期値が格納されている。初期値は、予め設定された数値列であってもよいし、乱数（疑似乱数を含む）であってもよい。 At the start of operation of the position learning device 2-1, initial values of coefficients are stored in the memory 20-1. The initial value may be a preset numerical string or may be a random number (including pseudo-random numbers).

誤差演算部２１は、当該位置学習装置２－１が入力する画像と対をなす遊具の位置真値を入力すると共に、ニューラルネットワーク部１４－１から遊具の位置推定値を入力する。そして、誤差演算部２１は、遊具の位置真値と位置推定値との間の誤差を演算し（ステップＳ５０７）、誤差を係数更新部２２－１に出力する。誤差は、例えば、遊具の位置推定値のベクトルと遊具の位置真値のベクトルとの間の差分べクトルである。 The error calculation unit 21 inputs the true position value of the play equipment that is paired with the image input by the position learning device 2-1, and also inputs the estimated position value of the play equipment from the neural network unit 14-1. Then, the error calculation unit 21 calculates the error between the true position value and the estimated position value of the play equipment (step S507), and outputs the error to the coefficient update unit 22-1. The error is, for example, a vector of differences between a vector of estimated position values of the play equipment and a vector of true position values of the play equipment.

係数更新部２２－１は、メモリ２０－１から係数を読み出すと共に、誤差演算部２１から誤差を入力する。そして、係数更新部２２－１は、誤差に基づいて、ニューラルネットワーク部１４－１が使用する係数を更新し（ステップＳ５０８）、更新後の係数を、次時点にニューラルネットワーク部１４－１が使用する係数としてメモリ２０－１に格納する。係数更新部２２－１は、例えば誤差伝播法により、誤差に基づいて係数を変更する。 The coefficient updating section 22-1 reads out the coefficients from the memory 20-1 and inputs the error from the error calculation section 21. Then, the coefficient update unit 22-1 updates the coefficients used by the neural network unit 14-1 based on the error (step S508), and the updated coefficients are used by the neural network unit 14-1 at the next time. It is stored in the memory 20-1 as a coefficient. The coefficient update unit 22-1 changes the coefficients based on the error, for example, using an error propagation method.

係数更新部２２－１は、当該位置学習装置２－１が画像及び遊具の位置真値の学習データを入力する毎に、係数を更新することで最適化し、更新した係数をメモリ２０－１に格納する。 The coefficient update unit 22-1 performs optimization by updating the coefficients every time the position learning device 2-1 inputs learning data of the true position values of images and play equipment, and stores the updated coefficients in the memory 20-1. Store.

位置学習装置２－１は、学習が完了していない場合（ステップＳ５０９：Ｎ）、ステップＳ５０１へ移行して、新たな画像及び遊具の位置真値の学習データを入力し、ステップＳ５０２～Ｓ５０８の処理を繰り返す。 If the learning is not completed (step S509: N), the position learning device 2-1 moves to step S501, inputs a new image and the learning data of the true position value of the play equipment, and performs steps S502 to S508. Repeat the process.

一方、位置学習装置２－１は、学習が完了した場合（ステップＳ５０９：Ｙ）、メモリ２０－１から係数を読み出し、学習済み係数として外部へ出力する（ステップＳ５１０）。位置学習装置２－１から出力された学習済み係数は、図１に示した位置推定装置１－１のメモリ１５－１に格納される。 On the other hand, when the learning is completed (step S509: Y), the position learning device 2-1 reads the coefficients from the memory 20-1 and outputs them to the outside as learned coefficients (step S510). The learned coefficients output from the position learning device 2-1 are stored in the memory 15-1 of the position estimating device 1-1 shown in FIG.

以上のように、実施例１の位置学習装置２－１によれば、画像及びこれに対応する遊具の位置真値を学習データとして入力し、位置推定装置１－１と同様に、人物位置計測部１０は人物毎の位置を計測し、人物密度マップ演算部１１は人物密度マップを演算する。また、人物姿勢計測部１２は人物毎の姿勢を計測し、注視度演算部１３は位置毎の注視度を演算する。 As described above, according to the position learning device 2-1 of the first embodiment, an image and the true position value of the play equipment corresponding thereto are input as learning data, and similarly to the position estimating device 1-1, the person position can be measured. A unit 10 measures the position of each person, and a person density map calculation unit 11 calculates a person density map. Further, the human posture measurement section 12 measures the posture of each person, and the gaze degree calculation section 13 calculates the gaze degree for each position.

ニューラルネットワーク部１４－１は、領域毎の人物密度である人物密度マップ及び位置毎の注視度を入力データとして、メモリ２０－１に格納された係数を用いたニューラルネットワークの演算を行い、遊具の位置推定値を出力データとして求める。 The neural network section 14-1 performs neural network calculations using the coefficients stored in the memory 20-1, using the human density map that is the human density for each area and the degree of gaze for each position as input data, and calculates the play equipment. Obtain the position estimate as output data.

誤差演算部２１は、遊具の位置真値と位置推定値との間の誤差を演算する。係数更新部２２－１は、例えば誤差伝播法により、誤差に基づいて、ニューラルネットワーク部１４－１が使用する係数を更新し、更新後の係数を、次時点にニューラルネットワーク部１４－１が使用する係数としてメモリ２０－１に格納する。 The error calculation unit 21 calculates the error between the true position value and the estimated position value of the play equipment. The coefficient update unit 22-1 updates the coefficients used by the neural network unit 14-1 based on the error by, for example, an error propagation method, and uses the updated coefficients at the next time point. It is stored in the memory 20-1 as a coefficient.

位置学習装置２－１は、係数を更新する学習処理を繰り返して学習が完了した場合、メモリ２０－１から係数を読み出し、学習済み係数として外部へ出力する。 When the position learning device 2-1 repeats the learning process of updating the coefficients and learning is completed, the position learning device 2-1 reads the coefficients from the memory 20-1 and outputs them to the outside as learned coefficients.

このようにして得られた学習済み係数は、図１に示した位置推定装置１－１のメモリ１５－１に格納され、ニューラルネットワーク部１４－１が遊具の位置推定値を求める際に用いられる。 The learned coefficients obtained in this way are stored in the memory 15-1 of the position estimating device 1-1 shown in FIG. 1, and are used when the neural network unit 14-1 calculates the estimated position of the play equipment. .

これにより、位置推定装置１－１が画像内の遊具の位置を推定する際に、人物との干渉等により遊具が完全に隠蔽され、明確に見えていない場合であっても、その位置を推定することが可能となる。 As a result, when the position estimation device 1-1 estimates the position of the play equipment in the image, it can estimate the position even if the play equipment is completely hidden due to interference with a person and cannot be clearly seen. It becomes possible to do so.

〔実施例２〕
次に、実施例２について説明する。前述のとおり、実施例２は、画像から領域毎の人物密度を演算し、領域毎の人物密度を入力データとし、遊具の位置推定値を出力データとしたニューラルネットワークを用いて、遊具の位置を推定するものである。 [Example 2]
Next, Example 2 will be explained. As mentioned above, in the second embodiment, the density of people in each region is calculated from the image, and the position of the play equipment is determined using a neural network that uses the density of people in each region as input data and the estimated position of the play equipment as output data. It is estimated.

（位置推定装置／実施例２）
図６は、実施例２の位置推定装置の構成を示すブロック図であり、図７は、その処理を示すフローチャートである。この位置推定装置１－２は、人物位置計測部１０、人物密度マップ演算部１１、ニューラルネットワーク部１４－２及びメモリ１５－２を備えている。 (Position estimation device/Example 2)
FIG. 6 is a block diagram showing the configuration of a position estimating device according to the second embodiment, and FIG. 7 is a flowchart showing its processing. This position estimation device 1-2 includes a person position measurement section 10, a person density map calculation section 11, a neural network section 14-2, and a memory 15-2.

図１に示した実施例１の位置推定装置１－１とこの位置推定装置１－２とを比較すると、両位置推定装置１－１，１－２は、人物位置計測部１０及び人物密度マップ演算部１１を備えている点で共通する。 Comparing the position estimating device 1-1 of the first embodiment shown in FIG. They are common in that they include an arithmetic unit 11.

一方、位置推定装置１－２は、位置推定装置１－１の人物姿勢計測部１２及び注視度演算部１３を備えておらず、位置推定装置１－１のニューラルネットワーク部１４－１及びメモリ１５－１とは異なるニューラルネットワーク部１４－２及びメモリ１５－２を備えている点で、位置推定装置１－１と相違する。 On the other hand, the position estimation device 1-2 does not include the human posture measurement unit 12 and the gaze degree calculation unit 13 of the position estimation device 1-1, and does not include the neural network unit 14-1 and the memory 15 of the position estimation device 1-1. The position estimation device 1-1 is different from the position estimation device 1-1 in that it includes a neural network unit 14-2 and a memory 15-2, which are different from the position estimation device 1-1.

人物位置計測部１０及び人物密度マップ演算部１１は、図１に示した構成部と同様であるため、説明を省略する。また、図７に示すステップＳ７０１～Ｓ７０３は、図２に示したステップＳ２０１～Ｓ２０３と同様であるため、説明を省略する。ここで、人物密度マップ演算部１１は、人物密度マップ（領域毎の人物密度）をニューラルネットワーク部１４－２に出力する。 The person position measurement unit 10 and the person density map calculation unit 11 are the same as the components shown in FIG. 1, and therefore their description will be omitted. Furthermore, steps S701 to S703 shown in FIG. 7 are the same as steps S201 to S203 shown in FIG. 2, so the explanation will be omitted. Here, the person density map calculation unit 11 outputs a person density map (person density for each region) to the neural network unit 14-2.

ニューラルネットワーク部１４－２は、メモリ１５－２から学習済み係数を読み出す。また、ニューラルネットワーク部１４－２は、人物密度マップ演算部１１から人物密度マップを入力する。 Neural network unit 14-2 reads learned coefficients from memory 15-2. Further, the neural network unit 14-2 receives the person density map from the person density map calculation unit 11.

ニューラルネットワーク部１４－２は、領域毎の人物密度である人物密度マップを入力データとして、学習済み係数を用いたニューラルネットワークの演算を行い、遊具の位置推定値を出力データとして求める（ステップＳ７０４）。そして、ニューラルネットワーク部１４－２は、遊具の位置推定値を出力する（ステップＳ７０５）。 The neural network unit 14-2 uses the human density map, which is the human density for each region, as input data, performs neural network calculations using learned coefficients, and obtains the estimated position of the play equipment as output data (step S704). . Then, the neural network unit 14-2 outputs the estimated position of the play equipment (step S705).

ニューラルネットワーク部１４－２としては、図１に示したニューラルネットワーク部１４－１と同様に、例えば多層パーセプトロン等を使用することができる。 As the neural network unit 14-2, for example, a multilayer perceptron or the like can be used, similar to the neural network unit 14-1 shown in FIG.

人物位置計測部１０が出力する人物毎の位置の座標系が画像座標系である場合、ニューラルネットワーク部１４－２が出力する遊具の位置推定値の座標系も、画像座標系とすることが典型的であるが、世界座標系等の他の座標系とするようにしてもよい。逆に、人物位置計測部１０が出力する人物毎の位置の座標系が世界座標系である場合、ニューラルネットワーク部１４－２が出力する遊具の位置推定値の座標系も、世界座標系とすることが典型的であるが、画像座標系等の他の座標系とするようにしてもよい。 When the coordinate system of the position of each person outputted by the person position measurement unit 10 is an image coordinate system, the coordinate system of the estimated position of the play equipment outputted by the neural network unit 14-2 is typically also set to the image coordinate system. However, other coordinate systems such as the world coordinate system may be used. Conversely, if the coordinate system of the position of each person outputted by the person position measurement section 10 is the world coordinate system, the coordinate system of the estimated position of the play equipment outputted by the neural network section 14-2 is also set to the world coordinate system. Although this is typical, other coordinate systems such as an image coordinate system may be used.

図８は、実施例２において、多層パーセプトロンを用いたニューラルネットワーク部１４－２を説明する図である。このニューラルネットワーク部１４－２は、３層パーセプトロンを用いて構成され、入力層の素子数は４、中間層の素子数は３、出力層の素子数は２である。 FIG. 8 is a diagram illustrating the neural network unit 14-2 using a multilayer perceptron in the second embodiment. This neural network unit 14-2 is configured using a three-layer perceptron, and the number of elements in the input layer is four, the number of elements in the intermediate layer is three, and the number of elements in the output layer is two.

ニューラルネットワーク部１４－２は、４つの領域のそれぞれについて、人物密度（変数）ｘ₁～ｘ₄を入力層の４個の素子（１～４番目の素子）に入力する。 The neural network unit 14-2 inputs the person densities (variables) x ₁ to x ₄ to four elements (first to fourth elements) of the input layer for each of the four regions.

ニューラルネットワーク部１４－２は、入力層の１～４番目の素子から変数ｘ₁～ｘ₄をそのまま出力し、中間層の５～７番目の素子（ｑ番目の素子）において、前記式（４）にて、変数ｘ₁～ｘ₄等を用いて出力値である変数ｘ₅～ｘ₇（ｘ_q）を演算する。 The neural network unit 14-2 outputs the variables x ₁ to x ₄ as they are from the first to fourth elements of the input layer, and calculates the formula (4) in the fifth to seventh elements (q-th element) of the intermediate layer. ), variables x ₅ to x ₇ (x _q ), which are output values, are calculated using variables x ₁ to x ₄ , etc.

ニューラルネットワーク部１４－２は、中間層の５～７番目の素子から変数ｘ₅～ｘ₇を出力し、出力層の８，９番目の素子（ｑ番目の素子）において、前記式（４）にて、変数ｘ₅～ｘ₇等を用いて出力値である遊具の位置推定値（Ｘ座標、Ｙ座標）（ｘ_q）を演算する。ニューラルネットワーク部１４－２は、出力層の８，９番目の素子から、遊具の位置推定値（Ｘ座標、Ｙ座標）を出力する。図８の例では、Ｃ₅＝Ｃ₆＝Ｃ₇＝｛１，２，３，４｝，Ｃ₈＝Ｃ₉＝｛５，６，７｝である。 The neural network unit 14-2 outputs variables x ₅ to x ₇ from the 5th to 7th elements of the intermediate layer, and in the 8th and 9th elements (q-th element) of the output layer, formula (4) , the estimated position value (X coordinate, Y coordinate) (x _q ) of the play equipment, which is an output value, is calculated using variables x ₅ to x ₇ and the like. The neural network unit 14-2 outputs the estimated position value (X coordinate, Y coordinate) of the play equipment from the 8th and 9th elements of the output layer. In the example of FIG. 8, C ₅ =C ₆ =C ₇ ={1, 2, 3, 4}, and C ₈ =C ₉ ={5, 6, 7}.

図６に戻って、メモリ１５－２には、例えば、後述する位置学習装置２－２によりニューラルネットワーク部１４－２を学習することにより求めた最適な学習済み係数が格納されている。 Returning to FIG. 6, the memory 15-2 stores, for example, optimal learned coefficients obtained by learning the neural network section 14-2 using a position learning device 2-2, which will be described later.

メモリ１５－２に格納された学習済み係数は、ニューラルネットワーク部１４－２により読み出され、ニューラルネットワークの入力層の素子から中間層の素子を介して出力層の素子への演算に用いられる。 The learned coefficients stored in the memory 15-2 are read out by the neural network unit 14-2 and used for calculations from the input layer elements of the neural network to the output layer elements via the intermediate layer elements.

学習済み係数は、必要に応じて外部からメモリ１５－２に書き込まれるようにしてもよいし、読み取り専用のメモリ１５－２に予め求めておいたデータとして設定しておくようにしてもよい。 The learned coefficients may be written into the memory 15-2 from the outside as necessary, or may be set as data obtained in advance in the read-only memory 15-2.

以上のように、実施例２の位置推定装置１－２によれば、人物位置計測部１０は、画像から人物毎の位置を計測し、人物密度マップ演算部１１は、人物毎の位置から人物密度マップを演算する。 As described above, according to the position estimation device 1-2 of the second embodiment, the person position measurement unit 10 measures the position of each person from the image, and the person density map calculation unit 11 measures the position of each person from the position of each person. Compute the density map.

ニューラルネットワーク部１４－２は、領域毎の人物密度である人物密度マップを入力データとして、学習済み係数を用いたニューラルネットワークの演算を行い、遊具の位置推定値を出力データとして求める。 The neural network unit 14-2 uses the human density map, which is the human density for each region, as input data, performs neural network calculations using learned coefficients, and obtains the estimated position of the play equipment as output data.

これにより、人物の空間的な分布を用いて、遊具の存在する場所をニューラルネットワークの演算により精度高く推定することができる。つまり、人物との干渉等により遊具が完全に隠蔽され、明確に見えていない場合であっても、その位置を推定することが可能となる。 Thereby, using the spatial distribution of people, the location where the play equipment is present can be estimated with high accuracy through neural network calculations. In other words, even if the play equipment is completely hidden and cannot be clearly seen due to interference with a person, it is possible to estimate its position.

（位置学習装置／実施例２）
図９は、実施例２の位置学習装置の構成を示すブロック図であり、図１０は、その処理を示すフローチャートである。この位置学習装置２－２は、人物位置計測部１０、人物密度マップ演算部１１、ニューラルネットワーク部１４－２、メモリ２０－２、誤差演算部２１及び係数更新部２２－２を備えている。 (Position learning device/Example 2)
FIG. 9 is a block diagram showing the configuration of a position learning device according to the second embodiment, and FIG. 10 is a flowchart showing its processing. This position learning device 2-2 includes a person position measurement section 10, a person density map calculation section 11, a neural network section 14-2, a memory 20-2, an error calculation section 21, and a coefficient update section 22-2.

図４に示した実施例１の位置学習装置２－１とこの位置学習装置２－２とを比較すると、両位置学習装置２－１，２－２は、人物位置計測部１０、人物密度マップ演算部１１及び誤差演算部２１を備えている点で共通する。 Comparing the position learning device 2-1 of the first embodiment shown in FIG. They are common in that they include a calculation section 11 and an error calculation section 21.

一方、位置学習装置２－２は、位置学習装置２－１の人物姿勢計測部１２及び注視度演算部１３を備えておらず、位置学習装置２－１のニューラルネットワーク部１４－１、メモリ２０－１及び係数更新部２２－１とは異なるニューラルネットワーク部１４－２、メモリ２０－２及び係数更新部２２－２を備えている点で、位置学習装置２－１と相違する。 On the other hand, the position learning device 2-2 does not include the human posture measurement unit 12 and the gaze degree calculation unit 13 of the position learning device 2-1, and does not include the neural network unit 14-1 and the memory 20 of the position learning device 2-1. The position learning device 2-1 is different from the position learning device 2-1 in that it includes a neural network unit 14-2, a memory 20-2, and a coefficient update unit 22-2 that are different from the position learning device 2-1 and the coefficient update unit 22-1.

人物位置計測部１０、人物密度マップ演算部１１及び誤差演算部２１は、図４に示した構成部と同様であるため、説明を省略する。また、図１０に示すステップＳ１００１～Ｓ１００５は、図５に示したステップＳ５０１～Ｓ５０３、図７に示したステップＳ７０４及び図５に示したステップＳ５０７と同様であるため、説明を省略する。ここで、人物密度マップ演算部１１は、人物密度マップをニューラルネットワーク部１４－２に出力し、誤差演算部２１は、誤差を係数更新部２２－２に出力する。 The person position measurement unit 10, the person density map calculation unit 11, and the error calculation unit 21 are the same as the components shown in FIG. 4, and therefore their description will be omitted. Furthermore, steps S1001 to S1005 shown in FIG. 10 are the same as steps S501 to S503 shown in FIG. 5, step S704 shown in FIG. 7, and step S507 shown in FIG. 5, and therefore the description thereof will be omitted. Here, the person density map calculation unit 11 outputs the person density map to the neural network unit 14-2, and the error calculation unit 21 outputs the error to the coefficient update unit 22-2.

ここで、ニューラルネットワーク部１４－２は、メモリ２０－２から係数更新部２２－２により更新され格納された係数を読み出し、係数を用いて、入力層の素子から中間層の素子を介して出力層の素子へと、ニューラルネットワークの演算を実行する。そして、ニューラルネットワーク部１４－２は、遊具の位置推定値を誤差演算部２１に出力する。 Here, the neural network unit 14-2 reads out the coefficients updated and stored by the coefficient update unit 22-2 from the memory 20-2, and uses the coefficients to output the coefficients from the input layer elements via the intermediate layer elements. Perform neural network operations on the elements of the layer. The neural network section 14-2 then outputs the estimated position of the play equipment to the error calculation section 21.

メモリ２０－２は、図４に示したメモリ２０－１と同様に、書き換え可能なメモリであり、ニューラルネットワーク部１４－２にて用いる係数が一時的に格納される。 The memory 20-2 is a rewritable memory similar to the memory 20-1 shown in FIG. 4, and temporarily stores coefficients used in the neural network unit 14-2.

メモリ２０－２に格納された係数は、ニューラルネットワーク部１４－２により読み出され、ニューラルネットワークの入力層の素子から中間層の素子を介して出力層の素子への演算に用いられる。 The coefficients stored in the memory 20-2 are read out by the neural network unit 14-2 and used for calculations from the input layer elements of the neural network to the output layer elements via the intermediate layer elements.

位置学習装置２－２の動作開始時点において、メモリ２０－２には、係数の初期値が格納されている。初期値は、予め設定された数値列であってもよいし、乱数（疑似乱数を含む）であってもよい。 At the start of the operation of the position learning device 2-2, initial values of coefficients are stored in the memory 20-2. The initial value may be a preset numerical string or may be a random number (including pseudo-random numbers).

係数更新部２２－２は、メモリ２０－２から係数を読み出すと共に、誤差演算部２１から誤差を入力する。そして、係数更新部２２－２は、誤差に基づいて、ニューラルネットワーク部１４－２が使用する係数を更新し（ステップＳ１００６）、更新後の係数を、次時点にニューラルネットワーク部１４－２が使用する係数としてメモリ２０－２に格納する。係数更新部２２－２は、例えば誤差伝播法により、誤差に基づいて係数を変更する。 The coefficient updating section 22-2 reads out the coefficients from the memory 20-2 and inputs the error from the error calculation section 21. Then, the coefficient update unit 22-2 updates the coefficients used by the neural network unit 14-2 based on the error (step S1006), and the updated coefficients are used by the neural network unit 14-2 at the next point in time. It is stored in the memory 20-2 as a coefficient. The coefficient updating unit 22-2 changes the coefficients based on the error, for example, using an error propagation method.

係数更新部２２－２は、当該位置学習装置２－２が画像及び遊具の位置真値の学習データを入力する毎に、係数を更新することで最適化し、更新した係数をメモリ２０－２に格納する。 The coefficient update unit 22-2 performs optimization by updating the coefficients each time the position learning device 2-2 inputs the learning data of the image and the true position value of the play equipment, and stores the updated coefficients in the memory 20-2. Store.

位置学習装置２－２は、学習が完了していない場合（ステップＳ１００７：Ｎ）、ステップＳ１００１へ移行して、新たな画像及び遊具の位置真値の学習データを入力し、ステップＳ１００２～Ｓ１００６の処理を繰り返す。 If the learning is not completed (step S1007: N), the position learning device 2-2 moves to step S1001, inputs a new image and the learning data of the true position value of the play equipment, and performs steps S1002 to S1006. Repeat the process.

一方、位置学習装置２－２は、学習が完了した場合（ステップＳ１００７：Ｙ）、メモリ２０－２から係数を読み出し、学習済み係数として外部へ出力する（ステップＳ１００８）。位置学習装置２－２から出力された学習済み係数は、図６に示した位置推定装置１－２のメモリ１５－２に格納される。 On the other hand, when the learning is completed (step S1007: Y), the position learning device 2-2 reads the coefficients from the memory 20-2 and outputs them to the outside as learned coefficients (step S1008). The learned coefficients output from the position learning device 2-2 are stored in the memory 15-2 of the position estimating device 1-2 shown in FIG.

以上のように、実施例２の位置学習装置２－２によれば、画像及びこれに対応する遊具の位置真値を学習データとして入力し、位置推定装置１－２と同様に、人物位置計測部１０は人物毎の位置を計測し、人物密度マップ演算部１１は人物密度マップを演算する。 As described above, according to the position learning device 2-2 of the second embodiment, an image and the true position value of the play equipment corresponding thereto are input as learning data, and similarly to the position estimation device 1-2, the position estimation device 2-2 performs human position measurement. A unit 10 measures the position of each person, and a person density map calculation unit 11 calculates a person density map.

ニューラルネットワーク部１４－２は、領域毎の人物密度である人物密度マップを入力データとして、メモリ２０－２に格納された係数を用いたニューラルネットワークの演算を行い、遊具の位置推定値を出力データとして求める。 The neural network unit 14-2 performs neural network calculations using coefficients stored in the memory 20-2 using the human density map, which is the human density for each area, as input data, and outputs the estimated position of the play equipment as output data. Find it as.

誤差演算部２１は、遊具の位置真値と位置推定値との間の誤差を演算する。係数更新部２２－２は、例えば誤差伝播法により、誤差に基づいて、ニューラルネットワーク部１４－２が使用する係数を更新し、更新後の係数を、次時点にニューラルネットワーク部１４－２が使用する係数としてメモリ２０－２に格納する。 The error calculation unit 21 calculates the error between the true position value and the estimated position value of the play equipment. The coefficient update unit 22-2 updates the coefficients used by the neural network unit 14-2 based on the error using, for example, an error propagation method, and uses the updated coefficients at the next time point. It is stored in the memory 20-2 as a coefficient.

位置学習装置２－２は、係数を更新する学習処理を繰り返して学習が完了した場合、メモリ２０－２から係数を読み出し、学習済み係数として外部へ出力する。 When the position learning device 2-2 repeats the learning process of updating the coefficients and the learning is completed, the position learning device 2-2 reads the coefficients from the memory 20-2 and outputs them to the outside as learned coefficients.

このようにして得られた学習済み係数は、図６に示した位置推定装置１－２のメモリ１５－２に格納され、ニューラルネットワーク部１４－２が遊具の位置推定値を求める際に用いられる。 The learned coefficients obtained in this way are stored in the memory 15-2 of the position estimating device 1-2 shown in FIG. 6, and are used when the neural network unit 14-2 calculates the estimated position of the play equipment. .

これにより、位置推定装置１－２が画像内の遊具の位置を推定する際に、人物との干渉等により遊具が完全に隠蔽され、明確に見えていない場合であっても、その位置を推定することが可能となる。 As a result, when the position estimation device 1-2 estimates the position of the play equipment in the image, it can estimate the position even if the play equipment is completely hidden due to interference with a person and cannot be clearly seen. It becomes possible to do so.

以上、実施例１，２を挙げて本発明を説明したが、本発明は前記実施例１，２に限定されるものではなく、その技術思想を逸脱しない範囲で種々変形可能である。前記実施例１，２では、位置推定装置１－１，１－２は画像内の遊具の位置を推定し、位置学習装置２－１，２－２は画像及び遊具の位置を学習データとして学習処理を行うようにした。 Although the present invention has been described above with reference to Examples 1 and 2, the present invention is not limited to the above-mentioned Examples 1 and 2, and can be modified in various ways without departing from the technical concept thereof. In the first and second embodiments, the position estimation devices 1-1 and 1-2 estimate the position of the play equipment in the image, and the position learning devices 2-1 and 2-2 learn the position of the image and the play equipment as learning data. I started processing.

これに対し、本発明は、遊具の位置を推定し、遊具の位置を学習データとすることに限定されるものではなく、遊具以外の物体の位置を推定し、物体の位置を学習データとするようにしてもよい。また、本発明は、物体以外の物、例えば人物の位置を推定し、人物の位置を学習データとするようにしてもよい。 In contrast, the present invention is not limited to estimating the position of play equipment and using the position of the play equipment as learning data, but estimating the position of an object other than play equipment and using the position of the object as learning data. You can do it like this. Further, in the present invention, the position of an object other than an object, such as a person, may be estimated and the position of the person may be used as learning data.

また、前記実施例１の人物位置計測部１０、人物密度マップ演算部１１、人物姿勢計測部１２及び注視度演算部１３、並びに前記実施例２の人物位置計測部１０及び人物密度マップ演算部１１は、球技に参加している全ての人物を対象として、人物毎の位置等を求めるようにした。 Further, the person position measurement unit 10, the person density map calculation unit 11, the person posture measurement unit 12, and the gaze degree calculation unit 13 of the first embodiment, and the person position measurement unit 10 and the person density map calculation unit 11 of the second embodiment The system targeted all the people participating in the ball game and determined the position of each person.

これに対し、人物位置計測部１０、人物密度マップ演算部１１、人物姿勢計測部１２及び注視度演算部１３は、球技に参加しているチーム毎の人物を対象として、チーム毎に、人物毎の位置等を求めるようにしてもよい。この場合、ニューラルネットワーク部１４－１は、チーム毎の人物密度マップ、及びチーム毎かつ位置毎の注視度を入力データとして、学習済み係数を用いたニューラルネットワークの演算を行う。また、ニューラルネットワーク部１４－２は、チーム毎の人物密度マップを入力データとして、学習済み係数を用いたニューラルネットワークの演算を行う。 On the other hand, the person position measurement section 10, the person density map calculation section 11, the person posture measurement section 12, and the gaze degree calculation section 13 target the people of each team participating in the ball game, and It is also possible to find the position, etc. In this case, the neural network unit 14-1 uses the person density map for each team and the degree of gaze for each team and position as input data, and performs neural network calculations using learned coefficients. Further, the neural network unit 14-2 uses the person density map for each team as input data and performs neural network calculations using learned coefficients.

また、前記実施例１のニューラルネットワーク部１４－１は、人物密度マップ及び位置毎の注視度を入力データとして処理を行うようにし、実施例２のニューラルネットワーク部１４－２は、人物密度マップを入力データとして処理を行うようにした。 Further, the neural network unit 14-1 of the first embodiment processes the person density map and the degree of gaze for each position as input data, and the neural network unit 14-2 of the second embodiment processes the person density map and the degree of gaze for each position. Processed as input data.

これに対し、ニューラルネットワーク部１４－１，１４－２は、人物位置計測部１０により計測された人物の位置、または他の構成部により計測された人物の移動速度、人物の移動方向等を入力データとして処理を行うようにしてもよい。この場合、人物速度計測部は、時系列の画像から人物を検出し、人物毎の移動速度を計測し、人物方向計測部は、時系列の画像から人物を検出し、人物毎の移動方向を計測する。 In response, the neural network units 14-1 and 14-2 input the person's position measured by the person position measuring unit 10, the person's moving speed, the person's moving direction, etc. measured by other components. It may also be processed as data. In this case, the person speed measurement unit detects people from the time-series images and measures the moving speed of each person, and the person direction measurement unit detects people from the time-series images and measures the movement direction of each person. measure.

尚、実施例１，２による位置推定装置１－１，１－２及び位置学習装置２－１，２－２のハードウェア構成としては、通常のコンピュータを使用することができる。位置推定装置１－１，１－２及び位置学習装置２－１，２－２は、ＣＰＵ、ＲＡＭ等の揮発性の記憶媒体、ＲＯＭ等の不揮発性の記憶媒体、及びインターフェース等を備えたコンピュータによって構成される。 Note that a normal computer can be used as the hardware configuration of the position estimating devices 1-1, 1-2 and the position learning devices 2-1, 2-2 according to the first and second embodiments. The position estimation devices 1-1, 1-2 and the position learning devices 2-1, 2-2 are computers equipped with a CPU, a volatile storage medium such as a RAM, a non-volatile storage medium such as a ROM, an interface, etc. Consisted of.

位置推定装置１－１に備えた人物位置計測部１０、人物密度マップ演算部１１、人物姿勢計測部１２、注視度演算部１３、ニューラルネットワーク部１４－１及びメモリ１５－１の各機能は、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 The functions of the person position measurement unit 10, person density map calculation unit 11, person posture measurement unit 12, gaze degree calculation unit 13, neural network unit 14-1, and memory 15-1 provided in the position estimation device 1-1 are as follows. Each of these functions is realized by having the CPU execute a program in which these functions are written.

また、位置推定装置１－２に備えた人物位置計測部１０、人物密度マップ演算部１１、ニューラルネットワーク部１４－２及びメモリ１５－２の各機能も、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 In addition, the functions of the person position measurement unit 10, person density map calculation unit 11, neural network unit 14-2, and memory 15-2 provided in the position estimation device 1-2 are also implemented by implementing a program that describes these functions in the CPU. Each is realized by executing them.

また、位置学習装置２－１に備えた人物位置計測部１０、人物密度マップ演算部１１、人物姿勢計測部１２、注視度演算部１３、ニューラルネットワーク部１４－１、メモリ２０－１、誤差演算部２１及び係数更新部２２－１の各機能も、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 Also included in the position learning device 2-1 are a person position measurement section 10, a person density map calculation section 11, a person posture measurement section 12, a gaze degree calculation section 13, a neural network section 14-1, a memory 20-1, and an error calculation section. The functions of the section 21 and the coefficient updating section 22-1 are also realized by causing the CPU to execute a program in which these functions are written.

また、位置学習装置２－２に備えた人物位置計測部１０、人物密度マップ演算部１１、ニューラルネットワーク部１４－２、メモリ２０－２、誤差演算部２１及び係数更新部２２－２の各機能も、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 In addition, each function of the person position measurement unit 10, person density map calculation unit 11, neural network unit 14-2, memory 20-2, error calculation unit 21, and coefficient update unit 22-2 provided in the position learning device 2-2. These functions are also realized by having the CPU execute a program in which these functions are written.

これらのプログラムは、前記記憶媒体に格納されており、ＣＰＵに読み出されて実行される。また、これらのプログラムは、磁気ディスク（フロッピー（登録商標）ディスク、ハードディスク等）、光ディスク（ＣＤ－ＲＯＭ、ＤＶＤ等）、半導体メモリ等の記憶媒体に格納して頒布することもでき、ネットワークを介して送受信することもできる。 These programs are stored in the storage medium, and are read and executed by the CPU. Additionally, these programs can be stored and distributed on storage media such as magnetic disks (floppy (registered trademark) disks, hard disks, etc.), optical disks (CD-ROMs, DVDs, etc.), semiconductor memories, etc., and can be distributed via networks. You can also send and receive messages.

１－１，１－２位置推定装置
２－１，２－２位置学習装置
１０人物位置計測部
１１人物密度マップ演算部
１２人物姿勢計測部
１３注視度演算部
１４－１，１４－２ニューラルネットワーク部
１５－１，１５－２，２０－１，２０－２メモリ
２１誤差演算部
２２－１，２２－２係数更新部 1-1, 1-2 Position estimation device 2-1, 2-2 Position learning device 10 Person position measurement section 11 Person density map calculation section 12 Person posture measurement section 13 Gaze degree calculation section 14-1, 14-2 Neural network Sections 15-1, 15-2, 20-1, 20-2 Memory 21 Error calculation section 22-1, 22-2 Coefficient update section

Claims

In a position estimation device that estimates the position of a predetermined object in an input image,
a person position measurement unit that detects a person based on image characteristics of the input image and measures the position of each person in a predetermined coordinate system;
a person density map calculation unit that calculates a person density for each area in the predetermined coordinate system from the position of each person measured by the person position measurement unit, and generates the person density for each area as a person density map; ,
Using the person density map generated by the person density map calculation unit as input data, a neural network calculation is performed using preset learned coefficients to obtain a position estimate value indicating an estimated value of the position of the predetermined object. A neural network section for obtaining output data ,
The person density map calculation unit calculates the person density for each area by dividing the area where the person can exist into a grid pattern and calculating the number of people existing in each grid from the position of each person. A position estimating device , characterized in that the grids are arranged so as to overlap each other with adjacent grids .

In a position estimation device that estimates the position of a predetermined object in an input image,
a person position measurement unit that detects a person based on image characteristics of the input image and measures the position of each person in a predetermined coordinate system;
a person density map calculation unit that calculates a person density for each area in the predetermined coordinate system from the position of each person measured by the person position measurement unit, and generates the person density for each area as a person density map; ,
a person posture measurement unit that detects the person based on image characteristics of the input image and measures the posture of each person;
A gaze degree calculation that calculates the degree of gaze of all the people to each coordinate position in the predetermined coordinate system as a gaze degree for each position from the posture of each person measured by the person posture measurement unit. Department and
Neural network calculation using preset learned coefficients using the person density map generated by the person density map calculation unit and the degree of gaze for each position calculated by the gaze degree calculation unit as input data. a neural network unit that calculates, as output data, an estimated position value indicating an estimated value of the position of the predetermined object;
A position estimation device comprising :

In a position learning device that inputs an input image and a true position value indicating a true value of a position of a predetermined object corresponding to the input image as learning data, and calculates coefficients of a neural network based on the learning data,
a person position measurement unit that detects a person based on image characteristics of the input image and measures the position of each person in a predetermined coordinate system;
a person density map calculation unit that calculates a person density for each area in the predetermined coordinate system from the position of each person measured by the person position measurement unit, and generates the person density for each area as a person density map; ,
Using the person density map generated by the person density map calculation unit as input data, the neural network is operated using the coefficients, and a position estimate value indicating an estimated value of the position of the predetermined object is obtained as output data. Neural network department and
an error calculation unit that calculates an error between the true position value corresponding to the input image and the estimated position value obtained by the neural network unit;
a coefficient updating unit that updates the coefficients of the neural network based on the error calculated by the error calculation unit ,
The person density map calculation unit calculates the person density for each area by dividing the area where the person can exist into a grid pattern and calculating the number of people existing in each grid from the position of each person. A position learning device , characterized in that the grids are arranged so as to overlap with adjacent grids .

In a position learning device that inputs an input image and a true position value indicating a true value of a position of a predetermined object corresponding to the input image as learning data, and calculates coefficients of a neural network based on the learning data,
a person position measurement unit that detects a person based on image characteristics of the input image and measures the position of each person in a predetermined coordinate system;
a person density map calculation unit that calculates a person density for each area in the predetermined coordinate system from the position of each person measured by the person position measurement unit, and generates the person density for each area as a person density map; ,
a person posture measurement unit that detects the person based on image characteristics of the input image and measures the posture of each person;
A gaze degree calculation that calculates the degree of gaze of all the people to each coordinate position in the predetermined coordinate system as a gaze degree for each position from the posture of each person measured by the person posture measurement unit. Department and
Using the person density map generated by the person density map calculation unit and the gaze degree for each position calculated by the gaze degree calculation unit as input data, the neural network is operated using the coefficients, and the neural network is calculated using the coefficients. a neural network unit that obtains, as output data, an estimated position value indicating an estimated value of the position of the predetermined object;
an error calculation unit that calculates an error between the true position value corresponding to the input image and the estimated position value obtained by the neural network unit;
a coefficient updating unit that updates the coefficients of the neural network based on the error calculated by the error calculation unit;
A position learning device characterized by comprising:

A program for causing a computer to function as the position estimating device according to claim 1 or 2.

A program for causing a computer to function as the position learning device according to claim 3 or 4.