JP7152651B2

JP7152651B2 - Program, information processing device, and information processing method

Info

Publication number: JP7152651B2
Application number: JP2018092913A
Authority: JP
Inventors: 聡田辺
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-05-14
Filing date: 2018-05-14
Publication date: 2022-10-13
Anticipated expiration: 2038-05-14
Also published as: JP2019200456A

Description

本発明は、プログラム、情報処理装置、及び情報処理方法に関する。 The present invention relates to a program, an information processing apparatus, and an information processing method.

従来から、視線方向を推定する技術がある。例えば、アパレルショップやスーパーの店内に設置された監視カメラで撮影した人物の画像から、人物の視線を推定する、などである。これにより、例えば、群衆がどのような商品に注目しているか、を経営者などは把握することができ、売り上げ戦略を策定することが可能となる場合がある。また、例えば、セキュリティ用の都市監視として、標識やサイネージの近傍に設置された監視カメラで撮影した人物画像から視線方向を推定することで、標識などの設置効果の調査や、群衆行動の把握などに利用される場合がある。 Conventionally, there are techniques for estimating the line-of-sight direction. For example, the line of sight of a person is estimated from an image of the person captured by a surveillance camera installed in an apparel shop or supermarket. As a result, for example, a manager or the like can grasp what kind of products the crowd is paying attention to, and it may be possible to formulate a sales strategy. In addition, for example, for city surveillance for security purposes, by estimating the line-of-sight direction from images of people captured by surveillance cameras installed near signs and signage, it is possible to investigate the effects of installing signs and understand crowd behavior. may be used for

視線方向を推定する技術として、例えば、以下がある。すなわち、撮影手段で撮影された現時刻の画像フレームに基づいて、眼球の３次元モデルを利用して特定人物の眼球中心の３次元位置を推定するとともに、特定人物の虹彩の位置を検出し、眼球中心と虹彩位置に基づいて視線方向を推定する技術がある。 Techniques for estimating the line-of-sight direction include, for example, the following. That is, based on the image frame at the current time taken by the photographing means, the three-dimensional position of the center of the eyeball of the specific person is estimated using the three-dimensional model of the eyeball, and the position of the iris of the specific person is detected, There is a technique for estimating the line-of-sight direction based on the center of the eyeball and the position of the iris.

この技術によれば、顔の向きの制限を緩和して、比較的少数のカメラにより、観測範囲内の任意の位置における被測定対象者の視線方向をリアルタイムに推定し追跡することができる、とされる。 According to this technology, it is possible to estimate and track the line-of-sight direction of the person to be measured at any position within the observation range in real time, using a relatively small number of cameras, by relaxing restrictions on the orientation of the face. be done.

特開２０１２－２１６１８０号公報JP 2012-216180 A

上述した、眼球中心と虹彩位置に基づいて視線方向を推定する技術は、例えば、人物の顔が複数の撮影手段で撮影されることが条件となっている。従って、顔が隠れた人物の画像の場合、上述した技術では、その人物の視線方向を推定することができない場合がある。 The above-described technique for estimating the line-of-sight direction based on the center of the eyeball and the position of the iris requires, for example, that a person's face is photographed by a plurality of photographing means. Therefore, in the case of an image of a person whose face is hidden, the above-described technique may not be able to estimate the line-of-sight direction of the person.

そこで、一開示は、顔が隠れた人物の画像であっても視線方向を推定できるようにしたプログラム、情報処理装置、及び情報処理方法を提供することにある。 Therefore, one disclosure is to provide a program, an information processing apparatus, and an information processing method that enable estimation of a line-of-sight direction even in an image of a person whose face is hidden.

一開示は、入力画像データに対して、人物の部位に関する正解データを利用して、画像に含まれる人物の部位の位置情報を推定し、前記部位のうち顔の部位の位置情報を推定することができなかったとき、推定することができた他の部位の位置情報に基づいて、画像に含まれる人物の視線方向を推定する、処理をコンピュータに実行させるプログラムにある。 One disclosure is to estimate the position information of the parts of the person included in the input image data using the correct data about the parts of the person, and to estimate the position information of the parts of the face among the parts. A program for causing a computer to execute a process of estimating the line-of-sight direction of a person included in an image based on the position information of other parts that could be estimated when the estimation could not be performed.

一開示によれば、顔が隠れた人物の画像であっても視線方向を推定することが可能である。 According to one disclosure, it is possible to estimate the gaze direction even for an image of a person whose face is hidden.

図１は情報処理システムの構成例を表す図である。FIG. 1 is a diagram showing a configuration example of an information processing system. 図２は動作例を表すフローチャートである。FIG. 2 is a flow chart showing an operation example. 図３は部位番号の例を表す図である。FIG. 3 is a diagram showing examples of part numbers. 図４は画像の例を表す図である。FIG. 4 is a diagram showing an example of an image. 図５は姿勢推定処理の例を表すフローチャートである。FIG. 5 is a flowchart showing an example of posture estimation processing. 図６（Ａ）は姿勢推定部の構成例、図６（Ｂ）は画像データの例、図６（Ｃ）は各部位の確率分布の例をそれぞれ表す図である。FIG. 6A is a diagram showing a configuration example of a posture estimation unit, FIG. 6B is an example of image data, and FIG. 6C is a diagram showing an example of probability distribution of each part. 図７（Ａ）は右手の確率分布の例、図７（Ｂ）は右ひじの確率分布の例、図７（Ｃ）は右手と右ひじのつながり度合いの確率分布の例をそれぞれ表す図である。7A shows an example of the probability distribution of the right hand, FIG. 7B shows an example of the probability distribution of the right elbow, and FIG. 7C shows an example of the probability distribution of the degree of connection between the right hand and the right elbow. be. 図８は注目度算出処理の例を表すフローチャートである。FIG. 8 is a flowchart showing an example of attention level calculation processing. 図９（Ａ）は部位の例、図９（Ｂ）は向きベクトルの例をそれぞれ表す図である。FIG. 9A is a diagram showing an example of a part, and FIG. 9B is a diagram showing an example of a direction vector. 図１０（Ａ）は部位の例、図１０（Ｂ）はベクトルの例、図１０（Ｃ）は向きベクトルの例をそれぞれ表す図である。10A is an example of a part, FIG. 10B is an example of a vector, and FIG. 10C is an example of a direction vector. 図１１はカウント処理の例を表すフローチャートである。FIG. 11 is a flowchart showing an example of counting processing. 図１２は３次元位置座標の例を表す図である。FIG. 12 is a diagram showing an example of three-dimensional position coordinates. 図１３は姿勢推定部の構成例を表す図である。FIG. 13 is a diagram showing a configuration example of a posture estimation unit. 図１４は動作例を表すフローチャートである。FIG. 14 is a flow chart showing an operation example. 図１５（Ａ）は同一人物特定処理、図１５（Ｂ）は類似度計算処理の例をそれぞれ表す図である。FIG. 15A is a diagram showing an example of same person identification processing, and FIG. 15B is a diagram showing an example of similarity calculation processing. 図１６は注目度算出処理の動作例を表すフローチャートである。FIG. 16 is a flowchart showing an operation example of attention degree calculation processing. 図１７は情報処理システムの構成例を表す図である。FIG. 17 is a diagram showing a configuration example of an information processing system. 図１８は動作例を表すフローチャートである。FIG. 18 is a flow chart showing an operation example. 図１９は注目度変化検出処理の例を表すフローチャートである。FIG. 19 is a flowchart showing an example of attention degree change detection processing. 図２０は注目度ベクトルの時系列の例を表す図である。FIG. 20 is a diagram showing an example of a time series of attention level vectors. 図２１（Ａ）と図２１（Ｃ）は画像の例、図２１（Ｂ）と図２１（Ｄ）は注目度ベクトルの例をそれぞれ表す図である。FIGS. 21A and 21C are examples of images, and FIGS. 21B and 21D are examples of interest vectors. 図２２は情報処理装置のハードウェア構成例を表す図である。FIG. 22 is a diagram showing a hardware configuration example of an information processing apparatus.

以下、本発明を実施するための形態について説明する。なお、以下の実施例は開示の技術を限定するものではない。そして、各実施の形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 EMBODIMENT OF THE INVENTION Hereinafter, the form for implementing this invention is demonstrated. It should be noted that the following examples do not limit the technology disclosed. Further, each embodiment can be appropriately combined within a range in which the processing contents are not inconsistent.

［第１の実施の形態］
＜情報処理システムの構成例＞
図１は情報処理システム１０の構成例を表す図である。 [First embodiment]
<Configuration example of information processing system>
FIG. 1 is a diagram showing a configuration example of an information processing system 10. As shown in FIG.

情報処理システム１０は、情報処理装置１００と撮像装置２００を備える。情報処理装置１００は、撮像装置２００で撮影された画像の画像データを入力し、入力した画像データから、画像に写っている人物の視線方向を推定する。本第１の実施の形態では、情報処理装置１００は、顔が隠れた画像であっても、そのような画像の画像データから視線方向を推定することが可能である。 The information processing system 10 includes an information processing device 100 and an imaging device 200 . The information processing apparatus 100 receives image data of an image captured by the imaging apparatus 200, and estimates the line-of-sight direction of a person appearing in the image from the input image data. In the first embodiment, the information processing apparatus 100 can estimate the line-of-sight direction from the image data of even an image in which the face is hidden.

情報処理装置１００は、姿勢推定部１１０と注目度算出部１２０、空間情報記憶部１３０、及び注目度記憶部１４０を備える。 Information processing apparatus 100 includes posture estimation section 110 , attention level calculation section 120 , spatial information storage section 130 , and attention level storage section 140 .

姿勢推定部１１０は、撮像装置２００から出力された画像データを入力し、入力画像データに基づいて、画像に含まれる人物ｉ（ｉ＝１，２，…，Ｉ）の姿勢ｐ_ｉを推定する。姿勢ｐ_ｉは、例えば、人物ｉの「鼻」、「左目」、「右目」など、各部位の位置情報を含む。姿勢ｐ_ｉの詳細は動作例で説明する。 The posture estimation unit 110 receives the image data output from the imaging device 200, and estimates the posture p _i of the person i (i=1, 2, . . . , I) included in the image based on the input image data. . The posture pi includes position information of each part such as the "nose", "left eye", and "right eye" of the person _i . The details of the posture _pi will be described in an operation example.

本第１の実施の形態においては、姿勢推定部１１０は、例えば、入力画像データに基づいて、人物の部位に関するモデルデータ（又は正解データ、或いは教師データ。以下では、これら３つを区別しないで用いる場合がある。）を利用して、姿勢ｐ_ｉを生成する。姿勢推定部１１０における姿勢ｐ_ｉの推定処理については動作例で説明する。姿勢推定部１１０は、推定した姿勢ｐ_ｉを注目度算出部１２０へ出力する。 In the first embodiment, posture estimating section 110 generates model data (or correct data, or teacher data) relating to parts of a person based on input image data, for example. ) is used to generate the pose p _i . The estimation processing of posture p _i in posture estimation section 110 will be described with an operation example. Posture estimation section 110 outputs the estimated posture p _i to attention degree calculation section 120 .

注目度算出部１２０は、姿勢ｐ_ｉに含まれる位置情報を利用して、画像に含まれる人物ｉの視線方向を推定する。具体的には、注目度算出部１２０は、例えば、姿勢推定部１１０で推定対象となっている人物ｉの部位のうち、顔の部位の位置情報を推定することができなかったとき、推定することができた顔以外の他の部位の位置情報に基づいて、画像に含まれる人物ｉの視線方向を推定する。 The attention degree calculation unit 120 estimates the line-of-sight direction of the person _i included in the image using the position information included in the posture pi. Specifically, for example, when the posture estimating unit 110 cannot estimate the position information of the parts of the face among the parts of the person i to be estimated by the posture estimating unit 110, the attention level calculating unit 120 estimates The line-of-sight direction of the person i included in the image is estimated based on the position information of other parts other than the face.

例えば、撮像装置２００で撮像された画像には、障害物や視線方向などによって人物ｉの顔などの部位が写っていない場合もある。従って、姿勢推定部１１０で推定された姿勢ｐ_ｉには、顔の部位（例えば、目や鼻など）の位置情報が含まれない場合がある。注目度算出部１２０では、そのような姿勢推定部１１０で推定することができなかった顔の部位の位置情報を、姿勢推定部１１０で推定できた他の部位の位置情報を利用して算出し、これらの位置情報に基づいて、人物ｉの視線方向を推定するようにしている。 For example, the image captured by the imaging device 200 may not show a part such as the face of the person i due to obstacles, line-of-sight direction, or the like. Therefore, the pose p _i estimated by the pose estimation unit 110 may not include position information of facial parts (eg, eyes, nose, etc.). Attention degree calculation section 120 calculates the position information of the part of the face that could not be estimated by posture estimation section 110 using the position information of other parts that could be estimated by posture estimation section 110 . , the line-of-sight direction of the person i is estimated based on the position information.

なお、注目度算出部１２０は、例えば、人物ｉの向きベクトルｑ_ｉを計算することで、人物ｉの視線方向を推定する。本第１の実施の形態では、向きベクトルｑ_ｉのことを、例えば、注目度と称する場合がある。以下では、注目度、向きベクトルｑ_ｉ、及び視線方向を区別しないで用いる場合がある。注目度算出処理の詳細については、動作例で説明する。注目度算出部１２０は、算出した注目度を注目度記憶部１４０に記憶する。 Note that the attention level calculation unit 120 estimates the line-of-sight direction of the person i, for example, by calculating the direction vector q _i of the person i. In the first embodiment, the orientation vector _qi may be referred to as, for example, the attention level. Below, attention level, orientation vector q _i , and line-of-sight direction may be used without distinction. Details of the attention level calculation process will be described in an operation example. The attention degree calculation unit 120 stores the calculated attention degree in the attention degree storage unit 140 .

また、注目度算出部１２０は、空間情報記憶部１３０から、各対象物３００－１，３００－２の位置情報を読み出す。そして、注目度算出部１２０は、算出した向きベクトルｑ_ｉと、各対象物３００－１，３００－２の位置情報とに基づいて、各対象物３００－１，３００－２に視線を向けている人物ｉの数をカウントする。注目度算出部１２０は、カウントしたカウント値を注目度記憶部１４０に記憶する。カウント処理の詳細も、動作例で説明する。 Also, the attention level calculation unit 120 reads the position information of each of the objects 300-1 and 300-2 from the space information storage unit 130. FIG. Then, based on the calculated orientation vector q _i and the position information of each of the objects 300-1 and 300-2, the attention level calculation unit 120 directs the line of sight to each of the objects 300-1 and 300-2. Count the number of persons i present. The attention level calculation unit 120 stores the counted count value in the attention level storage unit 140 . The details of the counting process will also be explained with an operation example.

空間情報記憶部１３０は、例えば、メモリであって、各対象物３００－１，３００－２の位置情報を記憶する。位置情報としては、例えば、各対象物３００－１，３００－２の設置点とその周囲の領域を、２次元座標（ｘ，ｙ）で表されたものであってもよい。 The spatial information storage unit 130 is, for example, a memory and stores position information of each of the objects 300-1 and 300-2. As the positional information, for example, the installation points of the objects 300-1 and 300-2 and their surrounding areas may be represented by two-dimensional coordinates (x, y).

注目度記憶部１４０は、例えば、メモリであって、注目度算出部１２０で算出された注目度を記憶する。 The attention level storage unit 140 is, for example, a memory, and stores the attention level calculated by the attention level calculation unit 120 .

撮像装置２００は、例えば、１又は複数の人物を撮影し、撮影した画像を画像データとして情報処理装置１００へ出力する。図１の例では、撮像装置２００は、対象物３００－１，３００－２と、対象物３００－１，３００－２へ視線方向を向けている複数の人物とを含む画像を撮影する。 The imaging device 200, for example, photographs one or more persons and outputs the photographed image to the information processing device 100 as image data. In the example of FIG. 1, the imaging device 200 captures an image including objects 300-1 and 300-2 and a plurality of persons facing the objects 300-1 and 300-2.

なお、図１の例では、撮像装置２００は情報処理装置１００の外部に配置される例を表しているが、撮像装置２００は、例えば、撮像部として、情報処理装置１００内に設けられてもよい。また、図１の例では、撮像装置２００は、１台の例を表しているが、複数台あってもよい。複数台の例は、第２の実施の形態で説明する。 In the example of FIG. 1, the imaging device 200 is arranged outside the information processing device 100, but the imaging device 200 may be provided inside the information processing device 100 as an imaging unit, for example. good. Also, in the example of FIG. 1, the imaging device 200 represents an example of one unit, but there may be a plurality of units. An example of multiple units will be described in the second embodiment.

＜動作例＞
図２は情報処理システム１０の動作例を表すフローチャートである。 <Operation example>
FIG. 2 is a flow chart showing an operation example of the information processing system 10 .

撮像装置２００と情報処理装置１００は処理を開始すると（Ｓ１０）、撮像装置２００は群衆を撮影する（Ｓ１１）。例えば、撮像装置２００は、図１に示すように、複数の人物（以下、「群衆」と称する場合がある、）が写った画像を撮影し、撮影した画像データを情報処理装置１００へ出力する。 When the imaging device 200 and the information processing device 100 start processing (S10), the imaging device 200 photographs the crowd (S11). For example, as shown in FIG. 1, the imaging device 200 captures an image of a plurality of people (hereinafter sometimes referred to as a “crowd”) and outputs the captured image data to the information processing device 100. .

次に、情報処理装置１００は、画像データに基づいて、撮像装置２００で撮影された画像に写っている人物ｉの姿勢ｐ_ｉを推定する（Ｓ１２）。姿勢ｐ_ｉは、例えば、以下の式（１）で表される。 Next, the information processing device 100 estimates the posture pi of the person _i appearing in the image captured by the imaging device 200 based on the image data (S12). The attitude p _i is represented by the following equation (1), for example.

式（１）において、ｘ_ｊ ^ｉは画像内における人物ｉの部位ｊのｘ座標、ｙ_ｊ ^ｉは画像内における人物ｉの部位ｊのｙ座標をそれぞれ表す。また、ｖ_ｊ ^ｉは、人物ｉの部位ｊが画像内で視えている（又は写っている、或いは含まれる）ときは「１」、視えていないときは「０」の値をとるパラメータである。 In equation (1), x _j ⁱ represents the x-coordinate of part j of person i in the image, and y _j ⁱ represents the y-coordinate of part j of person i in the image. Also, v _j ⁱ is a parameter that takes a value of “1” when the part j of the person i is visible (or captured or included) in the image, and takes a value of “0” when it is not visible. .

図３は部位番号と部位との関係例を表す図である。図３に示すように、各部位ｊには、部位番号が予め割り振られている。図３の例では、部位ｊ＝１のときは「鼻」を表し、部位ｊ＝６のときは「首」を表す。図３は一例であって他の番号が割り振られてもよい。 FIG. 3 is a diagram showing an example of the relationship between part numbers and parts. As shown in FIG. 3, a part number is assigned in advance to each part j. In the example of FIG. 3, when part j=1, it represents "nose", and when part j=6, it represents "neck". FIG. 3 is an example and other numbers may be assigned.

図４は撮像された画像の例を表す図である。図４の例では、人物ｉ＝１の右手の部位（ｊ＝１２）は画像に写っているが、左肘の部位（ｊ＝９）は障害物により画像に写っていない。そのため、姿勢ｐ_ｉ＝（・・・ｘ_９ ^１ｙ_９ ^１０・・・ｘ_１２ ^１ｙ_１２ ^１１・・・）となり得る。例えば、姿勢推定部１１０は、入力画像データに基づいて、このような姿勢ｐ_ｉを推定する。以下では、姿勢推定処理（Ｓ１２）の詳細について説明する。なお、以下では、姿勢ｐ_ｉと姿勢ベクトルｐ_ｉとを区別しないで用いる場合がある。 FIG. 4 is a diagram showing an example of a captured image. In the example of FIG. 4, the right hand part (j=12) of person i=1 is shown in the image, but the left elbow part (j=9) is not shown in the image due to an obstacle. Therefore, the posture p _i =(... x ₉ ¹ y ₉ ¹ 0... x ₁₂ ¹ y ₁₂ ¹ 1...) can be obtained. For example, posture estimation section 110 estimates such posture p _i based on input image data. Details of the posture estimation processing (S12) will be described below. It should be noted that, hereinafter, the posture p _i and the posture vector p _i may be used without distinction.

＜姿勢推定処理＞
図５は姿勢推定処理（Ｓ１２）の例を表すフローチャートである。また、図６（Ａ）は姿勢推定部１１０の構成例を表す図である。図５の各処理を説明しながら、図６（Ａ）について説明する。なお、図６（Ａ）に示すように、姿勢推定部１１０は、ＣＮＮ（Convolutional neural network）処理部１１１と候補点算出部１１２、及びグルーピング処理部１１３を備える。 <Posture estimation processing>
FIG. 5 is a flowchart showing an example of posture estimation processing (S12). FIG. 6A is a diagram showing a configuration example of posture estimation section 110. As shown in FIG. FIG. 6A will be described while describing each process in FIG. In addition, as shown in FIG. 6A, the posture estimation unit 110 includes a CNN (Convolutional neural network) processing unit 111 , a candidate point calculation unit 112 , and a grouping processing unit 113 .

図５に示すように、姿勢推定部１１０は、姿勢推定処理を開始すると（Ｓ１２０）、部位ｊ（ｊ＝１，２，…，Ｊ）の確率分布（又はヒートマップ）を計算する（Ｓ１２１）。ＣＮＮ処理部１１１では、例えば、畳み込みニューラルネットワーク（以下、「ＣＮＮ」と称する場合がある。）を用いた公知の手法により、確率分布φ（Ｘ，Ｗ）を計算する。Ｗは、例えば、ＣＮＮ処理に利用されるパラメータを表す。例えば、ＣＮＮ処理部１１１は、以下の処理を行う。 As shown in FIG. 5, when posture estimation processing is started (S120), posture estimation section 110 calculates a probability distribution (or heat map) of part j (j=1, 2, . . . , J) (S121). . The CNN processing unit 111 calculates the probability distribution φ(X, W) by, for example, a known method using a convolutional neural network (hereinafter sometimes referred to as “CNN”). W represents a parameter used for CNN processing, for example. For example, the CNN processing unit 111 performs the following processing.

すなわち、ＣＮＮ処理部１１１は、ある画像データに対して、右手の正解データを利用して、フィルタリング処理（又は畳み込み処理）を行う。そして、ＣＮＮ処理部１１１は、フィルタリング後のデータに対して、複数画像を含むブロック毎の代表値（又は最大値）を抽出するプーリング処理を施し、以後、フィルタリング処理とプーリング処理を繰り返すことで、右手（ｊ＝１２）の確率分布の正解データを生成する。次に、ＣＮＮ処理部１１１は、図６（Ｂ）に示すようにＲＧＢ（Red Green Blue）の画像データＸを入力する。ＣＮＮ処理部１１１は、右手（ｊ＝１２）の確率分布の正解データを利用して、入力画像データＸに対して、フィルタリング処理とプーリング処理を繰り返すことで、右手の確率分布φ（Ｘ，Ｗ）を得る。 That is, the CNN processing unit 111 performs filtering processing (or convolution processing) on certain image data using the right-hand correct data. Then, the CNN processing unit 111 performs pooling processing for extracting a representative value (or maximum value) for each block containing a plurality of images from the data after filtering, and thereafter repeats the filtering processing and the pooling processing. Correct data of the right-hand (j=12) probability distribution are generated. Next, the CNN processing unit 111 inputs RGB (Red Green Blue) image data X as shown in FIG. 6B. The CNN processing unit 111 repeats the filtering process and the pooling process on the input image data X using the correct data of the right-hand (j=12) probability distribution, thereby obtaining the right-hand probability distribution φ(X, W ).

図６（Ｃ）は右手の確率分布φ（Ｘ，Ｗ）の例を表す図である。例えば、確率分布φ（Ｘ，Ｗ）は、画素毎（又は複数画素を含むブロック毎）に「０」から「１」までの数値（又は確率値）を表すものとして表現される。 FIG. 6C is a diagram showing an example of the right-hand probability distribution φ(X, W). For example, the probability distribution φ(X, W) is expressed as a numerical value (or probability value) from "0" to "1" for each pixel (or each block containing a plurality of pixels).

なお、以下では、フィルタリング処理とプーリング処理を繰り返す処理のことを、例えば、ＣＮＮ処理と称する場合がある。 In addition, below, the process which repeats a filtering process and a pooling process may be called CNN process, for example.

同様に、ＣＮＮ処理部１１１は、右肘（ｊ＝１０）の正解データを利用して、ＣＮＮ処理により、右肘の確率分布の正解データを得る。そして、ＣＮＮ処理部１１１は、右肘の確率分布の正解データを利用して、ＣＮＮ処理により、入力画像データＸから、右肘の確率分布φ（Ｘ，Ｗ）を得る。 Similarly, the CNN processing unit 111 uses the correct data of the right elbow (j=10) to obtain the correct data of the probability distribution of the right elbow by CNN processing. Then, the CNN processing unit 111 obtains the probability distribution φ(X, W) of the right elbow from the input image data X by CNN processing using the correct data of the probability distribution of the right elbow.

なお、ＣＮＮ処理部１１１は、各部位の確率分布φ（Ｘ，Ｗ）だけではなく、各部位のつながり度合いを表す確率分布φ（Ｘ，Ｗ）を算出する。ＣＮＮ処理部１１１では、ＣＮＮ処理に利用した、各部位の正解データから、各部位を接続させた正解データを得ることができる。そして、ＣＮＮ処理部１１１は、ある画像データから、各部位を接続させた正解データを利用して、ある画像データに対してＣＮＮ処理を行うことで、各部位を接続させた正解データの確率分布を得て、入力画像Ｘに対して、この確率分布を利用してＣＮＮ処理を行うことで、各部位のつながり度合いを表す確率分布φ（Ｘ，Ｗ）を算出することができる。 Note that the CNN processing unit 111 calculates not only the probability distribution φ(X, W) of each part but also the probability distribution φ(X, W) representing the degree of connection of each part. The CNN processing unit 111 can obtain correct data in which each part is connected from the correct data of each part used for the CNN processing. Then, the CNN processing unit 111 performs CNN processing on certain image data using correct data in which each part is connected from certain image data, thereby obtaining a probability distribution of correct data in which each part is connected. By performing CNN processing on the input image X using this probability distribution, it is possible to calculate a probability distribution φ(X, W) representing the degree of connection between parts.

このように、ＣＮＮ処理部１１１は、各部位ｊの正解データを利用して、画像データＸに対して、畳み込み処理などを施すことで、各部位ｊの確率分布φ（Ｘ，Ｗ）を計算する。 In this way, the CNN processing unit 111 uses the correct data of each part j to perform convolution processing or the like on the image data X, thereby calculating the probability distribution φ(X, W) of each part j. do.

ＣＮＮ処理部１１１は、Ｓ１２１において、例えば、以下の処理を行う。すなわち、ＣＮＮ処理部１１１は、内部メモリに記憶された各部位ｊの正解データを読み出して、この正解データを利用して、ある画像データに対してＣＮＮ処理を行い、各部位ｊの確率分布の正解データを計算し、内部メモリに記憶する。ＣＮＮ処理部１１１は、このような各部位ｊの確率分布の正解データを、Ｓ１２１の処理の前に計算し、内部メモリに記憶させておいてもよい。そして、ＣＮＮ処理部１１１は、入力画像データＸに対して、内部メモリから読み出した各部位ｊの確率分布の正解データを利用して、ＣＮＮ処理を行うことで、各部位の確率分布φ（Ｘ，Ｗ）を得る。 The CNN processing unit 111 performs, for example, the following processing in S121. That is, the CNN processing unit 111 reads the correct data of each part j stored in the internal memory, uses this correct data, performs CNN processing on certain image data, and obtains the probability distribution of each part j. Correct answer data is calculated and stored in internal memory. The CNN processing unit 111 may calculate such correct data of the probability distribution of each part j before the process of S121 and store it in the internal memory. Then, the CNN processing unit 111 performs CNN processing on the input image data X using the correct data of the probability distribution of each part j read out from the internal memory, thereby obtaining the probability distribution φ(X , W).

ＣＮＮ処理部１１１は、正解データを用いたときの各部位ｊの部位番号と、その部位ｊの確率分布φ（Ｘ，Ｗ）とを候補点算出部１１２へ出力する。 The CNN processing unit 111 outputs the part number of each part j when using the correct data and the probability distribution φ(X, W) of the part j to the candidate point calculation unit 112 .

以上、各部位ｊの確率分布φ（Ｘ，Ｗ）の計算例について説明した。このような畳み込みニューラルネットワークを用いた手法として、例えば、Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields", In CVPR 2017に開示された手法がある。ＣＮＮ処理部１１１は、これを用いて、各部位ｊの確率分布φ（Ｘ，Ｗ）を計算してもよい。また、各部位の確率分布φ（Ｘ，Ｗ）の計算例としては、畳み込みニューラルネットワーク以外にも、テンプレートマッチングを用いた公知の手法が用いられてもよい。テンプレートマッチングは、例えば、各部位のモデルデータと比較して、入力画像の各部位の確率分布φ（Ｘ，Ｗ）を算出する手法である。 An example of calculation of the probability distribution φ(X, W) of each part j has been described above. As a method using such a convolutional neural network, for example, the method disclosed in Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields", In CVPR 2017 There is The CNN processing unit 111 may use this to calculate the probability distribution φ(X, W) of each part j. As an example of calculation of the probability distribution φ(X, W) of each part, a known method using template matching may be used other than the convolutional neural network. Template matching is, for example, a method of calculating the probability distribution φ(X, W) of each part of the input image by comparing with model data of each part.

図５に戻り、次に、姿勢推定部１１０は、部位位置の候補点を求める（Ｓ１２２）。例えば、図６（Ａ）に示すように、候補点算出部１１２は、ＣＮＮ処理部１１１から出力された各部位の確率分布φ（Ｘ，Ｗ）に基づいて、候補点を算出する。 Returning to FIG. 5, next, posture estimation section 110 obtains candidate points for part positions (S122). For example, as shown in FIG. 6A, the candidate point calculator 112 calculates candidate points based on the probability distribution φ(X, W) of each part output from the CNN processor 111 .

図７（Ａ）は、右手（ｊ＝１２）の確率分布φ（Ｘ，Ｗ）に対して右手の候補点Ａ１，Ａ２を探索する例を表す図である。候補点算出部１１２は、右手の確率分布φ（Ｘ，Ｗ）において、あるブロックの確率が極大点となっている位置座標Ａ１，Ａ２を探索する。この位置座標Ａ１，Ａ２が右手の候補点となる。また、図７（Ｂ）に示すように、候補点算出部１１２は、右肘の確率分布確率分布φ（Ｘ，Ｗ）において、確率が極大点となっている位置座標Ｂ１，Ｂ２を探索する。位置座標Ｂ１，Ｂ２が右肘の候補点となる。候補点算出部１１２は、各部位の部位番号とその候補点とをグルーピング処理部１１３へ出力する。 FIG. 7A is a diagram showing an example of searching for candidate points A1 and A2 on the right hand with respect to the probability distribution φ(X, W) on the right hand (j=12). Candidate point calculation section 112 searches for position coordinates A1 and A2 at which the probability of a certain block is the maximum point in right-hand probability distribution φ(X, W). These position coordinates A1 and A2 are candidate points on the right hand side. Further, as shown in FIG. 7B, the candidate point calculation unit 112 searches for the position coordinates B1 and B2 at which the probability is the maximum point in the probability distribution φ(X, W) of the right elbow. . Position coordinates B1 and B2 are candidate points for the right elbow. Candidate point calculation section 112 outputs the part number of each part and its candidate points to grouping processing section 113 .

なお、姿勢推定部１１０は、例えば、極大点が閾値以上のとき、そのような極大点をその部位の候補点とし、極大点が閾値より小さいときは、その部位が入力画像に写っていないと判定する。前者の場合、姿勢推定部１１０は、ｖ_ｊ ^ｉ＝１（又は視えている部位）とし、後者の場合、ｖ_ｊ ^ｉ＝０（又は視えていない部位）とする。姿勢推定部１１０は、前者の場合に以降の処理を行い、後者の場合はここで姿勢推定処理（Ｓ１２）を終了する。 For example, when the local maximum point is equal to or greater than a threshold, posture estimation section 110 regards such a local maximum point as a candidate point for the part, and when the local maximum point is smaller than the threshold, it assumes that the part is not shown in the input image. judge. In the former case, posture estimation section 110 sets v _j ⁱ =1 (or the visible part), and in the latter case, v _j ⁱ =0 (or the non-visible part). Posture estimation section 110 performs subsequent processing in the former case, and terminates posture estimation processing (S12) here in the latter case.

図５に戻り、次に、姿勢推定部１１０は、部位位置の候補点ごとにグルーピングを行い（Ｓ１２３）、人物ｉごとに部位を特定する。例えば、図６（Ａ）のグルーピング処理部１１３においてこのようなグルーピングが行われる。グルーピング処理部１１３は、例えば、部位ｊの候補点間の距離を比較することで、各候補点をグループ化する。具体的には、グルーピング処理部１１３は、例えば、ＣＮＮ処理部１１１においてＣＮＮ処理が施された、部位と部位とのつながり度合いを示す確率分布φ（Ｘ，Ｗ）を用いて処理を行う。 Returning to FIG. 5, next, posture estimation section 110 performs grouping for each part position candidate point (S123), and identifies a part for each person i. For example, such grouping is performed in the grouping processing unit 113 of FIG. 6(A). The grouping processing unit 113 groups the candidate points by, for example, comparing the distances between the candidate points of the part j. Specifically, the grouping processing unit 113 performs processing using, for example, the probability distribution φ(X, W) indicating the degree of connection between parts, which has been subjected to CNN processing in the CNN processing unit 111 .

図７（Ｃ）は右手と右肘のつながり度合いの確率分布φ（Ｘ，Ｗ）の例を表す図である。候補点算出部１１２において、右手の候補点Ａ１，Ａ２（図７（Ａ））と右肘の候補点Ｂ１，Ｂ２（図７（Ｂ））を得ている。グルーピング処理部１１３は、これらの候補点Ａ１，Ａ２，Ｂ１，Ｂ２に対して、Ａ１とＢ１とが同一人物に属する部位であるのか、Ａ１とＢ２とが同一人物に属する部位であるのかを、右手と右肘のつながり度合いの確率分布に基づいて、判別する。例えば、グルーピング処理部１１３は、以下の式を利用して、各組み合わせのつながり度合いを計算する。 FIG. 7C is a diagram showing an example of the probability distribution φ(X, W) of the degree of connection between the right hand and the right elbow. Candidate point calculation section 112 obtains candidate points A1 and A2 for the right hand (FIG. 7A) and candidate points B1 and B2 for the right elbow (FIG. 7B). The grouping processing unit 113 determines whether A1 and B1 belong to the same person or whether A1 and B2 belong to the same person. The determination is made based on the probability distribution of the degree of connection between the right hand and the right elbow. For example, the grouping processing unit 113 uses the following formula to calculate the degree of connection of each combination.

式（２）と式（３）において、ｙ₁は、例えば、図７（Ｃ）に示すように、人物ｉ＝１の右手と右肘のつながり度合いを示す確率分布上の集合を表す。ここで、候補点Ａ１とＢ１のつながり度合いは、例えば、集合ｙ_１に対して、候補点Ａ１とＢ１とを結んだ線分の線積分の数値を表している。また、候補点Ａ１とＢ２とのつながり度合いは、例えば、集合ｙ_１に対して、候補点Ａ１とＢ２とを結んだ線分の線積分の数値を表している。グルーピング処理部１１３は、式（２）と式（３）の数値に対して、大きい方、例えば、候補点Ａ１とＢ１とを選択し、選択した候補点Ａ１とＢ１とをグループ化する。同様に、グルーピング処理部１１３は、人物ｉ＝２の右手と右肘のつながり度合いを示す確率分布をＣＮＮ処理部１１１から候補点算出部１１２を介して取得し、式（２）と式（３）において、ｙ_１をｙ_２に代えた式を用いて、その数値の大きい方を選択して、グループ化する。この場合、グルーピング処理部１１３は、Ａ２とＢ２とをグループ化する。 In equations (2) and (3), y ₁ represents, for example, a set on a probability distribution indicating the degree of connection between the right hand and right elbow of person i=1, as shown in FIG. 7(C). Here, the degree of connection between the candidate points A1 and B1 represents, for example, the numerical value of the line integral connecting the candidate points A1 and _B1 with respect to the set y1. The degree of connection between the candidate points A1 and B2 represents, for example, the numerical value of the line integral connecting the candidate _points A1 and B2 with respect to the set y1. The grouping processing unit 113 selects, for example, the candidate points A1 and B1, which are larger than the numerical values of the equations (2) and (3), and groups the selected candidate points A1 and B1. Similarly, the grouping processing unit 113 acquires the probability distribution indicating the degree of connection between the right hand and the right elbow of the person i=2 from the CNN processing unit 111 via the candidate point calculation unit 112, and ), using the formula in which y ₁ is replaced with y ₂ , the larger value is selected and grouped. In this case, the grouping processing unit 113 groups A2 and B2.

このように、ＣＮＮ処理部１１１では、各部位のつながり度合いを示す確率分布を計算し、グルーピング処理部１１３は、このような確率分布と、候補点算出部１１２で算出された各候補点の組み合わせとに対して、線積分の計算を行う。そして、グルーピング処理部１１３は、その計算結果が最も大きい候補点の組み合わせをグループ化する。グルーピング処理部１１３は、グループ化した各候補点により、人物ｉの各部位を特定することができる。 In this way, the CNN processing unit 111 calculates a probability distribution indicating the degree of connection of each part, and the grouping processing unit 113 combines such a probability distribution with each candidate point calculated by the candidate point calculation unit 112. Calculate the line integral for Then, the grouping processing unit 113 groups the combination of candidate points with the largest calculation result. The grouping processing unit 113 can specify each part of the person i by each grouped candidate point.

ＣＮＮ処理部１１１は、Ｓ１２３において、例えば、以下の処理を行う。すなわち、ＣＮＮ処理部１１１は、内部メモリから式（２）と式（３）を読み出し、各部位のつながり度合いを示す確率分布の集合を式（２）と式（３）に代入し、各線分の線積分の数値を得る。そして、ＣＮＮ処理部１１１は、最も大きい数値となっている候補点の組み合わせを１つのまとめることで、グループ化する。 The CNN processing unit 111 performs, for example, the following processing in S123. That is, the CNN processing unit 111 reads out the equations (2) and (3) from the internal memory, substitutes a set of probability distributions indicating the degree of connection of each part into the equations (2) and (3), and obtains each line segment Get the numerical value of the line integral of . Then, the CNN processing unit 111 groups the combination of candidate points having the largest numerical value into one.

図５に戻り、次に、姿勢推定部１１０は、グループ化した人物ｉごとにその姿勢ｐ_ｉを取得する（Ｓ１２４）。例えば、図６（Ａ）のグルーピング処理部１１３は、グループ化した各部位の候補点（又は位置座標）を、式（１）に示された姿勢ｐ_ｉの各要素に代入することで、人物ｉの姿勢ｐ_ｉを得る。 Returning to FIG. 5, next, posture estimation section 110 acquires the posture p _i of each grouped person i (S124). For example, the grouping processing unit 113 in FIG. 6A substitutes the grouped candidate points (or position coordinates) of each part for each element of the posture _pi shown in Equation (1), thereby Get the pose p _i of i.

以上が姿勢推定処理（Ｓ１２）である。 The posture estimation processing (S12) has been described above.

図２に戻り、次に、情報処理装置１００は、群衆の人物ｉごとに注目度を算出する（Ｓ１３）。以下、注目度算出処理の例について説明する。 Returning to FIG. 2, next, the information processing apparatus 100 calculates the degree of attention for each person i in the crowd (S13). An example of attention degree calculation processing will be described below.

＜注目度算出処理＞
図８は、注目度算出処理の例を表すフローチャートである。例えば、注目度算出部１２０で行われる処理である。 <Attention degree calculation processing>
FIG. 8 is a flowchart showing an example of attention level calculation processing. For example, it is a process performed by the attention degree calculation unit 120 .

注目度算出部１２０は、注目度算出処理を開始すると（Ｓ１３０）、姿勢ｐ_ｉを利用して、顔の部位、首、左肩、及び右肩は視えているか否かを判別する（Ｓ１３１）。例えば、注目度算出部１２０は、以下の式を利用して判定する。 When the attention degree calculation process is started (S130), the attention degree calculation unit 120 uses the posture _pi to determine whether or not the parts of the face, the neck, the left shoulder, and the right shoulder are visible (S131). For example, the attention degree calculation unit 120 uses the following formula for determination.

注目度算出部１２０は、ｖ^ｉが「１」のとき、顔の部位（鼻、左目、右目、左耳、右耳）、首、左肩、及び右肩が全て視えていると判定し、ｖ^ｉが「０」のとき、顔の部位、首、左肩、又は右肩のいずれかが視えていないと判定する。例えば、注目度算出部１２０は、内部メモリから式（４）を読み出して、姿勢推定部１１０から出力された姿勢ｐ_ｉからｖ_１ ^ｉ～ｖ_８ ^ｉを抽出し、式（４）に代入することで判定する。 When v ⁱ is “1”, the attention level calculation unit 120 determines that all parts of the face (nose, left eye, right eye, left ear, right ear), neck, left shoulder, and right shoulder are visible. When ⁱ is "0", it is determined that any part of the face, neck, left shoulder, or right shoulder is not visible. For example, attention level calculation section 120 reads equation (4) from the internal memory, extracts v ₁ ⁱ to v ₈ ⁱ from posture p _i output from posture estimation section 110, and substitutes them into equation (4). Judge by

注目度算出部１２０は、顔の部位、首、左肩、又は右肩のいずれかが視えていないと判定したとき（Ｓ１３１でＮＯ）、視えていない部位の位置情報を補間により算出する（Ｓ１３２）。注目度算出部１２０は、例えば、以下の式を用いて、人物ｉにおいて視えていない部位ｋの位置ベクトル（又は位置情報。以下では、位置ベクトルと位置情報とを区別しないで用いる場合がある。）ａ _ｋ ^ｉ＝（ｘ _ｋ ^ｉｙ _ｋ ^ｉ） ^Ｔを算出する。 When the attention level calculation unit 120 determines that any part of the face, neck, left shoulder, or right shoulder is not visible (NO in S131), it calculates the position information of the invisible part by interpolation (S132). . The attention level calculation unit 120 uses, for example, the following equation to obtain the position vector (or position information) of the invisible part k of the person i. Hereinafter, the position vector and the position information may be used without distinction. ) a _k ⁱ =(x _k ⁱ y _k ⁱ ) ^T.

式（５）において、ａ_ｋ１ ^ｉ，ａ_ｋ２ ^ｉ，…，ａ_ｋＭ ^ｉは人物ｉの視えている部位（ｋ１，ｋ２，…，ｋＭ）（Ｍは、０＜Ｍ≦ｊを満たす整数）の位置ベクトル、Ａ _ｋ ^Ｍは２行２Ｍ列の行列、ｂ_ｋ ^Ｍは２行１列の縦ベクトルをそれぞれ表す。 In _Equation (5 ⁾ , a _k1 ⁱ , a _k2 ⁱ , . A position vector, A _k ^M represents a matrix of 2 rows and 2 M columns, and b _k ^M represents a vertical vector of 2 rows and 1 column.

式（５）は、例えば、視えていない部位ｋの位置ベクトルａ_ｋ ^ｉは、視ている部位の位置ベクトルａ_ｋ１ ^ｉ，ａ_ｋ２ ^ｉ，…，ａ_ｋＭ ^ｉとオフセットｂ_ｋ ^Ｍとを用いて算出されることを表している。 For example, the position vector a _k ⁱ of the _unseen part ^k is _obtained by using the position ^vectors a _k1 ⁱ , a _k2 ⁱ , . It means that it is calculated.

式（５）の行列Ａ _ｋ ^Ｍと縦ベクトルｂ_ｋ ^Ｍは、姿勢ｓの集合Ｐを用いて、以下の式を解くことで求めることができる。 The matrix A _k ^M and the vertical vector b _k ^M in Equation (5) can be obtained by solving the following equation using the set P of postures s.

式（６）において、ｓ _ｋ ^ｉ＝（ｘ _ｋ ^ｉｙ _ｋ ^ｉ） ^Ｔは、姿勢ｓの部位番号ｋの部位の位置ベクトルを表す。また、姿勢ｓの集合Ｐは、例えば、３Ｄ－ＣＧソフトウェアなどで作成した人体模型をモデルデータとした場合の各部位の位置ベクトルの集合である。 In Equation (6), s _k ⁱ =(x _k ⁱ y _k ⁱ ) ^T represents the position vector of the part with the part number k of the posture s. A set P of postures s is, for example, a set of position vectors of each part when a human body model created by 3D-CG software or the like is used as model data.

式（６）は、例えば、このような人体模型として作成された人物の部位番号ｋの位置ベクトルｓ_ｋ ^ｉと、姿勢推定部１１０で推定された、視えている部位の位置ベクトルａ_ｋ１ ^ｉ，ａ_ｋ２ ^ｉ，…，ａ_ｋＭ ^ｉとの誤差が最小となるＡ _ｋ ^Ｍとｂ_ｋ ^Ｍとを表している。式（６）は、例えば、勾配降下法など、公知の手法により解くことが可能である。 Equation (6) is, for example, a position vector s _k ⁱ of a part number k of a person created as such a human body model and a position vector a _k1 ⁱ of a visible part estimated by the posture estimation unit 110, A _{kM and b k} _M ^that minimize the error from a _k2 ⁱ , . . . , a _kM ⁱ are ^shown . Equation (6) can be solved by a known method such as gradient descent.

このように注目度算出部１２０は、例えば、顔の部位の位置情報を姿勢推定部１１０で推定することができなかったとき、姿勢推定部１１０で推定することができた他の部位の位置情報を利用して、顔の部位の位置情報を算出している。 In this way, for example, when posture estimation section 110 cannot estimate the position information of a part of the face, attention level calculation section 120 calculates the position information of other parts that could be estimated by posture estimation section 110. is used to calculate the position information of the parts of the face.

具体的には、注目度算出部１２０は、Ｓ１３２において、例えば、以下の処理を行う。すなわち、注目度算出部１２０は、姿勢推定部１１０から受け取った姿勢ｐ_ｉから、視えている部位の位置ベクトルａ_ｋ１ ^ｉ，ａ_ｋ２ ^ｉ，…，ａ_ｋＭ ^ｉを抽出する。そして、注目度算出部１２０は、内部メモリに記憶された式（５）を読み出して、式（５）に代入することで、姿勢ｐ_ｉの中で視えていない部位ｋの位置ベクトルａ_ｋ ^ｉを算出する。その際、注目度算出部１２０は、内部メモリから式（６）を読み出して、視えている部位の位置ベクトルａ_ｋ１ ^ｉ，ａ_ｋ２ ^ｉ，…，ａ_ｋＭ ^ｉを式（６）に代入して演算を行うことで、Ａ _ｋ ^Ｍとｂ_ｋ ^Ｍを取得して、式（５）に代入する。 Specifically, the attention degree calculation unit 120 performs, for example, the following processing in S132. That is, attention level calculation section ¹²⁰ ^extracts position _vectors a _k1 _i , a _k2 ⁱ , . Then, attention level calculation section 120 reads out equation (5) stored in the internal memory and substitutes it into equation (5) to obtain position vector a _k ⁱ of part k that is not visible in posture p _i . Calculate At that time, the attention level calculation unit 120 reads out the expression (6 ⁾ from the internal memory _, substitutes the position vectors a _k1 ⁱ , a _k2 ⁱ , . By performing calculations, A _k ^M and b _k ^M are obtained and substituted into equation (5).

なお、首、左肩、右肩、及び鼻を、顔の部位に含めてもよい。この場合、注目度算出部１２０は、これらの部位を含めた顔の部位について、その部位が視えているか否かをＳ１３１において判定してもよい。 Note that the neck, left shoulder, right shoulder, and nose may be included in the facial region. In this case, the attention degree calculation unit 120 may determine in S131 whether or not the part of the face including these parts is visible.

次に、注目度算出部１２０は、人物ｉの向きベクトルｑ_ｉを算出する（Ｓ１３３）。例えば、注目度算出部１２０は、以下の式を用いて、向きベクトルｑ_ｉを算出する。 Next, the attention level calculation unit 120 calculates the orientation vector qi of the person _i (S133). For example, the attention level calculation unit 120 calculates the direction vector q _i using the following formula.

式（７）において、Ｗは２行２Ｊ列の行列、ｗ_０は２行１列の縦ベクトルをそれぞれ表す。また、ａ_ｊ ^ｉは、人物ｉの部位ｊの位置ベクトルを表す。行列Ｗと縦ベクトルｗ_０は、式（６）と同様に、以下の式を解くことにより求められる。 In equation (7), W represents a matrix of ₂ rows and 2 J columns, and w0 represents a vertical vector of 2 rows and 1 column. Also, a _j ⁱ represents the position vector of the part j of the person i. Matrix W and column vector _w0 can be obtained by solving the following equations as in equation (6).

式（８）において、ｓ_ｋ＝（ｘ_ｋｙ_ｋ）^Ｔは、姿勢ｓの部位番号ｋの部位の位置ベクトルを表し、姿勢ｓの集合Ｐは、例えば、上述した場合と同様に、３Ｄ－ＣＧソフトウェアなどで作成された人体模型のおける各部位の位置ベクトルの集合である。また、ｑ_ｓは、例えば、姿勢ｓが持つ向きベクトルであり、顔の部位（左目、右目、左耳、右耳、鼻、首）の最小２乗平面Ｓ_ｆａｃｅに直交するベクトルとして定義される。図９（Ａ）は人物画像の例を表し、図９（Ｂ）は向きベクトルｑ_ｓの例を表す図である。 In equation (8), s _k =(x _k y _k ) ^T represents the position vector of the part of pose s with part number k, and the set P of poses s is, for example, a 3D- It is a set of position vectors of each part in a human body model created by CG software or the like. Also, qs is, for example, the direction vector of the posture _s , and is defined as a vector orthogonal to the least-squares plane S _face of the parts of the face (left eye, right eye, left ear, right ear, nose, neck). . FIG. 9A shows an example of a person image, and FIG. 9B shows an example of a direction vector _qs .

例えば、注目度算出部１２０は、Ｓ１３３において、以下の処理を行う。すなわち、注目度算出部１２０は、内部メモリに記憶された式（７）を読み出して、姿勢ｐ_ｉから抽出した、視えている部位の位置ベクトルａ_ｋ１ ^ｉ，ａ_ｋ２ ^ｉ，…，ａ_ｋＭ ^ｉと、補間により算出した、視えていない部位ｋの位置ベクトルａ_ｋ ^ｉとを、式（７）に代入する。そして、注目度算出部１２０は、人物ｉの向きベクトルｑ_ｉを算出する。その際、注目度算出部１２０は、内部メモリから、式（８）、向きベクトルｑ_ｓ、及び各部位番号ｋの部位の位置ベクトルｓ_ｋを読み出して、式（８）に代入することで、Ｗとｗ_０とを取得する。この場合、予め計算されたＷとｗ_０とが内部メモリに記憶され、注目度算出部１２０は、処理の際に内部メモリからＷとｗ_０とを読み出して、式（７）に代入するようにしてもよい。 For example, the attention degree calculation unit 120 performs the following processing in S133. That is, the attention level calculation unit 120 reads out the equation (7 ⁾ stored in the internal memory, and extracts the position _vectors a _k1 _i , a _k2 ⁱ , ^. , and the position vector a _k ⁱ of the invisible site k calculated by interpolation are substituted into the equation (7). Then, the attention degree calculation unit 120 calculates the orientation vector q _i of the person i. At that time, the attention level calculation unit 120 reads out the expression (8), the direction vector q _s , and the position vector s _k of the part with the part number k from the internal memory, and substitutes them into the expression (8). Get W and _w0 . In this case, pre-calculated W and _w0 are stored in the internal memory, and the attention level calculation unit 120 reads out W and _w0 from the internal memory and substitutes them into equation (7). can be

図８に戻り、注目度算出部１２０は、人物ｉの向きベクトルｑ_ｉを算出すると、注目度算出処理を終了する（Ｓ１３４）。 Returning to FIG. 8, after calculating the orientation vector qi of the person _i , the attention level calculation unit 120 ends the attention level calculation process (S134).

一方、注目度算出部１２０は、顔の部位、首、左肩、及び右肩のいずれも視えていると判定したとき（Ｓ１３１でＹＥＳ）、人物ｉの向きベクトルｑ_ｉを算出し（Ｓ１３３）、注目度算出処理を終了する（Ｓ１３４）。この場合、注目度算出部１２０は、補間処理（Ｓ１３２）を行うことなく、視ている部位ｊの位置ベクトルａ_１ ^ｉ，ａ_２ ^ｉ，…，ａ_ｊ ^ｉを式（７）に代入することで、向きベクトルｑ_ｉを算出する。 On the other hand, when the attention level calculation unit 120 determines that all parts of the face, the neck, the left shoulder, and the right shoulder are visible (YES in S131), the attention level calculation unit 120 calculates the direction vector qi of the person _i (S133), The attention level calculation process is terminated (S134). In this case, the attention level calculation unit 120 _substitutes the position vectors a ₁ ⁱ , a ₂ ⁱ , ^. to calculate the orientation vector _qi .

上述した例は、人物ｉの向きベクトルｑ_ｉの算出する（Ｓ１３３）例として、式（７）と式（８）を用いた例について説明した。例えば、注目度算出部１２０は、式（７）と式（８）に代えて、以下の式を用いて、人物ｉの向きベクトルｑ_ｉを算出してもよい。 In the above example, the example using equations (7) and (8) was described as an example of calculating the orientation vector qi of the person _i (S133). For example, the attention level calculation unit 120 may calculate the direction vector qi of the person _i using the following formula instead of the formulas (7) and (8).

式（９）に示すように、注目度算出部１２０は、鼻（ｊ＝１）、首（ｊ＝６）、左肩（ｊ＝７）、右肩（ｊ＝８）の各部位の位置ベクトルａ_ｋ ^ｉのｘ軸方向の位置座標（ｘ_１ ^ｉ，ｘ_６ ^ｉ，ｘ_７ ^ｉ，ｘ_８ ^ｉ）を用いて、向きベクトルｑ_ｉを算出する。なお、式（９）において、ｗ_１，ｗ_２はパラメータであり、例えば、ｗ_１＝１．０、ｗ_２＝０．５である。図１０（Ａ）から図１０（Ｃ）は、ｗ_１＝１．０、ｗ_２＝０．５の場合の各座標の関係例を表す図である。 As shown in Equation (9), the attention level calculation unit 120 calculates position vectors of the nose (j=1), neck (j=6), left shoulder (j=7), and right shoulder (j=8). A direction vector q _i is calculated using the position coordinates (x ₁ ⁱ , x ₆ ⁱ , x ₇ ⁱ , x ₈ ⁱ ) of a _k ⁱ in the x-axis direction. In equation (9), w ₁ and w ₂ are parameters, for example w ₁ =1.0 and w ₂ =0.5. FIGS. 10(A) to 10(C) are diagrams showing examples of the relationship between coordinates when w ₁ =1.0 and w ₂ =0.5.

注目度算出部１２０は、例えば、以下の処理を行う。すなわち、注目度算出部１２０は、内部メモリに記憶した式（９）を読み出す。そして、注目度算出部１２０は、姿勢ｐ_ｉから抽出した、又は、補間により算出した、各部位（ｊ＝１，６，７，８）の位置ベクトルａ_ｋ１ ^ｉのｘ座標を式（９）に代入することで、人物ｉの向きベクトルｑ_ｉを算出する。 The attention degree calculation unit 120 performs, for example, the following processing. That is, the attention level calculation unit 120 reads out the formula (9) stored in the internal memory. Then, attention level calculation section 120 calculates the x-coordinate of position vector a _k1 ⁱ of each part (j=1, 6, 7, 8) extracted from posture p _i or calculated by interpolation using equation (9). , the orientation vector q _i of the person i is calculated.

以上が注目度算出処理（Ｓ１３）の例である。上述した例は、人物ｉの向きベクトルｑ_ｉを算出する例について説明した。例えば、注目度算出部１２０は、姿勢推定部１１０から、他の人物（ｉ＋１）の姿勢ｐ_ｉ＋１を受け取ったときは、この人物（ｉ＋１）に対する注目度算出処理（Ｓ１３）を行い、姿勢ｐ_ｉ＋１を算出する。このようにして、注目度算出部１２０は、画像に写っている全ての人物ｉの向きベクトルｑ_ｉを算出する。 The above is an example of the attention level calculation process (S13). The above example describes an example of calculating the orientation vector q _i of the person i. For example, when attention level calculation section 120 receives posture pi+1 of another person ( _i +1) from posture estimation section 110, attention level calculation section 120 performs attention level calculation processing (S13) for this person (i+1), and performs attitude _pi+1. Calculate In this way, the attention level calculation unit 120 calculates the direction vectors qi of all the persons _i appearing in the image.

図２に戻り、次に、情報処理装置１００は、算出した人物ｉの向きベクトルｑ_ｉを注目度記憶部１４０に記憶する（Ｓ１５）。 Returning to FIG. 2, next, the information processing apparatus 100 stores the calculated orientation vector qi of the person _i in the attention level storage unit 140 (S15).

次に、情報処理装置１００は、終了するか否かを判定する（Ｓ１６）。例えば、情報処理装置１００を操作するユーザが終了ボタンを操作したか否か、或いは、終了コマンドを入力したか否かにより判定する。 Next, the information processing apparatus 100 determines whether or not to end (S16). For example, it is determined whether or not the user operating the information processing apparatus 100 has operated the end button or whether or not the end command has been input.

情報処理装置１００は、終了するときは（Ｓ１６でＹＥＳ）、一連の処理を終了させ（Ｓ１７）、終了しないときは、Ｓ１１へ移行して、上述した処理を繰り返す（Ｓ１１からＳ１５）。例えば、情報処理装置１００は、他の画像（又は次の画像フレーム）に対して、画像に写っている人物ｉの向きベクトルｑ_ｉを算出するときは、終了することなく（Ｓ１６でＮＯ）、上述した処理を繰り返せばよい。 When the information processing apparatus 100 ends (YES in S16), the series of processes ends (S17). For example, the information processing apparatus 100 does not end (NO in _S16 ), The above processing may be repeated.

なお、情報処理装置１００は、向きベクトルｑ_ｉを注目度記憶部１４０に記憶した後、対象物３００－１，３００－２に視線を向けている人物ｉの数をカウント処理を行ってもよい。 After storing the direction vector qi in the attention level storage unit 140, the information processing apparatus 100 may count the number of persons _i who are looking at the objects 300-1 and 300-2. .

図１１は、カウント処理の例を表すフローチャートである。例えば、注目度算出部１２０で行われる。 FIG. 11 is a flowchart showing an example of counting processing. For example, it is performed by the attention degree calculation unit 120 .

注目度算出部１２０は、処理を開始すると（Ｓ１４０）、向きベクトルｑ_ｉが対象物ｍと交差するか否かを判定する（Ｓ１４１）。例えば、注目度算出部１２０は、算出した向きベクトルｑ_ｉを、ｎ（ｎ＞０）倍し、ｎ倍した向きベクトルｑ_ｉが、対象物３００－１，３００－２の設置点の位置座標と交差するか否か、或いは設置点の位置座標の周囲の一定範囲内の領域で交差するか否かにより判定する。この場合、注目度算出部１２０は、例えば、向きベクトルｑ_ｉを表す二次方程式に、設置点の位置座標を代入しても二次方程式としての解が得られるか否かにより判定してもよい。或いは、注目度算出部１２０は、向きベクトルｑ_ｉを表す二次方程式と一定範囲内の領域を表す一次方程式とで解が得られるか否かにより判定してもよい。注目度算出部１２０は、例えば、空間情報記憶部１３０に記憶された対象物３００－１，３００－２の設定点を表す位置情報などを利用して、このような計算を行う。 When the process is started (S140), the attention level calculation unit 120 determines whether or not the direction vector _qi intersects the object m (S141). For example, the attention level calculation unit 120 multiplies the calculated orientation vector q _i by n (n>0), and the n-fold orientation vector q _i is the position coordinates of the installation points of the objects 300-1 and 300-2. or whether it intersects within a certain range around the position coordinates of the installation point. In this case, the attention level calculation unit 120 may determine, for example, whether or not a solution as a quadratic equation can be obtained by substituting the position coordinates of the installation point into the quadratic equation representing the direction vector _qi . good. Alternatively, the attention level calculation unit 120 may determine whether or not a solution can be obtained from a quadratic equation representing the direction vector q _i and a linear equation representing the region within a certain range. The attention degree calculation unit 120 performs such calculation using, for example, position information representing the set points of the objects 300-1 and 300-2 stored in the spatial information storage unit .

注目度算出部１２０は、向きベクトルｑ_ｉが対象物ｍと交差すると判定したとき（Ｓ１４１でＹＥＳ）、対象ｍに対するカウント値を増加させる（Ｓ１４２）。例えば、注目度算出部１２０は、ｎ倍した向きベクトルｑ_ｉが、対象物３００－１の設置点の位置座標や、その周囲の領域で交差するとき、対象物３００－１のカウント値をインクリメントする。 When the attention level calculation unit 120 determines that the direction vector q _i intersects the object m (YES in S141), it increases the count value for the object m (S142). For example, when the direction vector q _i multiplied by n intersects the positional coordinates of the installation point of the object 300-1 or the surrounding area, the attention level calculation unit 120 increments the count value of the object 300-1. do.

次に、注目度算出部１２０は、終了判定を行い（Ｓ１４３）、カウント処理を終了させるときは（Ｓ１４３でＹＥＳ）、終了し（Ｓ１４４）、終了させないときは（Ｓ１４３でＮＯ）、人物ｉをインクリメントし（Ｓ１４５）、次の人物ｉに対して、どの対象物ｍに着目しているかを判定する（Ｓ１４１，Ｓ１４２）。 Next, the attention level calculation unit 120 determines whether to end the counting process (S143). The number is incremented (S145), and it is determined which object m the next person i is focused on (S141, S142).

一方、注目度算出部１２０は、向きベクトルｑ_ｉが対象物ｍと交差しないと判定したとき（Ｓ１４１でＮｏ）、対象物ｍに対するカウント値を増加させることなく、終了判定を行う（Ｓ１４３）。 On the other hand, when the attention level calculation unit 120 determines that the orientation vector _qi does not intersect the object m (No in S141), it performs end determination without increasing the count value for the object m (S143).

例えば、注目度算出部１２０は、カウント処理を終了したとき（Ｓ１４４）、例えば、内部メモリなどに記憶した各対象物ｍに対するカウント値を、注目度記憶部１４０に記憶する。注目度算出部１２０は、表示装置へカウント値を出力することで、ユーザに対して、どの対象物３００－１，３００－２に群衆が着目しているかを、通知することができる。 For example, when the counting process ends (S144), the attention level calculation unit 120 stores the count value for each object m stored in the internal memory or the like in the attention level storage unit 140, for example. By outputting the count value to the display device, the attention degree calculation unit 120 can notify the user of which objects 300-1 and 300-2 the crowd is paying attention to.

例えば、街中などで監視カメラを用いて群衆の画像が撮影された場合を考える。この場合、撮影された画像には、障害物や監視カメラの設置場所などによって、顔が隠れた人物が含まれる場合がある。このような場合、顔の部位の位置情報が得られない場合がある。 For example, consider a case where a surveillance camera is used to capture an image of a crowd in a city. In this case, the photographed image may include a person whose face is hidden due to obstacles, installation locations of surveillance cameras, or the like. In such a case, it may not be possible to obtain the positional information of the parts of the face.

本第１の実施の形態の情報処理装置１００は、顔の部位の位置情報が得られないときであっても、位置情報が得られた他の部位の位置情報を利用して、補間処理（例えば図８のＳ１３２）により、顔の部位の位置情報を算出する。そして、情報処理装置１００は、顔の位置情報と他の部位の位置情報とを用いて、向きベクトルｑ_ｉを算出する。従って、本情報処理装置１００は、顔が隠れた人物の画像であっても、その人物の視線方向を推定することが可能である。 Information processing apparatus 100 according to the first embodiment performs interpolation processing ( For example, in S132) of FIG. 8, the position information of the part of the face is calculated. Then, the information processing apparatus 100 calculates the direction vector q _i using the position information of the face and the position information of other parts. Therefore, the information processing apparatus 100 can estimate the line-of-sight direction of a person whose face is hidden in the image.

[第２の実施の形態]
第１の実施の形態では、姿勢ｐ_ｉなどは、２次元空間上のベクトルとして表現される例について説明した。本第２の実施の形態では、姿勢ｐ_ｉなどが、３次元空間上のベクトルとして表現される例について説明する。このような３次元空間上のベクトルは、例えば、複数台のカメラ（又は撮像装置２００）を用いて、計算が可能となる。 [Second embodiment]
In the first embodiment, an example has been described in which the posture p _i and the like are expressed as vectors in a two-dimensional space. In the second embodiment, an example in which postures p _i and the like are expressed as vectors in a three-dimensional space will be described. Vectors in such a three-dimensional space can be calculated using, for example, multiple cameras (or imaging devices 200).

図１２は、２台のカメラの座標系と３次元空間上の位置座標の例を表す図である。図１２に示す２台のカメラは、例えば、撮像装置２００が２台あることを表している。 FIG. 12 is a diagram showing an example of coordinate systems of two cameras and position coordinates in a three-dimensional space. Two cameras shown in FIG. 12 represent, for example, that there are two imaging devices 200 .

図１２において、Ｏは第１のカメラの原点、Ｏ’は第２のカメラの原点をそれぞれ表す。また、ｔは第１のカメラから第２のカメラへ向かう並進ベクトル、（Ｘ，Ｙ，Ｚ）は第１のカメラの座標系から見た部位の３次元位置座標をそれぞれ表す。さらに、Ｒは、第１のカメラから見た第２のカメラの回転角を表す回転行列、ｆは第１のカメラの焦点距離（原点Ｏから第１のカメラの画像座標系の原点までの距離）、ｆ’は第２のカメラの焦点距離（原点Ｏ’から第２のカメラの画像座標系の原点までの距離）をそれぞれ表す。さらに、（ｘ，ｙ）は、第１のカメラの画像内（又は第１のカメラの画像座標系）における部位の２次元位置座標、（ｘ’，ｙ’）は、第２のカメラの画像内（又は第２のカメラの画像座標系）における部位の２次元位置座標をそれぞれ表す。 In FIG. 12, O represents the origin of the first camera, and O' represents the origin of the second camera. Also, t represents a translation vector from the first camera to the second camera, and (X, Y, Z) represents the three-dimensional position coordinates of the part viewed from the coordinate system of the first camera. Furthermore, R is a rotation matrix representing the rotation angle of the second camera as seen from the first camera, f is the focal length of the first camera (the distance from the origin O to the origin of the image coordinate system of the first camera ) and f′ represent the focal length of the second camera (the distance from the origin O′ to the origin of the image coordinate system of the second camera). Furthermore, (x, y) is the two-dimensional position coordinates of the part in the image of the first camera (or the image coordinate system of the first camera), and (x', y') is the image of the second camera. (or the image coordinate system of the second camera).

図１３は、第２の実施の形態における姿勢推定部１１０の構成例を表す図である。 FIG. 13 is a diagram showing a configuration example of posture estimation section 110 in the second embodiment.

図１３に示すように、姿勢推定部１１０は、第１及び第２のＣＮＮ処理部１１１－１，１１１－２、第１及び第２の候補点算出部１１２－１，１１２－２、第１及び第２のグルーピング処理部１１３－１，１１３－２を備える。また、姿勢推定部１１０は、同一人物特定処理部１１４、カメラ行列計算部１１５、及び３次元位置計算部１１６を備える。 As shown in FIG. 13, posture estimation section 110 includes first and second CNN processing sections 111-1 and 111-2, first and second candidate point calculation sections 112-1 and 112-2, first and second grouping processing units 113-1 and 113-2. In addition, posture estimation section 110 includes same person identification processing section 114 , camera matrix calculation section 115 , and three-dimensional position calculation section 116 .

第１及び第２のＣＮＮ処理部１１１－１，１１１－２は、第１及び第２のカメラから出力された画像データに対して、ＣＮＮ処理などを施して、各部位に対する確率分布φ（Ｘ，Ｗ）を出力する。第１及び第２のＣＮＮ処理部１１１－１，１１１－２の各々は、各カメラからの画像データに対して、例えば、第１の実施の形態と同様のＣＮＮ処理を施すことで、各カメラで撮像された画像の各部位に対する確率分布φ（Ｘ，Ｗ）を出力する。 The first and second CNN processing units 111-1 and 111-2 perform CNN processing and the like on the image data output from the first and second cameras, and obtain a probability distribution φ(X , W). Each of the first and second CNN processing units 111-1 and 111-2 performs, for example, the same CNN processing as in the first embodiment on the image data from each camera, so that each camera output the probability distribution φ(X, W) for each part of the image captured by .

第１及び第２の候補点算出部１１２－１，１１２－２は、第１及び第２のＣＮＮ処理部１１１－１，１１１－２から出力された確率分布φ（Ｘ，Ｗ）に基づいて、各部位の候補点をそれぞれ算出する。第１及び第２の候補点算出部１１２－１，１１２－２の各々は、例えば、第１の実施の形態と同様に、確率分布φ（Ｘ，Ｗ）から極大点を探索するなどにより、候補点を算出する。 The first and second candidate point calculation units 112-1 and 112-2 are based on the probability distribution φ(X, W) output from the first and second CNN processing units 111-1 and 111-2. , and the candidate points for each part are calculated. Each of the first and second candidate point calculation units 112-1 and 112-2, for example, similarly to the first embodiment, searches for a local maximum point from the probability distribution φ(X, W), Calculate candidate points.

第１及び第２のグルーピング処理部１１３－１，１１３－２は、第１及び第２の候補点算出部１１２－１，１１２－２から出力された候補点に対して、それぞれグルーピングを行う。第１及び第２のグルーピング処理部１１３－１，１１３－２の各々は、例えば、第１の実施の形態と同様に、各候補点の距離に基づいて、グルーピングを行う。 The first and second grouping processors 113-1 and 113-2 respectively group the candidate points output from the first and second candidate point calculators 112-1 and 112-2. Each of the first and second grouping processing units 113-1 and 113-2 performs grouping based on the distance of each candidate point, for example, as in the first embodiment.

同一人物特定処理部１１４は、第１及び第２のグルーピング処理部１１３－１，１１３－２から出力された、グループ化された候補点が同一人物の候補点であるか否かを、類似度を用いて特定する。同一人物特定処理部１１４は、類似度の高い候補点の組み合わせを同一人物の候補点であると判定して、その候補点を出力する。詳細は動作例で説明する。 The same person identification processing unit 114 determines whether or not the grouped candidate points output from the first and second grouping processing units 113-1 and 113-2 are candidate points of the same person. Identify using The same person identification processing unit 114 determines that a combination of candidate points with a high degree of similarity is candidate points of the same person, and outputs the candidate points. Details will be explained in an operation example.

カメラ行列計算部１１５は、カメラ行列Ｐ，Ｐ’を計算する。カメラ行列Ｐは、例えば、図１２に示すように、第１のカメラの画像座標系を３次元位置の座標系へ変換する行列を表す。また、カメラ行列Ｐ’は、例えば、第２のカメラの画像座標系を３次元位置の座標系を変換する行列を表す。カメラ行列計算部１１５は、同一人物特定処理部１１４から出力された各候補点と、計算したカメラ行列Ｐ，Ｐ’とを３次元位置計算部１１６へ出力する。カメラ行列Ｐ，Ｐ’の計算例は動作例で説明する。 The camera matrix calculator 115 calculates camera matrices P and P'. The camera matrix P represents a matrix for transforming the image coordinate system of the first camera into the coordinate system of the three-dimensional position, as shown in FIG. 12, for example. Also, the camera matrix P' represents, for example, a matrix for transforming the image coordinate system of the second camera into the coordinate system of the three-dimensional position. The camera matrix calculation unit 115 outputs each candidate point output from the same person identification processing unit 114 and the calculated camera matrices P and P′ to the three-dimensional position calculation unit 116 . A calculation example of the camera matrices P and P' will be described in an operation example.

３次元位置計算部１１６は、例えば、カメラ行列Ｐ，Ｐ’を用いて、グループ化された各部位の候補点（２次元の位置座標）を３次元位置座標へ変換して、３次元位置ベクトルを含む姿勢ｐ_ｉを出力する。詳細は動作例で説明する。 The three-dimensional position calculation unit 116 converts the grouped candidate points (two-dimensional position coordinates) of each part into three-dimensional position coordinates using, for example, the camera matrices P and P′, and generates a three-dimensional position vector Output the pose p _i containing . Details will be explained in an operation example.

図１４は第２の実施の形態における動作例を表すフローチャートである。例えば、情報処理装置１００は、図２に示すフローチャートに代えて、図１４に示すフローチャートにより処理を行う。 FIG. 14 is a flow chart showing an operation example in the second embodiment. For example, the information processing apparatus 100 performs processing according to the flowchart shown in FIG. 14 instead of the flowchart shown in FIG.

情報処理装置１００は、処理を開始すると（Ｓ２０）、第１のカメラで群衆を撮影し（Ｓ２１）、第２のカメラでも同じ群衆を撮影する（Ｓ２３）。例えば、撮像装置２００が２台あり、一方の撮像装置２００が第１のカメラ、他方の撮像装置２００が第２のカメラとして、各々群衆を撮影する。 When the information processing apparatus 100 starts processing (S20), the first camera captures an image of the crowd (S21), and the second camera captures an image of the same crowd (S23). For example, there are two imaging devices 200, one imaging device 200 serves as a first camera, and the other imaging device 200 serves as a second camera, each of which photographs a crowd.

次に、情報処理装置１００は、第１のカメラで撮影された画像に含まれる各人物の姿勢を推定し（Ｓ２２）、第２のカメラで撮影された画像に含まれる各人物の姿勢も推定する（Ｓ２４）。例えば、第１のＣＮＮ処理部１１１－１、第１の候補点算出部１１２－１、及び第１のグルーピング処理部１１３－１において、第１のカメラで撮影された画像に含まれる各人物の姿勢を推定する。また、例えば、第２のＣＮＮ処理部１１１－２、第２の候補点算出部１１２－２、及び第２のグルーピング処理部１１３－２において、第２のカメラで撮影された画像に含まれる各人物の姿勢を推定する。 Next, the information processing apparatus 100 estimates the posture of each person included in the image captured by the first camera (S22), and also estimates the posture of each person included in the image captured by the second camera. (S24). For example, in the first CNN processing unit 111-1, the first candidate point calculation unit 112-1, and the first grouping processing unit 113-1, each person included in the image captured by the first camera Estimate pose. Further, for example, in the second CNN processing unit 111-2, the second candidate point calculation unit 112-2, and the second grouping processing unit 113-2, each Estimate the pose of a person.

次に、情報処理装置１００は、２つのカメラで撮影された画像に対して、同一人物特定処理を行う（Ｓ２５）。 Next, the information processing apparatus 100 performs same person identification processing on the images captured by the two cameras (S25).

図１５（Ａ）は同一人物特定処理の例を表すフローチャートである。例えば、同一人物特定処理部１１４において行われる。 FIG. 15A is a flowchart showing an example of same person identification processing. For example, it is performed in the same person identification processing unit 114 .

同一人物特定処理部１１４は、同一人物特定処理を開始すると（Ｓ２５０）、第１のカメラで撮影した人物の画像をトリミング（又は切り抜き、或いは切り取り）し（Ｓ２５１）、第２のカメラで撮影された人物の画像をトリミングする（Ｓ２５２）。例えば、同一人物特定処理部１１４は、以下の処理を行う。 When starting the same person identification processing (S250), the same person identification processing unit 114 trims (or cuts out or cuts) the image of the person captured by the first camera (S251), and the image of the person captured by the second camera is trimmed (or cut out) (S251). Then, the image of the person who has been photographed is trimmed (S252). For example, the same person identification processing unit 114 performs the following processing.

すなわち、同一人物特定処理部１１４は、第１及び第２のグルーピング処理部１１３－１，１１３－２から、グループ化された候補点を入力する。そのため、同一人物特定処理部１１４は、候補点に基づいて、グループ化された候補点全体の周囲にある画像の画素値が一定の範囲内にある画素値を、第１及び第２の画像の画像データから抽出することで、人物の画像をトリミングする。例えば、人物の画像の各画素の画素値は、第１及び第２のＣＮＮ処理部１１１－１，１１１－２、第１及び第２の候補点算出部１１２－１，１１２－２，及び第１及び第２のグルーピング処理部１１３－１，１１３－２を介して、同一人物特定処理部１１４へ入力される。 That is, the same person identification processing unit 114 receives grouped candidate points from the first and second grouping processing units 113-1 and 113-2. Therefore, based on the candidate points, the same person identification processing unit 114 assigns the pixel values of the images surrounding the entire grouped candidate points within a certain range to the pixel values of the first and second images. Crop the image of the person by extracting from the image data. For example, the pixel value of each pixel in the image of a person is obtained by the first and second CNN processing units 111-1 and 111-2, the first and second candidate point calculation units 112-1 and 112-2, and the It is input to the same person identification processing unit 114 via the first and second grouping processing units 113-1 and 113-2.

次に、同一人物特定処理部１１４は、類似度計算処理を行う（Ｓ２５３）。 Next, the same person identification processing unit 114 performs similarity calculation processing (S253).

図１５（Ｂ）は類似度計算処理の例を表すフローチャートである。 FIG. 15B is a flowchart showing an example of similarity calculation processing.

同一人物特定処理部１１４は、類似度計算処理を開始すると（Ｓ２５３０）、第１及び第２のカメラで撮影された人物の部位の画像をトリミングする（Ｓ２５３１，Ｓ２５３３）。この場合も、例えば、同一人物特定処理部１１４は、候補点ごとに、候補点の周囲にある画像の画素値が一定の範囲内にある画素値を抽出することで、人物の部位の画像をトリミングする。同一人物特定処理部１１４は、例えば、第１及び第２のカメラで撮影された画像ごとに、このような部位の画像をトリミングする。 When starting the similarity calculation process (S2530), the same person identification processing unit 114 trims the images of the parts of the person captured by the first and second cameras (S2531, S2533). Also in this case, for example, the same person identification processing unit 114 extracts the pixel values of the image around the candidate point within a certain range for each candidate point, thereby extracting the image of the part of the person. trim. The same person identification processing unit 114 trims the image of such a region, for example, for each image captured by the first and second cameras.

次に、同一人物特定処理部１１４は、各々トリミングした部位の画像に対して、カラーヒストグラムを計算する（Ｓ２５３２，Ｓ２５３４）。例えば、同一人物特定処理部１１４は、以下の処理を行う。 Next, the same person identification processing unit 114 calculates a color histogram for each trimmed part image (S2532, S2534). For example, the same person identification processing unit 114 performs the following processing.

すなわち、同一人物特定処理部１１４は、各部位の画像の各画素を所定セル（例えば、８×８画素）にまとめ、所定セルごとに、ＲＧＢの各画素値（又は階調値）の出現回数を取得する。このような処理は、公知手法でよく、例えば、色情報を利用した局所特徴量であるＣＳＳ（Color Self-Similarity）特徴量を計算することで、カラーヒストグラムが計算されてよい。同一人物特定処理部１１４は、第１及び第２のカメラで撮影された画像ごとに、カラーヒストグラムを計算する。 That is, the same person identification processing unit 114 collects each pixel of the image of each part into a predetermined cell (for example, 8×8 pixels), and for each predetermined cell, calculates the number of appearances of each pixel value (or gradation value) of RGB. to get Such processing may be performed by a known method. For example, a color histogram may be calculated by calculating a CSS (Color Self-Similarity) feature amount, which is a local feature amount using color information. The same person identification processing unit 114 calculates a color histogram for each image captured by the first and second cameras.

次に、同一人物特定処理部１１４は、カラーヒストグラム（Ｓ２５３２，Ｓ２５３４）を用いて平均２乗誤差を計算し、類似度を計算する（Ｓ２５３５）。例えば、同一人物特定処理部１１４は、以下の処理を行う。 Next, the same person identification processing unit 114 calculates the mean square error using the color histograms (S2532, S2534) and calculates the degree of similarity (S2535). For example, the same person identification processing unit 114 performs the following processing.

すなわち、同一人物特定処理部１１４は、第１のカメラで撮影した、ある部位の画像に対応するカラーヒストグラム（Ｓ２５３２）と、第２のカメラで撮影した、その部位の画像に対応するカラーヒストグラム（Ｓ２５３４）との平均２乗誤差を計算する。カラーヒストグラムでは、異なるカメラで撮影された部位の画像について、所定セルごとの画素値の出現回数として計算される。そのため、同一人物特定処理部１１４は、そのような２つの出現回数の誤差の２乗を算出し、その算出値の部位全体における平均値を算出する。同一人物特定処理部１１４は、算出した平均値の逆数を、類似度として計算する。同一人物特定処理部１１４は、部位ごとに、このような類似度を計算する。 That is, the same person identification processing unit 114 creates a color histogram (S2532) corresponding to the image of a certain part taken by the first camera and a color histogram (S2532) corresponding to the image of the part taken by the second camera. S2534) to calculate the mean squared error. The color histogram is calculated as the number of appearances of a pixel value for each predetermined cell for images of regions captured by different cameras. Therefore, the same person identification processing unit 114 calculates the square of the error between the two appearance counts, and calculates the average value of the calculated values for the entire part. The same person identification processing unit 114 calculates the reciprocal of the calculated average value as the degree of similarity. The same person identification processing unit 114 calculates such a degree of similarity for each part.

同一人物特定処理部１１４は、類似度を計算すると、類似度計算処理を終了する（Ｓ２５３６）。 After calculating the degree of similarity, the same person identification processing unit 114 ends the degree of similarity calculation processing (S2536).

図１５（Ａ）に戻り、次に、同一人物特定処理部１１４は、類似度の高い組み合わせを探索する（Ｓ２５４）。例えば、同一人物特定処理部１１４は、部位ごとに計算した複数の類似度が、すべて類似度閾値以上のとき、第１のカメラで撮影した人物と第２のカメラで撮影した人物が同一人物であると判定し、そうでないときは、同一人物ではないと判定する。このような判定は、一例であって、同一人物特定処理部１１４は、類似度が類似度閾値以上となっている部位の個数が、個数閾値以上のとき、同一人物であると判定し、そうでないときは同一人物ではないと判定してもよい。例えば、類似度が高いほど、同一人物由来の部位画像である確率は高くなる。 Returning to FIG. 15A, next, the same person identification processing unit 114 searches for a combination with a high degree of similarity (S254). For example, the same person identification processing unit 114 determines that the person photographed by the first camera and the person photographed by the second camera are the same person when all of the plurality of similarities calculated for each part are equal to or greater than the similarity threshold. If not, it is determined that they are not the same person. Such a determination is an example, and the same person identification processing unit 114 determines that the person is the same person when the number of parts whose similarity is equal to or higher than the similarity threshold is equal to or higher than the number threshold. If not, it may be determined that they are not the same person. For example, the higher the degree of similarity, the higher the probability that the partial images are derived from the same person.

そして、同一人物特定処理部１１４は、同一人物特定処理を終了する（Ｓ２５５）。 Then, the same person identification processing unit 114 ends the same person identification processing (S255).

図１４に戻り、次に、情報処理装置１００は、カメラ行列Ｐ，Ｐ’を計算する（Ｓ２６）。ここで、カメラ行列Ｐ，Ｐ’について説明する。 Returning to FIG. 14, the information processing apparatus 100 then calculates the camera matrices P and P' (S26). Here, the camera matrices P and P' will be explained.

図１２に示すように、最初に、第１のカメラの座標系から見た部位の３次元位置座標（Ｘ，Ｙ，Ｚ）を第１のカメラの画像内における部位の２次元位置座標（ｘ，ｙ）と第２のカメラの画像内における部位の２次元位置座標（ｘ’，ｙ’）と並進ベクトルｔ、回転行列Ｒとで記述することを考える。 As shown in FIG. 12, first, the three-dimensional position coordinates (X, Y, Z) of the part viewed from the coordinate system of the first camera are converted to the two-dimensional position coordinates (x , y), the two-dimensional position coordinates (x′, y′) of the part in the image of the second camera, the translation vector t, and the rotation matrix R.

まず、投資投影モデルによれば、３次元位置座標（Ｘ，Ｙ，Ｚ）は、２次元位置座標（ｘ，ｙ）と第１のカメラの焦点距離ｆを用いて、以下の式（９－１）または式（９－２）で記述される。 First, according to the investment projection model, the three-dimensional position coordinates (X, Y, Z) are obtained by the following equation (9- 1) or described by formula (9-2).

同様にして、第２のカメラの座標系から見た部位の３次元位置座標（Ｘ’，Ｙ’，Ｚ’）は、以下の式（９－３）又は式（９－４）式で記述される。ここで、ｋとｋ’は「０」ではない実数である。 Similarly, the three-dimensional position coordinates (X', Y', Z') of the part viewed from the coordinate system of the second camera are described by the following formula (9-3) or formula (9-4). be done. Here, k and k' are real numbers that are not "0".

一方、図１２に示すように、第２カメラの座標系（Ｘ’，Ｙ’，Ｚ’）と第１カメラの座標系（Ｘ，Ｙ，Ｚ）の関係は、並進ベクトルｔと回転行列Ｒを用いて以下の式（９－５）で記述できる。 On the other hand, as shown in FIG. 12, the relationship between the coordinate system (X', Y', Z') of the second camera and the coordinate system (X, Y, Z) of the first camera is the translation vector t and the rotation matrix R can be described by the following equation (9-5) using

以上の式（９－１）、式（９－２）、式（９－３）、式（９－４）、及び式（９－５）を連立させて解けば、３次元位置座標（Ｘ，Ｙ，Ｚ）と２次元位置座標（ｘ，ｙ）と２次元位置座標（ｘ’，ｙ’）の関係が、並進ベクトルｔ、回転行列Ｒとを用いて以下の４つの式で記述されることが分かる。 By simultaneously solving the above equations (9-1), (9-2), (9-3), (9-4), and (9-5), the three-dimensional position coordinates (X , Y, Z), two-dimensional position coordinates (x, y), and two-dimensional position coordinates (x', y') are described by the following four equations using a translation vector t and a rotation matrix R. I understand that.

ここで、Ｐ，Ｐ’カメラ行列とよばれる行列である。２次元位置座標（ｘ，ｙ）と２次元位置座標（ｘ’，ｙ’）、並進ベクトルｔ、回転行列Ｒが分かれば、３次元位置座標（Ｘ，Ｙ，Ｚ）は、上記の式（９－６）、式（９－７）、式（９－８）、及び式（９－９）を逆に解くことで計算できることが分かった。 Here, it is a matrix called a P, P' camera matrix. If the two-dimensional position coordinates (x, y), the two-dimensional position coordinates (x', y'), the translation vector t, and the rotation matrix R are known, the three-dimensional position coordinates (X, Y, Z) can be obtained by the above equation ( 9-6), equations (9-7), equations (9-8), and equations (9-9) can be solved in reverse.

以上から、カメラ行列計算部１１５は、例えば、以下の式を用いて、カメラ行列Ｐ，Ｐ’を計算する。 Based on the above, the camera matrix calculation unit 115 calculates camera matrices P and P' using, for example, the following equations.

式（１０）と式（１１）において、ｆ_０は、スケールを調整するパラメータであり、例えば、ｆ_０＝１である。 In equations (10) and (11), f ₀ is a parameter for adjusting the scale, for example f ₀ =1.

カメラ行列計算部１１５は、例えば、内部メモリに記憶された式（１０）と式（１１）を読み出して、第１及び第２のカメラの焦点距離ｆ，ｆ’、回転行列Ｒ、第１のカメラの原点Ｏから第２のカメラの原点Ｏ’へ向かうベクトルｔを式（１０）と式（１１）に代入する。カメラ行列計算部１１５は、例えば、焦点距離ｆ，ｆ’、回転行列Ｒ、ベクトルｔも内部メモリに記憶されており、これらの値を内部メモリから読み出して、式（１０）と式（１１）に代入すればよい。 The camera matrix calculation unit 115 reads, for example, the equations (10) and (11) stored in the internal memory, the focal lengths f and f′ of the first and second cameras, the rotation matrix R, the first A vector t directed from the origin O of the camera to the origin O' of the second camera is substituted into the equations (10) and (11). The camera matrix calculator 115 also stores focal lengths f, f′, rotation matrix R, and vector t in the internal memory, for example, and reads out these values from the internal memory to obtain equations (10) and (11). should be substituted for

次に、情報処理装置１００は、各部位の３次元位置（Ｘ，Ｙ，Ｚ）を計算する（Ｓ２７）。例えば、３次元位置計算部１１６は、以下の式の連立方程式を解くことで、（Ｘ，Ｙ，Ｚ）を計算する。 Next, the information processing device 100 calculates the three-dimensional position (X, Y, Z) of each part (S27). For example, the three-dimensional position calculation unit 116 calculates (X, Y, Z) by solving the following simultaneous equations.

例えば、３次元位置計算部１１６は、以下の処理を行う。すなわち、３次元位置計算部１１６は、内部メモリに記憶された式（１２）から式（１５）を読み出して、Ｓ２６で計算したカメラ行列Ｐ，Ｐ’の各成分と、部位の位置座標（ｘ，ｙ），（ｘ’，ｙ’）を、式（１２）から式（１５）に代入する。そして、３次元位置計算部１１６は、式（１２）から式（１５）の連立方程式を解くことで、部位の３次元位置座標（Ｘ，Ｙ，Ｚ）を得る。この場合、３次元位置計算部１１６は、計算した部位の３次元位置座標を、式（１）に代入することで、３次元ベクトルとして表現された姿勢ｐ_ｉを得る。 For example, the 3D position calculator 116 performs the following processing. That is, the three-dimensional position calculation unit 116 reads out the equations (12) to (15) stored in the internal memory, the components of the camera matrices P and P′ calculated in S26, and the position coordinates (x , y) and (x', y') are substituted into equations (12) to (15). Then, the three-dimensional position calculation unit 116 obtains the three-dimensional position coordinates (X, Y, Z) of the part by solving the simultaneous equations (12) to (15). In this case, the three-dimensional position calculation unit 116 obtains the posture p _i expressed as a three-dimensional vector by substituting the calculated three-dimensional position coordinates of the part into Equation (1).

以降は、情報処理装置１００は、第１の実施の形態と同様に、姿勢ｐ_ｉを利用して、Ｓ１３からＳ１７の処理を行って、向きベクトルｑ_ｉを算出し、一連の処理を終了する（Ｓ２８）。Ｓ１３からＳ１７の処理においては、情報処理装置１００は、例えば、３次元位置座標を用いて処理を行い、３次元位置ベクトルとして表現された向きベクトルｑ_ｉを得る。 After that, the information processing apparatus 100 uses the orientation p _i to perform the processes of S13 to S17 to calculate the orientation vector q _i , as in the first embodiment, and ends the series of processes. (S28). In the processing from S13 to S17, the information processing apparatus 100 performs processing using, for example, the three-dimensional position coordinates, and obtains the orientation vector q _i expressed as a three-dimensional position vector.

このように本第２の実施の形態では、姿勢ｐ_ｉや向きベクトルｑ_ｉを３次元位置ベクトルとして表現できるため、２次元位置ベクトルを利用した第１の実施の形態と比較して、正確な向きベクトルｑ_ｉを得ることが可能となる。 As described above, in the second embodiment, the posture p _i and the orientation vector q _i can be expressed as three-dimensional position vectors. It becomes possible to obtain the orientation vector _qi .

[第３の実施の形態]
第１の実施の形態では、注目度算出処理（例えば図８のＳ１３０）において、一定の部位に着目して、視えていない部位については補間する処理について説明した。例えば、ある人物ｉの視えていない部位ｋの数が閾値以下でないとき、補間によりそのような部位ｋを算出しても、その部位ｋの位置ベクトルａ _ｋ ^ｉ＝（ｘ _ｋ ^ｉｙ _ｋ ^ｉ） ^Ｔを精度良く算出することができない場合がある。 [Third Embodiment]
In the first embodiment, in the attention degree calculation process (for example, S130 in FIG. 8), attention is paid to a certain part, and the process of interpolating a part that is not visible has been described. For example, when the number of unseen parts k of a certain person i is not equal to or less than a threshold, even if such parts k are calculated by interpolation, the position vector a _k ⁱ =(x _k ⁱ y _k ⁱ ) of the part k is ^T. may not be calculated with high accuracy.

本第３の実施の形態の情報処理装置１００は、人物ｉの姿勢は、隣接する人物ｔの姿勢に似る傾向を持つという、経験測に基づいて、人物ｉの姿勢ｐ_ｉだけではなく、人物ｔの姿勢ｐ_ｔも利用して、人物ｉの位置ベクトルａ _ｋ ^ｉ＝（ｘ _ｋ ^ｉｙ _ｋ ^ｉ） ^Ｔを算出する。 The information processing apparatus 100 according to the third embodiment is based on empirical measurement that the posture of a person _i tends to resemble the posture of an adjacent person t. The position vector a _k ⁱ =(x _k ⁱ y _k ⁱ ) ^T of the person i is calculated using the pose p _t of t.

図１６は、第３の実施の形態における注目度算出処理の例を表すフローチャートである。ただし、撮像装置２００により群衆が撮影され（図２のＳ１１）、姿勢推定部１１０において、姿勢推定処理（図２のＳ１２）により、人物ｉの姿勢ｐ_ｉと人物ｔの姿勢ｐ_ｔとが得られているものとする。 FIG. 16 is a flowchart showing an example of attention level calculation processing according to the third embodiment. However, the crowd is photographed by the imaging device 200 (S11 in FIG. 2), and the posture estimating unit 110 obtains the posture p _{i of the person i} and the posture p _t of the person t through the posture estimation process (S12 in FIG. 2). shall be provided.

注目度算出部１２０は、注目度算出処理を開始すると（Ｓ１３０）、ある人物ｉの視えていない部位（とくに顔の部位）の数が閾値以下か否かを判定する（Ｓ１３５）。例えば、注目度算出部１２０は、姿勢推定部１１０から出力された姿勢ｐ_ｉにおいて、ｖ_ｊ ^ｉ＝０となっている数が閾値以上となっているか否かを判定する。この場合、例えば、注目度算出部１２０は、顔の部位（ｊ＝１～５）に着目し、その部位のｖ_ｊ ^ｉが「０」となっている数が閾値以上か否かを判定してもよい。 Upon starting the attention level calculation process (S130), the attention level calculation unit 120 determines whether or not the number of unseen parts (especially facial parts) of a certain person i is equal to or less than a threshold (S135). For example, attention level calculation section 120 determines whether or not the number of v _j ⁱ =0 in posture p _i output from posture estimation section 110 is greater than or equal to a threshold. In this case, for example, the attention level calculation unit 120 focuses on a part of the face (j=1 to 5) and determines whether or not the number of v _j ⁱ of that part being “0” is equal to or greater than a threshold. may

注目度算出部１２０は、視えていない部位の数が閾値よりも多いとき（Ｓ１３５でＮＯ）、視えていない部位ｋの位置ベクトルａ _ｋ ^ｉ＝（ｘ _ｋ ^ｉｙ _ｋ ^ｉ） ^Ｔを補間により算出する（Ｓ１３６）。例えば、注目度算出部１２０は、以下の式を用いて、位置ベクトルａ_ｋ ^ｉを算出する。 When the number of unseen parts is greater than the threshold (NO in S135), the attention level calculation unit 120 calculates the position vector ak ⁱ =(x _k ⁱ y _k ⁱ ) ^T of the unseen part _k by interpolation. (S136). For example, the attention degree calculation unit 120 calculates the position vector a _k ⁱ using the following formula.

式（１６）において、ａ_ｋ１ ^ｉ，ａ_ｋ２ ^ｉ，…，ａ_ｋＭ１ ^ｉは、人物ｉにおいて視えている部位の位置ベクトル、ａ_ｋ１ ^ｔ，ａ_ｋ２ ^ｔ，…，ａ_ｋＭ２ ^ｔは、人物ｉに隣接する人物ｔにおいて視えている部位の位置ベクトルをそれぞれ表す。また、Ａ_ｋ ^{Ｍ１，Ｍ２}は、２Ｍ行２Ｍ _１＋２Ｍ _２列の行列である。Ａ_ｋ ^{Ｍ１，Ｍ２}とｂ_ｋ ^{Ｍ１，Ｍ２}は、例えば、姿勢ｓの集合Ｐ（教師データ）を用いて、式（６）と同様に、以下の式を用いて算出する。 In ^Equation ( ¹⁶ ⁾ , _a _k1 ⁱ ^, _a _k2 ⁱ _, _. Position vectors of parts visible to an adjacent person t are respectively represented. Also, A _k ^{M1 and M2} are matrices of 2M rows and 2M ₁ +2M ₂ columns . A _k ^{M1, M2} and b _k ^{M1, M2} are calculated using, for example, the set P (teaching data) of postures s using the following equations, similar to equation (6).

例えば、注目度算出部１２０は、以下の処理を行う。すなわち、注目度算出部１２０は、姿勢推定部１１０から出力された姿勢ｐ_ｉと姿勢ｐ_ｔに基づいて、人物ｉの中心座標と、人物ｔの中心座標を求める。注目度算出部１２０は、２つの中心座標が閾値以内であれば、人物ｔは人物ｉに隣接すると判定する。隣接すると判定すると、注目度算出部１２０は、内部メモリから式（１７）を読み出して、Ａ_ｋ ^{Ｍ１，Ｍ２}とｂ_ｋ ^{Ｍ１，Ｍ２}を計算する。注目度算出部１２０は、姿勢ｐ_ｉからｖ_ｊ ^ｉ＝１となっている位置ベクトルと、姿勢ｐ_ｔからｖ_ｊ ^ｔ＝１となっている位置ベクトル、及び計算したＡ_ｋ ^{Ｍ１，Ｍ２}とｂ_ｋ ^{Ｍ１，Ｍ２}を、式（１６）の右辺に代入することで、人物ｉの視えていない部位ｋの位置ベクトルａ_ｋ ^ｉを算出する。 For example, the attention degree calculation unit 120 performs the following processing. That is, attention level calculation section 120 obtains the center coordinates of person i and the center coordinates of person t based on posture p _i and posture p _t output from posture estimation section 110 . The attention level calculation unit 120 determines that the person t is adjacent to the person i if the two center coordinates are within the threshold. If it is determined that they are adjacent, the attention level calculation unit 120 reads Equation (17) from the internal memory and calculates A _k ^{M1, M2} and b _k ^{M1, M2} . The attention level calculation unit 120 calculates the position vector v _j ⁱ =1 from the posture p _i , the position vector v _j ^t =1 from the posture p _t , and the calculated A _k ^{M1, M2} and b By substituting _k ^{M1 and M2} into the right side of the equation (16), the position vector a _k ⁱ of the invisible part k of the person i is calculated.

次に、注目度算出部１２０は、視えている部位の位置ベクトルａ_ｋ１ ^ｉ，ａ_ｋ２ ^ｉ，…，ａ_ｋＭ１ ^ｉと、補間により算出した、視えていない部位ｋの位置ベクトルａ_ｋ ^ｉとを利用して、人物ｉの向きベクトルｑ_ｉを算出する（Ｓ１３７）。例えば、注目度算出部１２０は、第１の実施の形態と同様に式（７）を用いて向きベクトルｑ_ｉを算出する。 Next, attention level calculation section ¹²⁰ calculates position ^vectors a _k1 ⁱ , a _k2 ⁱ _, _. Using this, the direction vector qi of the person _i is calculated (S137). For example, the attention level calculation unit 120 calculates the direction vector _qi using the equation (7) as in the first embodiment.

そして、注目度算出部１２０は、注目度算出処理を終了する（Ｓ１３８）。 Then, the attention degree calculation unit 120 ends the attention degree calculation processing (S138).

一方、注目度算出部１２０は、視えていない部位の数が閾値以下のとき（Ｓ１３５でＹＥＳ）、姿勢ｐ_ｉに基づいて、人物ｉの向きベクトルｑ_ｉを算出する（Ｓ１３７）。この場合、姿勢ｐ_ｉには、部位の位置座標が一部含まれないこともあるが、そのような場合は、注目度算出部１２０は、行列Ｗの成分を調整して、そのような行列Ｗを含む式（７）を利用して、向きベクトルｑ_ｉを算出する。 On the other hand, when the number of unseen parts is equal to or less than the threshold (YES in S135), attention degree calculation section 120 calculates orientation vector _qi of person _i based on posture pi (S137). In this case, the posture p _i may not include part of the position coordinates of the body parts. Using equation (7) with W, the orientation vector q _i is calculated.

以降、情報処理装置１００は、第１の実施の形態と同様の処理（Ｓ１５からＳ１７）を行う。 Thereafter, the information processing apparatus 100 performs the same processing (S15 to S17) as in the first embodiment.

このように、本第３の実施の形態では、情報処理装置１００は、視えていない部位の数が閾値よりも多いときは、処理対象の人物ｉに隣接する人物ｔの位置ベクトルを利用して、人物ｉの部位の位置ベクトルを補間により算出している。従って、補間処理を行わない場合と比較して、本第３の実施の形態の情報処理装置１００は、視えていない部位の位置を精度よく算出することができ、さらに、向きベクトルｑ_ｉを精度よく算出することも可能となる。 As described above, in the third embodiment, when the number of invisible parts is larger than the threshold, the information processing apparatus 100 uses the position vector of the person t adjacent to the person i to be processed. , the position vector of the part of the person i is calculated by interpolation. Therefore, the information processing apparatus 100 according to the third embodiment can accurately calculate the position of the unseen part, and can accurately calculate the direction vector q _i as compared to the case where the interpolation process is not performed. It is also possible to calculate well.

[第４の実施の形態]
第１の実施の形態で利用した式（５）と式（７）は、線形な関数として表現されている。そのため、対象に対する関数の近似能力に限界がある場合がある。そこで、本第４の実施の形態では、式（５）と式（７）に非線形な関数を用いる。これにより、例えば、線形な関数を用いた場合と比較して、対象に対する近似能力を高めるようにする。 [Fourth Embodiment]
Equations (5) and (7) used in the first embodiment are expressed as linear functions. Therefore, the approximation ability of the function to the object may be limited. Therefore, in the fourth embodiment, non-linear functions are used for equations (5) and (7). As a result, for example, the approximation ability for the object is enhanced compared to the case of using a linear function.

本第４の実施の形態の情報処理装置１００は、式（５）に代えて、以下の式を用いて、視えていない部位ｋの位置ベクトルａ_ｋ ^ｉを補間により算出する。 The information processing apparatus 100 according to the fourth embodiment uses the following equation instead of equation (5) to calculate the position vector a _k ⁱ of the invisible part k by interpolation.

式（１８）において、Ｄ_ｌ１ ^Ｍは、ｌ１行２Ｍ列の行列、Ｄ_ｌ２ ^ｌ１は、ｌ２行１１列の行列、Ｄ_ｋ ^ｌ２は、ｋ行ｌ２列の行列をそれぞれ表す。また、ａ_ｋ１ ^ｉ，ａ_ｋ２ ^ｉ，…，ａ_ｋＭ ^ｉは、人物ｉにおいて視えている部位の位置ベクトルを表す。また、式（１９）において、δ（ｘ）は活性化関数であり、αとβは、α≠βとなる予め決められた実数をそれぞれ表す。 In Equation (18), D _l1 ^M represents a matrix of l1 rows and 2M columns, D _l2 ^l1 represents a matrix of l2 rows and 11 columns, and D _k ^l2 represents a matrix of k rows and l2 columns. Also, a _k1 ⁱ , a _k2 ⁱ _, ^. Also, in equation (19), δ(x) is an activation function, and α and β respectively represent predetermined real numbers that satisfy α≠β.

行列Ｄ_ｌ１ ^Ｍ，Ｄ_ｌ２ ^ｌ１，及びＤ_ｋ ^ｌ２は、例えば、式（６）と同様に、姿勢ｓの集合Ｐを用いて、以下の式を解くことで得られる行列である。 The matrices D _l1 ^M , D _l2 ^l1 , and D _k ^l2 are, for example, matrices obtained by solving the following equations using the set P of postures s, like Equation (6).

例えば、注目度算出部１２０は、補間処理（図８のＳ１３２）として、以下の処理を行う。すなわち、注目度算出部１２０は、内部メモリに記憶されたｓ_ｋ ^ｉと、ｓ_ｋ１ ^ｉ，ｓ_ｋ２ ^ｉ，…，ｓ_ｋＭ ^ｉとを内部メモリから読み出し、内部メモリから読み出した式（２０）に代入し、行列Ｄ_ｌ１ ^Ｍ，Ｄ_ｌ２ ^ｌ１，及びＤ_ｋ ^ｌ２を得る。そして、注目度算出部１２０は、内部メモリに記憶された式（１８）を内部メモリから読み出して、式（２０）で得た行列Ｄ_ｌ１ ^Ｍ，Ｄ_ｌ２ ^ｌ１，及びＤ_ｋ ^ｌ２と、姿勢ｓから抽出したａ_ｋ１ ^ｉ，ａ_ｋ２ ^ｉ，…，ａ_ｋＭ ^ｉとを、式（１８）に代入する。これにより、注目度算出部１２０は、視えていない部位ｋの位置ベクトルａ_ｋ ^ｉを算出する。 For example, the attention level calculation unit 120 performs the following process as the interpolation process (S132 in FIG. 8). That is, the attention level calculation unit 120 _reads s _k ⁱ and s _k1 ⁱ , s _k2 ⁱ , ^. Substitute to obtain matrices D _l1 ^M , D _l2 ^l1 , and D _k ^l2 . Then, attention level calculation section 120 reads equation (18) stored in the internal memory from the internal memory, and obtains matrices D _l1 ^M , D _l2 ^l1 , and D _k ^l2 obtained by equation (20), and attitude s a _k1 ⁱ , a _k2 ⁱ _, ^. Thereby, the attention level calculation unit 120 calculates the position vector a _k ⁱ of the part k that is not visible.

また、本第４の実施の形態の情報処理装置１００は、式（７）に代えて、以下の式を用いて、人物ｉの向きベクトルｑ_ｉを算出する。 Further, the information processing apparatus 100 according to the fourth embodiment calculates the orientation vector qi of the person _i using the following equation instead of the equation (7).

式（２１）において、Ｗ_ｌ１ ^Ｊは、ｌ１行２Ｊ列の行列、Ｗ_ｌ２ ^ｌ１は、ｌ２行１１列の行列、Ｗ_ｋ ^ｌ２は、ｋ行ｌ２列の行列をそれぞれ表す。 In equation (21), W _l1 ^J represents a matrix of l1 rows and 2J columns, W _l2 ^l1 represents a matrix of l2 rows and 11 columns, and W _k ^l2 represents a matrix of k rows and l2 columns.

行列Ｗ_ｌ１ ^Ｊ，Ｗ_ｌ２ ^ｌ１，Ｗ_ｋ ^ｌ２は、例えば、式（８）と同様に、姿勢ｓの集合Ｐを用いて、以下の式を解くことで得られる行列である。 The matrices W _l1 ^J , W _l2 ^l1 , and W _k ^l2 are, for example, matrices obtained by solving the following equations using the set P of postures s, like equation (8).

例えば、注目度算出部１２０は向きベクトルｑ_ｉの算出処理（図８のＳ１３３）として、以下の処理を行う。すなわち、注目度算出部１２０は、内部メモリに記憶されたｑ_ｓ ^ｉと、ｓ_１ ^ｉ，ｓ_２ ^ｉ，…，ｓ_Ｊ ^ｉとを内部メモリから読み出し、内部メモリに記憶された式（２２）を内部メモリから読み出して、式（２２）に代入し、行列Ｗ_ｌ１ ^Ｊ，Ｗ_ｌ２ ^ｌ１，Ｗ_ｋ ^ｌ２を得る。注目度算出部１２０は、内部メモリに記憶された式（２１）を内部メモリから読み出して、式（２２）で得た行列Ｗ_ｌ１ ^Ｊ，Ｗ_ｌ２ ^ｌ１，Ｗ_ｋ ^ｌ２と、姿勢ｓから抽出したａ_１ ^ｉ，ａ_２ ^ｉ，…，ａ_Ｊ ^ｉとを、式（２１）に代入し、人物ｉの向きベクトルｑ_ｉを算出する。 For example, the attention degree calculation unit 120 performs the following processing as the direction vector _qi calculation processing (S133 in FIG. 8). That is, the attention level calculation unit 120 ^reads _q _s ⁱ and s ₁ ⁱ , s ₂ ⁱ , . is read from the internal memory and substituted into equation (22) to obtain the matrices W _l1 ^J , W _l2 ^l1 and W _k ^l2 . The attention level calculation unit 120 reads the expression (21) stored in the internal memory from the internal memory, and extracts from the matrix W _l1 ^J , W _l2 ^l1 , W _k ^l2 obtained by the expression (22) and the orientation s _A ₁ ⁱ , a ₂ ⁱ ^, _.

なお、式（１８）において、３つの行列Ｄ_ｌ１ ^Ｍ，Ｄ_ｌ２ ^ｌ１，Ｄ_ｋ ^ｌ２を用いる例について説明した。例えば、注目度算出部１２０は、このうち２つの行列を用いて式（１８）を計算してもよい。また、例えば、注目度算出部１２０は、３つの行列Ｗ_ｌ１ ^Ｊ，Ｗ_ｌ２ ^ｌ１，Ｗ_ｋ ^ｌ２ではなく、このうち、２つの行列を用いて式（２１）を計算してもよい。 Note that the example using the three matrices D _l1 ^M , D _l2 ^l1 , and D _k ^l2 in Equation (18) has been described. For example, attention level calculation section 120 may calculate equation (18) using two of these matrices. Also, for example, the attention level calculation unit 120 may calculate Equation (21) using two of these matrices instead of the three matrices W _l1 ^J , W _l2 ^l1 and W _k ^l2 .

[第５の実施の形態]
第１の実施の形態では、注目度を算出する例について説明した。本第５の実施の形態では、算出した注目度の変化を検出する例について説明する。情報処理装置１００において、このような注目度の変化を検出することで、例えば、群衆が視線を向けている方向が突然変化したような状況が発生したことを検知でき、そのような状況が発生した時間を検出することも可能となる。 [Fifth embodiment]
1st Embodiment demonstrated the example which calculates attention degree. In the fifth embodiment, an example of detecting a change in the calculated attention level will be described. By detecting such a change in the degree of attention in the information processing apparatus 100, it is possible to detect, for example, the occurrence of a sudden change in the direction in which the line of sight of the crowd is directed, and the occurrence of such a situation can be detected. It is also possible to detect the time when the

図１７は、第５の実施の形態における情報処理システム１０の構成例を表す図である。 FIG. 17 is a diagram showing a configuration example of the information processing system 10 according to the fifth embodiment.

図１７に示すように、情報処理装置１００は、さらに、変化検知部１５０を備える。変化検知部１５０は、注目度記憶部１４０から注目度を読み出し、例えば、その時間的な変化を検知する。変化検知部１５０は、検知した結果を、例えば、外部の表示装置へ出力し、ユーザへ知らせることが可能である。 As shown in FIG. 17 , the information processing device 100 further includes a change detection section 150 . The change detection unit 150 reads attention levels from the attention level storage unit 140, and detects, for example, changes over time. The change detection unit 150 can output the detection result to, for example, an external display device to notify the user.

図１８は、情報処理装置１００の動作例を表すフローチャートである。図１８において、Ｓ１１からＳ１５までの処理は、第１の実施の形態と同様である。 FIG. 18 is a flow chart showing an operation example of the information processing apparatus 100 . In FIG. 18, the processing from S11 to S15 is the same as in the first embodiment.

情報処理装置１００は、各人物ｉの注目度（又は向きベクトルｑ_ｉ）を注目度記憶部１４０に記録すると（Ｓ１５）、注目度変化検出処理（Ｓ１８）を行う。 When the information processing apparatus 100 records the attention level (or orientation vector q _i ) of each person i in the attention level storage unit 140 (S15), the information processing apparatus 100 performs attention level change detection processing (S18).

図１９は、注目度変化検出処理の例を表すフローチャートである。図１９の各処理は、例えば、変化検知部１５０で行われる。 FIG. 19 is a flowchart showing an example of attention degree change detection processing. Each process in FIG. 19 is performed by the change detection unit 150, for example.

変化検知部１５０は、注目度検出処理を開始すると（Ｓ１８０）、注目度ベクトルｕ_ｉ ^ｔを、時刻（Ｔ－ｎ）＜ｔ≦（Ｔ－ｍ）と、時刻（Ｔ－ｍ）＜ｔとの２つの集合に分ける（Ｓ１８１）。 When the attention level detection process is started (S180), the change detection unit 150 changes the attention level vector u _i ^t to the time (Tn)<t≦(Tm) and the time (Tm)<t. (S181).

ここで、注目度ベクトルｕ_ｉ ^ｔは、例えば、時刻ｔにおける人物ｉの向きベクトルｑ_ｉ ^ｔを正規化したものであり、以下の式で定義される。 Here, the interest vector u _i ^t is obtained by normalizing the direction vector q _i ^t of the person i at time t, for example, and is defined by the following equation.

また、時刻（Ｔ－ｎ）＜ｔ≦（Ｔ－ｍ）（ただし、ｎ＞ｍ）での注目度ベクトルの集合を、Ｕ_{Ｔ－ｎ＜ｔ≦Ｔ－ｍ}とすると、注目度ベクトルの集合Ｕ_{Ｔ－ｎ＜ｔ≦Ｔ－ｍ}は、例えば、以下の式で定義される。 Also, if the set of interest vectors at time (Tn)<t≦(Tm) (where n>m) is U _Tn<t≦Tm , then the set of interest vectors _UT-n<t≦Tm is defined, for example, by the following equation.

図２０は、時刻ｔ、時刻（Ｔ－ｎ）、時刻（Ｔ－ｍ）の関係例を表す図である。各時刻ｔにおいて、１つの注目度ベクトルｕが算出されるものとすると、時刻ｔが現在時刻Ｔのとき、時刻ｔ＝Ｔ－ｎから時刻ｔ＝Ｔまでは、ｎ個の注目度ベクトルｕが算出される。また、時刻ｔが時刻（Ｔ－ｎ）から時刻ｔ＝（Ｔ－ｍ）までの間で算出された注目度ベクトルの個数は（ｎ－ｍ）個であり、時刻（Ｔ－ｍ）から現在時刻Ｔまでの間で算出された注目度ベクトルの個数はｍ個となる。図２０に示すように、時刻（Ｔ－ｍ）を境に、前半と後半に別れ、注目度ベクトルの集合Ｕ_{Ｔ－ｎ＜ｔ≦Ｔ－ｍ}は、前半である、時刻（Ｔ－ｎ）から時刻（Ｔ－ｍ）までの注目度ベクトルｕ_ｉ ^ｔの集合を表している。 FIG. 20 is a diagram showing an example of the relationship between time t, time (Tn), and time (Tm). Assuming that one attention vector u is calculated at each time t, when time t is the current time T, n attention vector u are calculated from time t=Tn to time t=T. Calculated. Further, the number of interest vectors calculated from time (Tn) to time t=(Tm) is (nm). The number of interest vectors calculated up to time T is m. As shown in FIG. 20, the first half and the second half are separated at the time (Tm), and the set of attention level vectors U _T-n<t≦Tm is the first half, the time (Tm). to time ( _Tm ⁾ .

変化検知部１５０は、Ｓ１８１の処理として、例えば、以下の処理を行う。すなわち、変化検知部１５０は、注目度記憶部１４０から、時刻（Ｔ－ｎ）から現在時刻Ｔまでのｎ個の向きベクトルｑ_ｉを、注目度記憶部１４０から読み出す。そして、変化検知部１５０は、内部メモリに記憶された式（２２）を読み出して、式（２２）に向きベクトルｑ_ｉを代入して、ｎ個の注目度ベクトルｕ_ｉ ^ｔを算出する。変化検知部１５０は、ｎ個の注目度ベクトルｕ_ｉ ^ｔを、時刻（Ｔ－ｎ）＜ｔ≦（Ｔ－ｍ）までの（ｎ－ｍ）個の注目度ベクトルの集合と、時刻（Ｔ－ｍ）＜ｔまでのｍ個の注目度ベクトルの集合に分ける。前者の注目度ベクトルの集合は、例えば、式（２４）として表される。 The change detection unit 150 performs, for example, the following process as the process of S181. That is, the change detection unit 150 reads out the n direction vectors q _i from the time (Tn) to the current time T from the attention level storage unit 140 . Then, the change detection unit 150 reads out the equation (22) stored in the internal memory, substitutes the orientation vector qi into the equation (22), and calculates n attention level vectors _{u i} _t ^. The change detection unit 150 combines the n attention level vectors u _i ^t with a set of (nm) attention level vectors until the time (Tn)<t≦(Tm) and the time (T −m) < t into a set of m attention vectors. The former set of attention level vectors is represented, for example, by Equation (24).

図１９に戻り、変化検知部１５０は、時刻（Ｔ－ｎ）＜ｔ≦（Ｔ－ｍ）までの（ｎ－ｍ）個の注目度ベクトルの集合Ｕ_{Ｔ－ｎ＜ｔ≦Ｔ－ｍ}に対して、注目度ベクトルｕ_ｉ ^ｔの確率分布ｐ（ｕ_ｉ ^ｔ）を推定する（Ｓ１８２）。本処理においては、確率分布ｐ（ｕ_ｉ ^ｔ）は、例えば、混合フォン・ミーゼス分布（又はフォンミーゼスフィッシャー分布）に沿って分布すると仮定する。混合フォン・ミーゼス分布は、例えば、注目度ベクトルｕ_ｉ ^ｔの始点をｄ次元（ｄは例えば２又は３）空間上の原点にとった場合、注目度ベクトルｕ_ｉ ^ｔの向きがどのような方向へ確率的に分布しているかを表している。 Returning to FIG. 19, the change detection unit 150 selects a set of (nm) interest level vectors _UT-n<t≦Tm until time (Tn)<t≦(Tm). In contrast, the probability distribution p(u _i ^t ) of the attention vector u _i ^t is estimated (S182). In this process, it is assumed that the probability distribution p(u _i ^t ) is distributed along, for example, a mixed von Mises distribution (or a von Mises Fisher distribution). The mixed von Mises distribution, for example, assumes that the starting point of the attention vector u _i ^t is the origin in a d-dimensional (d is, for example, 2 or 3) space, the direction of the attention vector u _i ^t is It represents whether it is distributed stochastically to

図２１（Ａ）は入力画像、図２１（Ｂ）は、入力画像に対して、注目度ベクトルｕ_ｉ ^ｔの確率分布の例をそれぞれ表す図である。図２１（Ａ）と図２１（Ｂ）に示すように、画像において群衆は主に２つの方向へ視線を向けているため、注目度ベクトルｕ_ｉ ^ｔの向きも主に２つの方向へ分布している。図２１（Ｂ）は、混合フォン・ミーゼス分布の例を表している。 FIG. 21A is a diagram showing an example of the probability distribution of the attention level vector u _i ^t for the input image, and FIG. 21B is a diagram showing an example of the probability distribution for the input image. As shown in FIGS. 21(A) and 21(B) , since the crowd mainly directs their line of sight in two directions in the image, the directions of the attention vector u _i ^t are also distributed mainly in two directions. ing. FIG. 21B shows an example of a mixed von Mises distribution.

変化検知部１５０は、例えば、以下の式を用いて、注目度ベクトルｕ_ｉ ^ｔの集合Ｕ_{Ｔ－ｎ＜ｔ≦Ｔ－ｍ}に対して、確率分布ｐ（ｕ_ｉ ^ｔ）を推定する。 The change detection unit 150 estimates a probability distribution p(u _i ^t ) for a set U _T-n<t≦Tm of attention level vectors u _i ^t using, for example, the following equation.

式（２５）において、Ｍ（ｕ_ｉ ^ｔ｜μ_ｊ，σ_ｊ）は、例えば、以下の式を用いて算出される。 In Equation (25), M(u _i ^t |μ _j , σ _j ) is calculated using, for example, the following equation.

式（２６）において、Ｉ_ρ（γ）は、例えば、以下の式を用いて算出される。 In Equation (26), I _ρ (γ) is calculated using, for example, the following equation.

式（２５）から式（２７）において、ｄは注目度ベクトルｕ_ｉ ^ｔの次元数（２又は３）、Ｉ_ρ（γ）はρ階の第１種変形ベッセル関数をそれぞれ表す。また、式（２５）から式（２７）において、α_ｊ，μ_ｊ，σ_ｊは、パラメータである。パラメータα_ｊ，μ_ｊ，σ_ｊの推定は、例えば、注目度ベクトルの集合Ｕ_{Ｔ－ｎ＜ｔ≦Ｔ－ｍ}を用いて、公知の期待値最大化法を用いて推定可能である。 In equations (25) to (27), d represents the number of dimensions (2 or 3) of the interest vector u _i ^t , and I _ρ (γ) represents the ρ-th order modified Bessel function of the first kind. Also, in equations (25) to (27), α _j , μ _j , and σ _j are parameters. The parameters α _j , μ _j , σ _j can be estimated using a known expectation maximization method, for example, using the set of interest vectors U _T-n<t≦Tm .

例えば、変化検知部１５０は、Ｓ１８２において、以下の処理を行う。すなわち、変化検知部１５０は、時刻（Ｔ－ｎ）＜ｔ≦（Ｔ－ｍ）までの注目度ベクトルの集合Ｕ_{Ｔ－ｎ＜ｔ≦Ｔ－ｍ}に対して、期待値最大化法などを用いて、パラメータα_ｊ，μ_ｊ，σ_ｊを推定する。そして、変化検知部１５０は、内部メモリに記憶された式（２５）から式（２７）を読み出して、推定したパラメータα_ｊ，μ_ｊ，σ_ｊや、注目度ベクトルｕ_ｉ ^ｔを、式（２５）から式（２７）に代入することで、確率分布ｐ（ｕ_ｉ ^ｔ）を推定する。 For example, the change detection unit 150 performs the following process in S182. That is, the change detection unit 150 applies the expected value maximization method or the like to the set of attention level vectors _UT-n<t≦Tm until the time (Tn)<t≦(Tm). to estimate the parameters α _j , μ _j , σ _j . Then, the change detection unit 150 reads out the equations (25) to (27) stored in the internal memory, and converts the estimated parameters α _j , μ _j , σ _j and the interest vector u _i ^t into the equation ( 25) into equation (27), the probability distribution p(u _i ^t ) is estimated.

図１９に戻り、次に、変化検知部１５０は、時刻（Ｔ－ｍ）＜ｔでの注目度ベクトルの集合Ｕ_{Ｔ－ｍ＜ｔ}の異常度βを計算する（Ｓ１８３）。異常度βは、例えば、以下の式で計算される。 Returning to FIG. 19, next, the change detection unit 150 calculates the degree of anomaly β of the attention vector set U _T−m<t at the time (T−m)<t (S183). The degree of abnormality β is calculated by, for example, the following formula.

式（２８）に示すように、異常度βは、例えば、時刻（Ｔ－ｎ）＜ｔ≦（Ｔ－ｍ）における注目度ベクトルの集合Ｕ_{Ｔ－ｎ＜ｔ≦Ｔ－ｍ}に対する確率分布ｐ（ｕ_ｉ ^ｔ）を基準にして、時刻（Ｔ－ｍ）＜ｔの注目度ベクトルの集合Ｕ_{Ｔ－ｍ＜ｔ}の分布がどれだけ外れているかを表している。注目度ベクトルの集合Ｕ_{Ｔ－ｍ＜ｔ}の分布が、注目度ベクトルの集合Ｕ_{Ｔ－ｎ＜ｔ≦Ｔ－ｍ}に対する確率分布ｐ（ｕ_ｉ ^ｔ）から外れたときは、異常度βの値は大きな値をとり、そうでないときは小さな値をとり得る。 As shown in equation (28), the degree of anomaly β is, for example, a probability distribution _p Based on (u _i ^t ), it represents how much the distribution of the set U _T-m<t of attention level vectors at time (Tm)<t deviates. When the distribution of the attention vector set U _T-m<t deviates from the probability distribution p(u _i ^t ) for the attention vector set U _T-n<t≦Tm , the value of the degree of abnormality β can take a large value, otherwise it can take a small value.

例えば、変化検知部１５０は、内部メモリに記憶された式（２８）を読み出して、Ｓ１８２で推定した確率分布ｐ（ｕ_ｉ ^ｔ）を、式（２８）に代入することで、異常度βを計算する。 For example, the change detection unit 150 reads the equation (28) stored in the internal memory, and substitutes the probability distribution p(u _i ^t ) estimated in S182 into the equation (28), thereby obtaining the degree of abnormality β as calculate.

次に、変化検知部１５０は、異常度βが閾値以上か否かを判定する（Ｓ１８４）。変化検知部１５０は、異常度βが閾値以上のとき（Ｓ１８４でＹＥＳ）、変化をユーザへ知らせる。そして、変化検知部１５０は、注目度変化検出処理を終了する（Ｓ１８６）。一方、変化検知部１５０は、異常度βが閾値より小さいとき（Ｓ１８４でＮＯ）、Ｓ１８４の処理を行うことなく、注目度変化検出処理を終了する（Ｓ１８６）。 Next, the change detection unit 150 determines whether or not the degree of abnormality β is equal to or greater than a threshold (S184). When the degree of abnormality β is greater than or equal to the threshold (YES in S184), change detection unit 150 notifies the user of the change. Then, the change detection unit 150 ends the attention level change detection process (S186). On the other hand, when the degree of abnormality β is smaller than the threshold (NO in S184), the change detection unit 150 ends the attention level change detection process without performing the process of S184 (S186).

変化検知部１５０は、例えば、以下の処理を行う。すなわち、変化検知部１５０は、Ｓ１８３で計算した異常度βと、内部メモリに記憶された閾値と比較して、異常度βが閾値以上のとき、変化があったこと、変化があった時刻（例えば、時刻ｔ＝（Ｔ－ｍ））を外部の表示装置へ出力する。一方、変化検知部１５０は、異常度βが閾値より小さいときは、変化を通知することなく処理を終了する。 The change detection unit 150 performs, for example, the following processes. That is, the change detection unit 150 compares the degree of abnormality β calculated in S183 with the threshold value stored in the internal memory. For example, time t=(T−m)) is output to an external display device. On the other hand, when the degree of abnormality β is smaller than the threshold, the change detection unit 150 terminates the process without notifying the change.

図２１（Ａ）から図２１（Ｄ）は、例えば、ある時刻（Ｔ－ｍ）を境に群衆の視線方向が変化している様子を表している。図２１（Ａ）と図２１（Ｃ）に示すように、視線方向に変化が生じると、向きベクトルｑ_ｉも変化し、注目度ベクトルｕ_ｉ ^ｔも変化する。そのため、注目度ベクトルの集合Ｕ_{Ｔ－ｎ＜ｔ≦Ｔ－ｍ}に対する確率分布ｐ（ｕ_ｉ ^ｔ）を基準（図２１（Ｂ））にすると、注目度ベクトルの集合Ｕ_{Ｔ－ｍ＜ｔ}の分布が大きくはずれ（図２１（Ｄ））、異常度βも大きくなる。 FIGS. 21A to 21D show, for example, how the line-of-sight direction of the crowd changes at a certain time (Tm). As shown in FIGS. 21(A) and 21(C), when the line-of-sight direction changes, the direction vector q _i changes and the attention vector u _i ^t also changes. Therefore, when the probability distribution p(u _i ^t ) for the attention vector set U _T-n<t≦Tm is used as a reference (FIG. 21B), the attention vector set U _T-m<t The distribution deviates greatly (FIG. 21(D)), and the degree of anomaly β also increases.

情報処理装置１００では、このような変化の検知結果を外部の表示装置へ出力することで、ユーザに対して、変化が発生したことや変化が発生した時刻をユーザに通知することができる。これにより、例えば、セキュリティ用の都市監視において、視線方向においてイベントが発生したことやその発生時刻などを、ユーザに知らせることが可能となる。 The information processing apparatus 100 can notify the user of the occurrence of the change and the time at which the change occurs by outputting the detection result of such a change to an external display device. As a result, for example, in city surveillance for security purposes, it is possible to notify the user of the occurrence of an event in the line-of-sight direction, the time of occurrence, and the like.

［その他の実施の形態］
図２２は、情報処理装置１００のハードウェア構成例を表す図である。 [Other embodiments]
FIG. 22 is a diagram showing a hardware configuration example of the information processing apparatus 100. As shown in FIG.

情報処理装置１００は、インタフェース部１８０、メモリ１８１、ＣＰＵ（Central Processing Unit）１８２、ＲＯＭ（Read Only Memory）１８３、及びＲＡＭ（Random Access Memory）１８４を備える。 The information processing apparatus 100 includes an interface section 180 , a memory 181 , a CPU (Central Processing Unit) 182 , a ROM (Read Only Memory) 183 , and a RAM (Random Access Memory) 184 .

インタフェース部１８０は、例えば、撮像装置２００から出力された画像データをメモリ１８１やＣＰＵ１８２へ出力する。 The interface unit 180 outputs image data output from the imaging device 200 to the memory 181 and the CPU 182, for example.

メモリ１８１は、例えば、第１の実施の形態の空間情報記憶部１３０と注目度記憶部１４０に対応する。また、メモリ１８１は、例えば、姿勢推定部１１０、注目度算出部１２０、及び第５の実施の形態の変化検知部１５０における内部メモリに対応する。 The memory 181 corresponds to, for example, the spatial information storage unit 130 and the attention level storage unit 140 of the first embodiment. Also, the memory 181 corresponds to, for example, the internal memory in the posture estimation unit 110, the attention level calculation unit 120, and the change detection unit 150 of the fifth embodiment.

ＣＰＵ１８２は、例えば、ＲＯＭ１８３に記憶されたプログラムを読み出して、ＲＡＭ１８４にロードし、ロードしたプログラムを実行することで、姿勢推定部１１０、注目度算出部１２０、及び変化検知部１５０の機能を実現する。ＣＰＵ１８２は、例えば、姿勢推定部１１０、注目度算出部１２０、及び変化検知部１５０に対応する。 The CPU 182, for example, reads a program stored in the ROM 183, loads it into the RAM 184, and executes the loaded program, thereby realizing the functions of the posture estimation unit 110, the attention degree calculation unit 120, and the change detection unit 150. . The CPU 182 corresponds to, for example, the posture estimation unit 110, the attention level calculation unit 120, and the change detection unit 150.

なお、ＣＰＵ１８２にえて、ＭＰＵ（Micro Processing Unit）やＤＳＰ（Digital Signal Processor）、ＦＰＧＡ（Field Programmable Gate Array）などのプロセッサやコントローラなどが用いられてもよい。 In place of the CPU 182, a processor such as an MPU (Micro Processing Unit), a DSP (Digital Signal Processor), or an FPGA (Field Programmable Gate Array), a controller, or the like may be used.

以上まとめると、付記のようになる。 The above is summarized as follows.

（付記１）
入力画像データに対して、人物の部位に関する正解データを利用して、画像に含まれる人物の部位の位置情報を推定し、
前記部位のうち顔の部位の位置情報を推定することができなかったとき、推定することができた他の部位の位置情報に基づいて、画像に含まれる人物の視線方向を推定する、
処理をコンピュータに実行させることを特徴とするプログラム。 (Appendix 1)
estimating the positional information of the parts of the person included in the input image data using the correct data on the parts of the person;
estimating the line-of-sight direction of the person included in the image based on the position information of other parts that could be estimated when the position information of the part of the face could not be estimated among the parts;
A program characterized by causing a computer to execute processing.

（付記２）
前記顔の部位の位置情報には、鼻、左目、右目、左耳、右耳、首、左肩、及び右肩の少なくともいずれか１つの位置情報を含むことを特徴とする付記１記載のプログラム。 (Appendix 2)
The program according to Supplementary Note 1, wherein the positional information of the parts of the face includes positional information of at least one of nose, left eye, right eye, left ear, right ear, neck, left shoulder, and right shoulder.

（付記３）
前記入力画像データに対して、前記正解データを利用して、前記人物の各部位の確率分布を計算し、前記各部位の確率分布に基づいて、前記各部位の候補点を探索し、前記各部位の候補点を人物ごとにグルーピングすることにより、前記人物の部位の位置情報を推定することを特徴とする付記１記載のプログラム。 (Appendix 3)
For the input image data, using the correct data, the probability distribution of each part of the person is calculated, based on the probability distribution of each part, candidate points of each part are searched, The program according to Supplementary Note 1, wherein the position information of the body part of the person is estimated by grouping the candidate points of the body part for each person.

（付記４）
前記正解データを利用して、所定画像の画像データに対するフィルタリング処理を行い、フィルタリング処理後の画像データから、複数画素を含むブロック単位に最大値を抽出するプーリング処理を行い、前記フィルタリング処理と前記プーリング処理とを繰り返して、人物の各部位の確率分布の正解データを生成し、前記人物の各部位の確率分布の正解データを利用して、入力画像データに対する前記フィルタリング処理を行い、フィルタリング処理後の画像データから、前記ブロック単位に最大値を抽出する前記プーリング処理を行い、前記フィルタリング処理と前記プーリング処理とを繰り返して、前記各部位の確率分布を生成することを特徴とする付記３記載のプログラム。 (Appendix 4)
Filtering processing is performed on image data of a predetermined image using the correct data, pooling processing is performed for extracting a maximum value in units of blocks including a plurality of pixels from the image data after filtering processing, and the filtering processing and the pooling are performed. are repeated to generate correct data of the probability distribution of each part of the person, perform the filtering process on the input image data using the correct data of the probability distribution of each part of the person, and perform the filtering process after the filtering process. The program according to Supplementary Note 3, wherein the pooling process for extracting the maximum value for each block is performed from the image data, and the filtering process and the pooling process are repeated to generate the probability distribution of each part. .

（付記５）
前記各部位の確率分布に基づいて、極大点となっている位置座標を前記各部位の候補点とすることを特徴とする付記３記載のプログラム。 (Appendix 5)
3. The program according to claim 3, wherein a position coordinate of a local maximum point is set as a candidate point for each part based on the probability distribution of each part.

（付記６）
前記極大点が閾値より小さいとき、前記部位が前記画像に含まれていないと判定し、前記極大点が閾値以上のとき、前記部位が画像に含まれていると判定することを特徴とする付記５記載のプログラム。 (Appendix 6)
When the local maximum point is smaller than a threshold, it is determined that the site is not included in the image, and when the local maximum point is greater than or equal to the threshold, it is determined that the site is included in the image. 5 program.

（付記７）
前記入力画像データに対して、各部位を接続した確率分布の正解データを利用して、フィルタリング処理を行い、フィルタリング処理後の画像データから、複数画素を含むブロック単位に最大値を抽出するプーリング処理を行い、前記フィルタリング処理と前記プーリング処理とを繰り返して、前記各部位のつながり度合いを示す確率分布を生成し、生成した確率分布と前記各部位の候補点とに基づいて、人物ごとに前記候補点をグルーピングすることを特徴とする付記３記載のプログラム。 (Appendix 7)
A pooling process of filtering the input image data using correct data of the probability distribution connecting each part, and extracting the maximum value in units of blocks containing a plurality of pixels from the image data after the filtering process. and repeating the filtering process and the pooling process to generate a probability distribution indicating the degree of connection of each part, and based on the generated probability distribution and the candidate points of each part, the candidates for each person 3. The program according to appendix 3, wherein the points are grouped.

（付記８）
入力画像データに基づいて、人物の部位に関する正解データを利用して、画像に含まれる人物の部位の位置情報と、前記画像に前記部位が含まれるか否かを示すパラメータとを含む姿勢ベクトルを推定することを特徴とする付記１記載のプログラム。 (Appendix 8)
Based on the input image data, using the correct data regarding the body part of the person, a posture vector containing the position information of the body part included in the image and a parameter indicating whether or not the body part is included in the image is generated. The program according to appendix 1, characterized by estimating.

（付記９）
前記顔の部位が前記画像に含まれるか否かを示す前記パラメータに基づいて、前記部位のうち顔の部位の位置情報を推定することができなかったことを判定することを特徴する付記８記載のプログラム。 (Appendix 9)
Supplementary note 8, wherein, based on the parameter indicating whether or not the facial part is included in the image, it is determined that the position information of the facial part could not be estimated among the facial parts. program.

（付記１０）
内部メモリから読み出した以下の式（２９）に、推定することができた他の部位の位置情報を表す位置ベクトルａ_ｋ１ ^ｉ，ａ_ｋ２ ^ｉ，…，ａ_ｋＭ ^ｉを代入することで、前記部位のうち顔の部位の位置情報を表す位置ベクトルａ_ｋ ^ｉを算出することを特徴とする付記１記載のプログラム。

（ただし、Ａ _ｋ ^Ｍは２行２Ｍ列の行列、ｂ_ｋ ^Ｍは２行１列の縦ベクトル、Ｍは０＜Ｍ≦ｊを満たす整数をそれぞれ表し、Ａ _ｋ ^Ｍとｂ_ｋ ^Ｍは、姿勢ｓの集合Ｐを用いて、以下の式（３０）を解くことで算出される。）

(Appendix 10)
^By substituting position _vectors a _k1 ⁱ , a _k2 ⁱ , . The program according to Supplementary Note 1, wherein the position vector a _k ⁱ representing the position information of the part of the face is calculated.

(where A _k ^M is a matrix of 2 rows and 2 M columns, b _k ^M is a column vector of 2 rows and 1 column, M represents an integer that satisfies 0 < M ≤ j, and A _k ^M and b _k ^M are It is calculated by solving the following equation (30) using the set P of postures s.)

（付記１１）
内部メモリから読み出した以下の式（３１）に、前記顔の部位の位置情報と前記他の部位の位置情報とを表す位置ベクトルａ_１ ^ｉ，ａ_２ ^ｉ，…，ａ_ｊ ^ｉを以下の式（３１）に代入することで、画像に含まれる人物ｉの視線方向を表す向きベクトルｑ_ｉを算出することを特徴とする付記１記載のプログラム。

（ただし、Ｗは２行２Ｊ列の行列、ｗ_０は２行１列の縦ベクトル、ｊは前記人物の部位をそれぞれ表し、姿勢ｓの集合Ｐと、姿勢ｓが持つ向きベクトルｑ_ｓ、及び姿勢ｓの部位番号ｋの部位の位置ベクトルｓ_ｋ＝（ｘ_ｋ，ｙ_ｋ）^Ｔを用いて、Ｗとｗ_０は、以下の式（３２）を解くことで算出される。）

(Appendix 11)
Position _vectors a ₁ ⁱ , a ₂ ⁱ , ^. The program according to Supplementary Note 1, wherein the direction vector qi representing the line-of-sight direction of the person _i included in the image is calculated by substituting (31).

(where W is a matrix of 2 rows and 2 J columns, w ₀ is a vertical vector of 2 rows and 1 column, j represents a part of the person, a set P of postures s, a direction vector q _s of postures s, W and w ₀ are calculated by solving the following equation (32) using the position vector _sk = (x _k , y _k ) ^T of the part with part number k in posture s.

（付記１２）
内部メモリから読み出した以下の式（３３）に基づいて、画像に含まれる人物ｉの視線方向を表す向きベクトルｑ_ｉを算出することを特徴とする付記１記載のプログラム。

（ただし、式（３３）において、ｗ_１とｗ_１はパラメータを表す） (Appendix 12)
2. The program according to appendix 1, wherein the direction vector q _i representing the line-of-sight direction of the person i included in the image is calculated based on the following equation (33) read out from the internal memory.

(where w ₁ and w ₁ represent parameters in equation (33))

（付記１３）
内部メモリから読み出した式（３３）に、前記顔の部位の位置情報と前記他の部位の位置情報とを表す、鼻、首、左肩、及び右肩の各部位のｘ軸方向の位置座標ｘ_１ ^ｉ，ｘ_６ ^ｉ，ｘ_７ ^ｉ，ｘ_８ ^ｉを代入することで、画像に含まれる人物ｉの視線方向を表す向きベクトルｑ_ｉを算出することを特徴とする付記１１記載のプログラム。 (Appendix 13)
In the equation (33) read out from the internal memory, positional coordinates x in the x-axis direction of each part of the nose, neck, left shoulder, and right shoulder, which represent the positional information of the facial parts and the positional information of the other parts. 12. The program according to Supplementary note 11, wherein the orientation vector q _i representing the line-of-sight direction of the person i included in the image is calculated by substituting ₁ ⁱ , x ₆ ⁱ , x ₇ ⁱ , and x ₈ ⁱ .

（付記１４）
前記向きベクトルｑ_ｉと、空間情報記憶部から読み出した対象物の位置座標とに基づいて、前記対象物に視線を向けている人物の数をカウントすることを特徴とする付記１１記載のプログラム。 (Appendix 14)
12. The program according to Supplementary Note 11, wherein the number of persons looking at the object is counted based on the orientation vector _qi and the positional coordinates of the object read from the spatial information storage unit.

（付記１５）
第１のカメラで撮影された第１の画像の第１の画像データに対して、前記正解データを利用して、前記第１の画像に含まれる人物の部位の２次元座標として表される第１の位置座標を推定し、第２のカメラで撮影された第２の画像の第２の画像データに基づいて、前記正解データを利用して、前記第２の画像に含まれる人物の部位の２次元座標として表される第２の位置座標を推定し、
前記第１の位置座標と前記第２の位置座標とを、部位の３次元位置座標へ変換し、
前記３次元位置座標を利用して、前記人物の顔の部位の位置情報を算出し、画像に含まれる人物の視線方向を推定することを特徴とする付記１記載のプログラム。 (Appendix 15)
First image data of a first image captured by a first camera is represented as two-dimensional coordinates of a part of a person included in the first image using the correct data. 1 position coordinates are estimated, and based on the second image data of the second image captured by the second camera, the correct data is used to determine the part of the person included in the second image. estimating a second position coordinate represented as a two-dimensional coordinate;
converting the first position coordinates and the second position coordinates into three-dimensional position coordinates of a part;
The program according to Supplementary Note 1, wherein the three-dimensional position coordinates are used to calculate the position information of the part of the person's face, and to estimate the line-of-sight direction of the person included in the image.

（付記１６）
前記第１の画像における前記部位の各画素値の第１の出現回数と、前記第２の画像における前記部位の各画素値の第２の出現回数とに基づいて、前記第１の画像内における人物と前記第２の画像内における人物とが同一人物であるか否かを判定し、同一人物であると判定したとき、同一人物の３次元位置座標（Ｘ，Ｙ，Ｚ）への変換を行うことを特徴とする付記１５記載のプログラム。 (Appendix 16)
Based on the first number of appearances of each pixel value of the part in the first image and the second number of appearances of each pixel value of the part in the second image, in the first image determining whether or not the person and the person in the second image are the same person; 16. The program according to appendix 15, characterized by performing

（付記１７）
内部メモリから読み出した以下の式（３４）と式（３５）に、前記第１のカメラの原点から前記第１の位置座標の原点までの焦点距離を表すｆ、前記第２のカメラの原点から前記第２の位置座標の原点までの焦点距離を表すｆ’、前記第１のカメラから見た前記第２のカメラの回転角を表すＲ、前記第１のカメラから前記第２のカメラへ向かう並進ベクトルを表すｔを代入して、カメラ行列Ｐ，Ｐ’を算出し、
前記内部メモリから読み出した以下の式（３６）から式（３９）に、カメラ行列Ｐ，Ｐ’の各成分と、前記第１の位置座標を表す（ｘ，ｙ）、前記第２の位置座標を表す（ｘ’，ｙ’）を代入して、式（３６）から式（３９）に示す連立方程式を解くことで、前記部位の３次元位置座標への変換を行うことを特徴とする付記１５記載のプログラム。

(Appendix 17)
The following equations (34) and (35) read out from the internal memory are given by f representing the focal length from the origin of the first camera to the origin of the first position coordinates, f from the origin of the second camera f′ representing the focal length to the origin of the second position coordinates, R representing the rotation angle of the second camera viewed from the first camera, and direction from the first camera to the second camera Calculate the camera matrices P and P' by substituting t representing the translation vector,
(36) to (39) read out from the internal memory, each component of the camera matrices P and P′, (x, y) representing the first position coordinates, and the second position coordinates By substituting (x', y') representing and solving the simultaneous equations shown in equations (36) to (39), the transformation to the three-dimensional position coordinates of the part is performed. 15. The program according to 15 above.

（付記１８）
推定することができなかった第１の人物の前記部位のうち顔の部位の位置情報の数が閾値よりも多いとき、第１の人物において推定することができた他の部位の位置情報と、第１の人物に隣接する第２の人物において推定することができた部位の位置情報とに基づいて、第１の人物において推定することができなかった前記顔の部位の位置情報を算出することを特徴とする付記１記載のプログラム。 (Appendix 18)
when the number of pieces of position information of facial parts among the parts of the first person that could not be estimated is greater than a threshold, position information of other parts that could be estimated in the first person; Calculating the position information of the part of the face that could not be estimated in the first person based on the position information of the part that could be estimated in the second person adjacent to the first person. The program according to appendix 1, characterized by:

（付記１９）
内部メモリから読み出した以下の式（４０）に、推定することができた他の部位の位置情報を表す位置ベクトルａ_ｋ１ ^ｉ，ａ_ｋ２ ^ｉ，…，ａ_ｋＭ ^ｉを代入することで、前記部位のうち顔の部位の位置情報を表す位置ベクトルａ_ｋ ^ｉを算出することを特徴とする付記１記載のプログラム。

（ただし、式（４０）において、δ（ｘ）は、以下の式（４１）に示す活性化関数であり、式（４０）において、行列Ｄ_ｌ１ ^Ｍ，Ｄ_ｌ２ ^ｌ１，及びＤ_ｋ ^ｌ２は、姿勢ｓの集合Ｐを用いて、以下の式（４２）を解くことで得られる行列である。）

(Appendix 19)
^By substituting position _vectors a _k1 ⁱ , a _k2 ⁱ , . The program according to Supplementary Note 1, wherein the position vector a _k ⁱ representing the position information of the part of the face is calculated.

(where, in equation (40), δ(x) is the activation function shown in equation (41) below, and in equation (40), matrices D _l1 ^M , D _l2 ^l1 , and D _k ^l2 are It is a matrix obtained by solving the following equation (42) using the set P of postures s.)

（付記２０）
内部メモリから読み出した以下の式（４３）に、前記顔の部位の位置情報と前記他の部位の位置情報を表す位置ベクトルａ_１ ^ｉ，ａ_２ ^ｉ，…，ａ_ｊ ^ｉを代入することで、画像に含まれる人物ｉの視線方向を表す向きベクトルｑ_ｉを算出することを特徴とする付記１記載のプログラム。

（ただし、式（４３）において、δ（ｘ）は、以下の式（４４）に示す活性化関数であり、式（４３）において、行列Ｗ_ｌ１ ^Ｊ，Ｗ_ｌ２ ^ｌ１，Ｗ_ｋ ^ｌ２は、姿勢ｓの集合Ｐを用いて、以下の式（４５）を解くことで得られる行列である。）

(Appendix 20)
_By substituting position ^vectors a ₁ ⁱ , a ₂ ⁱ , . , the program according to appendix 1, which calculates a direction vector q _i representing the line-of-sight direction of the person i included in the image.

(However, in equation (43), δ(x) is the activation function shown in equation (44) below, and in equation (43), matrices W _l1 ^J , W _l2 ^l1 , and W _k ^l2 are the orientation It is a matrix obtained by solving the following equation (45) using the set P of s.)

（付記２１）
さらに、推定した前記人物の視線方向の変化を検知し、検知結果を出力する
ことを特徴とする付記１記載のプログラム。 (Appendix 21)
The program according to Supplementary Note 1, further detecting a change in the estimated line-of-sight direction of the person and outputting a detection result.

（付記２２）
時刻ｔを現在時刻Ｔとしたとき、推定した前記人物の視線方向を表す向きベクトルを正規化した注目度ベクトルを、時刻（Ｔ－ｎ）から時刻（Ｔ－ｍ）までに取得した第１の注目度ベクトルの集合と、時刻（Ｔ－ｍ）から時刻ｔまでに取得した第２の注目度ベクトルの集合に分けて、前記第１の注目度ベクトルの集合に基づいて、前記第２の注目度ベクトルの集合の異常度を計算し、前記異常度が閾値以上のとき、時刻（Ｔ－ｍ）を境にして視線方向に変化があったことを示す検知結果を出力することを特徴とする付記２１記載のプログラム。 (Appendix 22)
When the time t is the current time T, the attention level vector obtained by normalizing the direction vector representing the estimated line-of-sight direction of the person is obtained from the first Divided into a set of interest vectors and a set of second interest vectors obtained from time (T−m) to time t, and based on the first interest vector set, the second attention vector The degree of abnormality of the set of degree vectors is calculated, and when the degree of abnormality is equal to or greater than a threshold, a detection result indicating that the line-of-sight direction has changed at time (Tm) is output. The program according to Appendix 21.

（付記２３）
内部メモリから読み出した以下の式（４６）に、時刻（Ｔ－ｎ）から時刻（Ｔ－ｍ）までに取得した前記人物の視線方向を表す向きベクトルｑ_ｉを代入して、時刻（Ｔ－ｎ）から時刻（Ｔ－ｍ）までに取得した第１の注目度ベクトルの集合に含まれる注目度ベクトルｕ_ｉ ^ｔを求め、

前記内部メモリから読み出した以下の式（４７）に、注目度ベクトルｕ_ｉ ^ｔを代入して、時刻（Ｔ－ｎ）から時刻（Ｔ－ｍ）までに取得した第１の注目度ベクトルの集合における注目度ベクトルｕ_ｉ ^ｔの確率分布ｐ（ｕ_ｉ ^ｔ）を算出し、

前記内部メモリから読み出した以下の式（４８）に、確率分布ｐ（ｕ_ｉ ^ｔ）を代入することで、異常度βを算出することを特徴とする付記２１記載のプログラム。

（ただし、式（４６）は、以下の式（４９）と式（５０）を用いて算出され、α_ｊ，μ_ｊ，σ_ｊはパラメータを表す。）

(Appendix 23)
By substituting the direction vector q _i representing the line-of-sight direction of the person obtained from time (Tn) to time (Tm) into the following equation (46) read from the internal memory, the time (T- n) to the time (Tm) to obtain the interest vector u _i ^t included in the set of the first interest vectors,

A set of first attention vectors obtained from time (Tn) to time (Tm) by substituting attention vector u _i ^t into the following equation (47) read from the internal memory Calculate the probability distribution p(u _i ^t ) of the interest vector u _i ^t in

22. The program according to appendix 21, wherein the degree of abnormality β is calculated by substituting the probability distribution p(u _i ^t ) into the following equation (48) read out from the internal memory.

(However, Equation (46) is calculated using Equations (49) and (50) below, and α _j , μ _j , and σ _j represent parameters.)

（付記２４）
前記部位のうち顔の部位の位置情報を推定することができなかったとき、推定することができた他の部位の位置情報を利用して、前記顔の部位の位置情報を算出し、前記顔の部位の位置情報と前記他の部位の位置情報に基づいて、画像に含まれる人物の視線方向を推定することを特徴とする付記１記載のプログラム。 (Appendix 24)
When the position information of the part of the face cannot be estimated among the parts, the position information of the part of the face is calculated using the position information of other parts that can be estimated, and the position information of the part of the face is calculated. The program according to Supplementary Note 1, wherein the line-of-sight direction of the person included in the image is estimated based on the positional information of the part of (1) and the positional information of the other parts.

（付記２５）
入力画像データに対して、人物の部位に関する正解データを利用して、画像に含まれる人物の部位の位置情報を推定する姿勢推定部と、
前記部位のうち顔の部位の位置情報を推定することができなかったとき、推定することができた他の部位の位置情報に基づいて、画像に含まれる人物の視線方向を推定する注目度算出部と
を備えることを特徴とする情報処理装置。 (Appendix 25)
a posture estimation unit for estimating position information of human body parts included in an image by using correct data on human body parts for input image data;
Attention degree calculation for estimating the line-of-sight direction of a person included in an image based on the position information of other parts that could be estimated when the position information of the part of the face could not be estimated among the parts. An information processing apparatus comprising: a section;

（付記２６）
姿勢推定部と注目度算出部とを有する情報処理装置における情報処理方法であって、
前記姿勢推定部により、入力画像データに対して、人物の部位に関する正解データを利用して、画像に含まれる人物の部位の位置情報を推定し、
前記注目度算出部により、前記部位のうち顔の部位の位置情報を推定することができなかったとき、推定することができた他の部位の位置情報に基づいて、画像に含まれる人物の視線方向を推定する
ことを特徴とする情報処理方法。 (Appendix 26)
An information processing method in an information processing device having a posture estimation unit and an attention level calculation unit,
estimating the position information of the parts of the person included in the image by using the correct data regarding the parts of the person with respect to the input image data by the posture estimation unit;
When the attention level calculation unit cannot estimate the position information of the face part among the parts, the line of sight of the person included in the image is calculated based on the position information of other parts that could be estimated. An information processing method characterized by estimating a direction.

１０：情報処理システム１００：情報処理装置
１１０：姿勢推定部１１１：ＣＮＮ処理部
１１１－１：第１のＣＮＮ処理部１１１－２：第２のＣＮＮ処理部
１１２：候補点算出部１１２－１：第１の候補点算出部
１１２－２：第２の候補点算出部１１３：グルーピング処理部
１１３－１：第１のグルーピング処理部１１３－２：第２のグルーピング処理部
１１４：同一人物特定処理部１１５：カメラ行列計算部
１１６：３次元位置計算部１２０：注目度算出部
１３０：空間情報記憶部１４０：注目度記憶部
１５０：変化検知部２００：撮像装置
３００－１，３００－２：対象物 10: Information processing system 100: Information processing device 110: Posture estimation unit 111: CNN processing unit 111-1: First CNN processing unit 111-2: Second CNN processing unit 112: Candidate point calculation unit 112-1: First candidate point calculation unit 112-2: Second candidate point calculation unit 113: Grouping processing unit 113-1: First grouping processing unit 113-2: Second grouping processing unit 114: Same person identification processing unit 115: camera matrix calculation unit 116: three-dimensional position calculation unit 120: attention level calculation unit 130: spatial information storage unit 140: attention level storage unit 150: change detection unit 200: imaging devices 300-1, 300-2: target object

Claims

estimating the positional information of the parts of the person included in the input image data using the correct data on the parts of the person;
estimating the line-of-sight direction of the person included in the image based on the estimated position information of the part other than the face when the position information of the part of the face cannot be estimated among the parts;
A program characterized by causing a computer to execute processing.

2. The program according to claim 1, wherein the positional information of said facial parts includes positional information of at least one of nose, left eye, right eye, left ear, right ear, neck, left shoulder, and right shoulder. .

Based on the input image data, using the correct data regarding the body part of the person, a posture vector containing the position information of the body part included in the image and a parameter indicating whether or not the body part is included in the image is generated. 2. The program according to claim 1, wherein the estimation is performed.

2. The program according to claim 1, wherein the orientation vector qi representing the line-of-sight direction of the person _i included in the image is calculated based on the following equation (51) read out from the internal memory.

(In equation (51), w ₁ and w ₂ represent parameters. In equation (51), x _j ⁱ indicates the x-coordinate of the position of part j of person i. j=1 is the nose , j=6 for the neck, j=7 for the left shoulder, and j=8 for the right shoulder. )

First image data of a first image captured by a first camera is represented as two-dimensional coordinates of a part of a person included in the first image using the correct data. 1 position coordinates are estimated, and based on the second image data of the second image captured by the second camera, the correct data is used to determine the part of the person included in the second image. estimating a second position coordinate represented as a two-dimensional coordinate;
converting the first position coordinates and the second position coordinates into three-dimensional position coordinates of a part;
2. The program according to claim 1, wherein said three-dimensional positional coordinates are used to calculate positional information of parts of said person's face, and to estimate a line-of-sight direction of said person included in an image.

when the number of pieces of position information of facial parts among the parts of the first person that could not be estimated is greater than a threshold, position information of other parts that could be estimated in the first person; Calculating the position information of the part of the face that could not be estimated in the first person based on the position information of the part that could be estimated in the second person adjacent to the first person. 2. The program according to claim 1, characterized by:

2. The program according to claim 1, further detecting a change in the estimated line-of-sight direction of said person and outputting a detection result.

When the time t is the current time T, the attention level vector obtained by normalizing the direction vector representing the estimated line-of-sight direction of the person is obtained from the first Divided into a set of interest vectors and a set of second interest vectors acquired from time (T−m) to time t, and based on the first interest vector set, the second interest vector The degree of abnormality of the set of degree vectors is calculated, and when the degree of abnormality is equal to or greater than a threshold, a detection result indicating that the line-of-sight direction has changed at time (Tm) is output. 8. A program according to claim 7.

When the position information of the part of the face cannot be estimated among the parts, the position information of the part of the face is calculated using the position information of other parts that can be estimated, and the position information of the part of the face is calculated. 2. The program according to claim 1, wherein the line-of-sight direction of the person included in the image is estimated based on the positional information of the part of and the positional information of the other part.

a posture estimation unit for estimating position information of human body parts included in an image by using correct data on human body parts for input image data;
Attention level for estimating the line-of-sight direction of a person included in an image based on the position information of parts other than the face that can be estimated when the position information of the parts other than the face cannot be estimated among the parts. An information processing apparatus comprising: a calculation unit;

An information processing method in an information processing device having a posture estimation unit and an attention level calculation unit,
estimating the position information of the parts of the person included in the image by using the correct data regarding the parts of the person with respect to the input image data by the posture estimation unit;
When the attention degree calculation unit cannot estimate the position information of the face part among the parts, the position information of the part other than the face that can be estimated is used to determine the position information of the person included in the image. An information processing method characterized by estimating a line-of-sight direction.