JP2023080835A

JP2023080835A - Setting program, setting method, and information processing apparatus

Info

Publication number: JP2023080835A
Application number: JP2021194362A
Authority: JP
Inventors: 由枝木村; Yoshie Kimura; 源太鈴木; Genta Suzuki
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2023-06-09

Abstract

To provide a setting program, a setting method, and an information processing apparatus capable of suppressing deterioration in the accuracy of behavior analysis.SOLUTION: An information processing apparatus identifies a first area corresponding to a passage from image data obtained by capturing interior, and identifies a body direction and face direction of a person moving in the identified first area. When the identified body direction of the person is different from the face direction of the person, the information processing apparatus identifies a second area adjacent to the first area among a plurality of areas constituting the interior on the basis of the face direction of the person or the body direction of the person, and sets the identified second area as an area where an object related to the person is stored.SELECTED DRAWING: Figure 5

Description

本発明は、設定プログラム、設定方法および情報処理装置に関する。 The present invention relates to a setting program, a setting method, and an information processing apparatus.

カメラで撮像された映像データから人の行動を分析する技術開発が進められている。例えば、映像データに含まれる各画像データから、購買行動を起こしやすい領域である注目領域を抽出し、注目領域において腕を一定の位置まで上げる動作をピッキング動作として検出することで、購買行動を分析する。近年では、注目領域の検出手法として、各画像データに対して、人手による注目領域の設定やセマンティックセグメンテーションを用いた注目領域の設定が利用されている。 Techniques for analyzing human behavior from video data captured by cameras are being developed. For example, from each image data included in the video data, we extract the attention area, which is the area where purchasing behavior is likely to occur, and detect the movement of raising the arm to a certain position in the attention area as a picking motion, thereby analyzing the purchase behavior. do. In recent years, as an attention area detection method, manual attention area setting and attention area setting using semantic segmentation are used for each image data.

特開２０１２－１７３９０３号公報JP 2012-173903 A 特開２０１３－５０９４５号公報JP 2013-50945 A

しかしながら、上記技術では、注目領域を正確に設定することが難しく、結果として、行動分析の精度が劣化する。例えば、人手による手法では、膨大な画像データに対して注目領域を設定することになり、時間がかかるだけでなく、人為的なミスを防ぐことが難しい。また、セマンティックセグメンテーションを用いた手法では、店舗内で消費者が歩く通路全体が注目領域に設定されるので、不要なピッキング動作が検出されてしまい、行動分析の精度が劣化する。 However, with the above technique, it is difficult to accurately set the attention area, and as a result, the accuracy of behavior analysis deteriorates. For example, in a manual method, an attention area is set for a huge amount of image data, which not only takes time but also makes it difficult to prevent human error. In addition, in the method using semantic segmentation, the entire aisle that the consumer walks in the store is set as the region of interest, so unnecessary picking actions are detected, and the accuracy of behavior analysis deteriorates.

一つの側面では、行動分析の精度劣化を抑制することができる設定プログラム、設定方法および情報処理装置を提供することを目的とする。 An object of one aspect of the present invention is to provide a setting program, a setting method, and an information processing apparatus capable of suppressing deterioration in accuracy of behavior analysis.

第１の案では、設定プログラムは、コンピュータに、室内を撮影した画像データから、通路に対応する第一のエリアを特定し、特定された前記第一のエリアを移動する人物の身体の向きと人物の顔の向きを特定し、特定された前記人物の身体の向きが前記人物の顔の向きと異なる場合に、前記人物の顔の向きまたは前記人物の身体の向きに基づいて、前記室内を構成する複数のエリアのうちの第一のエリアと隣接する第二のエリアを特定し、特定された前記第二のエリアを、前記人物に関連する物体が収納されたエリアと設定する、処理を実行させることを特徴とする。 In the first scheme, the setting program causes the computer to specify a first area corresponding to the passage from the image data of the interior, and the direction of the body of the person moving in the specified first area. a face orientation of a person is identified, and when the identified body orientation of the person is different from the person's face orientation, the room is navigated based on the person's face orientation or the person's body orientation; A process of identifying a second area adjacent to a first area among a plurality of areas, and setting the identified second area as an area containing an object related to the person. It is characterized by executing

一実施形態によれば、行動分析の精度劣化を抑制することができる。 According to one embodiment, deterioration in accuracy of behavior analysis can be suppressed.

図１は、実施例１にかかる情報処理装置を含むシステムの全体構成を説明する図である。FIG. 1 is a diagram illustrating the overall configuration of a system including an information processing apparatus according to a first embodiment; 図２は、実施例１にかかる認識対象の行動を説明する図である。FIG. 2 is a diagram for explaining behavior of a recognition target according to the first embodiment; 図３は、セマンティックセグメンテーションによる注目領域の検出を説明する図である。FIG. 3 is a diagram explaining detection of a region of interest by semantic segmentation. 図４は、参考技術における注目領域の検出を説明する図である。FIG. 4 is a diagram illustrating detection of a region of interest in the reference technique. 図５は、実施例１にかかる情報処理装置の機能構成を示す機能ブロック図である。FIG. 5 is a functional block diagram of the functional configuration of the information processing apparatus according to the first embodiment; 図６は、第１機械学習モデルの生成を説明する図である。FIG. 6 is a diagram illustrating generation of the first machine learning model. 図７は、第２機械学習モデルを用いた動作解析を説明する図である。FIG. 7 is a diagram for explaining motion analysis using the second machine learning model. 図８は、ユーザの移動軌跡の生成を説明する図である。FIG. 8 is a diagram illustrating generation of a user's movement trajectory. 図９は、顔の向きと身体の向きのプロットを説明する図である。FIG. 9 is a diagram for explaining plots of face orientation and body orientation. 図１０は、注目領域の抽出を説明する図である。FIG. 10 is a diagram for explaining extraction of a region of interest. 図１１は、商品棚エリアの設定を説明する図である。FIG. 11 is a diagram for explaining the setting of the product shelf area. 図１２は、処理の流れを示すフローチャートである。FIG. 12 is a flowchart showing the flow of processing. 図１３は、ハードウェア構成例を説明する図である。FIG. 13 is a diagram illustrating a hardware configuration example.

以下に、本願の開示する設定プログラム、設定方法および情報処理装置の実施例を図面に基づいて詳細に説明する。なお、この実施例によりこの発明が限定されるものではない。また、各実施例は、矛盾のない範囲内で適宜組み合わせることができる。 Hereinafter, embodiments of a setting program, a setting method, and an information processing apparatus disclosed in the present application will be described in detail based on the drawings. In addition, this invention is not limited by this Example. Moreover, each embodiment can be appropriately combined within a range without contradiction.

［全体構成］
図１は、実施例１にかかる情報処理装置１０を含むシステムの全体構成を説明する図である。図１に示すように、このシステムは、空間の一例である店舗１と、店舗１内の異なる場所に設置された複数のカメラ２と、情報処理装置１０とを有する。 [overall structure]
FIG. 1 is a diagram illustrating the overall configuration of a system including an information processing apparatus 10 according to the first embodiment. As shown in FIG. 1, this system includes a store 1, which is an example of a space, a plurality of cameras 2 installed at different locations within the store 1, and an information processing device 10. FIG.

複数のカメラ２それぞれは、店舗１内の所定領域を撮像する監視カメラの一例であり、撮像した映像のデータを、情報処理装置１００に送信する。以下の説明では、映像のデータを「映像データ」と表記する場合がある。また、映像データには、時系列の複数の画像フレームが含まれる。各画像フレームには、時系列の昇順に、フレーム番号が付与される。１つの画像フレームは、カメラ２があるタイミングで撮影した静止画像の画像データである。 Each of the plurality of cameras 2 is an example of a monitoring camera that captures an image of a predetermined area within the store 1 and transmits captured image data to the information processing apparatus 100 . In the following description, video data may be referred to as "video data". Video data also includes a plurality of time-series image frames. A frame number is assigned to each image frame in ascending chronological order. One image frame is image data of a still image captured by the camera 2 at a certain timing.

情報処理装置１０は、複数のカメラ２それぞれにより撮像された各画像データを解析するコンピュータの一例である。なお、複数のカメラ２それぞれと情報処理装置１０とは、有線や無線を問わず、インターネットや専用線などの各種ネットワークを用いて接続される。 The information processing device 10 is an example of a computer that analyzes each image data captured by each of the plurality of cameras 2 . Note that each of the plurality of cameras 2 and the information processing apparatus 10 are connected using various networks such as the Internet and dedicated lines, regardless of whether they are wired or wireless.

近年、カメラ２で撮像された映像データから人の行動を分析する技術開発が進められている。例えば、映像データに含まれる各画像データから、購買行動を起こしやすい領域である注目領域を抽出し、注目領域において腕を一定の位置まで上げる動作をピッキング動作として検出することで、購買行動を分析することが行われている。 In recent years, technology development for analyzing human behavior from video data captured by the camera 2 has been advanced. For example, from each image data included in the video data, we extract the attention area, which is the area where purchasing behavior is likely to occur, and detect the movement of raising the arm to a certain position in the attention area as a picking motion, thereby analyzing the purchase behavior. is being done.

図２は、実施例１にかかる認識対象の行動を説明する図である。図２に示す領域Ａが注目領域と仮定する。この場合、図２の（１）に示す、商品棚の前に位置する人物（ユーザ）のピッキング動作が認識対象である。しかし、図２の（２）や（３）に示す、商品棚がない場所で、商品に手を伸ばしていないもののピッキング動作と似た動作を行った人物も認識されてしまうことで、誤検出が発生する。 FIG. 2 is a diagram for explaining behavior of a recognition target according to the first embodiment; Assume that the region A shown in FIG. 2 is the region of interest. In this case, the picking motion of the person (user) positioned in front of the product shelf shown in (1) of FIG. 2 is the recognition target. However, as shown in (2) and (3) in Fig. 2, a person who does not reach out to pick a product in a place where there is no product shelf but performs a similar picking action is also recognized, resulting in erroneous detection. occurs.

誤検出を減らすために、手を伸ばす商品棚の領域を注目領域として設定することが考えられるが、その場合、図２の（２）に示す人物は、画像データ上、手が棚に入っていることから、誤検出されてしまう。別の手法としては、足元の通路を注目領域と設定することが行われる。例えば、図１に示す領域Ａを注目領域に設定した場合、図２の（２）や（３）に示す人物は、検出されなくなり、誤検出が抑制される。 In order to reduce erroneous detection, it is conceivable to set the area of the product shelf where the hand reaches as the attention area. In that case, the person shown in FIG. Therefore, it is erroneously detected. Another method is to set the passage at your feet as the attention area. For example, when the area A shown in FIG. 1 is set as the attention area, the persons shown in (2) and (3) in FIG. 2 are not detected, and erroneous detection is suppressed.

足元の領域を注目領域と設定する手法は、手動設定で行われることが多い。しかし、手動設定では、膨大な画像データに対して注目領域を設定することになり、時間がかかるだけでなく、人為的なミスを防ぐことが難しい。 Manual setting is often used to set the foot area as the attention area. However, with manual setting, the attention area is set for a huge amount of image data, which not only takes time but also makes it difficult to prevent human error.

別の手法として、画像データのピクセル単位で何が写っているのかをカテゴリ分けする技術であるセマンティックセグメンテーションによる自動設定が利用されている。図３は、セマンティックセグメンテーションによる注目領域の検出を説明する図である。図３に示すように、セマンティックセグメンテーションは、画像データを機械学習モデル（convolutional encoder-decoder）に入力し、画像データの各領域にラベルが設定された出力結果を取得する。しかし、注目領域に限らず、注目領域以外を含んだ全通路に、ラベル「通路」が設定されてしまい、図２の（２）や（３）の人物も認識対象となることから、これらの人物のピッキング動作を検出する誤検出が発生する。 Another technique uses automatic configuration through semantic segmentation, a technique that categorizes what is in each pixel of image data. FIG. 3 is a diagram explaining detection of a region of interest by semantic segmentation. As shown in FIG. 3, semantic segmentation inputs image data into a machine learning model (convolutional encoder-decoder) and obtains an output result in which each region of the image data is labeled. However, the label "passage" is set not only for the attention area but also for all passages including areas other than the attention area, and the persons (2) and (3) in FIG. An erroneous detection occurs when a person's picking action is detected.

なお、カメラの映像データから人の作業位置を抽出し、作業位置のクラスタリングによってＲＯＩ（Region Of Interest）を自動的に提供する参考技術も利用も考えられる。図４は、参考技術における注目領域の検出を説明する図である。図４に示すように、参考技術では、静止しての購買行動が生じた領域を抽出するので、図４の（Ｂ）に示した人物が静止した位置のみが抽出されてしまい、注目領域Ａを十分にカバーすることが難しい。つまり、参考技術では、ゆっくり移動して商品を取る動作（ピッキング動作）を検出することが難しい。 In addition, it is possible to use a reference technology that automatically provides ROI (Region Of Interest) by extracting the working positions of people from the image data of the camera and clustering the working positions. FIG. 4 is a diagram illustrating detection of a region of interest in the reference technique. As shown in FIG. 4, in the reference technique, since the area where the purchase behavior occurs while still is extracted, only the position where the person is stationary shown in FIG. difficult to adequately cover. In other words, with the reference technology, it is difficult to detect an action of moving slowly to pick up an item (picking action).

一般的に、画像データ内で、注目領域と設定する領域やピッキング動作が検出された領域を、人物が手に取る商品等が陳列される商品棚エリアの設定した上で、人物の行動分析等が行われる。しかし、上述したように、手動設定、セマンティックセグメンテーション、または、参考技術による注目領域の設定では、注目領域を正確に抽出することが難しい。このため、商品棚エリアの設定ミスが発生し、最終的な行動分析の精度も劣化する。 In general, in image data, areas set as attention areas and areas where picking motions are detected are set as product shelf areas where products picked up by people are displayed, and then human behavior analysis, etc. is performed. is done. However, as described above, it is difficult to accurately extract the attention area by manual setting, semantic segmentation, or setting the attention area by the reference technique. For this reason, a product shelf area setting error occurs, and the accuracy of the final behavior analysis also deteriorates.

そこで、実施例１にかかる情報処理装置１０は、室内を撮影した画像データから、通路に対応する第一のエリアを特定する。情報処理装置１０は、特定された第一のエリアを移動する人物の身体と人物の顔の向きを測定する。情報処理装置１０は、測定された人物の身体の向きが人物の顔の向きと異なる場合に、人物の顔の向きまたは身体の向きに基づいて、店舗１内を構成する複数のエリアのうちの第一のエリアと隣接する第二のエリアを特定する。情報処理装置１０は、特定された第二のエリアを、商品棚のエリアと設定する。 Therefore, the information processing apparatus 10 according to the first embodiment identifies the first area corresponding to the passage from the image data of the interior of the room. The information processing apparatus 10 measures the orientation of the body and face of the person moving in the specified first area. When the measured orientation of the person's body is different from the orientation of the person's face, the information processing device 10 selects one of the plurality of areas within the store 1 based on the orientation of the person's face or body. A second area adjacent to the first area is identified. The information processing device 10 sets the specified second area as the product shelf area.

すなわち、情報処理装置１０は、リテール店舗での購買行動では移動と商品を選び取る行動が主として発生するので、移動時は顏と身体の向きが同じ方向を向き、選び取る行動の時は顏と身体の向きにバラつきが生じることを用いて注目領域を抽出する。この結果、情報処理装置１０は、注目領域を誤差なく正確に抽出することができるので、行動分析の精度劣化を抑制することができる。 That is, in the information processing apparatus 10, since movement and selection of products are mainly generated in the purchase behavior at the retail store, the face and the body face the same direction when moving, and when the selection is performed, the face and the body face the same direction. A region of interest is extracted using variations in the direction of the body. As a result, the information processing apparatus 10 can accurately extract the region of interest without error, thereby suppressing deterioration in accuracy of behavior analysis.

［機能構成］
図５は、実施例１にかかる情報処理装置１０の機能構成を示す機能ブロック図である。図５に示すように、情報処理装置１０は、通信部１１、記憶部１２、制御部２０を有する。 [Function configuration]
FIG. 5 is a functional block diagram of the functional configuration of the information processing apparatus 10 according to the first embodiment. As shown in FIG. 5 , the information processing device 10 has a communication section 11 , a storage section 12 and a control section 20 .

通信部１１は、他の装置との間の通信を制御する処理部であり、例えば通信インタフェースなどにより実現される。例えば、通信部１１は、カメラ２から映像データを受信し、制御部２０による処理結果を管理端末などに送信する。 The communication unit 11 is a processing unit that controls communication with other devices, and is realized by, for example, a communication interface. For example, the communication unit 11 receives video data from the camera 2 and transmits the result of processing by the control unit 20 to a management terminal or the like.

記憶部１２は、各種データや制御部２０が実行するプログラムなどを記憶する処理部であり、メモリやハードディスクなどにより実現される。記憶部１２は、訓練データＤＢ１３、第１機械学習モデル１４、第２機械学習モデル１５、映像データＤＢ１６、セグメント結果ＤＢ１７、ＲＯＩ情報ＤＢ１８、設定結果ＤＢ１９を記憶する。 The storage unit 12 is a processing unit that stores various data, programs executed by the control unit 20, and the like, and is implemented by a memory, a hard disk, or the like. The storage unit 12 stores a training data DB 13, a first machine learning model 14, a second machine learning model 15, a video data DB 16, a segment result DB 17, an ROI information DB 18, and a setting result DB 19.

訓練データＤＢ１３は、第１機械学習モデル１４の訓練に使用される各訓練データを記憶するデータベースである。具体的には、各訓練データは、説明変数であるＲＧＢの画像データと、目的変数（正解情報）である当該画像データに対するセマンティックセグメンテーションの実行結果（以下では、セグメント結果またはセグメンテーション結果と記載することがある）とが対応付けられたデータである。 The training data DB 13 is a database that stores each training data used for training the first machine learning model 14 . Specifically, each training data consists of RGB image data as an explanatory variable and semantic segmentation execution results for the image data as an objective variable (correct information) (hereinafter referred to as a segment result or a segmentation result). ) are associated with each other.

第１機械学習モデル１４は、セマンティックセグメンテーションを実行するモデルである。具体的には、第１機械学習モデル１４は、ＲＧＢの画像データの入力に応じて、セグメンテーション結果を出力する。セグメンテーション結果には、画像データ内の各領域に対して、識別されたラベルが設定される。例えば、第１機械学習モデル１４には、convolutional encoder-decoderなどを採用することができる。 The first machine learning model 14 is a model that performs semantic segmentation. Specifically, the first machine learning model 14 outputs a segmentation result according to the input of RGB image data. The segmentation result sets an identified label for each region in the image data. For example, the first machine learning model 14 can employ a convolutional encoder-decoder or the like.

第２機械学習モデル１５は、動作解析を実行するモデルである。具体的には、第２機械学習モデル１５は、機械学習済みのモデルであり、人物の２次元画像データに対して、頭、手首、腰、足首などの２次元の関節位置（骨格座標）を推定し、基本となる動作の認識やユーザが定義したルールの認識を行う深層学習器の一例である。この第２機械学習モデル１５を用いることで、人物の基本動作を認識することができ、足首の位置、顏の向き、身体の向きを取得することができる。なお、基本となる動作とは、例えば歩く、走る、止まるなどである。ユーザが定義したルールとは、商品を手に取るまでの各行動に該当する骨格情報の遷移などである。 The second machine learning model 15 is a model for executing motion analysis. Specifically, the second machine learning model 15 is a machine-learned model, and two-dimensional joint positions (skeletal coordinates) such as the head, wrists, waist, and ankles are calculated for two-dimensional image data of a person. It is an example of a deep learner that infers, recognizes basic actions, and recognizes user-defined rules. By using this second machine learning model 15, it is possible to recognize the basic motion of a person, and to acquire the ankle position, face orientation, and body orientation. Note that basic motions are, for example, walking, running, and stopping. The rules defined by the user are, for example, the transition of skeleton information corresponding to each action until picking up the product.

映像データＤＢ１６は、店舗１に設置される複数のカメラ２それぞれにより撮像された映像データを記憶するデータベースである。例えば、映像データＤＢ１６は、カメラ２ごと、または、撮像された時間帯ごとに、映像データを記憶する。 The image data DB 16 is a database that stores image data captured by each of the plurality of cameras 2 installed in the store 1 . For example, the video data DB 16 stores video data for each camera 2 or for each time period during which the image was captured.

セグメント結果ＤＢ１７は、セマンティックセグメンテーションの実行結果を記憶するデータベースである。具体的には、セグメント結果ＤＢ１７は、第１機械学習モデル１４の出力結果を記憶する。例えば、セグメント結果ＤＢ１７は、ＲＧＢの画像データとセマンティックセグメンテーションの実行結果とを対応付けて記憶する。 The segment result DB 17 is a database that stores execution results of semantic segmentation. Specifically, the segment result DB 17 stores the output result of the first machine learning model 14 . For example, the segment result DB 17 associates and stores RGB image data and execution results of semantic segmentation.

ＲＯＩ情報ＤＢ１８は、後述する制御部２０により得られた注目領域のＲＯＩ、商品棚のＲＯＩなどを記憶するデータベースである。例えば、ＲＯＩ情報ＤＢ１８は、ＲＧＢの画像データごとに、注目領域のＲＯＩ、商品棚のＲＯＩなどを対応付けて記憶する。 The ROI information DB 18 is a database that stores ROIs of attention areas, ROIs of product shelves, and the like obtained by the control unit 20, which will be described later. For example, the ROI information DB 18 associates and stores the ROI of the attention area, the ROI of the product shelf, and the like for each RGB image data.

設定結果ＤＢ１９は、後述する制御部２０によりセグメント結果に対して商品棚のエリアを設定した結果を記憶するデータベースである。例えば、設定結果ＤＢ１９は、ＲＧＢの画像データと、画像データに対して設定された各ラベルの設定情報とを対応付けて記憶する。 The setting result DB 19 is a database that stores the result of setting the product shelf area for the segment result by the control unit 20, which will be described later. For example, the setting result DB 19 associates and stores RGB image data and setting information for each label set for the image data.

制御部２０は、情報処理装置１０全体を司る処理部であり、例えばプロセッサなどによる実現される。この制御部２０は、事前学習部２１、取得部２２、動作解析部２３、バラつき抽出部２４、軌跡生成部２５、注目領域抽出部２６、エリア設定部２７を有する。なお、事前学習部２１、取得部２２、動作解析部２３、バラつき抽出部２４、軌跡生成部２５、注目領域抽出部２６、エリア設定部２７は、プロセッサが有する電子回路やプロセッサが実行するプロセスなどにより実現される。 The control unit 20 is a processing unit that controls the entire information processing apparatus 10, and is implemented by, for example, a processor. The control unit 20 has a pre-learning unit 21 , an acquisition unit 22 , a motion analysis unit 23 , a variation extraction unit 24 , a trajectory generation unit 25 , an attention area extraction unit 26 and an area setting unit 27 . The pre-learning unit 21, the acquisition unit 22, the motion analysis unit 23, the variation extraction unit 24, the trajectory generation unit 25, the attention area extraction unit 26, and the area setting unit 27 are implemented by electronic circuits possessed by the processor, processes executed by the processor, and the like. It is realized by

事前学習部２１は、第１機械学習モデル１４を生成する処理部である。具体的には、事前学習部２１は、訓練データＤＢ１３に記憶される各訓練データを用いた機械学習により、第１機械学習モデルの訓練を実行する。 The pre-learning unit 21 is a processing unit that generates the first machine learning model 14 . Specifically, the pre-learning unit 21 executes training of the first machine learning model by machine learning using each training data stored in the training data DB 13 .

図６は、第１機械学習モデル１４の生成を説明する図である。図６に示すように、事前学習部２１は、ＲＧＢの画像データと正解情報（セグメンテーション結果）とを含む訓練データを第１機械学習モデル１４に入力し、出力結果（セグメンテーション結果）を取得する。そして、事前学習部２１は、訓練データの正解情報と出力結果との誤差が最小化するように、第１機械学習モデル１４のパラメータ等を最適化する。 FIG. 6 is a diagram illustrating generation of the first machine learning model 14. As shown in FIG. As shown in FIG. 6, the pre-learning unit 21 inputs training data including RGB image data and correct information (segmentation result) to the first machine learning model 14 and acquires an output result (segmentation result). Then, the pre-learning unit 21 optimizes the parameters and the like of the first machine learning model 14 so as to minimize the error between the correct information of the training data and the output result.

取得部２２は、各カメラ２から映像データを取得して映像データＤＢ１６に格納する処理部である。例えば、取得部２２は、各カメラ２から随時取得してもよく、定期的に取得してもよい。 The acquisition unit 22 is a processing unit that acquires video data from each camera 2 and stores it in the video data DB 16 . For example, the acquisition unit 22 may acquire from each camera 2 at any time or periodically.

動作解析部２３は、カメラ２により撮像された映像データに写っている人物の動作解析を実行する処理部である。具体的には、動作解析部２３は、映像データに含まれる各画像データ（フレーム）を第２機械学習モデル１５に入力し、各画像データに写っている人物の動作を認識する。 The motion analysis unit 23 is a processing unit that analyzes the motion of a person appearing in the video data captured by the camera 2 . Specifically, the motion analysis unit 23 inputs each image data (frame) included in the video data to the second machine learning model 15, and recognizes the motion of the person appearing in each image data.

図７は、第２機械学習モデル１５を用いた動作解析を説明する図である。図７に示すように、動作解析部２３は、ＲＧＢの画像データを第２機械学習モデル１５に入力し、画像データに写っている人物の２次元骨格座標を取得する。そして、動作解析部２３は、２次元骨格座標にしたがって、人物の足首の位置、顔の向き、身体の向きを特定し、特定した結果を、バラつき抽出部２４や軌跡生成部２５などに出力する。 FIG. 7 is a diagram for explaining motion analysis using the second machine learning model 15. As shown in FIG. As shown in FIG. 7, the motion analysis unit 23 inputs the RGB image data to the second machine learning model 15 and acquires the two-dimensional skeleton coordinates of the person appearing in the image data. Then, the motion analysis unit 23 identifies the ankle position, face orientation, and body orientation of the person according to the two-dimensional skeletal coordinates, and outputs the identified results to the variation extraction unit 24, the trajectory generation unit 25, and the like. .

このように、動作解析部２３は、所定時間間隔で取得された各映像データに含まれる各画像データ（例えば１００フレーム）それぞれを第２機械学習モデル１５に入力し、各画像データに写っている人物の足首の位置、顔の向き、身体の向きを測定することで、映像データ内における人物の足首の位置の遷移、顔の向きの遷移、身体の向きの遷移を特定することができる。 In this way, the motion analysis unit 23 inputs each image data (for example, 100 frames) included in each video data acquired at predetermined time intervals to the second machine learning model 15, and By measuring the ankle position, face orientation, and body orientation of the person, it is possible to identify the transition of the ankle position, the face orientation, and the body orientation transition in the video data.

バラつき抽出部２４は、動作解析部２３により特定された人物の２次元骨格座標を用いて、当該人物の身体の向きと顔の向きとのバラつきを抽出する処理部である。上記例で説明すると、バラつき抽出部２４は、映像データに含まれる各画像データ（例えば１００フレーム）についての顔の向きと身体の向きとを、動作解析部２３から取得する。続いて、バラつき抽出部２４は、バラつきとして、各画像データ内の人物の顔の向きと顔の向きとのなす角度を算出する。 The variation extraction unit 24 is a processing unit that uses the two-dimensional skeletal coordinates of the person specified by the motion analysis unit 23 to extract variations in the orientation of the body and the orientation of the face of the person. In the above example, the variation extractor 24 acquires from the motion analyzer 23 the orientation of the face and the orientation of the body for each piece of image data (for example, 100 frames) included in the video data. Subsequently, the variation extracting unit 24 calculates the angle formed by the face directions of the persons in each image data as the variation.

軌跡生成部２５は、映像データに写っている各人物の移動軌跡を生成する処理部である。具体的には、軌跡生成部２５は、映像データ内の画像データに対するセマンティックセグメンテーションの実行結果に、動作解析部２３により得られた人物の足首の位置をプロットすることで、人物の移動軌跡を生成する。 The trajectory generation unit 25 is a processing unit that generates a movement trajectory of each person appearing in the video data. Specifically, the trajectory generation unit 25 plots the positions of the person's ankles obtained by the motion analysis unit 23 on the execution result of semantic segmentation for the image data in the video data, thereby generating the movement trajectory of the person. do.

図８は、ユーザの移動軌跡の生成を説明する図である。図８に示すように、軌跡生成部２５は、映像データ内の画像データ（例えば最後の画像データ）を、第１機械学習モデル１４に入力する。そして、動作解析部２３は、第１機械学習モデル１４により領域（エリア）が識別されて、各エリアにラベルが設定されたセグメンテーション結果を取得する。 FIG. 8 is a diagram illustrating generation of a user's movement trajectory. As shown in FIG. 8 , the trajectory generator 25 inputs image data (for example, the last image data) in video data to the first machine learning model 14 . Then, the motion analysis unit 23 acquires a segmentation result in which regions (areas) are identified by the first machine learning model 14 and labels are set for each area.

その後、軌跡生成部２５は、セグメンテーション結果に含まれる各ラベルから、ラベル「通路」が設定された通路の領域を特定する。続いて、軌跡生成部２５は、通路の領域に対して、映像データ内の各画像データから特定された各人物の足首の位置を、軌跡としてプロットする。そして、軌跡生成部２５は、プロット結果を注目領域抽出部２６に出力する。 After that, the trajectory generation unit 25 identifies the region of the passage to which the label “passage” is set from each label included in the segmentation result. Subsequently, the trajectory generation unit 25 plots the positions of the ankles of each person identified from each image data in the video data as a trajectory with respect to the area of the passage. The trajectory generation unit 25 then outputs the plot result to the attention area extraction unit 26 .

このようにして、軌跡生成部２５は、映像データについて、映像データ内の出現する人物が通路の領域を移動する移動軌跡を生成することができる。 In this way, the trajectory generation unit 25 can generate a movement trajectory along which a person appearing in the video data moves through the area of the passage.

注目領域抽出部２６は、人物の身体の向きと顔の向きとのバラつきに基づき、注目領域を抽出する処理部である。具体的には、注目領域抽出部２６は、軌跡生成部２５により生成された移動軌跡のうち、人物の顔の向きと顔の向きとのなす角度が閾値以上である移動軌跡を含む領域を、注目領域として抽出する。 The attention area extraction unit 26 is a processing unit that extracts an attention area based on the variation between the orientation of the body and the orientation of the face of the person. Specifically, the region-of-interest extraction unit 26 selects, from among the movement trajectories generated by the trajectory generation unit 25, a region that includes a movement trajectory in which the angle between the face orientations of the person is equal to or greater than a threshold value. Extract as a region of interest.

図９は、顔の向きと身体の向きのプロットを説明する図であり、図１０は、注目領域の抽出を説明する図である。図９に示すように、注目領域抽出部２６は、軌跡生成部２５により生成された移動軌跡に、動作解析部２３により特定された人物の顔の向きと身体の向きとをプロットする。続いて、注目領域抽出部２６は、バラつき抽出部２４により算出された角度（バラつき）に基づき、各軌跡について、人物の顔の向きと身体の向きの角度を特定する。 FIG. 9 is a diagram for explaining plotting of the orientation of the face and the orientation of the body, and FIG. 10 is a diagram for explaining the extraction of the region of interest. As shown in FIG. 9 , the attention area extraction unit 26 plots the face direction and body direction of the person specified by the motion analysis unit 23 on the movement trajectory generated by the trajectory generation unit 25 . Subsequently, the attention area extracting unit 26 identifies the angle of the person's face orientation and body orientation for each trajectory based on the angles (variation) calculated by the variation extracting unit 24 .

その後、図１０に示すように、注目領域抽出部２６は、移動軌跡の点群に対して、顔の向きと身体の向きとのバラつきをベースにクラスタリングを実行する。そして、注目領域抽出部２６は、角度が閾値以上であり、バラつきが大きいとしてクラスタリングされた領域Ａ１とＡ２を注目領域として抽出し、角度が閾値未満であり、バラつきが小さいとしてクラスタリングされた領域Ａ３を通路の領域として抽出する。 After that, as shown in FIG. 10, the region-of-interest extraction unit 26 clusters the point group of the movement trajectory based on variations in the orientation of the face and the orientation of the body. Then, the attention area extracting unit 26 extracts the areas A1 and A2 clustered as having an angle equal to or larger than the threshold and having large variations as attention areas, and extracts the area A3 clustered as having an angle less than the threshold and having small variations. is extracted as the passage region.

このようにして、注目領域抽出部２６は、映像データ内で人物の行動分析の対象である領域であって、商品に対するピッキング動作の検出対象となる領域である注目領域を絞り込むことができる。なお、注目領域抽出部２６は、クラスタリングに限らず、例えば、角度が閾値以上である軌跡を最大に含む各領域を注目領域として抽出するなどの手法を用いることもできる。 In this manner, the attention area extracting unit 26 can narrow down the attention area, which is the target area for the behavior analysis of the person in the video data and the target area for detecting the picking motion for the product. Note that the attention area extracting unit 26 is not limited to clustering, and can also use a method such as extracting each area that includes the maximum number of trajectories whose angles are equal to or greater than a threshold as attention areas.

エリア設定部２７は、室内を構成する複数のエリアのうちの注目領域に該当する第一のエリアと隣接する第二のエリアを特定し、特定された第二のエリアを、人物に関連する物体が収納されたエリアと設定する処理部である。具体的には、エリア設定部２７は、セマンティックセグメンテーションの実行結果から得られた各ラベルが付与されたエリアのうち、注目領域抽出部２６により抽出された注目領域と接するエリアを、商品棚エリアに設定する。 The area setting unit 27 identifies a second area adjacent to the first area corresponding to the attention area among the plurality of areas constituting the room, and designates the identified second area as an object related to the person. is a processing unit that sets an area in which is stored. Specifically, the area setting unit 27 assigns an area, which is in contact with the attention area extracted by the attention area extraction unit 26, among the areas to which each label obtained from the execution result of the semantic segmentation is attached, to the product shelf area. set.

図１１は、商品棚エリアの設定を説明する図である。図１１に示すように、エリア設定部２７は、注目領域抽出部２６により注目領域として抽出された各クラスタについて、クラスタ内に含まれる軌跡（点群）を囲む多角形の座標を取得する。そして、エリア設定部２７は、多角形に囲まれる領域Ｃと領域Ｄを特定する。その後、エリア設定部２７は、領域Ｃを生成する多角形の座標と、領域Ｄを生成する多角形の座標とを、注目領域としてＲＯＩ情報ＤＢ１８に格納する。 FIG. 11 is a diagram for explaining the setting of the product shelf area. As shown in FIG. 11 , the area setting unit 27 acquires the coordinates of a polygon surrounding the trajectory (point group) included in each cluster extracted as the attention area by the attention area extraction unit 26 . Then, the area setting unit 27 specifies the area C and the area D surrounded by polygons. After that, the area setting unit 27 stores the coordinates of the polygon for generating the area C and the coordinates of the polygon for generating the area D in the ROI information DB 18 as the attention area.

また、エリア設定部２７は、領域Ｃを生成する多角形の各点群（軌跡）に対して、バラつき抽出部２４により特定された顔の向きを対応付ける。そして、エリア設定部２７は、顔の向きの方向にあるエリアのうち、領域Ｃを含む通路エリア（ラベル：通路）と隣接するエリアＣ１とエリアＣ２を、商品棚エリアに設定する。すなわち、エリア設定部２７は、セグメンテーション結果においてエリアＣ１とエリアＣ２とのそれぞれに設定されるラベルを、商品棚を示すラベルに設定し直す。 In addition, the area setting unit 27 associates each point group (trajectory) of the polygon that generates the area C with the orientation of the face specified by the variation extraction unit 24 . Then, the area setting unit 27 sets area C1 and area C2 adjacent to the aisle area (label: aisle) including the area C among the areas in the face direction as the product shelf area. That is, the area setting unit 27 resets the labels set for each of the areas C1 and C2 in the segmentation result to labels indicating product shelves.

同様に、エリア設定部２７は、領域Ｄを生成する多角形の各点群（軌跡）に対して、バラつき抽出部２４により特定された顔の向きを対応付ける。そして、エリア設定部２７は、顔の向きの方向にあるエリアのうち、領域Ｄを含む通路エリアと隣接するエリアＣ３とエリアＣ４を、商品棚エリアに設定する。すなわち、エリア設定部２７は、セグメンテーション結果においてエリアＣ３とエリアＣ４とのそれぞれに設定されるラベルを、商品棚を示すラベルに設定し直す。 Similarly, the area setting unit 27 associates each point group (trajectory) of the polygon that generates the area D with the orientation of the face identified by the variation extraction unit 24 . Then, the area setting unit 27 sets the area C3 and the area C4 adjacent to the aisle area including the area D among the areas in the direction of the face direction as the product shelf area. That is, the area setting unit 27 resets the labels set for each of the areas C3 and C4 in the segmentation result to labels indicating product shelves.

その後、エリア設定部２７は、各ラベルが設定されたセグメンテーション結果を、設定結果ＤＢ１９に格納する。このようにして、エリア設定部２７は、映像データから生成されたセグメンテーション結果に対して、ピッキング動作を特定する対象の注目領域の設定と、ピッキング動作の対象である商品棚エリアを設定することができる。 After that, the area setting unit 27 stores the segmentation result with each label set in the setting result DB 19 . In this manner, the area setting unit 27 can set the attention area for which the picking operation is specified and the product shelf area for which the picking operation is to be performed for the segmentation result generated from the video data. can.

なお、エリア設定部２７は、映像データ内の代表的な画像データに限らず、各画像データに対しても上記各設定を実行することができ、代表的な画像データに対して実行された注目領域の設定および商品棚エリアの設定を、各画像データに対しても流用することもできる。つまり、エリア設定部２７は、セグメンテーション結果を用いた商品棚エリアの設定結果を用いて、映像データの各画像データに対して、エリアＣ１、Ｃ２、Ｃ３、Ｃ４のそれぞれに該当する領域を商品棚エリアと設定することもできる。 Note that the area setting unit 27 can perform the above settings not only for the representative image data in the video data, but also for each image data. The setting of the area and the setting of the product shelf area can also be applied to each image data. That is, the area setting unit 27 uses the setting result of the product shelf area using the segmentation result to set the regions corresponding to the areas C1, C2, C3, and C4 for each image data of the video data to the product shelf area. Area can also be set.

［処理の流れ］
図１２は、処理の流れを示すフローチャートである。なお、ここでは、事前学習により第１機械学習モデル１４が生成済みである例で説明する。 [Process flow]
FIG. 12 is a flowchart showing the flow of processing. Here, an example in which the first machine learning model 14 has already been generated by pre-learning will be described.

図１２に示すように、処理開始が指示されると（Ｓ１０１：Ｙｅｓ）、動作解析部２３は、取得部２２により取得された映像データに基づき、動作解析を実行する（Ｓ１０２）。そして、動作解析部２３は、動作解析に基づき、人物の顔の向き等を検出する（Ｓ１０３）。例えば、動作解析部２３は、映像データに含まれる各画像データを第２機械学習モデルに入力して、各画像データに含まれる人物の２次元骨格情報や２次元骨格情報の遷移を特定し、各人物の足首の位置、顏の向き、身体の向きを検出する。 As shown in FIG. 12, when the start of processing is instructed (S101: Yes), the motion analysis unit 23 executes motion analysis based on the video data acquired by the acquisition unit 22 (S102). Then, the motion analysis unit 23 detects the orientation of the person's face and the like based on the motion analysis (S103). For example, the motion analysis unit 23 inputs each image data included in the video data to the second machine learning model, identifies the two-dimensional skeleton information of the person included in each image data and the transition of the two-dimensional skeleton information, The ankle position, face direction, and body direction of each person are detected.

続いて、軌跡生成部２５は、映像データに含まれる画像データを、第１機械学習モデル１４に入力し、セマンティックセグメンテーションの実行結果であるセグメンテーション結果を取得する（Ｓ１０４）。 Subsequently, the trajectory generation unit 25 inputs the image data included in the video data to the first machine learning model 14, and acquires the segmentation result, which is the execution result of semantic segmentation (S104).

そして、軌跡生成部２５は、映像データに含まれる各画像データから、人物の移動軌跡を生成する（Ｓ１０５）。例えば、軌跡生成部２５は、各画像データ内の人物に対して特定された足首の位置を、セグメンテーション結果にプロットすることで、各人物の移動軌跡を生成する。 Then, the trajectory generation unit 25 generates the movement trajectory of the person from each image data included in the video data (S105). For example, the trajectory generation unit 25 generates the movement trajectory of each person by plotting the positions of the ankles specified for the person in each image data on the segmentation result.

その後、注目領域抽出部２６は、移動軌跡をプロットしたセグメンテーション結果内の各移動軌跡に、顔の向きと身体の向きとをプロットする（Ｓ１０６）。そして、注目領域抽出部２６は、顔の向きと身体の向きとのバラつきを検出する（Ｓ１０７）。例えば、注目領域抽出部２６は、各移動軌跡について、顔の向きと身体の向きの各ベクトルのなす角度を、バラつきとして取得する。 After that, the region-of-interest extraction unit 26 plots the orientation of the face and the orientation of the body on each trajectory in the segmentation results in which the trajectories are plotted (S106). Then, the region-of-interest extraction unit 26 detects variations between the orientation of the face and the orientation of the body (S107). For example, the region-of-interest extracting unit 26 acquires the angle formed by each vector of the orientation of the face and the orientation of the body for each movement trajectory as a variation.

続いて、注目領域抽出部２６は、顔の向きと身体の向きとのバラつきをベースにクラスタリングを実行し（Ｓ１０８）、クラスタリング結果に基づき、注目領域を抽出する（Ｓ１０９）。例えば、注目領域抽出部２６は、角度が閾値以上である軌跡のクラスタを注目領域として抽出する。 Subsequently, the attention area extracting unit 26 performs clustering based on variations in the orientation of the face and the orientation of the body (S108), and extracts the attention area based on the clustering result (S109). For example, the attention area extracting unit 26 extracts a cluster of trajectories whose angles are equal to or greater than a threshold as attention areas.

その後、エリア設定部２７は、注目領域のＲＯＩ座標を生成し（Ｓ１１０）、商品棚エリアを設定する（Ｓ１１１）。例えば、エリア設定部２７は、クラスタ内に含まれる軌跡（点群）を囲む多角形の座標をＲＯＩ座標として生成する。また、エリア設定部２７は、注目領域と隣接する領域であって、軌跡に対応付けられた顔の向きのベクトル方向にある領域を、商品棚エリアに設定する。 After that, the area setting unit 27 generates the ROI coordinates of the attention area (S110), and sets the product shelf area (S111). For example, the area setting unit 27 generates, as ROI coordinates, the coordinates of a polygon surrounding the trajectory (point group) included in the cluster. In addition, the area setting unit 27 sets, as the product shelf area, an area that is adjacent to the attention area and that is in the vector direction of the face orientation associated with the trajectory.

［効果］
上述したように、情報処理装置１０は、画像データから領域分割をするセグメンテーション結果を生成し、セグメンテーション結果と動作解析結果から通路領域を再抽出し、動作解析結果から顏の向きと体向きのバラつきを抽出する。そして、情報処理装置１０は、通路領域とバラつき情報からクラスタリングにより注目領域を抽出する。この結果、情報処理装置１０は、過不足のない注目領域を自動的に提供することができる。 [effect]
As described above, the information processing apparatus 10 generates a segmentation result for segmenting the image data into regions, re-extracts the passage region from the segmentation result and the motion analysis result, and detects variations in face orientation and body orientation from the motion analysis result. to extract Then, the information processing apparatus 10 extracts an attention area by clustering from the passage area and the variation information. As a result, the information processing apparatus 10 can automatically provide an appropriate region of interest.

例えば、この情報処理装置１０を用いることで、手動で注目領域を設定する必要がないので、人為的なミスを削減でき、手動設定に比べて、膨大な画像データに対して正確かつ高速な注目領域の設定を実現することができる。また、情報処理装置１０は、人物が興味を示す顔を動かす動作が行われた領域を注目領域として抽出することができるので、図４の参考技術と異なり、過不足のない注目領域を設定できる。 For example, by using the information processing apparatus 10, there is no need to manually set the attention area, so human error can be reduced, and compared to manual setting, accurate and high-speed attention can be paid to a huge amount of image data. Region setting can be implemented. In addition, the information processing apparatus 10 can extract, as an attention area, an area in which an action of moving the face that a person is interested in is performed. Therefore, unlike the reference technique in FIG. .

また、情報処理装置１０は、過不足のない注目領域と隣接する領域を商品棚と特定することができるので、参考技術とは異なり、停止した状態でのピッキング動作に限らず、ゆっくり移動して商品を取るピッキング動作を検出することができる。この結果、情報処理装置１０は、ピッキング動作の検出精度を向上させることができ、行動分析などの精度を向上させることができる。 In addition, the information processing apparatus 10 can identify a product shelf as an area adjacent to the correct area of interest. A picking action to pick up a product can be detected. As a result, the information processing apparatus 10 can improve the detection accuracy of the picking motion, and can improve the accuracy of action analysis and the like.

さて、これまで本発明の実施例について説明したが、本発明は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。 Although the embodiments of the present invention have been described so far, the present invention may be implemented in various different forms other than the embodiments described above.

［数値等］
上記実施例で用いた数値例、カメラ数、ラベル名、軌跡の数等は、あくまで一例であり、任意に変更することができる。また、各フローチャートで説明した処理の流れも矛盾のない範囲内で適宜変更することができる。また、上記実施例では、店舗を例にして説明したが、これに限定されるものではなく、例えば倉庫、工場、教室、電車の車内や飛行機の客室などにも流用することができる。これらの場合、人物に関連する物体が収納された領域の一例として説明した商品棚の領域に代わりに、物を置く領域や荷物をしまう領域が検出、設定対象となる。 [Numbers, etc.]
Numerical examples, the number of cameras, label names, the number of trajectories, etc. used in the above embodiment are only examples, and can be changed arbitrarily. Also, the flow of processing described in each flowchart can be changed as appropriate within a consistent range. In addition, in the above embodiment, the store was explained as an example, but the present invention is not limited to this, and can be applied to, for example, a warehouse, a factory, a classroom, a train car, an airplane cabin, and the like. In these cases, instead of the product shelf area described as an example of the area in which objects related to people are stored, an area where objects are placed and an area where luggage is stored are detected and set.

また、上記実施例では、人物の足首の位置を用いる例を説明したが、これに限定されるものではなく、例えば足の位置、靴の位置などを用いることもできる。また、上記実施例では、顔の向きの方向にあるエリアを商品棚エリアと特定する例を説明したが、身体の向きの方向にあるエリアを商品棚エリアと特定することもできる。また、各機械学習モデルは、ニューラルネットワークなどを用いることができる。 Also, in the above embodiment, an example using the position of the person's ankle has been described, but the present invention is not limited to this, and the position of the foot, the position of the shoe, or the like can also be used. In the above embodiment, an example was described in which the area in the direction of the face is specified as the product shelf area, but the area in the direction of the body can also be specified as the product shelf area. Also, each machine learning model can use a neural network or the like.

［システム］
上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 [system]
Information including processing procedures, control procedures, specific names, and various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散や統合の具体的形態は図示のものに限られない。つまり、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Also, each component of each device illustrated is functionally conceptual, and does not necessarily need to be physically configured as illustrated. That is, the specific forms of distribution and integration of each device are not limited to those shown in the drawings. That is, all or part of them can be functionally or physically distributed and integrated in arbitrary units according to various loads and usage conditions.

さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Further, each processing function performed by each device may be implemented in whole or in part by a CPU and a program analyzed and executed by the CPU, or implemented as hardware based on wired logic.

［ハードウェア］
図１３は、ハードウェア構成例を説明する図である。図１３に示すように、情報処理装置１０は、通信装置１０ａ、ＨＤＤ（Hard Disk Drive）１０ｂ、メモリ１０ｃ、プロセッサ１０ｄを有する。また、図１３に示した各部は、バス等で相互に接続される。 [hardware]
FIG. 13 is a diagram illustrating a hardware configuration example. As shown in FIG. 13, the information processing device 10 has a communication device 10a, a HDD (Hard Disk Drive) 10b, a memory 10c, and a processor 10d. 13 are interconnected by a bus or the like.

通信装置１０ａは、ネットワークインタフェースカードなどであり、他の装置との通信を行う。ＨＤＤ１０ｂは、図５に示した機能を動作させるプログラムやＤＢを記憶する。 The communication device 10a is a network interface card or the like, and communicates with other devices. The HDD 10b stores programs and DBs for operating the functions shown in FIG.

プロセッサ１０ｄは、図５に示した各処理部と同様の処理を実行するプログラムをＨＤＤ１０ｂ等から読み出してメモリ１０ｃに展開することで、図５等で説明した各機能を実行するプロセスを動作させる。例えば、このプロセスは、情報処理装置１０が有する各処理部と同様の機能を実行する。具体的には、プロセッサ１０ｄは、事前学習部２１、取得部２２、動作解析部２３、バラつき抽出部２４、軌跡生成部２５、注目領域抽出部２６、エリア設定部２７等と同様の機能を有するプログラムをＨＤＤ１０ｂ等から読み出す。そして、プロセッサ１０ｄは、事前学習部２１、取得部２２、動作解析部２３、バラつき抽出部２４、軌跡生成部２５、注目領域抽出部２６、エリア設定部２７等と同様の処理を実行するプロセスを実行する。 The processor 10d reads from the HDD 10b or the like a program that executes the same processing as each processing unit shown in FIG. 5 and develops it in the memory 10c, thereby operating the process of executing each function described with reference to FIG. 5 and the like. For example, this process executes the same function as each processing unit of the information processing apparatus 10 . Specifically, the processor 10d has functions similar to those of the pre-learning unit 21, the acquisition unit 22, the motion analysis unit 23, the variation extraction unit 24, the trajectory generation unit 25, the attention area extraction unit 26, the area setting unit 27, and the like. A program is read from the HDD 10b or the like. Then, the processor 10d executes processes similar to those of the pre-learning unit 21, the acquisition unit 22, the motion analysis unit 23, the variation extraction unit 24, the trajectory generation unit 25, the attention area extraction unit 26, the area setting unit 27, and the like. Execute.

このように、情報処理装置１０は、プログラムを読み出して実行することで設定方法を実行する情報処理装置として動作する。また、情報処理装置１０は、媒体読取装置によって記録媒体から上記プログラムを読み出し、読み出された上記プログラムを実行することで上記した実施例と同様の機能を実現することもできる。なお、この他の実施例でいうプログラムは、情報処理装置１０によって実行されることに限定されるものではない。例えば、他のコンピュータまたはサーバがプログラムを実行する場合や、これらが協働してプログラムを実行するような場合にも、上記実施例が同様に適用されてもよい。 Thus, the information processing apparatus 10 operates as an information processing apparatus that executes the setting method by reading and executing the program. Further, the information processing apparatus 10 can read the program from the recording medium by the medium reading device and execute the read program, thereby realizing the same function as the embodiment described above. Note that the programs referred to in other embodiments are not limited to being executed by the information processing apparatus 10 . For example, the above embodiments may be similarly applied when another computer or server executes the program, or when they cooperate to execute the program.

このプログラムは、インターネットなどのネットワークを介して配布されてもよい。また、このプログラムは、ハードディスク、フレキシブルディスク（ＦＤ）、ＣＤ－ＲＯＭ、ＭＯ（Magneto－Optical disk）、ＤＶＤ（Digital Versatile Disc）などのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行されてもよい。 This program may be distributed via a network such as the Internet. In addition, this program is recorded on a computer-readable recording medium such as a hard disk, flexible disk (FD), CD-ROM, MO (Magneto-Optical disk), DVD (Digital Versatile Disc), etc., and is read from the recording medium by a computer. It may be executed by being read.

１０情報処理装置
１１通信部
１２記憶部
１３訓練データＤＢ
１４第１機械学習モデル
１５第２機械学習モデル
１６映像データＤＢ
１７セグメント結果ＤＢ
１８ＲＯＩ情報ＤＢ
１９設定結果ＤＢ
２０制御部
２１事前学習部
２２取得部
２３動作解析部
２４バラつき抽出部
２５軌跡生成部
２６注目領域抽出部
２７エリア設定部
10 Information Processing Device 11 Communication Unit 12 Storage Unit 13 Training Data DB
14 first machine learning model 15 second machine learning model 16 video data DB
17 Segment result DB
18 ROI information DB
19 Setting result DB
20 control unit 21 pre-learning unit 22 acquisition unit 23 motion analysis unit 24 variation extraction unit 25 trajectory generation unit 26 attention area extraction unit 27 area setting unit

Claims

to the computer,
Identify the first area corresponding to the passage from the image data taken in the room,
identifying the orientation of the body and the orientation of the face of a person moving in the identified first area;
When the identified orientation of the person's body is different from the orientation of the person's face, the first area among the plurality of areas forming the room is based on the orientation of the person's face or the orientation of the person's body. identifying one area and an adjacent second area;
setting the identified second area as an area containing an object related to the person;
A setting program characterized by executing processing.

identifying the position of each person appearing in the video data from each image data in the video data including the image data;
The computer performs a process of identifying a region of interest to be analyzed for behavior of the person in the first area based on the orientation of the person's face and the orientation of the body of the person at the position of each person. and run
The process of specifying
Identifying the second area adjacent to the attention area in the first area;
The setting process is
2. The setting program according to claim 1, wherein said second area is set as an area in which an object related to said person is stored.

The process of specifying
performing clustering based on the angle formed by the orientation of the face of the person and the orientation of the body of the person;
3. The setting program according to claim 2, wherein, as a result of said clustering, when said angle is equal to or greater than a threshold, an area including said person's position belonging to a cluster is specified as said attention area.

The process of specifying
Among a plurality of areas adjacent to the first area in the image data, an area located in a direction of a vector indicating the orientation of the face of the person or the orientation of the body of the person included in the attention area is defined as the first area. 4. The setting program according to claim 3, wherein the second area is specified.

The process of specifying
Identifying the first area based on segmentation results of identifying each area of the image data by semantic segmentation;
The process of specifying
5. Inputting the image data to a machine learning model that has undergone machine learning, and identifying the orientation of the person's body and the person's face based on the output result of the machine learning model. A configuration program according to any one of .

The setting process is
6. The setting program according to claim 5, wherein the second area is set for the segment result or the image data on which the segment result is based.

the computer
Identify the first area corresponding to the passage from the image data taken in the room,
identifying the orientation of the body and the orientation of the face of a person moving in the identified first area;
When the identified orientation of the person's body is different from the orientation of the person's face, the first area among the plurality of areas forming the room is based on the orientation of the person's face or the orientation of the person's body. identifying one area and an adjacent second area;
setting the identified second area as an area containing an object related to the person;
A setting method characterized by executing processing.

Identify the first area corresponding to the passage from the image data taken in the room,
identifying the orientation of the body and the orientation of the face of a person moving in the identified first area;
When the identified orientation of the person's body is different from the orientation of the person's face, the first area among the plurality of areas forming the room is based on the orientation of the person's face or the orientation of the person's body. identifying one area and an adjacent second area;
setting the identified second area as an area containing an object related to the person;
An information processing apparatus comprising a control unit.