JP2022187215A

JP2022187215A - Operation determination program, operation determination method, and operation determination device

Info

Publication number: JP2022187215A
Application number: JP2021095110A
Authority: JP
Inventors: 諒石田; Ryo Ishida; 有一村瀬; Yuichi Murase
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2021-06-07
Filing date: 2021-06-07
Publication date: 2022-12-19

Abstract

To accurately determine whether an operation to an object from a person is normal or not.SOLUTION: An operation determination device 1 acquires a photographed image 2 obtained by photographing an operation of a person 3a. Then, the operation determination device 1 detects a position of an object 3b related to the person 3a from the acquired photographed image 2. Along with this operation, the operation determination device 1 detects skeleton information of the person 3a from the acquired photographed image 2. The operation determination device 1 determines whether the operation performed by the person 3a to the object 3b is normal or not on the basis of the detected position of the object 3b and the detected skeleton information. Consequently, whether the operation is normal or not can be accurately determined.SELECTED DRAWING: Figure 1

Description

本発明は、動作判別プログラム、動作判別方法および動作判別装置に関する。 The present invention relates to a motion discrimination program, a motion discrimination method, and a motion discrimination device.

画像から特定の物体を認識する画像認識技術は、広く普及している。この技術では、例えば、画像における特定の物体の領域がバウンディングボックスとして特定される。また、物体の画像認識を機械学習を用いて行う技術も存在する。そして、このような画像認識技術は、例えば、店舗における顧客の購買動作の監視や、工場における作業者の作業管理に応用することが考えられている。 Image recognition technology for recognizing a specific object from an image is widely used. In this technique, for example, a specific object region in an image is specified as a bounding box. There is also a technique for performing image recognition of an object using machine learning. Such image recognition technology is being considered to be applied to, for example, the monitoring of customers' purchasing behavior in stores and the work management of workers in factories.

特開２０２０－５３０１９号公報Japanese Patent Application Laid-Open No. 2020-53019 特開２０１４－１３２５０１号公報JP 2014-132501 A

ところで、人物が物体に対して行う動作が正常か否かを画像から判別する際には、物体の位置とともに人物の位置や動きを画像から認識することで、正常時と異常時における人物の動きの細かい違いを判別できるようになる。しかし、特定の物体を認識する上記の画像認識技術では、画像から物体を認識できるのみであり、物体とともに人物を認識することはできない。また、画像から人物を認識する技術として、姿勢推定（Pose Estimation）技術があるが、この技術では物体を認識することはできない。 By the way, when judging from an image whether or not a person's movement toward an object is normal, the position and movement of the person along with the position of the object can be recognized from the image. It becomes possible to discriminate fine differences between However, the above image recognition technology for recognizing a specific object can only recognize the object from the image, and cannot recognize the person together with the object. As a technology for recognizing a person from an image, there is a pose estimation technology, but this technology cannot recognize an object.

１つの側面では、本発明は、人物が物体に対して行う動作が正常か否かを高精度に判別可能な動作判別プログラム、動作判別方法および動作判別装置を提供することを目的とする。 In one aspect, an object of the present invention is to provide a motion discrimination program, a motion discrimination method, and a motion discrimination device capable of highly accurately discriminating whether or not a motion performed by a person on an object is normal.

１つの案では、コンピュータに、人物の動作を撮影した撮影画像を取得し、取得した撮影画像から、人物に関係する物体の位置と、人物の骨格情報とを検出し、物体の位置と骨格情報とに基づいて、人物が物体に対して行う動作が正常か否かを判別する、処理を実行させる動作判別プログラムが提供される。 In one proposal, a photographed image obtained by photographing a person's movement is acquired in a computer, the position of an object related to the person and the skeleton information of the person are detected from the acquired photographed image, and the position of the object and the skeleton information are detected. There is provided a motion determination program for executing a process for determining whether or not a motion performed by a person on an object is normal based on and.

また、１つの案では、上記の動作判別プログラムに基づく処理と同様の処理をコンピュータが実行する動作判別方法が提供される。
さらに、１つの案では、上記の動作判別プログラムに基づく処理と同様の処理を実行する動作判別装置が提供される。 Also, in one proposal, there is provided a motion discrimination method in which a computer executes processing similar to the processing based on the above motion discrimination program.
Furthermore, one proposal provides a motion discrimination device that executes the same processing as the processing based on the above motion discrimination program.

１つの側面では、人物が物体に対して行う動作が正常か否かを高精度に判別できる。 In one aspect, it can be determined with high accuracy whether or not the action performed by a person on an object is normal.

第１の実施の形態に係る動作判別装置について示す図である。BRIEF DESCRIPTION OF THE DRAWINGS It is a figure which shows about the motion discrimination|determination apparatus which concerns on 1st Embodiment. 第２の実施の形態に係る顧客監視システムの構成例を示す図である。It is a figure which shows the structural example of the customer monitoring system which concerns on 2nd Embodiment. 監視装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of a monitoring apparatus. ＨＯＩＤによる人物と物体の認識についての比較例を示す図である。FIG. 10 is a diagram showing a comparative example of recognition of a person and an object by HOID; 飲料缶およびパッケージ商品の例を示す図である。FIG. 2 shows examples of beverage cans and packaged goods; 誤った購買動作が行われた場合の画像の例を示す図である。It is a figure which shows the example of an image when the wrong purchasing action is performed. 第１、第２の比較例を用いた認識処理例とその応用について示す図である。FIG. 10 is a diagram showing an example of recognition processing using first and second comparative examples and its application; 監視装置が備える処理機能の構成例を示す図である。It is a figure which shows the structural example of the processing function with which a monitoring apparatus is provided. 画像特徴抽出部および骨格情報抽出部で生成されるデータの例を示す図である。FIG. 4 is a diagram showing an example of data generated by an image feature extraction unit and a skeleton information extraction unit; 購買動作抽出部の内部構成例を示す図である。It is a figure which shows the internal structural example of a purchasing action extraction part. 予測器としての判定部の処理について説明するための図である。It is a figure for demonstrating the process of the determination part as a predictor. 予測器（判定部）の学習処理手順を示すフローチャートの例である。It is an example of the flowchart which shows the learning processing procedure of a predictor (determination part). 予測器を用いた判定処理手順を示すフローチャートの例である。It is an example of the flowchart which shows the determination processing procedure using a predictor. 識別器（判定部）の学習処理手順を示すフローチャートの例である。It is an example of the flowchart which shows the learning processing procedure of a discriminator (determination part). 識別器を用いた判定処理手順を示すフローチャートの例である。It is an example of the flowchart which shows the determination processing procedure using a discriminator. 第４の実施の形態に係る監視装置が備える処理機能の構成例を示す図である。FIG. 13 is a diagram illustrating a configuration example of processing functions provided in a monitoring device according to a fourth embodiment; FIG. 正常な購買動作が行われた否かを判定する判定ルールについて説明するための図である。It is a figure for demonstrating the determination rule which determines whether normal purchasing action was performed. 環境差補正部による補正処理例を示す図である。It is a figure which shows the correction processing example by an environment difference correction|amendment part. 人物差補正部の内部構成例を示す図である。FIG. 4 is a diagram showing an example internal configuration of a person difference correcting unit; 人物差補正部による補正処理について説明するための図である。FIG. 5 is a diagram for explaining correction processing by a person difference correction unit; 判定ルールに基づく判定処理手順を示すフローチャートの例（その１）である。FIG. 11 is an example (part 1) of a flowchart showing a determination processing procedure based on a determination rule; FIG. 判定ルールに基づく判定処理手順を示すフローチャートの例（その２）である。FIG. 11 is an example (part 2) of a flowchart showing a determination processing procedure based on a determination rule; FIG. スキャン点数のカウント処理の手順を示すフローチャートの例である。FIG. 11 is an example of a flowchart showing a procedure of counting processing of the number of scan points; FIG.

以下、本発明の実施の形態について図面を参照して説明する。
〔第１の実施の形態〕
図１は、第１の実施の形態に係る動作判別装置について示す図である。図１に示す動作判別装置１は、人物が物体に対して行う動作が正常か否かを判別する装置である。動作判別装置１は、パーソナルコンピュータ、サーバ装置などのコンピュータとして実現される。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[First embodiment]
FIG. 1 is a diagram showing a motion discrimination device according to a first embodiment. A motion determination device 1 shown in FIG. 1 is a device that determines whether or not a person's motion with respect to an object is normal. The motion determination device 1 is realized as a computer such as a personal computer or a server device.

動作判別装置１は、人物の動作を撮影した撮影画像２を取得する（ステップＳ１）。この撮影画像２には、人物３ａが物体３ｂに対して動作を行っている様子が写っている。図１に示す撮影画像２には、例として、人物３ａが物体３ｂを手に持って読み取り装置３ｃに近づけ、物体３ｂの表面に付加された識別情報を読み取り装置３ｃに読み取らせる動作が写っているものとする。 The action determination device 1 acquires a photographed image 2 of a person's action (step S1). In this photographed image 2, a state in which a person 3a is making an action with respect to an object 3b is shown. In the photographed image 2 shown in FIG. 1, as an example, a person 3a holds an object 3b in his/her hand and brings it close to the reading device 3c, and the action of causing the reading device 3c to read the identification information added to the surface of the object 3b is shown. It is assumed that there is

動作判別装置１は、取得した撮影画像２から、人物３ａに関係する物体３ｂの位置を検出する（ステップＳ２ａ）。この検出では、例えば、撮影画像２から人物３ａが認識されるとともに、認識された人物３ａとの間で相互作用がある物体３ｂが認識され、これらの人物３ａおよび物体３ｂの位置が検出される。図１では例として、人物３ａを内包する画像領域４ａと、物体３ｂを内包する画像領域４ｂとが検出されている。このような検出は、例えば、ＨＯＩＤ（Human Object Interaction Detection）を用いて実現することができる。 The motion determination device 1 detects the position of the object 3b related to the person 3a from the acquired photographed image 2 (step S2a). In this detection, for example, a person 3a is recognized from the photographed image 2, an object 3b interacting with the recognized person 3a is recognized, and the positions of these person 3a and object 3b are detected. . In FIG. 1, as an example, an image area 4a including a person 3a and an image area 4b including an object 3b are detected. Such detection can be realized using, for example, HOID (Human Object Interaction Detection).

ＨＯＩＤは、与えられた１枚の画像から人物と物体が認識され、その人物がその物体と相互作用があるときに、人物と物体と相互作用の種類とを検出するものである。したがって、ＨＯＩＤが適用された場合、例えば、撮影画像２の中に人物が相互作用を与える物体があるときに、人物、物体および相互作用の種類が検出される。そして、人物、物体および相互作用の種類に基づいて、人物（人物３ａに対応）に関係する物体（物体３ｂに対応）が特定される。例えば、あるユーザが本を把持したときに、ユーザ、本、および本を把持していることが検出される。なお、相互作用としては、直接的にユーザの意識、無意識、接触、非接触にかかわらず、画像から認識できるすべての相互作用が含まれる。 HOID recognizes a person and an object from one given image, and detects the person, the object, and the type of interaction when the person interacts with the object. Therefore, when HOID is applied, for example, when there is an object with which the person interacts in the captured image 2, the person, the object, and the type of interaction are detected. An object (corresponding to object 3b) related to the person (corresponding to person 3a) is then identified based on the person, the object, and the type of interaction. For example, when a user picks up a book, the user, the book, and the holding of the book are detected. The interaction includes all interaction that can be recognized from the image regardless of whether the user is directly conscious, unconscious, contact or non-contact.

これとともに、動作判別装置１は、取得した撮影画像２から、人物３ａの骨格情報を検出する（ステップＳ２ｂ）。この検出では、例えば、人物３ａが有する所定の複数の関節の位置が検出される。図１では例として、人物３ａの関節として左右の肩、肘および手首の位置が検出されている、骨格線５ａは、右肩の位置と右肘の位置とを結んだ線を示し、骨格線５ｂは、右肘の位置と右手首の位置とを結んだ線を示す。また、骨格線６ａは、左肩の位置と左肘の位置とを結んだ線を示し、骨格線６ｂは、左肘の位置と左手首の位置とを結んだ線を示す。 Along with this, the motion determination device 1 detects the skeleton information of the person 3a from the captured image 2 (step S2b). In this detection, for example, the positions of a plurality of predetermined joints of the person 3a are detected. In FIG. 1, as an example, the positions of left and right shoulders, elbows and wrists are detected as joints of the person 3a. 5b shows a line connecting the position of the right elbow and the position of the right wrist. A skeleton line 6a indicates a line connecting the position of the left shoulder and the position of the left elbow, and a skeleton line 6b indicates a line connecting the position of the left elbow and the position of the left wrist.

動作判別装置１は、このようにして検出された物体３ｂの位置および骨格情報に基づいて、人物３ａが物体３ｂに対して行う動作が正常か否かを判別する（ステップＳ３）。図１の例では、人物３ａが物体３ｂに付加された識別情報を読み取り装置３ｃに読み取らせる動作が、正常に行われたか否かが判別される。 Based on the position and skeleton information of the object 3b detected in this manner, the motion determination device 1 determines whether or not the motion performed by the person 3a on the object 3b is normal (step S3). In the example of FIG. 1, it is determined whether or not the operation of causing the reading device 3c to read the identification information added to the object 3b by the person 3a was performed normally.

以上の処理では、人物３ａが物体３ｂに対して行う動作が正常か否かを、精度よく判別することができる。
例えば、本実施の形態では、撮影画像２から単に物体の位置が認識されるのではなく、人物に関係する物体の位置が認識される。これにより、図１の例のように人物が手に持っている物体を確実に認識でき、人物と関係のない物体が写っていたとしても、それを認識しない。例えば、画像から特定の物体を認識する画像認識技術では、このように人物に関係する物体のみを抽出して認識することはできない。 With the above processing, it is possible to accurately determine whether or not the action performed by the person 3a on the object 3b is normal.
For example, in the present embodiment, the position of an object related to a person is recognized instead of simply recognizing the position of the object from the captured image 2 . As a result, an object held by a person can be reliably recognized as in the example of FIG. 1, and even if an object unrelated to the person is captured, it is not recognized. For example, image recognition technology for recognizing a specific object from an image cannot extract and recognize only objects related to people in this way.

さらに、本実施の形態では、物体の位置とともに人体の骨格情報が用いられることで、正常時と異常時における人物の動きの細かい違いを判別できるようになる。例えば、物体３ｂに付加された識別情報を読み取り装置３ｃに読み取らせる動作が正常に行われた場合と、そうでない場合では、物体３ｂの位置の推移や、人物３ａの各関節の位置の推移が異なる。この動作が正常に行われない場合としては、例えば、物体３ｂを読み取り装置３ｃに近づけたときの物体３ｂの角度が、正常時とは異なる場合が考えられる。この場合、物体３ｂを持っている手の関節の状態（例えば関節間の相対位置）も、正常時とは異なると考えられる。本実施の形態では、骨格情報が用いられることで、このような人物の動きの細かい違いを判別できるようになる。 Furthermore, in the present embodiment, by using the skeleton information of the human body together with the position of the object, it is possible to discriminate fine differences in the movement of the person between the normal state and the abnormal state. For example, when the operation to read the identification information added to the object 3b by the reading device 3c is performed normally and when it is not, the transition of the position of the object 3b and the transition of the positions of each joint of the person 3a are different. different. As a case where this operation is not performed normally, for example, it is conceivable that the angle of the object 3b when the object 3b is brought closer to the reading device 3c is different from that in the normal state. In this case, the state of the joints of the hand holding the object 3b (for example, relative positions between the joints) is also considered to be different from the normal state. In the present embodiment, by using skeleton information, it is possible to discriminate such fine differences in the movement of a person.

〔第２の実施の形態〕
次に、第２の実施の形態として、動作判別装置１の処理を店舗における顧客監視システムに適用した場合について説明する。 [Second embodiment]
Next, as a second embodiment, a case in which the processing of the action determination device 1 is applied to a customer monitoring system in a store will be described.

図２は、第２の実施の形態に係る顧客監視システムの構成例を示す図である。図２に示す顧客監視システムは、商品が販売される店舗における顧客の購買動作を監視するためのシステムであり、監視装置１００と、監視装置１００に接続されたカメラ１０１とを含む。なお、監視装置１００は、図１に示した動作判別装置１の一例である。 FIG. 2 is a diagram showing a configuration example of a customer monitoring system according to the second embodiment. The customer monitoring system shown in FIG. 2 is a system for monitoring the purchasing behavior of customers in a store where merchandise is sold, and includes a monitoring device 100 and a camera 101 connected to the monitoring device 100 . Note that the monitoring device 100 is an example of the motion determination device 1 shown in FIG.

カメラ１０１は、キャッシュレジスタ５０が設置された店舗内に設置される。キャッシュレジスタ５０は、ＰＯＳ（Point Of Sale）システムに含まれるＰＯＳ端末である。また、キャッシュレジスタ５０は、顧客自身が精算操作を行うセルフ方式のキャッシュレジスタであり、「セルフレジ」と呼ばれることもある。 Camera 101 is installed in a store where cash register 50 is installed. The cash register 50 is a POS terminal included in a POS (Point Of Sale) system. In addition, the cash register 50 is a self-type cash register in which the customer himself/herself performs a checkout operation, and is sometimes called a "self-checkout".

キャッシュレジスタ５０は、バーコードスキャナ５１、ディスプレイ５２および入出金部５３を備える。バーコードスキャナ５１は、商品に付加された、商品コードを示すバーコードを読み取る。ディスプレイ５２は、バーコードが読み取られた商品の価格や、購入対象の商品の合計金額、お釣りの金額などを表示する。入出金部５３は、顧客からの入金の受け付けや、お釣りの出金を行う。 The cash register 50 includes a barcode scanner 51 , a display 52 and a deposit/withdrawal unit 53 . The barcode scanner 51 reads the barcode indicating the product code attached to the product. The display 52 displays the price of the product whose barcode has been read, the total price of the product to be purchased, the amount of change, and the like. The deposit/withdrawal unit 53 receives deposits from customers and dispenses change.

顧客は、例えば、購入対象の商品が入れられた店内用のカゴを持ってキャッシュレジスタ５０に接近し、カゴの中の商品を順にバーコードスキャナ５１に近づけてバーコードを読み取らせる「スキャン操作」を行う。顧客は、すべての商品についてのスキャン操作が終了すると、精算を要求する「精算操作」を行う。例えば、ディスプレイ５２がタッチパネルである場合、顧客は、タッチパネル上の精算ボタンの押下によって清算操作を行うことができる。精算操作を行った顧客は、ディスプレイ５２の表示情報にしたがって入出金部５３に購入金額を投入し、お釣りがある場合にはそれを入出金部５３から受け取る。 For example, the customer approaches the cash register 50 with an in-store basket containing products to be purchased, and brings the products in the basket one by one to the barcode scanner 51 to read the barcode ("scanning operation"). I do. When the customer completes the scanning operation for all the products, the customer performs "settlement operation" to request settlement. For example, if the display 52 is a touch panel, the customer can perform a checkout operation by pressing a checkout button on the touch panel. The customer who has performed the settlement operation puts the purchase amount into the deposit/withdrawal unit 53 according to the information displayed on the display 52, and receives the change from the deposit/withdrawal unit 53, if any.

カメラ１０１は、キャッシュレジスタ５０を用いた顧客の購買動作が写るように、キャッシュレジスタ５０の前面（特に、バーコードスキャナ５１の周辺）を撮影する。監視装置１００は、カメラ１０１による撮影画像から、顧客が正しい購買動作を行ったか否かを判定し、異常な購買動作を行ったと判定された場合には警告を発することができる。 The camera 101 photographs the front surface of the cash register 50 (particularly, the area around the barcode scanner 51) so that the purchase behavior of the customer using the cash register 50 can be captured. The monitoring device 100 can determine whether or not the customer made a correct purchase action from the image captured by the camera 101, and issue a warning when it is determined that the customer made an abnormal purchase action.

図３は、監視装置のハードウェア構成例を示す図である。監視装置１００は、例えば、図３に示すようなコンピュータとして実現される。図３に示す監視装置１００は、プロセッサ１１１、ＲＡＭ（Random Access Memory）１１２、ＨＤＤ（Hard Disk Drive）１１３、ＧＰＵ（Graphics Processing Unit）１１４、入力インタフェース（Ｉ／Ｆ）１１５、読み取り装置１１６、ネットワークインタフェース（Ｉ／Ｆ）１１７および通信インタフェース（Ｉ／Ｆ）１１８を有する。 FIG. 3 is a diagram illustrating a hardware configuration example of a monitoring device. The monitoring device 100 is implemented as a computer as shown in FIG. 3, for example. Monitoring device 100 shown in FIG. It has an interface (I/F) 117 and a communication interface (I/F) 118 .

プロセッサ１１１は、監視装置１００全体を統括的に制御する。プロセッサ１１１は、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＡＳＩＣ（Application Specific Integrated Circuit）またはＰＬＤ（Programmable Logic Device）である。また、プロセッサ１１１は、ＣＰＵ、ＭＰＵ、ＤＳＰ、ＡＳＩＣ、ＰＬＤのうちの２以上の要素の組み合わせであってもよい。 The processor 111 centrally controls the monitoring device 100 as a whole. The processor 111 is, for example, a CPU (Central Processing Unit), MPU (Micro Processing Unit), DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), or PLD (Programmable Logic Device). Also, the processor 111 may be a combination of two or more of CPU, MPU, DSP, ASIC, and PLD.

ＲＡＭ１１２は、監視装置１００の主記憶装置として使用される。ＲＡＭ１１２には、プロセッサ１１１に実行させるＯＳ（Operating System）プログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。また、ＲＡＭ１１２には、プロセッサ１１１による処理に必要な各種データが格納される。 A RAM 112 is used as a main storage device of the monitoring device 100 . The RAM 112 temporarily stores at least part of an OS (Operating System) program and application programs to be executed by the processor 111 . Also, the RAM 112 stores various data necessary for processing by the processor 111 .

ＨＤＤ１１３は、監視装置１００の補助記憶装置として使用される。ＨＤＤ１１３には、ＯＳプログラム、アプリケーションプログラム、および各種データが格納される。なお、補助記憶装置としては、ＳＳＤ（Solid State Drive）などの他の種類の不揮発性記憶装置を使用することもできる。 The HDD 113 is used as an auxiliary storage device for the monitoring device 100 . The HDD 113 stores an OS program, application programs, and various data. Other types of non-volatile storage devices such as SSDs (Solid State Drives) can also be used as auxiliary storage devices.

ＧＰＵ１１４には、表示装置１１４ａが接続されている。ＧＰＵ１１４は、プロセッサ１１１からの命令にしたがって、画像を表示装置１１４ａに表示させる。表示装置としては、液晶ディスプレイや有機ＥＬ（ElectroLuminescence）ディスプレイなどがある。 A display device 114 a is connected to the GPU 114 . The GPU 114 causes the display device 114a to display an image according to instructions from the processor 111 . Display devices include a liquid crystal display and an organic EL (ElectroLuminescence) display.

入力インタフェース１１５には、入力装置１１５ａが接続されている。入力インタフェース１１５は、入力装置１１５ａから出力される信号をプロセッサ１１１に送信する。入力装置１１５ａとしては、キーボードやポインティングデバイスなどがある。ポインティングデバイスとしては、マウス、タッチパネル、タブレット、タッチパッド、トラックボールなどがある。 An input device 115 a is connected to the input interface 115 . The input interface 115 transmits signals output from the input device 115 a to the processor 111 . The input device 115a includes a keyboard, pointing device, and the like. Pointing devices include mice, touch panels, tablets, touch pads, trackballs, and the like.

読み取り装置１１６には、可搬型記録媒体１１６ａが脱着される。読み取り装置１１６は、可搬型記録媒体１１６ａに記録されたデータを読み取ってプロセッサ１１１に送信する。可搬型記録媒体１１６ａとしては、光ディスク、光磁気ディスク、半導体メモリなどがある。 A portable recording medium 116 a is attached to and detached from the reading device 116 . The reading device 116 reads data recorded on the portable recording medium 116 a and transmits the read data to the processor 111 . The portable recording medium 116a includes an optical disk, a magneto-optical disk, a semiconductor memory, and the like.

ネットワークインタフェース１１７は、ネットワーク１１７ａを介して他の装置との間でデータの送受信を行う。
通信インタフェース１１８は、カメラ１０１との間でデータの送受信を行う。 The network interface 117 transmits and receives data to and from another device via the network 117a.
A communication interface 118 transmits and receives data to and from the camera 101 .

以上のようなハードウェア構成によって、監視装置１００の処理機能を実現することができる。
ところで、セルフ方式のキャッシュレジスタは、人口減少による人手不足への対処や、混雑解消、ウィルス感染対策などの目的のために、急速に普及が進んでいる。セルフ方式のキャッシュレジスタでは、顧客が誤った購買動作を行うことがあり、それによって未払金が発生してしまうという問題がある。このような誤った購買動作は、過失の場合もあれば故意の場合もあり、以下のように様々な種類の動作がある。 The processing functions of the monitoring apparatus 100 can be realized by the hardware configuration as described above.
By the way, self-service cash registers are rapidly becoming popular for purposes such as coping with labor shortages due to population decline, relieving congestion, and preventing virus infection. A problem with self-service cash registers is that customers may make erroneous purchases, resulting in unpaid bills. Such erroneous purchasing actions may be negligent or intentional, and there are various types of actions as follows.

過失による誤った購買動作としては、例えば、顧客が商品のスキャン操作を忘れて、店内用のカゴから顧客の袋に直接商品を移動させてしまう「スキャン漏れ」がある。また、店内用のカゴをショッピングカートの上側と下側とに設置できる場合に、顧客が下側のカゴに入っている商品のスキャン操作を忘れてしまう「カゴ漏れ」もある。 An erroneous purchasing operation caused by negligence includes, for example, a "scan failure" in which a customer forgets to scan an item and moves the item directly from the basket for use in the store to the customer's bag. There is also a "cart leak" where the customer forgets to scan the items in the lower basket when the in-store baskets can be placed on the top and bottom of the shopping cart.

一方、故意による誤った購買動作としては、例えば、顧客がバーコードだけを指で隠しながら、スキャン操作のフリをする「バーコード隠し」がある。また、パッケージ内に同一商品が複数個セットになっている場合に、パッケージ上のバーコードでなく、パッケージから露出している１つの商品のバーコードのスキャン操作を行う「バーコードスキャン誤り」もある。 On the other hand, as an intentional erroneous purchasing action, for example, there is "barcode hiding" in which a customer hides only the barcode with his or her finger while pretending to perform a scanning operation. In addition, when multiple items of the same product are set in a package, there is also a "barcode scan error" in which the barcode of one product exposed from the package is scanned instead of the barcode on the package. be.

これらの誤った購買動作では、種類によって商品の見え方や身体の動きが異なるので、すべての種類の動作を誤った購買動作として自動認識することは、技術的な難易度が高いといえる。そこで、例えば、複数種類のセンサや、単一種類のセンサを複数個用いて購買動作を認識する方法が考えられる。しかし、この方法は機材の導入コストが高いので、単一のカメラだけを用いて認識できることが望まれる。 In these erroneous purchase actions, the appearance of the product and the movement of the body differ depending on the type, so it can be said that it is technically difficult to automatically recognize all types of actions as erroneous purchase actions. Therefore, for example, a method of recognizing a purchase action using a plurality of types of sensors or a plurality of sensors of a single type is conceivable. However, since this method requires a high equipment cost, it is desired to be able to recognize using only a single camera.

ここで、カメラによる撮影画像から特定の商品を認識する技術は、一般的に普及している。この技術では、例えば、画像における商品の領域（例えばバウンディングボックス）が特定される。また、このような商品の画像認識を機械学習を用いて行う技術も存在する。しかし、これらの技術では、商品の１つ１つについてのテンプレートや学習データをあらかじめ用意しなければならない。このため、これらの技術を、多数の商品が存在し、新規の商品が頻繁に導入される店舗において使用することは現実的でない。また、この技術ではあくまで特定の商品の画像領域が特定されるのみであり、人物との関係性を認識することはできない。 Here, a technique for recognizing a specific product from an image captured by a camera is commonly used. In this technique, for example, the product area (eg, bounding box) in the image is specified. There is also a technique for performing image recognition of such products using machine learning. However, with these techniques, templates and learning data for each product must be prepared in advance. Therefore, it is not realistic to use these technologies in stores that have a large number of products and frequently introduce new products. In addition, this technique only identifies the image area of a specific product, and cannot recognize the relationship with a person.

一方、人物については、画像データを収集しやすく、画像中のサイズが大きいことが多いという性質がある。このため、人物は、例えばニューラルネットワーク（Neural Network：ＮＮ）を用いた機械学習により、人物以外の物体と比較して容易に認識できる。そこで、まず画像から人物を認識し、次に、人物の身体部位の位置（例えば手の位置）に注目して物体を認識することで、人物に関係する物体を認識する難易度を下げることができる。このような考え方により人物とそれに関係する物体とを認識する技術として、ＨＯＩＤがある。 On the other hand, people tend to have image data that is easy to collect and often have a large size in the image. Therefore, a person can be easily recognized by machine learning using a neural network (NN), for example, compared with objects other than the person. Therefore, it is possible to reduce the difficulty of recognizing objects related to people by first recognizing a person from an image and then recognizing an object by focusing on the position of the person's body part (for example, the position of the hand). can. HOID is a technique for recognizing a person and related objects based on such a concept.

図４は、ＨＯＩＤによる人物と物体の認識についての比較例を示す図である。ＨＯＩＤでは、画像上の人物の情報を利用して物体が認識される。画像上の物体のうち、人物と相互作用がある物体のみが認識され、相互作用がない物体は無視される。 FIG. 4 is a diagram showing a comparative example of recognition of a person and an object by HOID. In HOID, objects are recognized using information about people on images. Among the objects on the image, only objects that interact with the person are recognized, and objects that do not interact are ignored.

図４に例示した画像２００には、人物２０１が何らかの物体２０２を手に持っている様子が写っている。この画像２００をＨＯＩＤによって学習された学習モデルに投入すると、例えば、人物の位置情報と、その人物との間で相互作用がある物体の位置情報と、物体のクラス名と、相互作用のクラス名と、そのクラス名についての信頼度スコアとが出力される。 An image 200 illustrated in FIG. 4 shows a person 201 holding an object 202 in his or her hand. When this image 200 is input to the learning model learned by HOID, for example, the position information of the person, the position information of the object with which the person interacts, the class name of the object, and the class name of the interaction and the confidence score for that class name.

人物および物体の位置情報は、例えばそれらの領域に外接する長方形領域を示すバウンディングボックスとして出力される。図４に例示した画像２００ａは、画像２００から人物２０１の位置を示すバウンディングボックス２０３と物体２０２の位置を示すバウンディングボックス２０４とが検出されている。なお、この例の場合、相互作用のクラス名としては「持つ」ことを示すクラス名が出力される。信頼度スコアは、実際には、人物の領域と物体の領域とがあるときに、物体がある物体クラス名に属し、かつ、その物体と人物の間にある相互作用のクラス名の関係があることの信頼度スコア（確率値）を表してもよい。 Positional information for people and objects is output as bounding boxes that indicate, for example, rectangular regions that circumscribe those regions. A bounding box 203 indicating the position of the person 201 and a bounding box 204 indicating the position of the object 202 are detected from the image 200a illustrated in FIG. In this example, a class name indicating "has" is output as the interaction class name. Confidence score is actually related to the object class name when there is a person area and an object area, the object belongs to a certain object class name, and the interaction class name between the object and the person. It may represent a confidence score (probability value) of the fact.

このＨＯＩＤを応用することで、背景や手と同化しない外観の商品であれば、商品画像を事前に登録しておくことなく、画像から人物に関係する商品を認識することが可能になる。すなわち、画像から任意の物体をＳｏｍｅｔｈｉｎｇ（事前に登録された物体か否かが不明である、人物以外の何らかの物体）として認識し、その物体の領域を示すバウンディングボックスを推定できる。以下、このような人物および物体の認識処理を「第１の比較例」と記載する。 By applying this HOID, it is possible to recognize a product related to a person from the image without registering the product image in advance, as long as the product has an appearance that does not assimilate with the background or hand. That is, it is possible to recognize an arbitrary object from an image as something (any object other than a person whose existence is unknown whether it is an object registered in advance) and estimate a bounding box indicating the area of the object. Hereinafter, such person and object recognition processing will be referred to as a “first comparative example”.

一方、画像から人物を認識する技術として、姿勢推定技術がある。姿勢推定技術は、人体の骨格情報を検出することで人体の姿勢を推定する技術である。骨格情報としては、例えば、人体に含まれる所定の複数の関節についての画像上の位置が検出される。以下、骨格検出による人物の認識処理を「第２の比較例」と記載する。 On the other hand, there is posture estimation technology as a technology for recognizing a person from an image. Posture estimation technology is a technology for estimating the posture of a human body by detecting skeletal information of the human body. As the skeletal information, for example, the positions on the image of a plurality of predetermined joints included in the human body are detected. Hereinafter, the person recognition processing based on skeleton detection will be referred to as a “second comparative example”.

ここで、セルフ方式のキャッシュレジスタ５０を用いた誤った購買動作の例を示し、この購買動作を第１、第２の比較例を用いて認識する場合の処理について示す。ここでは、誤った購買動作として、前述の「バーコードスキャン誤り」を例示する。また、購買対象の商品としては、飲料缶とそのパッケージ商品とを例示する。 Here, an example of an erroneous purchase operation using the self-service cash register 50 will be shown, and the process of recognizing this purchase operation using the first and second comparative examples will be shown. Here, the aforementioned "barcode scanning error" is exemplified as an erroneous purchasing operation. In addition, beverage cans and their package products are exemplified as products to be purchased.

図５は、飲料缶およびパッケージ商品の例を示す図である。図５に示す飲料缶２１１は、ビールなどの飲料が入った缶である。飲料缶２１１の側面には、商品コードを示すバーコード２１２が付加されている。また、図５に示すパッケージ商品２１３は、飲料缶２１１が複数本セット販売された商品であり、外装の内部に複数本の飲料缶２１１が収納されている。外装面には、パッケージ商品２１３の商品コードを示すバーコード２１４が付加されている。 FIG. 5 is a diagram showing examples of beverage cans and packaged goods. A beverage can 211 shown in FIG. 5 is a can containing a beverage such as beer. A bar code 212 indicating a product code is attached to the side surface of the beverage can 211 . A package product 213 shown in FIG. 5 is a product in which a plurality of beverage cans 211 are sold as a set, and the plurality of beverage cans 211 are stored inside the exterior. A bar code 214 indicating the product code of the package product 213 is added to the exterior surface.

このようなパッケージ商品２１３の多くは、両端部などの外装の一部が開口しており、その開口部から内部の飲料缶２１１の一部が露出している。このため、図５に示すように、パッケージ商品２１３の外装から露出した飲料缶２１１に付加されているバーコード２１５を、顧客が故意にキャッシュレジスタ５０のバーコードスキャナ５１にスキャンさせる場合がある。この場合、複数本の飲料缶が１本分の金額で不正に購入されてしまう。 In many of such packaged products 213, a part of the exterior such as both ends is opened, and a part of the beverage can 211 inside is exposed from the opening. Therefore, as shown in FIG. 5, the customer may intentionally cause the barcode scanner 51 of the cash register 50 to scan the barcode 215 attached to the beverage can 211 exposed from the exterior of the packaged product 213 . In this case, multiple beverage cans are illegally purchased for the price of one.

例えば、不正な購買動作を検出する方法として、スキャンされた商品の数と持ち出された商品の数とを照合する方法がある。しかし、このような「バーコードスキャン誤り」の購買動作が行われた場合、スキャンされた商品の数と持ち出された商品の数とが一致してしまうので、不正動作を容易に検出できない。 For example, as a method of detecting fraudulent purchasing behavior, there is a method of collating the number of products that have been scanned and the number of products that have been taken out. However, when such a "barcode scan error" purchase operation is performed, the number of scanned products matches the number of products taken out, so the fraudulent operation cannot be easily detected.

図６は、誤った購買動作が行われた場合の画像の例を示す図である。図６に示す画像２２１，２２２には、上記のようにパッケージ商品２１３について「バーコードスキャン誤り」の購買動作が行われたときの状況が写っている。 FIG. 6 is a diagram showing an example of an image when an erroneous purchasing action is performed. Images 221 and 222 shown in FIG. 6 show the situation when the purchase operation of "barcode scanning error" is performed for the package product 213 as described above.

画像２２１は、顧客が店内用のカゴ２１６からパッケージ商品２１３を手に持って取り出した状態を示している。画像２２２は、その後に顧客が、手に持ったパッケージ商品２１３をバーコードスキャナ５１に近づけてスキャン操作を行っている状態を示している。ただし、画像２２２において顧客は、パッケージ商品２１３に付加されたバーコードではなく、パッケージ商品２１３内の飲料缶に付加されたバーコードをバーコードスキャナ５１にスキャンさせようとしている。このようなスキャン操作を行う場合、顧客は、手に持ったパッケージ商品２１３を回転させて飲料缶のバーコードのスキャンが可能となる位置まで移動させるという、正常な購買動作時とは異なる動作を行う。例えば、パッケージ商品２１３を回転させるために、腕や指（親指）などが正常時とは異なる特有の動きをする。 An image 221 shows a state in which the customer has taken out the package product 213 from the basket 216 for the store. An image 222 shows a state in which the customer then brings the package product 213 held in his/her hand close to the bar code scanner 51 and performs a scanning operation. However, in image 222 , the customer is trying to have barcode scanner 51 scan the barcode attached to the beverage can within package 213 , not the barcode attached to package 213 . When performing such a scanning operation, the customer rotates the packaged product 213 held in his or her hand and moves it to a position where the bar code of the beverage can can be scanned. conduct. For example, in order to rotate the package product 213, the arm, fingers (thumb), and the like move in a unique manner different from normal movements.

したがって、顧客によるこのような動作を、少なくとも正常時とは異なる動作として画像から認識できれば、そのような動作が行われたときに警告を発することができるようになる。しかしながら、次の図７に示すように、ＨＯＩＤを用いた上記の第１の比較例や、骨格検出を用いた上記の第２の比較例を用いた場合、上記動作を誤った購買動作として認識することは難しい。 Therefore, if such an action by the customer can be recognized from the image at least as an action different from the normal operation, a warning can be issued when such an action is performed. However, as shown in the following FIG. 7, when the first comparative example using HOID and the second comparative example using skeleton detection are used, the above action is recognized as an erroneous purchase action. difficult to do.

図７は、第１、第２の比較例を用いた認識処理例とその応用について示す図である。
図７の上側に示す画像２２１ａ，２２２ａは、図６の画像２２１，２２２を基に第１の比較例を適用して、商品の位置を認識した場合を示している。この場合、ＨＯＩＤにより、まず人物の領域を示すバウンディングボックス（図示せず）が検出され、次に、この人物と相互関係がある商品の領域を示すバウンディングボックス２１７が検出される。実際には、商品は、人物が手に持っている何らかの物体（Ｓｏｍｅｔｈｉｎｇ）として認識されるが、キャッシュレジスタ５０の前で人物が手に持っていることから、物体が商品であると特定できる。 FIG. 7 is a diagram showing an example of recognition processing using the first and second comparative examples and its application.
Images 221a and 222a shown on the upper side of FIG. 7 show the case where the position of the product is recognized by applying the first comparative example based on the images 221 and 222 of FIG. In this case, HOID first finds a bounding box (not shown) indicating the area of the person and then finds the bounding box 217 indicating the area of the item that is correlated with this person. In reality, the product is recognized as some kind of object (Something) held by a person, but since the person is holding it in front of the cash register 50, it can be identified as a product.

このように、ＨＯＩＤを用いた第１の比較例を適用した場合、撮影画像から商品の位置を検出可能である。このため、顧客が商品をバーコードスキャナ５１に近づけたことは認識可能である。しかし、顧客の身体の動きを検出できないので、パッケージ商品２１３を回転させる際の身体の特有の動き（例えば指の動き）を認識することはできない。このため、上記のような顧客の購買動作を正常な購買動作と区別することはできない。 Thus, when the first comparative example using HOID is applied, the position of the product can be detected from the captured image. Therefore, it is possible to recognize that the customer brought the product closer to the barcode scanner 51 . However, since the movement of the customer's body cannot be detected, it is not possible to recognize the characteristic movement of the customer's body (for example, movement of the finger) when rotating the package product 213 . Therefore, the customer's purchasing behavior as described above cannot be distinguished from normal purchasing behavior.

一方、図７の下側に示す画像２２１ｂ，２２２ｂは、図６の画像２２１，２２２を基に第２の比較例を適用して、人物の動きを認識した場合を示している。この場合、骨格検出により人物の関節の位置が検出される。画像２２１ｂ，２２２ｂでは、手首と親指の関節とを結んだ線を太線２１８ａ～２１８ｃで表している。 On the other hand, images 221b and 222b shown on the lower side of FIG. 7 show the case where the movement of a person is recognized by applying the second comparative example based on the images 221 and 222 of FIG. In this case, the positions of the joints of the person are detected by skeleton detection. In the images 221b and 222b, thick lines 218a to 218c represent lines connecting the wrist and the joint of the thumb.

このように、骨格検出を用いた第２の比較例を適用した場合、撮影画像から、不正なスキャン操作のために商品を回転する際の身体の特有の動き（例えば指の動き）を検出可能である。しかし、商品の位置を検出できないので、検出された身体の動きを商品と関連付けて「商品を持った状態での動き」として認識することができない。 In this way, when the second comparative example using skeleton detection is applied, it is possible to detect, from the captured image, specific body movements (for example, finger movements) when rotating the product due to unauthorized scanning operations. is. However, since the position of the product cannot be detected, the detected body movement cannot be associated with the product and recognized as "movement while holding the product".

そこで、本実施の形態の監視装置１００は、ＨＯＩＤを用いた第１の比較例の技術と、骨格検出を用いた第２の比較例の技術とを組み合わせることで、商品の位置と身体の特有の動きの両方を検出できるようにする。これにより、監視装置１００は、事前に多数の商品の画像を登録しておくことなく、キャッシュレジスタ５０の前で顧客が手に持っている商品を認識できる。そして、監視装置１００は、その商品の移動軌跡や画像内の見え方に加え、その商品を持ってスキャン操作を行う顧客の身体の動きを正確に認識できる。その結果、監視装置１００は、上記の「バーコードスキャン誤り」をはじめとする各種の誤った購買動作を、正常な購買動作と区別して認識できるようになる。 Therefore, the monitoring apparatus 100 of the present embodiment combines the technique of the first comparative example using HOID and the technique of the second comparative example using skeleton detection, so that the position of the product and the uniqueness of the body can be detected. motion can be detected. As a result, the monitoring device 100 can recognize the product held by the customer in front of the cash register 50 without registering images of many products in advance. The monitoring apparatus 100 can accurately recognize the movement trajectory of the product and how it appears in the image, as well as the body movement of the customer who carries the product and performs the scanning operation. As a result, the monitoring device 100 can distinguish various erroneous purchase actions including the above-mentioned "barcode scan error" from normal purchase actions.

図８は、監視装置が備える処理機能の構成例を示す図である。図８に示すように、監視装置１００は、画像取得部１２１、画像特徴抽出部１２２、骨格情報抽出部１２３、購買動作抽出部１２４、学習部１２５、判定部１２６、画像記憶部１３１および学習モデル記憶部１３２を備える。監視装置１００は、画像取得部１２１、画像特徴抽出部１２２、骨格情報抽出部１２３、購買動作抽出部１２４、学習部１２５および判定部１２６の処理は、例えば、監視装置１００が備えるプロセッサ１１１が所定のプログラムを実行することで実現される。画像記憶部１３１および学習モデル記憶部１３２は、ＲＡＭ１１２やＨＤＤ１１３など、監視装置１００が備える記憶装置の記憶領域によって実現される。 FIG. 8 is a diagram illustrating a configuration example of processing functions included in the monitoring apparatus. As shown in FIG. 8, the monitoring device 100 includes an image acquisition unit 121, an image feature extraction unit 122, a skeleton information extraction unit 123, a purchasing behavior extraction unit 124, a learning unit 125, a determination unit 126, an image storage unit 131, and a learning model. A storage unit 132 is provided. In the monitoring device 100, the processes of the image acquiring unit 121, the image feature extracting unit 122, the skeleton information extracting unit 123, the purchasing motion extracting unit 124, the learning unit 125, and the determining unit 126 are performed by the processor 111 provided in the monitoring device 100, for example. It is realized by executing the program of The image storage unit 131 and the learning model storage unit 132 are implemented by a storage area of a storage device included in the monitoring apparatus 100, such as the RAM 112 and the HDD 113. FIG.

画像取得部１２１は、カメラ１０１によって撮影された動画像のデータを取得する。学習時においては、取得された動画像のデータは学習データとして画像記憶部１３１に格納された後、画像記憶部１３１から読み出された画像特徴抽出部１２２と骨格情報抽出部１２３とに入力される。一方、購買動作の判定時においては、取得された動画像のデータは画像特徴抽出部１２２と骨格情報抽出部１２３に順次入力される。 The image acquisition unit 121 acquires data of moving images captured by the camera 101 . At the time of learning, the data of the acquired moving image is stored in the image storage unit 131 as learning data, and then input to the image feature extraction unit 122 and the skeleton information extraction unit 123 read out from the image storage unit 131. be. On the other hand, at the time of determining a purchasing action, the acquired moving image data is sequentially input to the image feature extraction unit 122 and the skeleton information extraction unit 123 .

画像特徴抽出部１２２は、入力された動画像をＨＯＩＤの学習モデル（ここではＮＮとする）に投入して、人物の情報と、その人物と相互作用がある商品の情報とを抽出する。具体的には、商品の外観情報と、人物および商品の位置情報とが抽出される。 The image feature extraction unit 122 inputs the input moving image into a HOID learning model (NN here), and extracts information on a person and information on products interacting with the person. Specifically, the appearance information of the product and the position information of the person and the product are extracted.

骨格情報抽出部１２３は、入力された動画像から骨格検出を行い、人物の骨格情報を抽出する。
購買動作抽出部１２４は、画像特徴抽出部１２２および骨格情報抽出部１２３によって動画像から抽出された情報に基づいて、購買動作を示す特徴量を生成する。 The skeleton information extraction unit 123 performs skeleton detection from the input moving image and extracts the skeleton information of the person.
Purchasing action extracting unit 124 generates a feature amount indicating a purchasing action based on the information extracted from the moving image by image feature extracting unit 122 and skeleton information extracting unit 123 .

学習部１２５は、学習データとしての多数の動画像から購買動作抽出部１２４によって抽出された特徴量を用いて、正常な購買動作と異常な購買動作とを識別するための識別器を学習する。本実施の形態では、ＮＮを用いたディープラーニングが実行されるものとする。学習部１２５は、学習によって得られた学習モデル（ＮＮ）を示すデータを、学習モデル記憶部１３２に格納する。 The learning unit 125 learns a discriminator for discriminating between a normal buying action and an abnormal buying action using the feature amount extracted by the purchasing action extracting unit 124 from many moving images as learning data. In this embodiment, it is assumed that deep learning using NN is executed. The learning unit 125 stores data indicating the learning model (NN) obtained by learning in the learning model storage unit 132 .

判定部１２６は、学習モデル記憶部１３２に格納された学習モデルのデータに基づく識別器として動作する。判定部１２６は、カメラ１０１によって撮影された動画像から購買動作抽出部１２４によって抽出された特徴量を用いて、正常な購買動作と異常な購買動作とを識別し、異常な購買動作を検出した場合には警告を発する。 The determination unit 126 operates as a discriminator based on the learning model data stored in the learning model storage unit 132 . The determining unit 126 distinguishes between normal purchasing actions and abnormal purchasing actions using the feature amount extracted by the purchasing action extracting unit 124 from the moving image captured by the camera 101, and detects abnormal purchasing actions. issue a warning if necessary.

本実施の形態では、実際の購買動作を録画して得られた動画像が、ラベル付けされずに学習データとして利用される。このような動画像のうち、大多数は正常な購買動作が行われたときの動画像であり、ごく少数は異常な購買動作が行われたときの動画像となる。ただし、理想的には、正常な購買動作が行われた場合の動画像のみが利用されてもよい。学習部１２５は、識別器の一例として、このような動画像に基づき、ある時刻の特徴量から次の時刻の特徴量を予測する予測器を学習するものとする。したがって、判定部１２６は、このような予測器として動作する。 In this embodiment, moving images obtained by recording actual purchasing actions are used as learning data without being labeled. Among such moving images, most are moving images when normal purchasing actions are performed, and a very small number are moving images when abnormal buying actions are performed. However, ideally, only moving images of normal purchasing actions may be used. As an example of a classifier, the learning unit 125 learns a predictor that predicts the feature amount at the next time from the feature amount at a certain time based on such moving images. Therefore, the decision unit 126 operates as such a predictor.

図９は、画像特徴抽出部および骨格情報抽出部で生成されるデータの例を示す図である。画像特徴抽出部１２２および骨格情報抽出部１２３は、フレームのデータが入力されるたびに次のようなデータを生成する。 FIG. 9 is a diagram showing an example of data generated by the image feature extraction section and the skeleton information extraction section. The image feature extraction unit 122 and the skeleton information extraction unit 123 generate the following data each time frame data is input.

画像特徴抽出部１２２は、フレームをＨＯＩＤの学習モデル（ＮＮ）に入力することでＨＯＩＤ情報を出力する。ＨＯＩＤ情報には、フレームから認識された人物と物体との相互作用を示す情報が含まれる。相互作用を示す情報には、例えば、人物を識別する人物ＩＤと、その人物と相互作用がある物体を識別する物体ＩＤと、その物体に対する人物の動作の種別を示す動作ＩＤとが含まれる。 The image feature extraction unit 122 outputs HOID information by inputting the frame to the HOID learning model (NN). The HOID information includes information indicating interaction between a person and an object recognized from the frame. The information indicating interaction includes, for example, a person ID that identifies a person, an object ID that identifies an object that interacts with the person, and an action ID that indicates the type of action of the person with respect to the object.

本実施の形態において、画像特徴抽出部１２２は、動作の種別として「物体を持つ」ことを示す動作ＩＤを含むＨＯＩＤ情報を出力する。例えば、撮影画像から、互いに相互作用がある人物と物体との組み合わせが複数組抽出された場合、動作ＩＤが「物体を持つ」ことを示す組み合わせについてのＨＯＩＤ情報のみが出力される。 In the present embodiment, the image feature extraction unit 122 outputs HOID information including a motion ID indicating "holding an object" as the type of motion. For example, when a plurality of combinations of a person and an object interacting with each other are extracted from the captured image, only HOID information for the combination indicating that the action ID is "holding the object" is output.

また、ＨＯＩＤ情報には、人物に関する情報と物体に関する情報とがさらに含まれる。人物に関する情報には、人物ＩＤと、人物領域の位置情報とが含まれる。物体に関する情報には、物体ＩＤと、物体領域の位置情報と、物体の種別を示す物体種別ＩＤとが含まれる。なお、人物領域および物体領域の位置情報としては、例えば、人物領域および物体領域をそれぞれ示すバウンディングボックスの位置を示す情報（例えば、四隅の座標）が用いられる。 The HOID information further includes information about a person and information about an object. The information about the person includes the person ID and the position information of the person area. The information about the object includes an object ID, position information of the object area, and an object type ID indicating the type of the object. As the position information of the human region and the object region, for example, information indicating the positions of bounding boxes respectively indicating the human region and the object region (for example, the coordinates of the four corners) is used.

画像特徴抽出部１２２は、上記のＨＯＩＤ情報から、物体についての外観情報および位置情報を生成して購買動作抽出部１２４に出力する。物体についての外観情報には、フレームから物体領域を切り出した画像が含まれる。この外観情報には、例えば、物体の色や形状、大きさを示す情報が含まれてもよい。物体についての位置情報には、物体領域（バウンディングボックス）の位置情報が含まれる。この位置情報としては、例えば、物体領域の四隅の座標が含まれる。また、外観情報には、さらに、物体領域と人物領域との相対的な位置関係を示す情報が含まれてもよい。 The image feature extraction unit 122 generates appearance information and position information about the object from the HOID information, and outputs the information to the purchase action extraction unit 124 . Appearance information about an object includes an image of the object region cut out from the frame. This appearance information may include, for example, information indicating the color, shape, and size of the object. The positional information about the object includes the positional information of the object area (bounding box). This position information includes, for example, the coordinates of the four corners of the object area. Also, the appearance information may further include information indicating the relative positional relationship between the object area and the person area.

なお、実際には、ＨＯＩＤにより物体として商品が正確に認識されるような補足的な処理が実行されることが望ましい。例えば、撮影画像内の所定位置にＲＯＩ（Region Of Interest）が設定され、認識された物体とＲＯＩとの位置関係に基づいて物体として商品が正確に認識されるようにする。一例として、キャッシュレジスタ５０の前において、商品が店内用のカゴから取り出されたときに位置する領域や、商品が仮置きされる領域がＲＯＩとして設定される。画像特徴抽出部１２２は、ＨＯＩＤによって新たに認識された物体の位置がＲＯＩ内である場合、その物体を商品と認識する一方、新たに認識された物体の位置が人物のバウンディングボックス内である場合、その物体は商品でない（例えば、財布などの顧客の私物）と認識する。また、画像特徴抽出部１２２は、ＲＯＩに物体が進入する前後におけるＲＯＩ内の画像変化（背景差分など）に基づいて、物体として商品を見分けるようにしてもよい。 In practice, it is desirable that supplementary processing be executed so that the product can be accurately recognized as an object by the HOID. For example, an ROI (Region Of Interest) is set at a predetermined position in the captured image, and the product is accurately recognized as the object based on the positional relationship between the recognized object and the ROI. As an example, in front of the cash register 50, a region in which a product is taken out from a basket for use in the store and a region in which the product is temporarily placed are set as ROIs. The image feature extraction unit 122 recognizes the object as a product when the position of the object newly recognized by the HOID is within the ROI, and when the position of the newly recognized object is within the bounding box of the person. , that the object is not a commodity (eg, a customer's personal item such as a wallet). Further, the image feature extraction unit 122 may distinguish the product as the object based on the image change (background difference etc.) in the ROI before and after the object enters the ROI.

一方、骨格情報抽出部１２３は、フレームから人物の骨格情報を生成して購買動作抽出部１２４に出力する。骨格情報には、関節点（関節の中心点）を識別する関節点ＩＤと、関節点の位置情報（座標）との組み合わせが、検出された関節点ごとに含められる。また、骨格情報には、関節点ごとの情報として、関節点の信頼度スコアが含まれてもよい。 On the other hand, the skeleton information extraction unit 123 generates the skeleton information of the person from the frame and outputs it to the purchasing motion extraction unit 124 . The skeleton information includes, for each detected joint point, a combination of a joint point ID for identifying a joint point (center point of the joint) and position information (coordinates) of the joint point. In addition, the skeleton information may include a joint point reliability score as information for each joint point.

なお、人物の骨格情報は、例えば、各関節点の位置と、画像特徴抽出部１２２で検出された人物領域または物体領域の位置との比較結果から、画像特徴抽出部１２２で検出された人物または物体と関連付けられる。画像特徴抽出部１２２で検出された人物のバウンディングボックス内に、骨格情報抽出部１２３によって検出された関節点（例えば所定数以上の関節点）が含まれる場合に、バウンディングボックスに対応する人物と関節点に対応する人物とが同一人物と判定される。この場合、関節点のデータに対して、バウンディングボックスに対応する人物の人物ＩＤが付加される。 Note that the skeleton information of a person is obtained by, for example, comparing the position of each joint point with the position of the person region or object region detected by the image feature extraction unit 122, and the person or object region detected by the image feature extraction unit 122. Associated with an object. When the bounding box of the person detected by the image feature extraction unit 122 includes the joint points (for example, a predetermined number or more of joint points) detected by the skeleton information extraction unit 123, the person and joints corresponding to the bounding box are identified. The person corresponding to the point is determined to be the same person. In this case, the person ID of the person corresponding to the bounding box is added to the joint point data.

図１０は、購買動作抽出部の内部構成例を示す図である。図１０に示すように、購買動作抽出部１２４は、環境差補正部１４１、特徴量算出部１４２および時系列情報処理部１４３を備える。 FIG. 10 is a diagram showing an example of the internal configuration of a purchasing action extraction unit. As shown in FIG. 10 , the purchasing behavior extraction unit 124 includes an environment difference correction unit 141 , a feature amount calculation unit 142 and a time series information processing unit 143 .

環境差補正部１４１は、画像特徴抽出部１２２から入力された物体の外観情報および位置情報と、骨格情報抽出部１２３から入力された人物の骨格情報とを、カメラ１０１による撮影環境に応じて補正する。例えば、カメラ１０１とキャッシュレジスタ５０との相対距離や、撮影画像の画素数に基づき、撮影空間内の基準位置（例えば、キャッシュレジスタ５０の表面の所定箇所）における距離と画像上の画素数とが常に一定になるように、物体の外観情報および位置情報と人物の骨格情報とが補正される。 The environment difference correction unit 141 corrects the external appearance information and position information of the object input from the image feature extraction unit 122 and the skeleton information of the person input from the skeleton information extraction unit 123 according to the shooting environment of the camera 101. do. For example, based on the relative distance between the camera 101 and the cash register 50 and the number of pixels in the photographed image, the distance at a reference position in the photographing space (for example, a predetermined location on the surface of the cash register 50) and the number of pixels on the image. Appearance information and position information of an object and skeleton information of a person are corrected so that they are always constant.

特徴量算出部１４２は、物体の外観情報、物体の位置情報、人物の骨格情報のそれぞれから特徴量ベクトルを算出し、それらの特徴量ベクトルを１つの特徴量ベクトルに統合する。この特徴量算出部１４２は、外観特徴抽出部１５１、物体位置特徴抽出部１５２、骨格特徴抽出部１５３および特徴量ベクトル統合部１５４を備える。 The feature amount calculation unit 142 calculates feature amount vectors from each of the object appearance information, the object position information, and the person's skeleton information, and integrates these feature amount vectors into one feature amount vector. The feature amount calculation unit 142 includes an appearance feature extraction unit 151 , an object position feature extraction unit 152 , a skeleton feature extraction unit 153 and a feature amount vector integration unit 154 .

外観特徴抽出部１５１は、補正後の物体の外観情報を特徴量ベクトル化して、外観特徴量ベクトルを算出する。例えば、ＲＯＩＡｌｉｇｎ法により外観情報（撮影画像から切り出された部分画像）から特徴量ベクトルが生成される。 The appearance feature extraction unit 151 converts the corrected appearance information of the object into a feature amount vector to calculate an appearance feature amount vector. For example, a feature amount vector is generated from appearance information (a partial image cut out from a photographed image) by the ROI Align method.

物体位置特徴抽出部１５２は、補正後の物体の位置情報を特徴量ベクトル化して、物体位置特徴量ベクトルを算出する。例えば、物体領域（バウンディングボックス）の中心座標、横幅、高さ、大きさ、アスペクト比、人体領域に対する物体領域の大きさの比率、人体領域と物体領域との重複率（Intersection over Union：ＩｏＵ）、人体領域の中心と物体領域の中心との距離、物体領域と撮影空間上の基準位置（例えば、キャッシュレジスタ５０の横で商品が一時的に置かれる台）との相対座標のうちの少なくとも１つが、ベクトル化される。そして、生成されたベクトルを線形変換することで、物体位置特徴量ベクトルが生成される。 The object position feature extraction unit 152 converts the corrected object position information into a feature amount vector to calculate an object position feature amount vector. For example, the center coordinates, width, height, size, aspect ratio of the object region (bounding box), the ratio of the size of the object region to the human body region, and the overlap ratio between the human body region and the object region (Intersection over Union: IoU) , the distance between the center of the human body region and the center of the object region, and the relative coordinates between the object region and a reference position in the shooting space (for example, a table on which goods are temporarily placed next to the cash register 50). one is vectorized. Then, by linearly transforming the generated vector, an object position feature amount vector is generated.

骨格特徴抽出部１５３は、補正後の人物の骨格情報を特徴量ベクトル化して、骨格位置特徴量ベクトルを算出する。例えば、各関節点の座標、各関節点の信頼度スコア、ある関節点と他の関節点との相対座標、各関節点と撮影空間上の基準位置との相対座標、各関節点と物体領域の中心との相対座標のうちの少なくとも１つが、ベクトル化される。そして、生成されたベクトルを線形変換することで、骨格位置特徴量ベクトルが生成される。 The skeletal feature extraction unit 153 converts the corrected human skeletal information into a feature amount vector to calculate a skeletal position feature amount vector. For example, the coordinates of each joint point, the reliability score of each joint point, the relative coordinates between one joint point and another joint point, the relative coordinates between each joint point and the reference position in the shooting space, each joint point and the object area. At least one of the coordinates relative to the center of is vectorized. Then, by linearly transforming the generated vector, a skeleton position feature vector is generated.

特徴量ベクトル統合部１５４は、算出された外観特徴量ベクトル、物体位置特徴量ベクトルおよび骨格位置特徴量ベクトルを統合して、統合特徴量ベクトルを算出する。このベクトル統合では、各特徴量ベクトルが関節点ごとに統合される。すなわち、１つの関節点についての骨格位置特徴量ベクトルと物体位置特徴量ベクトルおよび外観特徴量ベクトルとが統合されて、その関節点についての統合特徴量ベクトルが算出され、このような統合特徴量ベクトルが関節点ごとに算出される。これにより、各関節点の動きを正確に学習することが可能な学習データとしての特徴量ベクトルが生成されるようにする。 The feature amount vector integration unit 154 integrates the calculated appearance feature amount vector, object position feature amount vector, and skeleton position feature amount vector to calculate an integrated feature amount vector. In this vector integration, each feature vector is integrated for each joint point. That is, the skeletal position feature vector, the object position feature vector, and the appearance feature vector for one joint point are integrated to calculate an integrated feature vector for the joint point. is calculated for each joint point. Thereby, a feature amount vector is generated as learning data that enables accurate learning of the motion of each joint point.

特徴量ベクトル統合部１５４は、例えば、コンカチネーション（Concatenation）によりそれぞれ複数次元の特徴量ベクトルを合計次元数の特徴量ベクトルに統合し、線形変換や非線形変換を行うことで統合特徴量ベクトルを算出する。例えば、外観特徴量ベクトルをｘ₁、物体位置特徴量ベクトルをｘ₂、関節点ｉの骨格位置特徴量ベクトルをｙ_iとしたとき、関節点ｉの統合特徴量ベクトルｚ_iは次の式（１）で表される。
ｚ_i＝Ｗ₂＊σ（Ｗ₁＊［ｘ₁，ｘ₂，ｙ_i］）・・・（１）
なお、式（１）において、Ｗ₁，Ｗ₂は所定の重み係数を表し、［］はコンカチネーションを表し、σは非線形変換を表し、＊は内積を表す。 The feature amount vector integration unit 154 integrates feature amount vectors of a plurality of dimensions into feature amount vectors of the total number of dimensions by, for example, concatenation, and performs linear transformation or nonlinear transformation to calculate an integrated feature amount vector. do. For example, when the appearance feature amount vector is x ₁ , the object position feature amount vector is x ₂ , and the skeletal position feature amount vector of joint point i is y _i , the integrated feature amount vector z _i of joint point i is expressed by the following equation ( 1).
z _i =W ₂ *σ(W ₁ *[x ₁ ,x ₂ ,y _i ]) (1)
In equation (1), W ₁ and W ₂ represent predetermined weighting coefficients, [ ] represents concatenation, σ represents nonlinear transformation, and * represents an inner product.

以上の手順により生成された関節点ごとの統合特徴量ベクトルは、そのまま学習部１２５や判定部１２６に入力されてもよい。一方、本実施の形態では、これらの統合特徴量ベクトルは時系列情報処理部１４３で処理された後、学習部１２５や判定部１２６に入力される。時系列情報処理部１４３は、生成された関節点ごとの統合特徴量ベクトルを時間的な連続性にしたがって処理することで、購買動作全体を表す特徴量ベクトルを生成する。この処理は、例えば、ＳＴ－ＧＣＮ（Spatial Temporal－Graph Convolutional Networks）を用いて実行される。ＳＴ－ＧＣＮでは、関節点ごとの統合特徴量ベクトルから、関節点の位置に基づく空間パターンと時間パターンの両方が学習される。 The integrated feature amount vector for each joint point generated by the above procedure may be directly input to the learning unit 125 and the determination unit 126 . On the other hand, in the present embodiment, these integrated feature amount vectors are processed by time-series information processing section 143 and then input to learning section 125 and determination section 126 . The time-series information processing unit 143 processes the generated integrated feature amount vector for each joint point according to temporal continuity, thereby generating a feature amount vector representing the entire purchase action. This processing is performed using, for example, ST-GCN (Spatial Temporal-Graph Convolutional Networks). In ST-GCN, both spatial patterns and temporal patterns based on joint point positions are learned from integrated feature vector for each joint point.

次に、学習部１２５および判定部１２６について説明する。前述のように、本実施の形態では、判定部１２６は、ある時刻の特徴量から次の時刻の特徴量を予測する予測器として動作する。学習部１２５は、このような予測器を学習する。 Next, learning section 125 and determination section 126 will be described. As described above, in the present embodiment, the determining unit 126 operates as a predictor that predicts the feature quantity at the next time from the feature quantity at a certain time. The learning unit 125 learns such a predictor.

図１１は、予測器としての判定部の処理について説明するための図である。図１１に示す特徴量空間２３１は、購買動作抽出部１２４から出力される統合特徴量ベクトルの各次元を有する座標空間を、二次元の座標空間として簡易的に表したものである。ここでは、フレーム周期を単位時間として、ある時刻Ｔと時刻（Ｔ＋１）との間の特徴量の変化について考える。時刻Ｔのフレームが現フレームの場合、時刻（Ｔ＋１）のフレームは次フレームとなる。 FIG. 11 is a diagram for explaining the processing of the determination unit as a predictor. A feature amount space 231 shown in FIG. 11 is a simple representation of a coordinate space having each dimension of the integrated feature amount vector output from the purchasing action extraction unit 124 as a two-dimensional coordinate space. Here, considering the frame period as a unit time, the change in the feature amount between a certain time T and time (T+1) is considered. If the frame at time T is the current frame, the frame at time (T+1) is the next frame.

学習部１２５は、実際の購買動作を録画して得られた動画像から生成された統合特徴量ベクトルを基に、時刻Ｔから時刻（Ｔ＋１）の特徴量を予測する予測器を学習する。このような学習は、例えば、学習モデルとしてＲＮＮ（Recurrent Neural Network）を用いて実行される。 The learning unit 125 learns a predictor that predicts feature amounts from time T to time (T+1) based on integrated feature amount vectors generated from moving images obtained by recording actual purchasing actions. Such learning is performed using, for example, an RNN (Recurrent Neural Network) as a learning model.

ここで、学習部１２５に入力される学習データとしての統合特徴量ベクトルには、正常・異常を示すラベルは付加されなくてよい。ただし、このような動画像の大多数は正常な購買動作が行われたときの動画像であり、異常な購買動作が行われたときの動画像はごく少数である。このため、学習部１２５は、正常な購買動作が行われたときの特徴量を予測する予測器を学習することになる。 Here, labels indicating normality/abnormality may not be added to the integrated feature vector as learning data input to the learning unit 125 . However, the majority of such moving images are moving images when normal purchasing actions are performed, and the moving images when abnormal purchasing actions are performed are very few. Therefore, the learning unit 125 learns a predictor that predicts the feature amount when a normal purchase action is performed.

そして、このようにして学習された予測器は、時刻Ｔの特徴量が入力されたとき、正常な購買動作が行われた場合における時刻（Ｔ＋１）の特徴量を予測する。例えば、予測器は、時刻Ｔにおいて商品や指の関節がある位置にある場合、正常な購買動作であれば時刻（Ｔ＋１）において商品や指の関節がどの位置に移動するかを予測できる。逆に、予測器は、時刻Ｔの特徴量が入力されたとき、異常な購買動作が行われた場合における時刻（Ｔ＋１）の特徴量を予測することはできない。 Then, the predictor learned in this way predicts the feature amount at time (T+1) when the feature amount at time T is input, when a normal purchasing action is performed. For example, if the product or the knuckle is in a certain position at time T, the predictor can predict where the product or the knuckle will move at time (T+1) if the purchase is normal. Conversely, when the feature amount at time T is input, the predictor cannot predict the feature amount at time (T+1) when an abnormal purchasing action is performed.

例えば、図１１に示す特徴量空間２３１において、時刻Ｔの特徴量Ｆ_Tが予測器に入力されたとき、予測器は、正常動作の特徴量群の中から時刻（Ｔ＋１）の特徴量Ｆ’_T+1を予測する。しかし、入力された特徴量Ｆ_Tが異常な購買動作が行われた場合の特徴量である場合、予測された時刻（Ｔ＋１）の特徴量Ｆ’_T+1と、時刻（Ｔ＋１）における実際の特徴量Ｆ_T+1との間の特徴量空間２３１における距離Ｄが離れる。したがって、判定部１２６は、このように予測された特徴量Ｆ’_T+1と実際の特徴量Ｆ_T+1との間の空間的な距離Ｄが所定の閾値より大きい場合に、異常な購買動作が行われたと判定することができる。 For example, in the feature amount space 231 shown in FIG. 11, when the feature amount FT at time _T is input to the predictor, the predictor selects the feature amount F' Predict _T+1 . However, if the input feature amount FT is a feature amount when an abnormal purchasing behavior is performed, the predicted feature amount F' _T ₊₁ at time (T+1) and the actual feature amount at time (T+1) A distance D in the feature amount space 231 between the feature amount F _T+1 is increased. Therefore, if the spatial distance D between the predicted feature amount F' _T+1 and the actual feature amount F _T+1 is greater than a predetermined threshold, the determination unit 126 determines whether the purchase is abnormal. It can be determined that an action has been performed.

このような処理によれば、前述の「バーコードスキャン誤り」のように、商品や関節の動きが正常の場合とは異なる動作を顧客がとった場合に、異常な購買動作が行われたと判定することが可能となる。 According to this process, it is determined that an abnormal purchase action has been performed when the customer makes an action that is different from when the movement of the product or joints is normal, such as the above-mentioned "barcode scan error". It becomes possible to

次に、図１２は、予測器（判定部）の学習処理手順を示すフローチャートの例である。
［ステップＳ１１］画像取得部１２１は、カメラ１０１によって購買動作が撮影された動画像を収集する。これらの動画像には、正常な購買動作が行われたときの動画像が相対的に多数含まれ、異常な購買動作が行われたときの動画像が少数含まれる。ただし、正常な購買動作が行われたときの動画像のみが収集されてもよい。画像取得部１２１は、収集された各動画像のデータを画像記憶部１３１に格納する。 Next, FIG. 12 is an example of a flowchart showing a learning processing procedure of the predictor (determination unit).
[Step S<b>11 ] The image acquisition unit 121 collects moving images in which the camera 101 captures the purchase action. These moving images include a relatively large number of moving images when normal purchasing actions are performed, and a small number of moving images when abnormal purchasing actions are performed. However, it is also possible to collect only moving images when a normal purchasing action is performed. The image acquisition unit 121 stores the collected data of each moving image in the image storage unit 131 .

［ステップＳ１２］画像記憶部１３１に格納された各動画像に対して、ステップＳ１８までの特徴量生成ループの処理が実行される。
［ステップＳ１３］動画像に含まれる各フレームに対して、ステップＳ１７までのフレーム処理ループが実行される。 [Step S12] For each moving image stored in the image storage unit 131, the processing of the feature generation loop up to step S18 is executed.
[Step S13] A frame processing loop up to step S17 is executed for each frame included in the moving image.

［ステップＳ１４］画像特徴抽出部１２２は、動画像のデータをＨＯＩＤの学習モデル（ＮＮ）に入力してＨＯＩＤ情報を算出する。画像特徴抽出部１２２は、ＨＯＩＤ情報に基づき、特徴量として物体の外観情報および位置情報を抽出する。また、骨格情報抽出部１２３は、動画像から骨格検出を行い、特徴量として人物の骨格情報を抽出する。 [Step S14] The image feature extraction unit 122 inputs the moving image data to the HOID learning model (NN) to calculate HOID information. The image feature extraction unit 122 extracts appearance information and position information of an object as feature amounts based on the HOID information. Also, the skeleton information extraction unit 123 performs skeleton detection from the moving image, and extracts the skeleton information of the person as a feature amount.

［ステップＳ１５］購買動作抽出部１２４の環境差補正部１４１は、抽出された特徴量（外観情報、位置情報および骨格情報）を撮影環境に応じて補正する。
［ステップＳ１６］補正後の特徴量に基づいて、特徴量ベクトルが算出される。 [Step S15] The environment difference correction unit 141 of the purchase action extraction unit 124 corrects the extracted feature amounts (appearance information, position information, and skeleton information) according to the shooting environment.
[Step S16] A feature amount vector is calculated based on the corrected feature amount.

具体的には、外観特徴抽出部１５１は、物体の外観情報に基づいて外観特徴量ベクトルを算出する。また、物体位置特徴抽出部１５２は、物体の位置情報に基づいて物体位置特徴量ベクトルを算出する。さらに、骨格特徴抽出部１５３は、人物の骨格情報に基づいて骨格位置特徴量ベクトルを算出する。そして、特徴量ベクトル統合部１５４は、外観特徴量ベクトル、物体位置特徴量ベクトルおよび骨格位置特徴量ベクトルを統合して、関節点ごとに統合特徴量ベクトルを算出する。時系列情報処理部１４３は、関節点ごとの統合特徴量ベクトルを時間的な連続性にしたがって処理することで、購買動作全体を表す特徴量ベクトルを生成する。 Specifically, the appearance feature extraction unit 151 calculates an appearance feature amount vector based on the appearance information of the object. Also, the object position feature extraction unit 152 calculates an object position feature amount vector based on the position information of the object. Furthermore, the skeletal feature extraction unit 153 calculates a skeletal position feature vector based on the skeletal information of the person. Then, the feature amount vector integration unit 154 integrates the appearance feature amount vector, the object position feature amount vector, and the skeleton position feature amount vector, and calculates an integrated feature amount vector for each joint point. The time-series information processing unit 143 generates a feature vector representing the entire purchase action by processing the integrated feature vector for each joint point according to temporal continuity.

［ステップＳ１７］動画像に含まれる全フレームに対してステップＳ１４～Ｓ１６の処理が実行されると、フレーム処理ループの処理が終了し、処理がステップＳ１８に進められる。 [Step S17] When the processes of steps S14 to S16 are executed for all the frames included in the moving image, the frame processing loop ends, and the process proceeds to step S18.

［ステップＳ１８］画像記憶部１３１に格納された全動画像に対してステップＳ１３～Ｓ１７の処理が実行されると、特徴量生成ループの処理が終了し、処理がステップＳ１９に進められる。 [Step S18] When the processes of steps S13 to S17 are executed for all the moving images stored in the image storage unit 131, the process of the feature generation loop ends, and the process proceeds to step S19.

［ステップＳ１９］学習部１２５は、購買動作全体を表す特徴量ベクトルに基づいて、時刻Ｔの特徴量から時刻（Ｔ＋１）の特徴量を予測する予測器を学習する。この学習によって生成された学習モデル（ＮＮ）のデータは、学習モデル記憶部１３２に格納される。 [Step S19] The learning unit 125 learns a predictor that predicts the feature amount at time (T+1) from the feature amount at time T based on the feature amount vector representing the entire purchase action. The learning model (NN) data generated by this learning is stored in the learning model storage unit 132 .

図１３は、予測器を用いた判定処理手順を示すフローチャートの例である。
［ステップＳ２１］画像取得部１２１は、カメラ１０１によって撮影された動画像のフレームを取得する。 FIG. 13 is an example of a flowchart showing a determination processing procedure using a predictor.
[Step S<b>21 ] The image acquisition unit 121 acquires frames of moving images shot by the camera 101 .

［ステップＳ２２］画像特徴抽出部１２２は、フレームのデータをＨＯＩＤの学習モデル（ＮＮ）に入力してＨＯＩＤ情報を算出する。このフレームは、直前のステップＳ２１またはステップＳ２９で取得されたフレームである。画像特徴抽出部１２２は、ＨＯＩＤ情報に基づき、特徴量として物体の外観情報および位置情報を抽出する。また、骨格情報抽出部１２３は、動画像から骨格検出を行い、特徴量として人物の骨格情報を抽出する。 [Step S22] The image feature extraction unit 122 inputs the frame data to the HOID learning model (NN) to calculate HOID information. This frame is the frame acquired in the previous step S21 or step S29. The image feature extraction unit 122 extracts appearance information and position information of an object as feature amounts based on the HOID information. Also, the skeleton information extraction unit 123 performs skeleton detection from the moving image, and extracts the skeleton information of the person as a feature amount.

［ステップＳ２３］購買動作抽出部１２４の環境差補正部１４１は、抽出された特徴量（外観情報、位置情報および骨格情報）を撮影環境に応じて補正する。
［ステップＳ２４］補正後の特徴量に基づいて、特徴量ベクトルが算出される。 [Step S23] The environment difference correction unit 141 of the purchase action extraction unit 124 corrects the extracted feature amounts (appearance information, position information, and skeleton information) according to the shooting environment.
[Step S24] A feature amount vector is calculated based on the corrected feature amount.

［ステップＳ２５］判定部１２６は、生成された特徴量ベクトルを、学習モデル記憶部１３２に格納された学習モデルのデータに基づく予測器に入力して、次フレームの特徴量ベクトルを予測する。判定部１２６は、次フレームの特徴量ベクトルの予測結果を、ＲＡＭ１１２に一時的に保存する。 [Step S25] The determination unit 126 inputs the generated feature amount vector to a predictor based on the data of the learning model stored in the learning model storage unit 132, and predicts the feature amount vector of the next frame. The determination unit 126 temporarily stores the prediction result of the feature amount vector of the next frame in the RAM 112 .

［ステップＳ２６］前フレームの特徴量ベクトルに基づいて予測された、現フレームの特徴量ベクトルの予測結果がＲＡＭ１１２に保存されている場合、ステップＳ２６～Ｓ２８の処理が実行される。この予測結果は、前フレームについてのステップＳ２５の処理によってＲＡＭ１１２に保存されたものである。判定部１２６は、この予測結果をＲＡＭ１１２から取得し、この予測結果が示す特徴量ベクトルと、ステップＳ２４で現フレームから算出された特徴量ベクトルとの間の距離を算出する。 [Step S26] If the prediction result of the current frame feature amount vector predicted based on the previous frame feature amount vector is stored in the RAM 112, steps S26 to S28 are executed. This prediction result is saved in the RAM 112 by the process of step S25 for the previous frame. The determination unit 126 acquires this prediction result from the RAM 112 and calculates the distance between the feature amount vector indicated by this prediction result and the feature amount vector calculated from the current frame in step S24.

［ステップＳ２７］判定部１２６は、算出された距離と所定の閾値とを比較する。距離が閾値を超えた場合、処理がステップＳ２８に進められ、距離が閾値以下の場合、処理がステップＳ２９に進められる。 [Step S27] The determination unit 126 compares the calculated distance with a predetermined threshold. If the distance exceeds the threshold, the process proceeds to step S28, and if the distance is less than or equal to the threshold, the process proceeds to step S29.

［ステップＳ２８］距離が閾値を超えた場合、異常な購買動作が行われたと判定される。判定部１２６は、異常な購買動作が発生したことを警告する処理を実行する。
例えば、判定部１２６は、異常な購買動作が発生したことを示す画像情報を表示装置１１４ａに表示させる。監視装置１００がキャッシュレジスタ５０と通信可能な場合、判定部１２６は、このような画像情報をキャッシュレジスタ５０のディスプレイ５２に表示させてもよい。 [Step S28] If the distance exceeds the threshold, it is determined that an abnormal purchasing action has been performed. The determination unit 126 executes a process of warning that an abnormal purchasing action has occurred.
For example, the determination unit 126 causes the display device 114a to display image information indicating that an abnormal purchasing action has occurred. If the monitoring device 100 can communicate with the cash register 50 , the determination unit 126 may display such image information on the display 52 of the cash register 50 .

また、音声による警告が行われてもよい。例えば、判定部１２６は、異常な購買動作が発生したことを警告する警告音声を、監視装置１００に接続されたスピーカに出力させる。また、店員が無線通信を介して音声を聴取可能なイヤホンを装着している場合、判定部１２６は、異常な購買動作が発生したことを警告する音声情報を送信し、イヤホンに音声を出力させて、店員に異常発生を通知してもよい。 Also, an audible warning may be given. For example, the determination unit 126 causes the speaker connected to the monitoring device 100 to output a warning sound warning that an abnormal purchasing behavior has occurred. In addition, when the clerk is wearing an earphone capable of listening to voice via wireless communication, the determination unit 126 transmits voice information warning that an abnormal purchasing action has occurred, and causes the earphone to output voice. to notify the store clerk of the occurrence of the abnormality.

［ステップＳ２９］画像取得部１２１は、カメラ１０１によって撮影された次のフレームを取得する。
［ステップＳ３０］画像特徴抽出部１２２または骨格情報抽出部１２３により、前フレームと同一の人物が検出されているかが判定される。同一の人物が検出されている場合、処理がステップＳ２２に進められる。一方、同一の人物が検出されていない場合（その人物が撮影領域外に移動した場合）には、図１３の判定処理が終了する。 [Step S<b>29 ] The image acquisition unit 121 acquires the next frame captured by the camera 101 .
[Step S30] The image feature extraction unit 122 or the skeleton information extraction unit 123 determines whether the same person as in the previous frame is detected. If the same person is detected, the process proceeds to step S22. On the other hand, when the same person is not detected (when the person moves out of the shooting area), the determination processing in FIG. 13 ends.

以上の第２の実施の形態では、ＨＯＩＤによる商品の検出結果と人物の骨格情報の検出結果を基に学習することで、正しい購買動作が行われたか否かを高精度に判定可能な判定部１２６を実現できる。特に、上記のバーコードスキャン誤りのように、商品のバーコードのスキャン操作が実際に行われているような不正な購買動作を、正しい購買動作でないと判定できるようになる。 In the above-described second embodiment, the determination unit is capable of determining with high accuracy whether or not a correct purchase operation has been performed by learning based on the product detection result and the person's skeleton information detection result by HOID. 126 can be realized. In particular, it becomes possible to determine that a fraudulent purchase action in which the bar code of the product is actually scanned is not a correct purchase action, such as the barcode scan error described above.

〔第３の実施の形態〕
第３の実施の形態では、第２の実施の形態における監視装置１００の処理の一部が変形される。第２の実施の形態では、判定部１２６が、時刻Ｔの特徴量から時刻（Ｔ＋１）の特徴量を予測する予測器として動作し、時刻（Ｔ＋１）での特徴量の予測値と実際の特徴量との差分から、異常動作が発生したか否かを判定した。これに対して、第３の実施の形態では、正常動作か異常動作かを示すラベルが付加された動画像が学習データとして用いられて、正常動作と異常動作とを明示的に識別する識別器が学習される。そして、判定部１２６の処理は、このような識別器として動作するように変形される。 [Third Embodiment]
In the third embodiment, part of the processing of the monitoring device 100 in the second embodiment is modified. In the second embodiment, the determination unit 126 operates as a predictor that predicts the feature amount at time (T+1) from the feature amount at time T, and predicts the feature amount at time (T+1) and the actual feature amount. Based on the difference from the amount, it was determined whether or not an abnormal operation had occurred. On the other hand, in the third embodiment, a moving image with a label indicating whether it is a normal operation or an abnormal operation is used as learning data, and a discriminator that clearly distinguishes between normal operation and abnormal operation is used. is learned. Then, the processing of the determination unit 126 is modified to operate as such a discriminator.

図１４は、識別器（判定部）の学習処理手順を示すフローチャートの例である。図１４では、図１２と同じ処理内容の処理ステップには同じステップ番号を付して示している。図１４に示す識別器の学習処理では、図１２のステップＳ１１とステップＳ１２との間にステップＳ４１が実行され、図１２のステップＳ１９の代わりにステップＳ４２が実行される。 FIG. 14 is an example of a flowchart showing a learning processing procedure of the discriminator (determination unit). In FIG. 14, processing steps having the same processing contents as in FIG. 12 are given the same step numbers. In the discriminator learning process shown in FIG. 14, step S41 is executed between steps S11 and S12 of FIG. 12, and step S42 is executed instead of step S19 of FIG.

［ステップＳ４１］ステップＳ１１で収集されて画像記憶部１３１に格納された各動画像に対するアノテーションが行われる。例えば、識別器が正常動作と異常動作とを識別する場合、正常動作を示す正常ラベルと異常動作を示す異常ラベルのいずれかが動画像のデータに付加される。好ましくは、動画像のうち、異常な購買動作が行われている期間のフレームにのみ異常ラベルが付加され、それ以外の期間のフレームに正常ラベルが付加される。例えば、図６、図７で説明した、パッケージ商品２１３についての「バーコードスキャン誤り」の動作が写った動画像では、顧客がパッケージ商品２１３を手に持ってバーコードスキャナ５１に近づける過程で、パッケージ商品２１３を回転させ始めてから、飲料缶のバーコードがスキャンされるまでの期間のフレームに対して、異常ラベルが付加されればよい。 [Step S41] Each moving image collected in step S11 and stored in the image storage unit 131 is annotated. For example, when the discriminator discriminates between normal operation and abnormal operation, either a normal label indicating normal operation or an abnormal label indicating abnormal operation is added to the moving image data. Preferably, the abnormal label is added only to the frame of the moving image in which the abnormal buying action is performed, and the normal label is added to the frame of the other period. For example, in the moving image showing the action of "barcode scanning error" for the package product 213 described with reference to FIGS. An abnormal label may be added to frames during the period from when the packaged product 213 is started to rotate until the bar code of the beverage can is scanned.

また、識別器がさらに異常動作の種類も識別する場合には、異常ラベルとして異常動作の種類ごとのラベルが動画像に付加される。例えば、前述した「スキャン漏れ」「カゴ漏れ」「バーコード隠し」「バーコードスキャン誤り」の４種類の異常動作を識別させる場合には、４種類の異常ラベルが用いられる。この場合、動画像のフレームには、正常ラベルと４種類の異常ラベルという５つのラベルのいずれかが付加されることになる。 If the discriminator further identifies types of abnormal motions, labels for each type of abnormal motion are added to the moving image as abnormal labels. For example, four types of abnormal labels are used to identify the four types of abnormal operations of "scan omission", "cart omission", "barcode hiding", and "barcode scanning error". In this case, one of five labels, ie, a normal label and four types of abnormal labels, is attached to each moving image frame.

なお、ステップＳ１１では、図１２の場合と異なり、異常な購買動作が行われたときの動画像が一定数以上収集されることが望ましい。特に、上記のように異常動作の種類を識別可能にする場合には、異常動作の種類ごとに、該当する種類の異常動作が行われたときの動画像が一定数収集されることが望ましい。 It should be noted that in step S11, unlike the case of FIG. 12, it is desirable to collect a certain number or more of moving images when an abnormal purchasing action is performed. In particular, when making it possible to identify the types of abnormal actions as described above, it is desirable to collect a certain number of moving images when the corresponding type of abnormal action is performed for each type of abnormal action.

［ステップＳ４２］学習部１２５に対しては、購買動作抽出部１２４によって算出された各動画像の特徴量ベクトルが、上記のラベルが付加された状態で入力される。学習部１２５は、入力された特徴量ベクトルとラベルとに基づいて、購買動作を識別する識別器を学習する。この学習によって生成された学習モデル（ＮＮ）のデータは、学習モデル記憶部１３２に格納される。 [Step S42] To the learning section 125, the feature amount vector of each moving image calculated by the purchasing action extraction section 124 is input with the above label added. The learning unit 125 learns a classifier that identifies a purchasing action based on the input feature amount vector and label. The learning model (NN) data generated by this learning is stored in the learning model storage unit 132 .

図１５は、識別器を用いた判定処理手順を示すフローチャートの例である。図１５では、図１３と同じ処理内容の処理ステップには同じステップ番号を付して示している。図１５に示す判定処理では、図１３のステップＳ２５～Ｓ２８の代わりにステップＳ５１～Ｓ５３が実行される。 FIG. 15 is an example of a flowchart showing a determination processing procedure using a discriminator. In FIG. 15, processing steps having the same processing contents as in FIG. 13 are given the same step numbers. In the determination process shown in FIG. 15, steps S51 to S53 are executed instead of steps S25 to S28 in FIG.

［ステップＳ５１］判定部１２６は、生成された特徴量ベクトルを、学習モデル記憶部１３２に格納された学習モデルのデータに基づく識別器に入力して、正常動作か異常動作かを判定する。 [Step S51] The determination unit 126 inputs the generated feature amount vector to a discriminator based on learning model data stored in the learning model storage unit 132, and determines whether the motion is normal or abnormal.

［ステップＳ５２］異常動作が行われた場合、処理がステップＳ５３に進められ、正常動作が行われた場合、処理がステップＳ２９に進められる。
［ステップＳ５３］判定部１２６は、異常な購買動作が発生したことを警告する処理を実行する。警告の仕方は、図１３のステップＳ２８と同様とすることができる。 [Step S52] If an abnormal operation has been performed, the process proceeds to step S53, and if a normal operation has been performed, the process proceeds to step S29.
[Step S53] The determination unit 126 executes a process of warning that an abnormal purchasing behavior has occurred. The warning method can be the same as in step S28 of FIG.

ここでは、識別器によって異常動作の種類も識別されたものとする。この場合、異常動作の種類に応じた警告を行うこともできる。例えば、図１３のステップＳ２８のように表示情報や音声で警告を行う場合、表示情報や音声によって異常動作の種類が通知される。また、異常動作の種類によって警告の方法を変えることもできる。例えば、「スキャン漏れ」や「カゴ漏れ」のように顧客の過失による異常動作が行われた場合には、キャッシュレジスタ５０のディスプレイ５２や音声出力によって顧客に警告を発して、スキャン操作のやり直しを促す。一方、「バーコード隠し」や「バーコードスキャン誤り」のように顧客の故意による異常動作が行われた場合には、表示情報や音声によって店員に対して警告を発する。 Here, it is assumed that the discriminator has also discriminated the type of abnormal operation. In this case, it is also possible to issue a warning according to the type of abnormal operation. For example, when the warning is given by display information or voice as in step S28 of FIG. 13, the type of abnormal operation is notified by the display information or voice. It is also possible to change the warning method depending on the type of abnormal operation. For example, when an abnormal operation such as "scan omission" or "cart omission" occurs due to the customer's negligence, the customer is warned by the display 52 of the cash register 50 or by voice output, and the customer is asked to redo the scanning operation. prompt. On the other hand, when an abnormal operation is intentionally performed by the customer, such as "barcode hiding" or "barcode scan error", a warning is issued to the store clerk by means of display information or voice.

また、顧客とのトラブルを防ぐために、故意による異常動作が行われた場合には、顧客に気づかれるような警告を行わずに、現在時刻やキャッシュレジスタ５０の識別番号、異常動作の種類などの情報を記憶装置に保存する、あるいは、撮影された動画像のデータを証拠として記憶装置に保存するなどの処理が行われてもよい。 In addition, in order to prevent trouble with the customer, when an intentional abnormal operation is performed, the current time, the identification number of the cash register 50, the type of abnormal operation, etc. are not issued without giving a warning that would be noticed by the customer. A process such as storing the information in a storage device or storing data of a captured moving image in a storage device as evidence may be performed.

以上の第３の実施の形態では、ＨＯＩＤによる商品の検出結果と人物の骨格情報の検出結果を基に学習することで、商品や手の関節が特有の動きをする異常な購買動作を識別可能な識別器を生成できる。特に、学習データに対して異常動作の種類ごとにラベル付けされた場合には、複数種類の異常動作を明示的に識別可能な識別器を生成できる。ＨＯＩＤによる商品の検出結果と人物の骨格情報の検出結果とが利用されることで、購買動作の細かな違いを判別できるようになり、複数種類の異常動作を高精度に識別可能な識別器を生成できる。 In the above-described third embodiment, by learning based on the product detection result and the person's skeleton information detection result by HOID, it is possible to identify an abnormal purchase behavior in which the product or hand joints move in a unique way. classifier can be generated. In particular, when the learning data is labeled for each type of abnormal motion, it is possible to generate a discriminator capable of explicitly identifying multiple types of abnormal motions. By using the results of product detection by HOID and the results of human skeletal information detection, it becomes possible to discriminate subtle differences in purchasing behavior, and a classifier that can identify multiple types of abnormal behavior with high accuracy. can be generated.

〔第４の実施の形態〕
第４の実施の形態では、第２の実施の形態における監視装置１００の処理の一部が変形される。具体的には、画像特徴抽出部１２２によって検出された物体（商品）の画像上の位置に基づく所定の判定ルールにしたがって、正常な購買動作が行われたか否かが判定される。また、骨格情報抽出部１２３によって検出された骨格情報に基づき、判定ルールの内容が顧客の体格や立ち位置に応じて補正される。 [Fourth Embodiment]
In the fourth embodiment, part of the processing of the monitoring device 100 in the second embodiment is modified. Specifically, it is determined whether or not the purchase operation was performed normally according to a predetermined determination rule based on the position on the image of the object (product) detected by the image feature extraction unit 122 . Further, based on the skeleton information detected by the skeleton information extraction unit 123, the contents of the determination rule are corrected according to the customer's physique and standing position.

図１６は、第４の実施の形態に係る監視装置が備える処理機能の構成例を示す図である。図１６では、図８と同じ処理を実行する処理機能には同じ符号を付して示している。図１６に示す監視装置１００ａは、図８に示した画像取得部１２１、画像特徴抽出部１２２および骨格情報抽出部１２３に加えて、判定ルール記憶部１６１、環境差補正部１６２、人物差補正部１６３、顧客操作認識部１６４および購買動作判定部１２４ａを備える。 FIG. 16 is a diagram illustrating a configuration example of processing functions included in a monitoring apparatus according to a fourth embodiment; In FIG. 16, processing functions that perform the same processing as in FIG. 8 are denoted by the same reference numerals. The monitoring apparatus 100a shown in FIG. 16 includes, in addition to the image acquiring unit 121, the image feature extracting unit 122, and the skeleton information extracting unit 123 shown in FIG. 163, a customer operation recognition unit 164, and a purchase operation determination unit 124a.

判定ルール記憶部１６１は、監視装置１００ａが備える記憶装置の記憶領域として実現される。判定ルール記憶部１６１は、正常な購買動作が行われたか否かを判定するための判定ルールを示す情報（判定ルール情報）が記憶される。例えば、判定ルール情報として、顧客が特定の動作を行ったことを商品の位置から認識するために画像上に設定される複数の判定領域を示す情報が記憶される。 The determination rule storage unit 161 is implemented as a storage area of a storage device included in the monitoring device 100a. The judgment rule storage unit 161 stores information (judgment rule information) indicating a judgment rule for judging whether or not a purchase operation was performed normally. For example, as the determination rule information, information indicating a plurality of determination areas set on the image for recognizing from the position of the product that the customer has performed a specific action is stored.

環境差補正部１６２、人物差補正部１６３、顧客操作認識部１６４および購買動作判定部１２４ａの処理は、例えば、監視装置１００ａが備えるプロセッサが所定のプログラムを実行することで実現される。 The processes of the environment difference correction unit 162, the person difference correction unit 163, the customer operation recognition unit 164, and the purchase action determination unit 124a are realized, for example, by the processor included in the monitoring device 100a executing a predetermined program.

環境差補正部１６２は、判定ルールを正しく適用できるように、画像取得部１２１によって取得された画像を撮影環境に応じて補正する。
人物差補正部１６３は、判定ルール記憶部１６１に記憶された判定ルール情報を、骨格情報抽出部１２３によって検出された骨格情報に基づいて補正する。この補正により、判定ルール情報が顧客の体格や立ち位置に応じて補正される。 The environment difference correction unit 162 corrects the image acquired by the image acquisition unit 121 according to the shooting environment so that the determination rule can be applied correctly.
Person difference correction section 163 corrects the determination rule information stored in determination rule storage section 161 based on the skeleton information detected by skeleton information extraction section 123 . By this correction, the determination rule information is corrected according to the customer's physique and standing position.

顧客操作認識部１６４は、キャッシュレジスタ５０を用いた顧客の操作を認識する。例えば、顧客の操作として、商品のスキャン操作や、スキャン操作の完了後に行われる精算開始操作が認識される。 The customer operation recognition unit 164 recognizes customer operations using the cash register 50 . For example, as the customer's operation, an operation of scanning a product and an operation of starting payment after completion of the scanning operation are recognized.

顧客操作認識部１６４は、例えば、カメラ１０１による撮影画像に基づいてこれらの操作を認識する。例えば、スキャン操作や精算開始操作が行われたときに、キャッシュレジスタ５０に搭載されたランプが点灯する場合がある。操作ごとに異なるランプが点灯する場合や、操作ごとに異なる色でランプが点灯する場合もある。このような場合、顧客操作認識部１６４は、撮影画像からランプの点灯や点灯されたときの色を検出することで、上記の操作が行われたことを認識できる。 The customer operation recognition unit 164 recognizes these operations based on images captured by the camera 101, for example. For example, a lamp mounted on the cash register 50 may be lit when a scan operation or a settlement start operation is performed. A different lamp may be lit for each operation, or a lamp may be lit in a different color for each operation. In such a case, the customer operation recognition unit 164 can recognize that the above operation has been performed by detecting the lighting of the lamp and the color when the lamp is lit from the photographed image.

また、顧客操作認識部１６４は、各操作に応じてキャッシュレジスタ５０のディスプレイ５２の表示内容に変化が生じることを撮影画像から認識することで、各操作を認識してもよい。 Further, the customer operation recognition unit 164 may recognize each operation by recognizing from the captured image that the display content of the display 52 of the cash register 50 changes according to each operation.

さらに、例えば、キャッシュレジスタ５０から操作ごとに異なる通知音が発生する場合には、顧客操作認識部１６４は、マイクを介してそれらの通知音の発生を検出することで、上記の操作が行われたことを認識できる。また、監視装置１００ａとキャッシュレジスタ５０とが通信可能である場合、顧客操作認識部１６４は、上記の操作が行われたときにその旨を示す通知をキャッシュレジスタ５０から受け付けてもよい。 Further, for example, when different notification sounds are generated for each operation from the cash register 50, the customer operation recognition unit 164 detects the generation of these notification sounds via the microphone, thereby confirming that the above operation is performed. can recognize that Further, when the monitoring device 100a and the cash register 50 can communicate with each other, the customer operation recognition unit 164 may receive a notification from the cash register 50 to that effect when the above operation is performed.

購買動作判定部１２４ａは、画像特徴抽出部１２２から出力された物体（商品）の位置情報に基づき、人物差補正部１６３によって補正された判定ルール情報が示す判定ルールにしたがって、正常な購買動作が行われた否かを判定する。また、購買動作判定部１２４ａは、判定結果に基づいてスキャン操作が行われた回数をカウントするとともに、顧客操作認識部１６４によって認識されたスキャン操作の回数をカウントする。購買動作判定部１２４ａは、顧客によって精算開始操作が行われたときに両者のカウント値を照合し、一致していない場合には、正しい購買動作が行われていないことを警告する処理を実行する。 Based on the position information of the object (product) output from the image feature extraction unit 122, the purchasing motion determination unit 124a determines whether a normal purchasing motion is performed according to the determination rule indicated by the determination rule information corrected by the person difference correction unit 163. Determine whether or not it has been done. Further, the purchasing operation determination unit 124a counts the number of scanning operations performed based on the determination result, and counts the number of scanning operations recognized by the customer operation recognition unit 164. FIG. The purchase action determination unit 124a collates the count values of both when the customer performs the checkout start operation, and if they do not match, executes a process of warning that the correct purchase action is not performed. .

図１７は、正常な購買動作が行われた否かを判定する判定ルールについて説明するための図である。
図１７に示す画像２４１は、画像取得部１２１が取得する撮影画像の一例である。この画像２４１は、キャッシュレジスタ５０の上方から、キャッシュレジスタ５０の前面（バーコードスキャナ５１が搭載された面）付近を撮影した画像である。また、画像２４１には、画像特徴抽出部１２２から出力されるＨＯＩＤ情報によって示される、商品２１３についての商品領域（バウンディングボックス）２４２が例示されている。さらに、商品領域２４２の中心位置２４３も併せて示されている。 FIG. 17 is a diagram for explaining a determination rule for determining whether or not a normal purchase operation has been performed.
An image 241 illustrated in FIG. 17 is an example of a photographed image acquired by the image acquisition unit 121 . This image 241 is an image of the vicinity of the front surface of the cash register 50 (the surface on which the barcode scanner 51 is mounted) taken from above the cash register 50 . The image 241 also illustrates a product area (bounding box) 242 for the product 213 indicated by the HOID information output from the image feature extraction unit 122 . Furthermore, the center position 243 of the product area 242 is also shown.

判定ルールでは、例えば、顧客がそれぞれ特定の動作を行ったことを認識するための複数の判定領域が利用される。この場合、判定ルール記憶部１６１に記憶される判定ルール情報には、これらの各判定領域の位置を示す情報が含められる。ここでは例として、取り出し領域Ｒ１と持ち出し領域Ｒ２という２か所の判定領域が設定されるものとする。図１７に示す画像２４１ａは、画像２４１上に取り出し領域Ｒ１と持ち出し領域Ｒ２を重畳して示したものである。 In the judgment rule, for example, a plurality of judgment areas are used for recognizing that each customer has performed a specific action. In this case, the determination rule information stored in the determination rule storage unit 161 includes information indicating the position of each of these determination areas. Here, as an example, it is assumed that two determination areas, ie, an extraction area R1 and a take-out area R2, are set. An image 241a shown in FIG. 17 is obtained by superimposing the extraction area R1 and the extraction area R2 on the image 241. As shown in FIG.

取り出し領域Ｒ１は、顧客がスキャン操作を行うために商品を手に取る動作を検出するための領域である。多くの場合、顧客は購入対象の商品を店内用のカゴ２１６に入れた状態でキャッシュレジスタ５０に近づき、商品を１つずつカゴ２１６から取り出してスキャン操作を行う。この場合、取り出し領域Ｒ１は、顧客がスキャン操作を行うために商品をカゴ２１６から取り出したことを検出するための領域である。そこで、以下の説明では、取り出し領域Ｒ１に基づいて検出しようとする上記の動作を「取り出し動作」と記載する。ここでは、商品領域２４２の中心位置２４３が取り出し領域Ｒ１に新たに入った場合に、取り出し動作が開始されたと判定され、中心位置２４３が取り出し領域Ｒ１の外側に移動した場合に、取り出し動作が終了したと判定されるものとする。 The take-out area R1 is an area for detecting the action of the customer picking up the product for the scanning operation. In many cases, the customer approaches the cash register 50 with the products to be purchased in the in-store basket 216, takes out the products one by one from the basket 216, and scans them. In this case, the takeout area R1 is an area for detecting that the customer has taken out the product from the basket 216 for scanning operation. Therefore, in the following description, the above operation to be detected based on the extraction area R1 will be referred to as "extraction operation". Here, when the center position 243 of the commodity area 242 newly enters the take-out area R1, it is determined that the take-out operation has started, and when the center position 243 moves outside the take-out area R1, the take-out operation ends. It shall be judged that

取り出し領域Ｒ１は、キャッシュレジスタ５０の前面から一定程度離れた位置に設定される。これに加えて、図１７の例では、キャッシュレジスタ５０の前面に近接する領域のうち、バーコードスキャナ５１が配置された領域とカゴ２１６が載置される領域との間にも設定されている。 The retrieval area R1 is set at a position a certain distance from the front surface of the cash register 50 . In addition to this, in the example of FIG. 17, among the areas close to the front surface of the cash register 50, it is also set between the area where the barcode scanner 51 is arranged and the area where the basket 216 is placed. .

ただし、取り出し領域Ｒ１は、カゴ２１６が載置される領域を含まないように設定される。例えば、カゴ２１６の中では複数の商品が近接していることが多い。このため、ＨＯＩＤの処理によれば、顧客がカゴ２１６を持ち運んでいる状態では、カゴ２１６の中の複数の商品を顧客が持っていると検出されてしまう場合がある。また、カゴ２１６を置いて商品を取り出す際にも、ある商品の下敷きになっていた別の商品を取り出す場合には、取り出しの対象でない上側の商品を顧客が持ったと誤検出される場合もある。カゴ２１６が載置される領域を含まないように取り出し領域Ｒ１が設定されることで、このような誤検出を防ぐことができる。 However, the take-out area R1 is set so as not to include the area where the basket 216 is placed. For example, in the basket 216, multiple products are often close together. Therefore, according to the HOID process, when the customer is carrying the basket 216, it may be detected that the customer has multiple items in the basket 216. FIG. Also, when taking out the product after placing the basket 216, if another product that is placed under another product is taken out, it may be erroneously detected that the customer has picked up the upper product that is not to be taken out. . Such erroneous detection can be prevented by setting the extraction area R1 so as not to include the area where the basket 216 is placed.

一方、持ち出し領域Ｒ２は、顧客が手に取った商品のスキャン操作を行う動作を検出するための領域であり、バーコードスキャナ５１に近接する位置に設定される。この持ち出し領域Ｒ２は、顧客が手に取った商品を店舗外に持ち出すための正常な動作を行ったことを検出するための領域ということもできる。なお、持ち出し領域Ｒ２は、境界線の少なくとも１つが取り出し領域Ｒ１の境界線と接するように設定される。具体的には、持ち出し領域Ｒ２の境界線のうち、顧客とキャッシュレジスタ５０との間を分ける境界線（キャッシュレジスタ５０側の境界線であり、キャッシュレジスタ５０の前面と平行な境界線）が、取り出し領域Ｒ１の境界線と接するように設定される。 On the other hand, the take-out area R2 is an area for detecting the operation of scanning the product picked up by the customer, and is set at a position close to the bar code scanner 51 . This take-out area R2 can also be said to be an area for detecting that the customer has performed a normal operation to take out the product picked up by the customer. It should be noted that the take-out area R2 is set so that at least one of the boundary lines is in contact with the boundary line of the take-out area R1. Specifically, among the boundary lines of the carry-out area R2, the boundary line separating the customer and the cash register 50 (the boundary line on the cash register 50 side and parallel to the front surface of the cash register 50) is It is set so as to be in contact with the boundary line of the extraction area R1.

このような取り出し領域Ｒ１および持ち出し領域Ｒ２と、商品領域２４２の中心位置２４３との関係から、例えば次のような判定ルールによって正常な購買動作が行われたか否かが判定される。購買動作判定部１２４ａは、中心位置２４３が取り出し領域Ｒ１内に移動すると、商品が１つ取り出された（取り出し動作が行われた）と認識する。購買動作判定部１２４ａは、その状態から中心位置２４３が取り出し領域Ｒ１から持ち出し領域Ｒ２内に移動すると、スキャン操作が行われた（正しい持ち出し動作が行われた）と認識する。このような一連の動作が検出された場合に、正常な購買動作が行われたと判定される。 Based on the relationship between the take-out area R1 and take-out area R2, and the center position 243 of the product area 242, it is determined whether or not the purchase operation was performed normally according to the following determination rule, for example. The purchasing motion determination unit 124a recognizes that one product has been taken out (a taking-out motion has been performed) when the center position 243 moves into the taking-out region R1. When the center position 243 moves from the take-out region R1 to the take-out region R2 from this state, the purchase motion determination unit 124a recognizes that the scanning operation has been performed (the correct take-out motion has been performed). When such a series of actions are detected, it is determined that a normal purchase action has been performed.

一方、中心位置２４３が持ち出し領域Ｒ２内に移動しなかった場合には、正常な購買動作が行われていないと判定される。例えば、中心位置２４３が、取り出し領域Ｒ１内から、持ち出し領域Ｒ２に移動することなく取り出し領域Ｒ１外に移動した場合には、顧客が商品をスキャンせずにポケットやバッグに入れたことが想定される。このような場合には、購買動作判定部１２４ａは、異常な購買動作が行われたと認識して、警告を発する処理を実行することができる。 On the other hand, if the central position 243 has not moved into the take-out area R2, it is determined that the purchase operation is not normal. For example, when the central position 243 moves from within the take-out region R1 to outside the take-out region R1 without moving to the carry-out region R2, it is assumed that the customer put the product in a pocket or bag without scanning it. be. In such a case, the purchasing action determination unit 124a can recognize that an abnormal buying action has been performed and issue a warning.

なお、上記の判定の間、連続する複数のフレーム間で同一の商品が手に持たれていることを認識する方法としては、例えば、商品の位置の軌跡に基づいて認識する方法や、商品領域の画像情報に基づいて認識する方法などを用いることができる。画像情報を用いる方法としては、例えば、バウンディングボックス内の輝度または色のヒストグラムを比較することで認識する方法を用いることができる。 During the above determination, as a method of recognizing that the same product is being held in a plurality of consecutive frames, for example, a method of recognizing based on the trajectory of the position of the product, or a method of recognizing the product region A method of recognizing based on the image information of the image can be used. As a method using image information, for example, a method of recognition by comparing luminance or color histograms within a bounding box can be used.

また、上記の例では画像上の設定領域と商品の位置との関係から購買動作を判定しているが、例えばさらに、設定領域に商品が進入する前後における設定領域内の画像変化（背景差分）も用いて、購買動作を判定するようにしてもよい。例えば、取り出し領域Ｒ１の画像変化を検出することで、認識された物体が商品であることを正確に判別して、商品の移動軌跡をトレースできるようになる。 In the above example, the purchase behavior is determined from the relationship between the set area on the image and the position of the product. may also be used to determine the purchasing behavior. For example, by detecting image changes in the take-out region R1, it is possible to accurately determine that the recognized object is a product, and trace the movement locus of the product.

図１８は、環境差補正部による補正処理例を示す図である。図１８（Ａ）は第１の補正処理例を示し、図１８（Ｂ）は第２の補正処理例を示す。
環境差補正部１６２は、判定ルールを正しく適用して判定処理を実行できるように、画像取得部１２１によって取得された画像を撮影環境に応じて補正する。 FIG. 18 is a diagram illustrating an example of correction processing by the environment difference correction unit; FIG. 18A shows a first correction processing example, and FIG. 18B shows a second correction processing example.
The environment difference correction unit 162 corrects the image acquired by the image acquisition unit 121 according to the shooting environment so that the determination rule can be applied correctly and the determination process can be executed.

例えば、環境差補正部１６２は、カメラ１０１による撮影環境を示す情報が記録された撮影環境情報に基づいて、撮影画像を補正する。撮影環境情報には、例えば、カメラ１０１による撮影画像の画素数、カメラ１０１の撮影方向、カメラ１０１と現実空間上の基準位置（例えば、キャッシュレジスタ５０の前面の所定位置）との距離などが記録される。判定ルール記憶部１６１には、撮影環境情報に記録される情報に関する基準値が登録されており、環境差補正部１６２は、撮影環境情報と基準値とを比較することで、撮影画像の画素数や撮影方向、基準位置との距離が基準値に合うように撮影画像を補正する。 For example, the environment difference correction unit 162 corrects the captured image based on the shooting environment information in which information indicating the shooting environment of the camera 101 is recorded. The shooting environment information includes, for example, the number of pixels of an image shot by the camera 101, the shooting direction of the camera 101, the distance between the camera 101 and a reference position in the physical space (for example, a predetermined position in front of the cash register 50), and the like. be done. Reference values related to information recorded in the shooting environment information are registered in the determination rule storage unit 161, and the environment difference correction unit 162 compares the shooting environment information with the reference values to determine the number of pixels of the captured image. , the shooting direction, and the distance from the reference position to match the reference values.

また、環境差補正部１６２は、このような補正を、撮影環境情報の代わりに、現実空間上に設置した複数のマーカを用いて実行してもよい。この場合、判定ルール記憶部１６１には、撮影画像上のマーカ間の位置関係（マーカ間距離および相対位置）についての基準値が登録される。環境差補正部１６２は、撮影画像に写った各マーカの位置を検出し、撮影画像上のマーカ間の位置関係が、判定ルール記憶部１６１に登録された基準値と一致するように、撮影画像を補正する。 Also, the environment difference correction unit 162 may perform such correction using a plurality of markers placed on the physical space instead of the shooting environment information. In this case, the determination rule storage unit 161 registers reference values for the positional relationship (inter-marker distance and relative position) between the markers on the captured image. The environment difference correction unit 162 detects the position of each marker captured in the captured image, and corrects the captured image so that the positional relationship between the markers on the captured image matches the reference value registered in the determination rule storage unit 161 . correct.

例えば、図１８（Ａ）では、撮影画像２５１上のマーカ間距離が基準値より小さい場合の例を示している。この例では、現実空間上にマーカＭ１～Ｍ４が設置され、これらのマーカＭ１～Ｍ４が撮影画像２５１に写っている。 For example, FIG. 18A shows an example in which the inter-marker distance on the captured image 251 is smaller than the reference value. In this example, markers M1 to M4 are placed in the physical space, and these markers M1 to M4 are shown in the captured image 251. FIG.

ここで、判定ルール情報において、取り出し領域Ｒ１の境界線のうち顧客とキャッシュレジスタ５０との間を分ける境界線Ｌ１が、画像の上端から３００画素の位置に設定されていたとする。カメラ１０１による撮影画像２５１上のマーカ間距離が基準値より小さい場合、判定ルール情報に設定された境界線Ｌ１の位置をそのまま用いてしまうと、取り出し領域Ｒ１に商品が位置するか否かの判定を正しく実行できない。このため、環境差補正部１６２は、マーカ間距離が基準値と一致するように撮影画像２５１を拡大する補正を行う。補正後（拡大後）の撮影画像２５１ａを用いて判定処理が実行されることで、判定処理を正確に実行できるようになる。 Here, in the determination rule information, it is assumed that the boundary line L1 separating the customer and the cash register 50 among the boundary lines of the retrieval area R1 is set at a position 300 pixels from the upper end of the image. When the distance between the markers on the image 251 captured by the camera 101 is smaller than the reference value, if the position of the boundary line L1 set in the determination rule information is used as it is, it is determined whether or not the product is located in the take-out area R1. cannot be executed correctly. Therefore, the environmental difference correction unit 162 performs correction to enlarge the captured image 251 so that the inter-marker distance matches the reference value. By executing the determination process using the corrected (enlarged) captured image 251a, the determination process can be accurately performed.

また、マーカを用いた場合には、図１８（Ｂ）に示すように、撮影画像を回転させる補正を行うこともできる。図１８（Ｂ）に示す撮影画像２５２では、マーカＭ１～Ｍ４の間の位置関係と判定ルール記憶部１６１に登録された基準値との比較から、レンズ光軸に対してカメラ１０１が回転していることがわかる。このような場合、環境差補正部１６２は、マーカ間の角度の関係が基準値と一致するように撮影画像２５２を回転した上で、マーカ間距離が基準値と一致するような画像領域２５３を、回転された撮影画像２５２から切り出す。このような補正により、補正後の撮影画像２５２ａが得られる。 In addition, when a marker is used, as shown in FIG. 18B, it is also possible to perform correction by rotating the photographed image. In the captured image 252 shown in FIG. 18B, the positional relationship between the markers M1 to M4 is compared with the reference values registered in the determination rule storage unit 161, and the camera 101 rotates with respect to the lens optical axis. I know there is. In such a case, the environmental difference correction unit 162 rotates the captured image 252 so that the angle relationship between the markers matches the reference value, and then selects the image region 253 such that the inter-marker distance matches the reference value. , is cut out from the rotated photographed image 252 . By such correction, a photographed image 252a after correction is obtained.

図１９は、人物差補正部の内部構成例を示す図である。図１９に示すように、人物差補正部１６３は、個体差補正部１７１と人物位置補正部１７２を備える。
個体差補正部１７１は、骨格情報抽出部１２３によって検出された骨格情報から人物の体格を推定し、人物による体格差に関係なく正確な判定を実行できるように、判定ルール記憶部１６１に記憶された判定ルール情報を補正する。 FIG. 19 is a diagram illustrating an example of an internal configuration of a person difference correction unit; As shown in FIG. 19 , the person difference correction section 163 includes an individual difference correction section 171 and a person position correction section 172 .
The individual difference correction unit 171 estimates the physique of the person from the skeletal information detected by the skeletal information extraction unit 123, and stores the physique of the person in the judgment rule storage unit 161 so that an accurate judgment can be executed regardless of the physique difference of the person. Correct the judgment rule information.

人物位置補正部１７２は、画像特徴抽出部１２２によって検出された人物の位置情報、または骨格情報抽出部１２３によって検出された骨格情報に基づいて、人物が立っている位置（立ち位置）を認識する。人物位置補正部１７２は、人物の立ち位置に関係なく正確な判定を実行できるように、認識された立ち位置に基づいて判定ルール記憶部１６１に記憶された判定ルール情報を補正する。 The person position correction unit 172 recognizes the position where the person stands (standing position) based on the position information of the person detected by the image feature extraction unit 122 or the skeleton information detected by the skeleton information extraction unit 123. . The person position correction unit 172 corrects the determination rule information stored in the determination rule storage unit 161 based on the recognized standing position so that accurate determination can be performed regardless of the standing position of the person.

図２０は、人物差補正部による補正処理について説明するための図である。前述のように、購買動作判定部１２４ａは、商品の位置が取り出し領域Ｒ１に進入することで取り出し動作を検出し、その後に商品の位置が持ち出し領域Ｒ２に進入することでスキャン操作の実行を検出する。例えば、人物の関節点間の長さ（特に、肘から手首までの腕の長さ）や、人物の立ち位置によって、正常な取り出し動作が実行される位置と正常なスキャン操作が実行される位置は概ね定まる。そこで、取り出し動作とスキャン操作との識別に事実上用いられる、取り出し領域Ｒ１と取り出し領域Ｒ２との境界線を、撮影画像における人物の所定の関節点間の長さや人物の立ち位置に応じて適正化することで、正確な動作判定を行うことができるようになる。 FIG. 20 is a diagram for explaining correction processing by the person difference correction unit. As described above, the purchasing motion determination unit 124a detects a take-out motion when the position of the product enters the take-out region R1, and then detects execution of the scanning operation when the position of the product enters the take-out region R2. do. For example, the length between the joint points of the person (especially the length of the arm from the elbow to the wrist) and the position where the normal picking operation is performed and the position where the normal scanning operation is performed depending on the standing position of the person is approximately determined. Therefore, the boundary line between the extraction region R1 and the extraction region R2, which is practically used for distinguishing between the extraction operation and the scanning operation, is appropriately set according to the length between predetermined joint points of the person in the photographed image and the standing position of the person. By doing so, it becomes possible to perform accurate motion determination.

図２０に示す撮影画像２６１には、取り出し領域Ｒ１と持ち出し領域Ｒ２とが例示されている。また、取り出し領域Ｒ１と持ち出し領域Ｒ２との境界線Ｌ２は、顧客とキャッシュレジスタ５０との間を分けるように撮影画像２６１上で水平に設定されている。人物差補正部１６３は、人物の所定の関節点間の長さや人物の立ち位置に応じて、境界線Ｌ２の垂直方向における位置を補正する。 A captured image 261 shown in FIG. 20 illustrates an extraction area R1 and a take-out area R2. A boundary line L2 between the take-out area R1 and the take-out area R2 is set horizontally on the captured image 261 so as to separate the customer and the cash register 50 from each other. The person difference correction unit 163 corrects the position of the boundary line L2 in the vertical direction according to the length between predetermined joint points of the person and the standing position of the person.

例えば、個体差補正部１７１は、骨格情報抽出部１２３によって検出された骨格情報に基づいて、人物の腕の長さを算出する。人物の腕が長い場合、商品の持ち出し動作を正しく行ったときの垂直方向の範囲は下方向（キャッシュレジスタ５０の方向）に広くなり得る。そこで、個体差補正部１７１は、人物の腕が所定の閾値より長い場合、境界線Ｌ２を画像上の下方向に補正し、人物の腕が閾値以下の場合、境界線Ｌ２を画像上の上方向に補正する。補正量は、例えば、腕の長さと閾値との差分に応じて決定されればよい。 For example, the individual difference correction unit 171 calculates the arm length of the person based on the skeleton information detected by the skeleton information extraction unit 123 . If the person's arm is long, the vertical range of the correct take-out action may be widened downward (in the direction of the cash register 50). Therefore, the individual difference correction unit 171 corrects the boundary line L2 downward on the image when the person's arm is longer than a predetermined threshold, and moves the boundary line L2 upward on the image when the person's arm is equal to or less than the threshold. Correct the direction. The correction amount may be determined, for example, according to the difference between the arm length and the threshold.

また、人物位置補正部１７２は、撮影画像２６１における人物の立ち位置を検出する。例えば、画像特徴抽出部１２２によって検出された人物の位置情報から、人物の立ち位置を検出可能である。あるいは、骨格情報抽出部１２３によって人物の足の関節点（例えば、足首、膝など）が検出されている場合には、その関節点の位置を人物の立ち位置として検出することもできる。 Also, the person position correction unit 172 detects the standing position of the person in the captured image 261 . For example, a person's standing position can be detected from the person's position information detected by the image feature extraction unit 122 . Alternatively, if the skeleton information extraction unit 123 has detected the joint points of the person's feet (for example, ankles, knees, etc.), the positions of the joint points can be detected as the person's standing position.

人物の立ち位置が画像上の下方向にあるほど、商品の持ち出し動作を正しく行ったときの垂直方向の範囲は下方向（キャッシュレジスタ５０の方向）に広くなり得る。そこで、人物位置補正部１７２は、人物の立ち位置を示すｙ座標（垂直方向の座標）が所定の基準垂直座標より大きい場合（下方向に位置する場合）、境界線Ｌ２を画像上の下方向に補正し、人物の立ち位置を示すｙ座標が基準垂直座標以下の場合、境界線Ｌ２を画像上の上方向に補正する。補正量は、例えば、人物の立ち位置を示すｙ座標と基準垂直座標との差分に応じて決定されればよい。 The lower the person's standing position on the image, the wider the vertical range in the downward direction (in the direction of the cash register 50) when the product is taken out correctly. Therefore, when the y-coordinate (vertical coordinate) indicating the standing position of the person is larger than the predetermined reference vertical coordinate (when positioned downward), the human position correction unit 172 moves the boundary line L2 downward on the image. , and if the y-coordinate indicating the standing position of the person is equal to or less than the reference vertical coordinate, the boundary line L2 is corrected upward on the image. The correction amount may be determined, for example, according to the difference between the y-coordinate indicating the standing position of the person and the reference vertical coordinate.

次に、第４の実施の形態に係る監視装置１００ａの処理について、フローチャートを用いて説明する。
図２１、図２２は、判定ルールに基づく判定処理手順を示すフローチャートの例である。この図２１、図２２では、正しい購買動作が行われたかが判定される。これとともに、その判定によってバーコードのスキャン操作が行われたと認識された回数と、キャッシュレジスタ５０においてスキャン操作が行われた実際の回数とが比較され、比較結果に応じた通知や警告の処理が実行される。 Next, processing of the monitoring device 100a according to the fourth embodiment will be described using a flowchart.
21 and 22 are examples of flowcharts showing determination processing procedures based on determination rules. In FIGS. 21 and 22, it is determined whether or not a correct purchase operation has been performed. Along with this, the number of times the bar code scanning operation is recognized by the determination is compared with the actual number of times the scanning operation is performed in the cash register 50, and notification or warning processing is performed according to the comparison result. executed.

［ステップＳ６１］まず、判定処理に対する前処理として、環境差補正部１６２で使用される補正パラメータの設定処理が実行される。このとき、撮影範囲内の所定位置に複数のマーカが設置され、各マーカを含む撮影範囲がカメラ１０１によって撮影される。撮影画像を画像取得部１２１が取得すると、環境差補正部１６２は、撮影画像上の各マーカの位置関係と、判定ルール記憶部１６１に登録された各マーカの位置関係の基準値とを比較することで、撮影画像を補正するための補正パラメータを決定する。補正パラメータとしては、例えば、画像の拡大・縮小率、画像の回転角度などが決定される。環境差補正部１６２は、決定された補正パラメータをＨＤＤ１１３などの記憶装置に保存する。 [Step S61] First, as preprocessing for the determination processing, processing for setting correction parameters used in the environment difference correction unit 162 is executed. At this time, a plurality of markers are placed at predetermined positions within the imaging range, and the imaging range including each marker is captured by the camera 101 . When the image acquisition unit 121 acquires the captured image, the environment difference correction unit 162 compares the positional relationship of each marker on the captured image with the reference value of the positional relationship of each marker registered in the determination rule storage unit 161. Thus, a correction parameter for correcting the captured image is determined. As correction parameters, for example, an image enlargement/reduction ratio, an image rotation angle, and the like are determined. Environmental difference correction unit 162 stores the determined correction parameters in a storage device such as HDD 113 .

なお、カメラ１０１の撮影環境を示す撮影環境情報が与えられる場合には、カメラ１０１による撮影が行われずに、撮影環境情報と、撮影環境情報に含まれる各情報の基準値とが比較されることで、補正パラメータが決定されればよい。 Note that when the shooting environment information indicating the shooting environment of the camera 101 is given, the shooting environment information is compared with the reference value of each information included in the shooting environment information without the camera 101 shooting. , the correction parameter is determined.

以上のようにして補正パラメータが設定されると、実際の判定処理を実行するためのカメラ１０１による撮影が開始される。このとき、マーカは撮影範囲から除去される。
［ステップＳ６２］カメラ１０１によって撮影された画像が、環境差補正部１６２により、ステップＳ６１で設定された補正パラメータに基づいて補正されて、画像特徴抽出部１２２および骨格情報抽出部１２３に入力される。これ以後、撮影画像が取得されるたびに、環境差補正部１６２による補正が行われ、補正後の撮影画像が画像特徴抽出部１２２および骨格情報抽出部１２３に入力される。 After the correction parameters are set as described above, the camera 101 starts photographing for executing actual determination processing. At this time, the marker is removed from the imaging range.
[Step S62] The image captured by the camera 101 is corrected by the environmental difference correction unit 162 based on the correction parameters set in step S61, and is input to the image feature extraction unit 122 and the skeleton information extraction unit 123. . After that, every time a captured image is acquired, the environmental difference correction unit 162 corrects the captured image, and the corrected captured image is input to the image feature extraction unit 122 and the skeleton information extraction unit 123 .

［ステップＳ６３］購買動作判定部１２４ａは、画像特徴抽出部１２２によって、人物と、その人物と相互作用を有する物体（商品）とが認識されたかを判定する。人物が認識されたが物体は認識されていない場合や、人物が認識されていない場合には、待ち状態となり、次のフレームについてステップＳ６３の処理が再実行される。一方、人物と物体とが認識された場合、処理がステップＳ６４に進められる。 [Step S63] The purchasing motion determination unit 124a determines whether the image feature extraction unit 122 has recognized a person and an object (product) interacting with the person. If the person is recognized but the object is not recognized, or if the person is not recognized, a waiting state is entered, and the process of step S63 is re-executed for the next frame. On the other hand, if the person and the object are recognized, the process proceeds to step S64.

次のステップＳ６４～Ｓ６９の処理は、フレームごとに実行される。
［ステップＳ６４］人物差補正部１６３は、判定ルール記憶部１６１に記憶された判定ルール情報を、骨格情報抽出部１２３によって検出された骨格情報に基づいて補正する。例えば、図２０で説明したように、人物の所定関節点間の長さや人物の立ち位置に応じて、取り出し領域Ｒ１および持ち出し領域Ｒ２の範囲が調整される。 The processing of the following steps S64 to S69 is executed for each frame.
[Step S<b>64 ] The person difference correction unit 163 corrects the determination rule information stored in the determination rule storage unit 161 based on the skeleton information detected by the skeleton information extraction unit 123 . For example, as described with reference to FIG. 20, the ranges of the extraction area R1 and the take-out area R2 are adjusted according to the length between predetermined joint points of the person and the standing position of the person.

［ステップＳ６５］購買動作判定部１２４ａは、ステップＳ６４で補正された判定ルール情報を用いて、商品点数のカウント処理を実行する。この処理では、購買動作判定部１２４ａによってスキャン操作が正しく実行されたと判定された回数が、商品点数としてカウントされる。なお、ステップＳ６５の処理内容については、後の図２３で詳しく説明する。 [Step S65] The purchasing motion determination unit 124a uses the determination rule information corrected in step S64 to count the number of products. In this process, the number of times the purchase operation determination unit 124a determines that the scanning operation has been performed correctly is counted as the product score. Details of the processing in step S65 will be described in detail later with reference to FIG.

［ステップＳ６６］購買動作判定部１２４ａは、キャッシュレジスタ５０において商品のバーコードのスキャンが実行された回数を、スキャン点数としてカウントする。この処理では、顧客操作認識部１６４によってバーコードがスキャンされたことが認識された場合に、スキャン点数がカウントアップされる。 [Step S66] The purchase operation determination unit 124a counts the number of times the bar code of the product has been scanned at the cash register 50 as the number of scan points. In this process, when the customer operation recognition unit 164 recognizes that the barcode has been scanned, the number of scan points is counted up.

［ステップＳ６７］購買動作判定部１２４ａは、商品点数がスキャン点数より少ないかを判定する。商品点数がスキャン点数より少ない場合、処理がステップＳ６８に進められ、商品点数とスキャン点数とが一致している場合、処理がステップＳ６９に進められる。 [Step S67] The purchasing motion determination unit 124a determines whether the product score is less than the scan score. If the product score is less than the scan score, the process proceeds to step S68, and if the product score and the scan score match, the process proceeds to step S69.

［ステップＳ６８］購買動作判定部１２４ａは、顧客に訂正操作を促す通知を行う。例えば、誤りがあることを示す音声が出力される。または、誤りがあることを示す表示情報が、キャッシュレジスタ５０のディスプレイ５２に表示される。すると、顧客による訂正操作が行われる。例えば、顧客によってキャッシュレジスタ５０に対してスキャン操作をやり直すための入力が行われる。また、顧客は、直前に手に持っていた商品を再度持ち直す。購買動作判定部１２４ａは、例えば顧客操作認識部１６４によってスキャン操作をやり直すための入力操作が検知されると、スキャン点数をカウントダウン（インクリメント）する。その後、処理がステップＳ６４に進められる。 [Step S68] The purchasing motion determination unit 124a notifies the customer to perform a correction operation. For example, a voice is output indicating that there is an error. Alternatively, display information indicating that there is an error is displayed on the display 52 of the cash register 50 . Then, the correction operation by the customer is performed. For example, an input is made by the customer to the cash register 50 to redo the scan operation. In addition, the customer picks up again the product that was held in the hand immediately before. For example, when the customer operation recognition unit 164 detects an input operation for redoing the scanning operation, the purchasing operation determination unit 124a counts down (increments) the scan score. After that, the process proceeds to step S64.

［ステップＳ６９］購買動作判定部１２４ａは、顧客が精算処理を行うかを判定する。例えば、顧客操作認識部１６４により、顧客が精算処理を要求するための操作をキャッシュレジスタ５０に対して行ったことが検知されると、顧客が精算処理を行うと判定される。顧客が精算処理を行うと判定されなかった場合、処理がステップＳ６４に進められる。一方、顧客が精算処理を行うと判定された場合、処理が図２２のステップＳ７１に進められる。 [Step S69] The purchase action determination unit 124a determines whether the customer performs the settlement process. For example, when the customer operation recognition unit 164 detects that the customer has performed an operation for requesting settlement processing on the cash register 50, it is determined that the customer will perform the settlement processing. If it is not determined that the customer will perform the settlement process, the process proceeds to step S64. On the other hand, if it is determined that the customer will perform the settlement process, the process proceeds to step S71 in FIG.

以下、図２２を参照して説明を続ける。
［ステップＳ７１］キャッシュレジスタ５０によって精算処理が実行される。例えば、キャッシュレジスタ５０のディスプレイ５２に、スキャンされた商品の合計金額が表示される。 The description will be continued below with reference to FIG.
[Step S71] The cash register 50 executes the settlement process. For example, the display 52 of the cash register 50 displays the total amount of the scanned item.

［ステップＳ７２］購買動作判定部１２４ａは、商品点数がスキャン点数より少ないかを判定する。商品点数がスキャン点数より少ない場合、処理がステップＳ７３に進められる。一方、商品点数とスキャン点数とが一致している場合、精算処理が継続され、精算処理が完了する（すなわち、支払いが終了する）と処理がステップＳ７５に進められる。 [Step S72] The purchasing motion determination unit 124a determines whether the product score is less than the scan score. If the product score is less than the scan score, the process proceeds to step S73. On the other hand, if the product score and the scan score match, the settlement process is continued, and when the settlement process is completed (that is, the payment is completed), the process proceeds to step S75.

［ステップＳ７３］購買動作判定部１２４ａは、スキャンされた商品の個数が足りないことを警告する処理を実行する。例えば、警告情報がキャッシュレジスタ５０のディスプレイ５２に表示される。あるいは、店員が装着するイヤホンに対して警告情報の音声が出力される。 [Step S73] The purchasing motion determination unit 124a executes a process of warning that the number of scanned products is insufficient. For example, warning information may be displayed on display 52 of cash register 50 . Alternatively, a sound of warning information is output to the earphone worn by the store clerk.

［ステップＳ７４］顧客による訂正操作が行われる。例えば、顧客によってキャッシュレジスタ５０に対してスキャン操作をやり直すための入力が行われ、精算処理が再度要求される。精算処理が完了すると、処理がステップＳ７５に進められる。 [Step S74] Correction operation is performed by the customer. For example, the customer makes an input to the cash register 50 to redo the scanning operation, and the settlement process is requested again. When the settlement process is completed, the process proceeds to step S75.

［ステップＳ７５］判定処理を終了するかが判定される。判定処理を継続する場合、処理が図２１のステップＳ６３に進められ、新たな人物および物体の認識処理が実行される。一方、判定処理を終了する場合、処理が終了となる。 [Step S75] It is determined whether or not to end the determination process. When continuing the determination process, the process proceeds to step S63 in FIG. 21, and a new person and object recognition process is executed. On the other hand, when ending the determination process, the process ends.

図２３は、スキャン点数のカウント処理の手順を示すフローチャートの例である。この図２３の処理は、図２１のステップＳ６５の処理に対応する。また、図２３の処理では、新たに認識された物体を示す情報（物体情報）がＲＡＭ１１２に順次記録されていく。この物体情報には、スキャンを実行済みか否かを示すフラグ情報が付加されている。 FIG. 23 is an example of a flow chart showing the procedure for counting the number of scan points. The processing of FIG. 23 corresponds to the processing of step S65 of FIG. In addition, in the processing of FIG. 23, information (object information) indicating a newly recognized object is sequentially recorded in the RAM 112 . Flag information indicating whether or not scanning has been completed is added to this object information.

［ステップＳ８１］購買動作判定部１２４ａは、物体領域（バウンディングボックス）の中心位置が取り出し領域Ｒ１内にあるかを判定する。中心位置が取り出し領域Ｒ１内にある場合、処理がステップＳ８２に進められ、中心位置が取り出し領域Ｒ１内にない場合、処理がステップＳ８６に進められる。 [Step S81] The purchasing motion determination unit 124a determines whether the center position of the object region (bounding box) is within the extraction region R1. If the center position is within the extraction region R1, the process proceeds to step S82, and if the center position is not within the extraction region R1, the process proceeds to step S86.

［ステップＳ８２］購買動作判定部１２４ａは、ステップＳ８１で取り出し領域Ｒ１内にあると判定された物体が、すでに記録済みの物体かを判定する。記録済みの物体とは、当該物体についての物体情報が、最も新しい物体情報としてＲＡＭ１１２に記録されていることを示す。記録済みの物体である場合、物体が取り出し領域Ｒ１内に位置する状態が前フレームから続いており、図２３の処理は終了する。一方、記録済みの物体でない場合、処理がステップＳ８３に進められる。 [Step S82] The purchasing motion determination unit 124a determines whether the object determined to be in the take-out region R1 in step S81 has already been recorded. A recorded object indicates that the object information about the object is recorded in the RAM 112 as the latest object information. If the object has already been recorded, the state in which the object is located within the extraction area R1 has continued from the previous frame, and the processing in FIG. 23 ends. On the other hand, if the object is not recorded, the process proceeds to step S83.

［ステップＳ８３］購買動作判定部１２４ａは、ステップＳ８１で取り出し領域Ｒ１内にあると判定された物体の物体情報を、ＲＡＭ１１２に新規に登録する。このとき、スキャンが未実行であることを示すフラグ情報が物体情報に付加される。 [Step S83] The purchasing motion determination unit 124a newly registers in the RAM 112 the object information of the object determined to be in the removal area R1 in step S81. At this time, flag information indicating that scanning has not been executed is added to the object information.

［ステップＳ８４］購買動作判定部１２４ａは、ステップＳ８３で新規に登録された物体情報より１つ前に登録された物体情報（すなわち、取り出し領域Ｒ１内に位置すると判定された１つ前の物体の物体情報）を参照する。購買動作判定部１２４ａは、この物体情報に付加されたフラグ情報に基づいて、この物体情報に対応する物体がスキャン未実行であるかを判定する。この物体がスキャン未実行である場合、処理がステップＳ８５に進められ、この物体がスキャン実行済みである場合、図２３の処理が終了する。 [Step S84] The purchasing motion determination unit 124a selects the object information registered immediately before the object information newly registered in step S83 (that is, the object information immediately preceding the object information that was newly registered in step S83). object information). Based on the flag information added to the object information, the purchasing operation determination unit 124a determines whether the object corresponding to the object information has not yet been scanned. If the object has not been scanned, the process proceeds to step S85, and if the object has been scanned, the process of FIG. 23 ends.

［ステップＳ８５］ステップＳ８４でＹｅｓと判定されたケースは、１つ前の物体が取り出し領域Ｒ１から持ち出し領域Ｒ２に移動することなく、取り出し領域Ｒ１の範囲外に移動したケースである。この場合、商品がスキャンされずに持ち出された可能性がある。そこで、購買動作判定部１２４ａは、例えば、スキャンが正しく行われなかったことを顧客に警告するための処理を実行する。例えば、スキャンが正しく行われなかったことを示す音声が出力される。または、スキャンが正しく行われなかったことを示す表示情報が、キャッシュレジスタ５０のディスプレイ５２に表示される。この後、図２３の処理が終了する。 [Step S85] The case where it is judged as Yes in step S84 is the case where the previous object has moved out of the take-out area R1 without moving from the take-out area R1 to the take-out area R2. In this case, there is a possibility that the product was taken out without being scanned. Therefore, the purchasing motion determination unit 124a executes, for example, a process for warning the customer that scanning was not performed correctly. For example, a sound is output indicating that the scan was not performed correctly. Alternatively, display information is displayed on the display 52 of the cash register 50 indicating that the scan was not performed correctly. After that, the processing in FIG. 23 ends.

なお、ステップＳ８５では、例えば、警告を発する処理の代わりに、異常な購買動作が行われたことを示す情報がＲＡＭ１１２に記録されてもよい。この場合、記録された情報に基づき、図２１のステップＳ６７や図２２のステップＳ７２で商品点数とスキャン点数とが同数と判定された場合に、異常があることを警告する処理が実行されてもよい。 It should be noted that in step S85, for example, information indicating that an abnormal purchase operation has been performed may be recorded in the RAM 112 instead of issuing a warning. In this case, based on the recorded information, if it is determined in step S67 of FIG. 21 or step S72 of FIG. good.

［ステップＳ８６］購買動作判定部１２４ａは、物体領域（バウンディングボックス）の中心位置が持ち出し領域Ｒ２内にあるかを判定する。この処理では、中心位置が持ち出し領域Ｒ２内にあり、かつ、対応する物体がＲＡＭ１１２に記録済みの物体である場合に、処理がステップＳ８７に進められる。この場合、物体が取り出し領域Ｒ１から持ち出し領域Ｒ２に移動したことを意味する。一方、中心位置が持ち出し領域Ｒ２の外にある場合、および、中心位置が持ち出し領域Ｒ２内にあるが、対応する物体の物体情報がＲＡＭ１１２に記録されていない場合には、図２３の処理が終了する。 [Step S86] The purchasing motion determination unit 124a determines whether the center position of the object region (bounding box) is within the bring-out region R2. In this process, if the center position is within the take-out area R2 and the corresponding object is already recorded in the RAM 112, the process proceeds to step S87. In this case, it means that the object has moved from the take-out area R1 to the take-out area R2. On the other hand, if the center position is outside the take-out area R2, or if the center position is inside the take-out area R2 but the object information of the corresponding object is not recorded in the RAM 112, the process of FIG. 23 ends. do.

［ステップＳ８７］購買動作判定部１２４ａは、商品点数をカウントアップする。
［ステップＳ８８］購買動作判定部１２４ａは、該当物体をスキャン済みの物体として記録する。すなわち、購買動作判定部１２４ａは、該当物体の物体情報に付加されたフラグ情報を、スキャン済みを示すように更新する。 [Step S87] The purchasing motion determination unit 124a counts up the number of products.
[Step S88] The purchasing motion determination unit 124a records the object as a scanned object. That is, the purchasing action determination unit 124a updates the flag information added to the object information of the object to indicate that the object has been scanned.

以上の第４の実施の形態によれば、人物に関係する商品が検出され、この商品の位置に基づいて正しい購買動作が行われたかが判定される。これとともに、購買動作の判定のために用いられる判定ルール情報が、骨格情報の検出結果に基づいて最適化される。これにより、商品ごとのテンプレートや学習データを用いることなく、また、人物の体格や撮影範囲における人物の位置に関係なく、人物が正しい購買動作を行ったか否かを精度よく判定できる。 According to the fourth embodiment described above, a product related to a person is detected, and based on the position of this product, it is determined whether or not the correct purchasing action was performed. Along with this, the determination rule information used for determining the purchasing behavior is optimized based on the skeleton information detection result. As a result, it is possible to accurately determine whether or not a person has made a correct purchase operation without using a template or learning data for each product, and regardless of the person's physique or the position of the person in the shooting range.

なお、上記の各実施の形態に示した装置（例えば、動作判別装置１、監視装置１００）の処理機能は、コンピュータによって実現することができる。その場合、各装置が有すべき機能の処理内容を記述したプログラムが提供され、そのプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、磁気記憶装置、光ディスク、半導体メモリなどがある。磁気記憶装置には、ハードディスク装置（ＨＤＤ）、磁気テープなどがある。光ディスクには、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）、ブルーレイディスク（Blu-ray Disc：ＢＤ、登録商標）などがある。 It should be noted that the processing functions of the devices (for example, the motion determination device 1 and the monitoring device 100) described in each of the above embodiments can be realized by a computer. In that case, a program describing the processing contents of the functions that each device should have is provided, and the above processing functions are realized on the computer by executing the program on the computer. A program describing the processing content can be recorded in a computer-readable recording medium. Computer-readable recording media include magnetic storage devices, optical disks, semiconductor memories, and the like. Magnetic storage devices include hard disk drives (HDD) and magnetic tapes. Optical discs include CDs (Compact Discs), DVDs (Digital Versatile Discs), Blu-ray Discs (BD, registered trademark), and the like.

プログラムを流通させる場合には、例えば、そのプログラムが記録されたＤＶＤ、ＣＤなどの可搬型記録媒体が販売される。また、プログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することもできる。 When distributing a program, for example, portable recording media such as DVDs and CDs on which the program is recorded are sold. It is also possible to store the program in the storage device of the server computer and transfer the program from the server computer to another computer via the network.

プログラムを実行するコンピュータは、例えば、可搬型記録媒体に記録されたプログラムまたはサーバコンピュータから転送されたプログラムを、自己の記憶装置に格納する。そして、コンピュータは、自己の記憶装置からプログラムを読み取り、プログラムにしたがった処理を実行する。なお、コンピュータは、可搬型記録媒体から直接プログラムを読み取り、そのプログラムにしたがった処理を実行することもできる。また、コンピュータは、ネットワークを介して接続されたサーバコンピュータからプログラムが転送されるごとに、逐次、受け取ったプログラムにしたがった処理を実行することもできる。 A computer that executes a program stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. The computer then reads the program from its own storage device and executes processing according to the program. The computer can also read the program directly from the portable recording medium and execute processing according to the program. Also, the computer can execute processing according to the received program every time the program is transferred from a server computer connected via a network.

１動作判別装置
２撮影画像
３ａ人物
３ｂ物体
３ｃ読み取り装置
４ａ，４ｂ画像領域
５ａ，５ｂ，６ａ，６ｂ骨格線 REFERENCE SIGNS LIST 1 motion determination device 2 photographed image 3a person 3b object 3c reader 4a, 4b image area 5a, 5b, 6a, 6b skeleton line

Claims

to the computer,
Acquire a photographed image of a person's movement,
Detecting the position of an object related to the person and the skeleton information of the person from the acquired captured image,
Based on the position of the object and the skeleton information, it is determined whether or not the action performed by the person with respect to the object is normal.
Action discrimination program that causes processing to be executed.

In the above determination,
calculating a first feature amount based on the position of the object detected from the first frame in the captured image and the skeleton information;
calculating a predicted value of the feature amount in the second frame by inputting the first feature amount into a predictor that predicts the feature amount in the second frame following the first frame;
calculating a second feature amount based on the position of the object detected from the second frame and the skeleton information;
Determining whether the action performed by the person on the object is normal based on the distance between the second feature amount and the predicted value;
2. The motion discrimination program according to claim 1.

The predictor is generated by learning using the first feature amount detected from each of a plurality of past images taken in the past as learning data,
3. The motion discrimination program according to claim 2.

each of the plurality of past images is an image obtained by photographing a state in which the action performed by the person on the object was performed normally;
4. The motion discrimination program according to claim 3.

In the determination, whether or not the person holds the object in his/her hand and causes the scanner mounted on the cash register to read the product information attached to the object is determined normally.
5. The motion discrimination program according to any one of claims 1 to 4.

In the above determination,
calculating a first feature value based on the position of the object detected from the captured image and the skeleton information;
By inputting the first feature amount to the classifier, identifying whether the action performed by the person on the object is normal action or one of a plurality of types of abnormal actions;
3. The motion discrimination program according to claim 2.

The discriminator includes the first feature quantity detected from each of a plurality of past images taken in the past, and the normal motion and the plurality of types of abnormal motion added to each of the plurality of past images. Generated by learning using a label indicating one of as learning data,
7. The motion discrimination program according to claim 6.

In the above determination,
calculating a first feature amount vector based on the position of the object and a second feature amount vector for each joint of the person based on the skeleton information;
calculating the first feature amount by integrating the first feature amount vector and the second feature amount vector for each joint;
8. The motion discrimination program according to any one of claims 2, 3, 4, 6 and 7.

area information indicating a plurality of image areas to be set in the captured image is stored in a storage unit of the computer;
In the above determination,
correcting the set positions of the plurality of image regions in the captured image based on the skeleton information;
Determining whether or not the action performed by the person with respect to the object is abnormal based on the time-series relationship between the plurality of image regions whose set positions are corrected and the position of the object;
2. The motion discrimination program according to claim 1.

In the determination, whether or not the person holds the object in his/her hand and scans the product information attached to the object with a scanner mounted on the cash register is performed normally.
10. The motion discrimination program according to claim 9.

Two image areas arranged adjacent to each other in the direction of the cash register from an area in which the person can be positioned in the photographed image are set as the plurality of image areas,
In the correction of the setting position,
calculating a distance between two specific joints of the person in the captured image based on the skeleton information;
repositioning a border between the two image regions based on the distance;
11. The motion discrimination program according to claim 10.

to the computer;
counting a first number of times that the scan operation was determined to have been performed normally by the determination and a second number of times that the scan operation was notified by the cash register;
outputting predetermined warning information when the first number of times and the second number of times do not match;
further processing,
12. The motion discrimination program according to claim 10 or 11.

to the computer;
detecting a type of interaction between the person and each of the one or more objects when the photographed image includes one or more objects with which the person interacts;
identifying the object related to the person among the one or more objects based on the detected type;
2. The motion discrimination program according to claim 1, further causing a process to be executed.

the computer
Acquire a photographed image of a person's movement,
Detecting the position of an object related to the person and the skeleton information of the person from the acquired captured image,
Based on the position of the object and the skeleton information, it is determined whether or not the action performed by the person with respect to the object is normal.
operation determination method.

Acquire a photographed image of a person's movement,
Detecting the position of an object related to the person and the skeleton information of the person from the acquired captured image,
a processing unit that determines whether or not an action performed by the person with respect to the object is normal based on the position of the object and the skeleton information;
A motion discrimination device having