JP2021077177A

JP2021077177A - Operation recognition apparatus, operation recognition method, and operation recognition program

Info

Publication number: JP2021077177A
Application number: JP2019204279A
Authority: JP
Inventors: 関　海克; Haike Guan; 海克関
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2019-11-11
Filing date: 2019-11-11
Publication date: 2021-05-20

Abstract

To improve accuracy in recognition of a plurality of monitor targets.SOLUTION: A recognition unit recognizes a plurality of monitor objects based on a captured image. An identification number setting unit sets an identification number to each of the monitor objects. A tracking processing unit tracks each of monitor objects recognized in the captured image and set with the identification numbers, based on the region of each of the monitor objects recognized in the captured image. An operation recognition processing unit recognizes an operation of each of the monitor objects in a region of the captured image destination where each of the monitor objects is tracked. A recognition result output unit outputs a recognition results of an operation of each of the recognized monitor objects.SELECTED DRAWING: Figure 4

Description

本発明は、動作認識装置、動作認識方法及び動作認識プログラムに関する。 The present invention relates to a motion recognition device, a motion recognition method, and a motion recognition program.

今日において、例えば人物、動物、作業機械等の監視対象の行動又は動作等をカメラ装置で撮像し、この撮像画像を解析することで、人物、動物、作業機械等の行動を可視化して認識可能とする行動認識装置が知られている。 Today, for example, by capturing the behavior or movement of a monitored object such as a person, animal, or work machine with a camera device and analyzing this captured image, it is possible to visualize and recognize the behavior of the person, animal, work machine, or the like. The behavior recognition device is known.

例えば、特許文献１（特開２０１１−１００１７５号公報）には、映像処理のみで、混雑したシーンに含まれる人物の行動を判定可能とした人物行動判定装置が開示されている。この人物行動判定装置は、映像処理で同一と判別した人物の重心位置及び重心の軌跡に基づいて重心軌跡を特徴量として検出する。そして、検出した特徴量を、事前登録されている行動毎の軌跡特徴量と照合することで、人物の行動を判断する。 For example, Patent Document 1 (Japanese Unexamined Patent Publication No. 2011-100175) discloses a human behavior determining device capable of determining the behavior of a person included in a crowded scene only by video processing. This person behavior determination device detects the center of gravity locus as a feature amount based on the position of the center of gravity and the locus of the center of gravity of the person determined to be the same by the image processing. Then, the behavior of the person is determined by collating the detected feature amount with the previously registered locus feature amount for each action.

しかし、特許文献１に開示されている人物行動判定装置の場合、作業又は動作している監視対象が複数となった場合、同一の監視対象を連続して特定することが困難となり、監視対象の認識精度が低下する問題がある。特に、複数の監視対象同士が撮像画像上で重なった場合、又は、一旦、物陰に隠れて再登場した場合等では、同一の監視対象を連続して特定することが困難となり、監視対象の認識精度が顕著に低下する。 However, in the case of the person behavior determination device disclosed in Patent Document 1, when there are a plurality of monitoring targets that are working or operating, it becomes difficult to continuously identify the same monitoring target, and the monitoring target There is a problem that the recognition accuracy is lowered. In particular, when a plurality of monitoring targets overlap each other on the captured image, or when they are hidden in the shadow and reappear, it becomes difficult to continuously identify the same monitoring target, and the monitoring target is recognized. The accuracy is significantly reduced.

本発明は、上述の課題に鑑みてなされたものであり、複数の監視対象の認識精度の向上を図った動作認識装置、動作認識方法及び動作認識プログラムの提供を目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a motion recognition device, a motion recognition method, and a motion recognition program for improving the recognition accuracy of a plurality of monitored objects.

上述した課題を解決し、目的を達成するために、本発明は、撮像画像に基づいて、複数の監視対象を認識する認識部と、認識部で認識された各監視対象に識別番号を設定する識別番号設定部と、撮像画像で認識された各監視対象の領域に基づいて、撮像画像で認識される、識別番号が設定された各監視対象をそれぞれ追尾する追尾処理部と、各監視対象を追尾した撮像画像の結果領域で、各監視対象の動作を認識する動作認識処理部と、認識された各監視対象の動作の認識結果を出力する認識結果出力部とを有する。 In order to solve the above-mentioned problems and achieve the object, the present invention sets a recognition unit that recognizes a plurality of monitoring targets and an identification number for each monitoring target recognized by the recognition unit based on the captured image. Based on the identification number setting unit and the area of each monitoring target recognized in the captured image, the tracking processing unit that tracks each monitoring target with the identification number set recognized in the captured image, and each monitoring target In the result area of the tracked captured image, it has an operation recognition processing unit that recognizes the operation of each monitoring target and a recognition result output unit that outputs the recognition result of the recognized operation of each monitoring target.

本発明によれば、複数の監視対象の認識精度を向上させることができるという効果を奏する。 According to the present invention, there is an effect that the recognition accuracy of a plurality of monitored objects can be improved.

図１は、実施の形態の行動認識システムのシステム構成図である。FIG. 1 is a system configuration diagram of the behavior recognition system of the embodiment. 図２は、実施の形態の行動認識システムに設けられている行動認識装置のハードウェア構成図である。FIG. 2 is a hardware configuration diagram of an action recognition device provided in the action recognition system of the embodiment. 図３は、行動認識装置の機能ブロック図である。FIG. 3 is a functional block diagram of the action recognition device. 図４は、行動認識装置における、監視対象の行動認識処理の流れを示すフローチャートである。FIG. 4 is a flowchart showing the flow of the behavior recognition process of the monitoring target in the behavior recognition device. 図５は、行動認識装置の認識部による作業者の認識動作を説明するための図である。FIG. 5 is a diagram for explaining the recognition operation of the worker by the recognition unit of the action recognition device. 図６は、行動認識装置の認識部による特徴量の演算処理を説明するための図である。FIG. 6 is a diagram for explaining the calculation processing of the feature amount by the recognition unit of the action recognition device. 図７は、行動認識装置の認識部による作業者の認証処理の階層構造を示す図である。FIG. 7 is a diagram showing a hierarchical structure of worker authentication processing by the recognition unit of the behavior recognition device. 図８は、認証された各作業者に対して、行動認識装置の初期ＩＤ設定部により、それぞれ設定されたＩＤを示す図である。FIG. 8 is a diagram showing IDs set by the initial ID setting unit of the action recognition device for each authenticated worker. 図９は、行動認識装置の追尾処理部に対して順次入力される作業者の撮像画像を示す図である。FIG. 9 is a diagram showing captured images of workers sequentially input to the tracking processing unit of the behavior recognition device. 図１０は、行動認識装置の追尾処理部で行われる追尾動作を説明するための図である。FIG. 10 is a diagram for explaining a tracking operation performed by the tracking processing unit of the action recognition device. 図１１は、行動認識装置の追尾処理部で行われる追尾動作を説明するための他の図である。FIG. 11 is another diagram for explaining the tracking operation performed by the tracking processing unit of the action recognition device. 図１２は、行動認識装置の追尾処理部が、作業者の予測領域の特徴量を評価して重み係数を得る動作を説明するための図である。FIG. 12 is a diagram for explaining an operation in which the tracking processing unit of the action recognition device evaluates the feature amount of the prediction area of the worker and obtains the weighting coefficient. 図１３は、カメラ装置から行動認識装置に入力される、作業者の複数フレームの撮像画像を示す図である。FIG. 13 is a diagram showing captured images of a plurality of frames of an operator input from the camera device to the action recognition device. 図１４は、複数フレームの撮像画像からなる時空間画像データを説明するための図である。FIG. 14 is a diagram for explaining spatiotemporal image data composed of captured images of a plurality of frames. 図１５は、行動認識装置の行動認識処理部で認識された作業者の行動認識結果の一例を示す図である。FIG. 15 is a diagram showing an example of the behavior recognition result of the worker recognized by the behavior recognition processing unit of the behavior recognition device. 図１６は、歩行から棚入れまでの一連の行動のうち、歩行中に、作業者が検出困難となった行動認識結果の例を示す図である。FIG. 16 is a diagram showing an example of an action recognition result in which an operator has difficulty in detecting during walking in a series of actions from walking to shelving. 図１７は、歩行から棚入れまでの一連の行動のうち、棚入れ中に、作業者が検出困難となった行動認識結果の例を示す図である。FIG. 17 is a diagram showing an example of an action recognition result in which an operator has difficulty in detecting a series of actions from walking to shelving during shelving.

以下、動作認識装置、動作認識方法及び動作認識プログラムの適用例となる実施の形態の行動認識システムを説明する。 Hereinafter, the behavior recognition system of the embodiment as an application example of the motion recognition device, the motion recognition method, and the motion recognition program will be described.

（システム構成）
図１は、実施の形態の行動認識システムのシステム構成図である。この図１に示すように行動認識システムは、行動認識装置１及びカメラ装置２を有している。カメラ装置２は、例えば単数又は複数の人物、動物、ロボット等の監視対象を撮像する。 (System configuration)
FIG. 1 is a system configuration diagram of the behavior recognition system of the embodiment. As shown in FIG. 1, the behavior recognition system includes a behavior recognition device 1 and a camera device 2. The camera device 2 images a monitored object such as a single person or a plurality of people, animals, and robots.

行動認識装置１は、入力インタフェース部３及び行動認識処理部４を有している。インタフェース３は、カメラ装置２からの監視対象の撮像画像を取得する。行動認識処理部４は、インタフェース３を介して取得した撮像画像に基づいて、監視対象の行動（動き）を認識し、この行動認識結果を例えば監視者が見るモニタ装置等の外部機器に出力する。 The action recognition device 1 has an input interface unit 3 and an action recognition processing unit 4. The interface 3 acquires a captured image to be monitored from the camera device 2. The behavior recognition processing unit 4 recognizes the behavior (movement) of the monitoring target based on the captured image acquired via the interface 3, and outputs the behavior recognition result to an external device such as a monitor device viewed by the monitor, for example. ..

（行動認識装置のハードウェア構成）
図２は、行動認識装置１のハードウェア構成図である。この図２に示すように、行動認識装置１は、ＣＰＵ（Central Processing Unit）１１、ＲＯＭ（Read Only Memory）１２、ＲＡＭ（Random Access Memory）１３、通信部１４、ＨＤＤ（Hard Disk Drive）１５、入力インタフェース部３及び出力インタフェース部１７を有している。これら各部３、１１〜１５及び１７は、バスライン１８を介して相互に接続されている。 (Hardware configuration of behavior recognition device)
FIG. 2 is a hardware configuration diagram of the action recognition device 1. As shown in FIG. 2, the action recognition device 1 includes a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, a communication unit 14, and an HDD (Hard Disk Drive) 15. It has an input interface unit 3 and an output interface unit 17. Each of these parts 3, 11 to 15 and 17 is connected to each other via a bus line 18.

入力インタフェース部３には、上述のカメラ装置２の他、例えばキーボード装置及びマウス装置等の操作部２０が接続されている。出力インタフェース部１７には、行動認識結果を表示するためのモニタ装置（表示部）が記憶されている。出力インタフェース部１７を介して、例えばＨＤＤ又は半導体メモリ等の外部記憶装置に行動認識結果を出力してもよい。 In addition to the camera device 2 described above, an operation unit 20 such as a keyboard device and a mouse device is connected to the input interface unit 3. The output interface unit 17 stores a monitor device (display unit) for displaying the action recognition result. The action recognition result may be output to an external storage device such as an HDD or a semiconductor memory via the output interface unit 17.

通信部１４には、例えばインターネット等の広域網又はＬＡＮ（Local Area Network）等のプライベート網等のネットワークを介してサーバ装置２２が接続されている。通信部１４は、このサーバ装置２２に対して行動認識結果を送信して記憶させる。これにより、管理者等は、例えばスマートホン、タブレット端末装置、パーソナルコンピュータ装置等の通信機器を介してサーバ装置２２にアクセスして行動認識結果を取得でき、監視対象を遠隔監視できる。 The server device 22 is connected to the communication unit 14 via, for example, a wide area network such as the Internet or a private network such as a LAN (Local Area Network). The communication unit 14 transmits and stores the action recognition result to the server device 22. As a result, the administrator or the like can access the server device 22 via a communication device such as a smart phone, a tablet terminal device, or a personal computer device to acquire the action recognition result, and can remotely monitor the monitoring target.

ＨＤＤ１５には、監視対象の行動認識処理を行う行動認識プログラムが記憶されている。ＣＰＵ１１は、この行動認識プログラムを実行することで、以下に説明する各機能を実現し、監視対象の行動認識処理を実行する。 The HDD 15 stores an action recognition program that performs an action recognition process to be monitored. By executing this action recognition program, the CPU 11 realizes each function described below and executes the action recognition process to be monitored.

（行動認識機能）
図３は、ＣＰＵ１１が行動認識プログラムを実行することで実現される各機能の機能ブロック図である。この図３に示すように、ＣＰＵ１１は、行動認識プログラムを実行することで入力部３１、認識部３２、初期識別番号設定部（初期ＩＤ設定部）３３、追尾処理部３４、行動認識処理部３５、認識結果出力部３６、監視対象認識辞書入力部３７及び行動認識辞書入力部３８の各機能を実現する。初期ＩＤ設定部３３は、識別番号設定部の一例である。行動認識処理部３５は、動作認識処理部の一例である。 (Behavior recognition function)
FIG. 3 is a functional block diagram of each function realized by the CPU 11 executing the action recognition program. As shown in FIG. 3, the CPU 11 executes an action recognition program to execute an input unit 31, a recognition unit 32, an initial identification number setting unit (initial ID setting unit) 33, a tracking processing unit 34, and an action recognition processing unit 35. , The recognition result output unit 36, the monitoring target recognition dictionary input unit 37, and the action recognition dictionary input unit 38 are realized. The initial ID setting unit 33 is an example of the identification number setting unit. The action recognition processing unit 35 is an example of the action recognition processing unit.

なお、入力部３１〜行動認識辞書入力部３８は、ソフトウェアで実現することとしたが、これらのうち、一部又は全部を、ＩＣ（Integrated Circuit）等のハードウェアで実現してもよい。 Although the input units 31 to the action recognition dictionary input unit 38 are realized by software, some or all of them may be realized by hardware such as an IC (Integrated Circuit).

また、行動認識プログラムは、インストール可能な形式または実行可能な形式のファイルでＣＤ−ＲＯＭ、フレキシブルディスク（ＦＤ）などのコンピュータ装置で読み取り可能な記録媒体に記録して提供してもよい。また、行動認識プログラムは、ＣＤ−Ｒ、ＤＶＤ（Digital Versatile Disk）、ブルーレイディスク（登録商標）、半導体メモリなどのコンピュータ装置で読み取り可能な記録媒体に記録して提供してもよい。また、行動認識プログラムは、インターネット等のネットワーク経由でインストールするかたちで提供してもよいし、機器内のＲＯＭ等に予め組み込んで提供してもよい。 Further, the action recognition program may be provided by recording a file in an installable format or an executable format on a recording medium readable by a computer device such as a CD-ROM or a flexible disk (FD). Further, the action recognition program may be provided by recording on a recording medium readable by a computer device such as a CD-R, a DVD (Digital Versatile Disk), a Blu-ray disc (registered trademark), or a semiconductor memory. Further, the behavior recognition program may be provided in the form of being installed via a network such as the Internet, or may be provided by being incorporated in advance in a ROM or the like in the device.

（行動認識処理）
このような入力部３１〜行動認識辞書入力部３８による監視対象の行動認識処理を、図４のフローチャートを用いて説明する。ＣＰＵ１１は、行動認識プログラムを読み込むことで、この図４のフローチャートに示す各処理を実行する。 (Behavior recognition processing)
The behavior recognition process of the monitoring target by the input unit 31 to the behavior recognition dictionary input unit 38 will be described with reference to the flowchart of FIG. By reading the action recognition program, the CPU 11 executes each process shown in the flowchart of FIG.

すなわち、まず、入力部３１は、カメラ装置２からの撮像画像を取得する（ステップＳ１）。この撮像画像は、例えば監視対象の一例となる一人又は複数の作業者が、職場の商品を棚に入れる作業等の作業状況を撮像した撮像画像である。 That is, first, the input unit 31 acquires the captured image from the camera device 2 (step S1). This captured image is a captured image obtained by capturing a work situation such as a work in which one or a plurality of workers, which is an example of a monitoring target, put a product in a workplace on a shelf.

次に、監視対象認識辞書入力部３７が、作業者を認識するための監視対象認識辞書を、認識部３２に入力する（ステップＳ２）。この監視対象認識辞書は、認識部３２の各階層の評価値を計算するための特徴量、重み係数及び各階層での評価値閾値を示す辞書（データ群）となっている。この監視対象認識辞書は、人の撮像画像及び人ではない物体の撮像画像に対して、以下に説明する特徴量を計算するための矩形の頂点位置座標、重み係数及び各階層での評価閾値を予め学習して形成される。 Next, the monitoring target recognition dictionary input unit 37 inputs the monitoring target recognition dictionary for recognizing the worker into the recognition unit 32 (step S2). This monitoring target recognition dictionary is a dictionary (data group) showing a feature amount, a weighting coefficient, and an evaluation value threshold in each layer for calculating the evaluation value of each layer of the recognition unit 32. This monitoring target recognition dictionary provides rectangular vertex position coordinates, weighting coefficients, and evaluation thresholds in each layer for calculating the features described below for captured images of humans and non-human objects. It is formed by learning in advance.

次に、認識部３２は、この監視対象認識辞書を参照し、入力部３１により取得された撮像画像に写っている作業者を認識する（ステップＳ３）。図５は、認識部３２による作業者の認識動作を説明するための図である。この図５に示すように、認識部３２は、撮影画像５０の範囲内で、例えば矩形状等の所定の形状のブロック５１、５２・・・を切り出す。ブロック５１の左上の座標（Ｘｓ，Ｙｓ）と、ブロック５１の右下の座標（Ｘｅ，Ｙｅ）は、撮像画像５０内におけるブロック５１の位置及び矩形の大きさで決定される。 Next, the recognition unit 32 refers to the monitoring target recognition dictionary and recognizes the worker in the captured image acquired by the input unit 31 (step S3). FIG. 5 is a diagram for explaining the recognition operation of the operator by the recognition unit 32. As shown in FIG. 5, the recognition unit 32 cuts out blocks 51, 52, ... Of a predetermined shape such as a rectangular shape within the range of the captured image 50. The upper left coordinates (Xs, Ys) of the block 51 and the lower right coordinates (Xe, Ye) of the block 51 are determined by the position of the block 51 and the size of the rectangle in the captured image 50.

認識部３２は、大きいサイズから小さいサイズの順に、ブロック５１を選択して以下に説明する特徴量等の演算を行う。すなわち、大きいブロック及び小さいブロックの処理時間は同じである。また、撮像画像５０内に存在するブロックとしては、大きいサイズのブロックの数は少なく、小さいサイズのブロック数は多い。このため、認識部３２は、大きいサイズのブロックから小さいサイズのブロックの順に選択して特徴量等の演算を行う。これにより、オブジェクト（監視対象）の迅速な検出が可能となる。 The recognition unit 32 selects the block 51 in order from the largest size to the smallest size, and performs calculations such as the feature amount described below. That is, the processing time of the large block and the small block is the same. Further, as the blocks existing in the captured image 50, the number of large size blocks is small and the number of small size blocks is large. Therefore, the recognition unit 32 selects a block having a large size to a block having a small size in this order, and performs a calculation such as a feature amount. This enables quick detection of objects (monitored targets).

図６（ａ）〜図６（ｂ）は、特徴量の演算処理を説明するための図である。この図６（ａ）〜図６（ｂ）に示すように、認識部３２は、入力された監視対象認識辞書を参照して、ブロック内における白黒の矩形領域に白い領域内の画素値を加算処理し、黒い画素領域内の画素合計値との差を、ブロック内の特徴量ｈ（ｘ）として算出する。そして、認識部３２は、この特徴量ｈ（ｘ）に、所定の重みを付け処理を行うことで、評価値ｆ（ｘ）を算出する。 6 (a) to 6 (b) are diagrams for explaining the arithmetic processing of the feature amount. As shown in FIGS. 6 (a) to 6 (b), the recognition unit 32 adds the pixel values in the white area to the black-and-white rectangular area in the block with reference to the input monitoring target recognition dictionary. The processing is performed, and the difference from the total pixel value in the black pixel region is calculated as the feature amount h (x) in the block. Then, the recognition unit 32 calculates the evaluation value f (x) by performing a predetermined weighting process on the feature amount h (x).

以下の（１）式が、このような評価値ｆ（ｘ）の演算式である。 The following equation (1) is an arithmetic expression of such an evaluation value f (x).

ここで、認識部３２は、図７に示すように第１の階層〜第ｎの階層（ｎは自然数）等の階層毎に評価値ｆ（ｘ）を算出する。算出した評価値ｆ（ｘ）が、監視対象認識辞書で示される、予め設定した閾値よりも小さい場合、認識部３２は、人以外の物体として判断し（非人ブロック）、そのブロックの評価を中止する。これに対して、算出した評価値ｆ（ｘ）が、監視対象認識辞書で示される、予め設定した閾値以上の場合、認識部３２は、そのブロックは、人が写っているブロックとして判断する。 Here, as shown in FIG. 7, the recognition unit 32 calculates the evaluation value f (x) for each layer such as the first layer to the nth layer (n is a natural number). When the calculated evaluation value f (x) is smaller than the preset threshold value indicated by the monitoring target recognition dictionary, the recognition unit 32 determines that the object is a non-human object (non-human block) and evaluates the block. Cancel. On the other hand, when the calculated evaluation value f (x) is equal to or greater than a preset threshold value indicated by the monitoring target recognition dictionary, the recognition unit 32 determines that the block is a block in which a person is shown.

このように人が写っているブロックが認識されると、初期ＩＤ設定部３３は、各ブロックに対して識別番号を設定する（ステップＳ４）。図８の例は、人が写っているブロックとして、３つのブロックが認識され、初期ＩＤ設定部３３が、各ブロック（初期領域）に対して、ＩＤ１、ＩＤ２及びＩＤ３の識別番号を設定した例である。 When the block in which a person is shown is recognized in this way, the initial ID setting unit 33 sets an identification number for each block (step S4). In the example of FIG. 8, three blocks are recognized as blocks in which a person is shown, and the initial ID setting unit 33 sets the identification numbers of ID1, ID2, and ID3 for each block (initial area). Is.

次に、追尾処理部３４は、各ブロックに写っている作業者を追尾する（ステップＳ５）。具体的には、追尾処理部３４には、図９に示すように、最初のフレームＦ０が供給され、Δｔ時間後に次のフレームＦ１が供給される等のように、入力部３１を介して取得された撮像画像が、順次（時系列に）供給される。 Next, the tracking processing unit 34 tracks the worker shown in each block (step S5). Specifically, as shown in FIG. 9, the tracking processing unit 34 is supplied with the first frame F0, the next frame F1 is supplied after Δt time, and so on. The captured images are supplied sequentially (in chronological order).

追尾処理部３４は、最初のフレームで、作業者の状態「Ｓ（ｘ，ｙ，ｖｘ，ｖｙ，Ｈｘ，Ｈｙ，Ｍ）」を定義する。ｘ，ｙは、図９に矩形の枠で囲んで示す作業者（追尾対象）が写っているブロックの左上の点Ａの座標値である。Ｈｘ、Ｈｙは作業者が写っているブロックの横のサイズ及び縦のサイズを示す。ｖｘ，ｖｙは、ブロック内の作業者が横方向及び縦方向に動く速度（初期値は０に設定）を示す。Ｍは、ブロック内の作業者の変倍率変化（前フレームに対する作業者のサイズ変化率：初期値は０に設定）である。 The tracking processing unit 34 defines the worker state "S (x, y, vx, by, Hx, Hy, M)" in the first frame. x and y are the coordinate values of the upper left point A of the block in which the worker (tracking target) shown in FIG. 9 surrounded by a rectangular frame is shown. Hx and Hy indicate the horizontal size and the vertical size of the block in which the worker is shown. vx and vy indicate the speed at which the worker in the block moves in the horizontal and vertical directions (the initial value is set to 0). M is a change in the variable magnification of the worker in the block (the rate of change in the size of the worker with respect to the previous frame: the initial value is set to 0).

追尾処理部３４は、図１０及び図１１に示すように、作業者の状態の予測→観測→修正を繰り返し行うことで、作業者を追尾する。図１１に示す「Ｓｔ」は、時間ｔにおける作業者（追尾対象）状態を示し、「Ｙｔ」は、時間ｔの観測結果を示す。 As shown in FIGS. 10 and 11, the tracking processing unit 34 tracks the worker by repeatedly predicting, observing, and correcting the state of the worker. “St” shown in FIG. 11 indicates a worker (tracking target) state at time t, and “Yt” indicates an observation result at time t.

追尾処理部３４は、以下の（２）式の演算により、作業者の状態Ｓ_ｋ-１から状態Ｓ_ｋまでの状態変化を算出する。 Tracking processing unit 34, the calculation of equation (2) below, to calculate the change of state to a state S _k from the state S _k-1 of the worker.

追尾処理部３４は、以下の（３）式〜（９）式の演算を行うことで、ｋ−１の作業者の状態から次のｋの作業者の状態を予測する。 The tracking processing unit 34 predicts the state of the next worker of k from the state of the worker of k-1 by performing the following calculations of equations (3) to (9).

次に、追尾処理部３４は、以下の（１０）式の演算を行うことで、作業者の状態Ｓ_ｋの観測データＺ_ｋを算出する。 Next, tracking processing unit 34, by performing the calculation of the following equation (10) calculates the observed data Z _k of state S _k of the worker.

観測データは、作業者の図９に示したブロック内（追尾対象が写っている領域内）のカラーヒストグラムとなっている。追尾処理部３４は、以下の（１１）式の演算を行うことで、カラーヒストグラムを算出する。（１１）式において、ｋ（）は、カラーヒストグラムを計算するカーネルである。また、（１１）式において、ａは変倍率、ｈ（ｘ_ｉ）はカラー画素値である。Ｐはカラーヒストグラムの頻度である。 The observation data is a color histogram in the block (in the area where the tracking target is shown) shown in FIG. 9 of the operator. The tracking processing unit 34 calculates the color histogram by performing the calculation of the following equation (11). In equation (11), k () is the kernel that calculates the color histogram. Further, (11) In the equation, a is the magnification ratio, h _{(x i)} is a color pixel value. P is the frequency of the color histogram.

この（１１）式におけるカーネルｋは、以下の（１２）式で算出される。 The kernel k in the equation (11) is calculated by the following equation (12).

この１２式で算出されるカーネルｋを用いることで、ブロック（被写体領域）の中心部の値は大きな値が算出され、ブロックの周辺に近くなるほど、小さな値が算出されるようになる。これにより、ブロック（被写体領域）の周辺の影響を軽減することができる。 By using the kernel k calculated by these 12 equations, a large value is calculated for the central portion of the block (subject area), and a smaller value is calculated as the value is closer to the periphery of the block. This makes it possible to reduce the influence of the periphery of the block (subject area).

次に、追尾処理部３４は、以下の（１３）式の演算を行うことで、予測状態Ｓ_ｋ ^（ｉ）を観測し、観測結果評価の重み加重平均を計算する。この際、追尾処理部３４は、図１２に複数の枠で示すようにＮ個の予測領域の特徴量を評価し、重み係数を求める。この求めた重み係数の加重平均が追尾結果となる。 _{Next, the tracking processing unit 34 observes the predicted state Sk} ⁽ⁱ⁾ by performing the calculation of the following equation (13), and calculates the weighted average of the observation result evaluation. At this time, the tracking processing unit 34 evaluates the feature quantities of N prediction regions as shown by a plurality of frames in FIG. 12, and obtains the weighting coefficient. The weighted average of the obtained weighting factors is the tracking result.

具体的には、追尾処理部３４は、上述の式（３）式〜（９）式にランダム変数を加えた、以下の式（１４）式〜（２０）式の演算を行うことで、作業者の状態Ｓ_ｋを予測する。ｒ_１〜ｒ_７としては、例えばガウシアン（Gaussian）ランダム変数を用いることができる。 Specifically, the tracking processing unit 34 works by performing the operations of the following equations (14) to (20) by adding a random variable to the above equations (3) to (9). to predict the state _{S k} of the person. As r _{1 to} _{r 7} , for example, a Gaussian random variable can be used.

次に、追尾処理部３４は、予測した状態の重み係数π^ｉを、以下のように計算する。すなわち、まず、追尾処理部３４は、作業者の追尾領域から計算したヒストグラムＰをモデルとして使用し、予測したＮ個の予測領域のカラーヒストグラムｑとして、以下の（２１）式の演算を行うことで、バタチャリア（Bhattacharyya）係数を算出する。 Next, the tracking processing unit 34 ^{calculates the weighting coefficient π i} in the predicted state as follows. That is, first, the tracking processing unit 34 uses the histogram P calculated from the tracking area of the operator as a model, and performs the calculation of the following equation (21) as the color histogram q of the predicted N prediction areas. Then, calculate the Bhattacharyya coefficient.

このバタチャリア係数は、作業者の追尾領域のカラーヒストグラム（＝モデル）と、予測した領域のカラーヒストグラムの類似度を表す。このため、バタチャリア係数の値が大きいということは、両者の類似度が高いことを意味する。追尾処理部３４は、このバタチャリア係数を用いて、以下の（２２）式の演算を行うことで、予測した状態の重み係数π^ｉを算出する。

This batacharia coefficient represents the similarity between the color histogram (= model) of the tracking area of the operator and the color histogram of the predicted area. Therefore, a large value of the Batacharia coefficient means that the degree of similarity between the two is high. ^{The tracking processing unit 34 calculates the weighting coefficient π i} in the predicted state by performing the calculation of the following equation (22) using this batacharia coefficient.

なお、（２２）式における「ｄ」は、以下の（２３）式で算出される。 In addition, "d" in the formula (22) is calculated by the following formula (23).

追尾処理部３４は、このように算出した重み係数π^ｉ、及び、予測した状態Ｓ^ｉ _ｋを用いて、上述の（１３）式の演算を行うことで、作業者の追尾結果を算出する。 Tracking processing unit 34 is thus calculated weight coefficients [pi ^i, and, using the state S ⁱ _k expected, by carrying out calculation of the above equation (13), and calculates the tracking result of the operator.

次に、追尾処理部３４は、上述の（２１）式の演算を行うことで算出したバタチャリア係数で示される追尾結果の類似度ρ［ｐ，ｑ］を、作業者追尾の確信度とする。この確信度の値は、０〜１．０の範囲の値となる。ρ［ｐ，ｑ］の値が高くなるほど、確信度が高いことを示す。 Next, the tracking processing unit 34 uses the similarity ρ [p, q] of the tracking result represented by the batacharia coefficient calculated by performing the calculation of the above equation (21) as the certainty of worker tracking. The value of this certainty is in the range of 0 to 1.0. The higher the value of ρ [p, q], the higher the certainty.

追尾処理部３４は、この確信度が、予め設定されている所定の閾値以上の値であれば、追尾成功と判断し、図８に示した各作業者のＩＤを、そのまま維持する。また、追尾処理部３４は、例えば追尾成功と判断した作業者の位置情報（＝矩形情報）及び作業者の矩形領域の画像データを作業者情報として記憶部に保存する。次のフレームの画像情報が入力されると、追尾処理部３４は、上述の追尾処理を行い、追尾成功した場合に、作業者のＩＤを維持すると共に、記憶部に保存した作業者情報を更新する。 If the certainty level is equal to or higher than a predetermined threshold value set in advance, the tracking processing unit 34 determines that the tracking is successful, and maintains the ID of each worker shown in FIG. 8 as it is. Further, the tracking processing unit 34 stores, for example, the position information (= rectangular information) of the worker determined to be successful in tracking and the image data of the rectangular area of the worker as the worker information in the storage unit. When the image information of the next frame is input, the tracking processing unit 34 performs the above-mentioned tracking processing, and when the tracking is successful, the worker ID is maintained and the worker information saved in the storage unit is updated. To do.

次に、このような作業者の追尾中において、作業者同士が重なり、物陰に移動し、又は、カメラ装置２の撮像範囲外に移動することで、追尾していた作業者が撮像画像上から消失又は検出困難となり、作業者の追尾が困難となる場合がある。このような場合、消失又は検出困難となったＩＤの作業者のデータを取得することも困難となる。 Next, during the tracking of such workers, the workers overlap each other and move to the shadow of the object, or move out of the imaging range of the camera device 2, so that the worker who was tracking can move from the captured image. It may disappear or become difficult to detect, making it difficult for the operator to track. In such a case, it becomes difficult to acquire the data of the worker whose ID is lost or difficult to detect.

なお、「追尾が困難となる場合」とは、「追尾ができない場合」、「追尾が非継続となった場合」、又は、「追尾不良となった場合」等と同義語である。 In addition, "when tracking becomes difficult" is a synonym for "when tracking is not possible", "when tracking is discontinued", "when tracking becomes poor", and the like.

作業者の追尾に成功している場合、上述のように、追尾処理部３４は、追尾している作業者のＩＤを維持し、記憶部の作業者情報を更新する。これに対して、追尾困難となった場合、追尾処理部３４は、最後に追尾が成功したフレームで検出された追尾困難となった作業者のＩＤ及び作業者情報を維持する。例えば、図８の例において、ＩＤ３の作業者の追尾が困難となった場合、追尾処理部３４は、追尾が最後に成功したフレームで検出された作業者のＩＤ３及び作業者情報を維持する。 When the tracking of the worker is successful, as described above, the tracking processing unit 34 maintains the ID of the worker being tracked and updates the worker information in the storage unit. On the other hand, when tracking becomes difficult, the tracking processing unit 34 maintains the ID and worker information of the worker who has become difficult to track, which was detected in the last frame in which tracking was successful. For example, in the example of FIG. 8, when it becomes difficult to track the worker with ID3, the tracking processing unit 34 maintains the ID3 of the worker and the worker information detected in the frame in which the tracking was last successful.

作業者の追尾が困難となった場合、認識部３２は、再度、各作業者の認識処理を行い、各作業者に対してＩＤを再設定する。このＩＤの再設定の際、再認識した各作業者のうち、追尾困難となった作業者の類似度に最も近似する類似度を有する作業者に対して、追尾困難となった作業者のＩＤを設定する。これにより、同一の作業者に対して同じＩＤを設定できる。 When it becomes difficult to track the worker, the recognition unit 32 performs the recognition process of each worker again and resets the ID for each worker. When resetting this ID, among the workers who were re-recognized, the ID of the worker who had difficulty in tracking the worker who had the similarity closest to the similarity of the worker who had difficulty in tracking. To set. As a result, the same ID can be set for the same worker.

このようなＩＤの再設定動作を具体的に説明すると、作業者の追尾が困難となった場合、認識部３２は、入力部３１を介して取得された撮像画像に基づいて、再度、作業者の再認識処理を行う。この再認識処理により、例えばＡ，Ｂ，Ｃの３人の作業者が認識されたものとする。 To specifically explain such an ID resetting operation, when it becomes difficult to track the operator, the recognition unit 32 again performs the operator based on the captured image acquired via the input unit 31. Re-recognition process is performed. It is assumed that, for example, three workers A, B, and C are recognized by this re-recognition process.

初期ＩＤ設定部３３は、再認識したＡ，Ｂ，Ｃの３人の作業者の類似度、及び、追尾困難となった際に維持した例えばＩＤ３の作業者の類似度（バタチャリア係数）を、上述の（２１）式の演算を行うことで、それぞれ算出する。そして、初期ＩＤ設定部３３は、Ａ，Ｂ，Ｃの３人の作業者のうち、追尾困難となった際に維持したＩＤ３の作業者の類似度に最も近い類似度を有する作業者に対して「ＩＤ３」を設定する。これにより、一旦、追尾困難となった作業者が、再度検出可能な状態となった際に、追尾困難となった際に設定されていたＩＤと同じＩＤを設定して、追尾を継続することができる。 The initial ID setting unit 33 determines the similarity of the three workers A, B, and C that have been re-recognized, and the similarity (batacharia coefficient) of, for example, the worker of ID3 that was maintained when tracking became difficult. Each is calculated by performing the calculation of the above equation (21). Then, the initial ID setting unit 33 refers to the worker having the closest similarity to the ID3 worker maintained when the tracking becomes difficult among the three workers A, B, and C. And set "ID3". As a result, when the worker who has become difficult to track once becomes in a detectable state again, he / she sets the same ID as the ID set when the tracking becomes difficult and continues the tracking. Can be done.

同様に、例えばＩＤ１及びＩＤ２の２人の作業者の追尾が困難となった場合、初期ＩＤ設定部３３は、Ａ，Ｂ，Ｃの３人の作業者のうち、追尾困難となった際に維持したＩＤ１の作業者の類似度に最も近い類似度を有する作業者に対して「ＩＤ１」を設定する。また、初期ＩＤ設定部３３は、Ａ，Ｂ，Ｃの３人の作業者のうち、追尾困難となった際に維持したＩＤ２の作業者の類似度に最も近い類似度を有する作業者に対して「ＩＤ２」を設定する。再認識後も、同じＩＤで各作業者を追尾可能とすることができる。 Similarly, for example, when it becomes difficult to track two workers with ID1 and ID2, the initial ID setting unit 33 will perform when tracking becomes difficult among the three workers A, B, and C. “ID1” is set for the worker having the closest similarity to the maintained ID1 worker. Further, the initial ID setting unit 33 refers to the worker having the closest similarity to the ID2 worker maintained when tracking becomes difficult among the three workers A, B, and C. And set "ID2". Even after re-recognition, each worker can be tracked with the same ID.

次に、行動認識処理部３５は、図４のフローチャートのステップＳ６において行動認識辞書入力部３８から入力される、作業者の行動を認識するための行動認識辞書に基づいて、作業者の行動認識処理を行う（ステップＳ７）。 Next, the action recognition processing unit 35 recognizes the worker's behavior based on the behavior recognition dictionary for recognizing the worker's behavior, which is input from the behavior recognition dictionary input unit 38 in step S6 of the flowchart of FIG. Perform the process (step S7).

具体的には、行動認識処理部３５は、図１３に示すように、入力部３１を介して入力される複数フレームにおける作業者の矩形領域の時空間特徴を抽出する。なお、図１３に示す各フレームの横軸ｘ、縦軸ｙは空間座標である。また、この図１３は、フレームＦ１、Ｆ２・・・が時間軸ｔに沿って時系列で並んでいる様子を示している。すなわち、各フレームは、時空間（ｘ，ｙ，ｔ）画像データである。また、時空間の一つ画素値Ｉ（ｘ，ｙ，ｔ）は、空間座標（ｘ，ｙ）と時間ｔの関数である。 Specifically, as shown in FIG. 13, the action recognition processing unit 35 extracts the spatiotemporal features of the rectangular region of the worker in a plurality of frames input via the input unit 31. The horizontal axis x and the vertical axis y of each frame shown in FIG. 13 are spatial coordinates. Further, FIG. 13 shows how the frames F1, F2, ... Are arranged in chronological order along the time axis t. That is, each frame is spatiotemporal (x, y, t) image data. Further, one pixel value I (x, y, t) in space-time is a function of spatial coordinates (x, y) and time t.

作業者が動くと、図１３に示す時空間画像データに変化点が発生する。行動認識処理部３５は、この変化点（＝時空間の特徴点）に基づいて、作業者の特定行動を認識する。行動認識処理部３５は、この変化点（時空間の特徴点）を、以下のようにして検出する。 When the worker moves, a change point occurs in the spatiotemporal image data shown in FIG. The action recognition processing unit 35 recognizes the specific action of the worker based on this change point (= feature point in space-time). The action recognition processing unit 35 detects this change point (characteristic point in space-time) as follows.

すなわち、図１４に、時空間画像データを示す。この図１４に示す大きな立方体が時空間画像データである。横軸は空間座標ｘ（画素）を示し、縦軸は空間座標ｙ（画素）を示す。時間軸ｔは、例えば毎秒３０フレーム等の所定のフレームレートで入力される撮像画像の時系列軸である。 That is, FIG. 14 shows spatiotemporal image data. The large cube shown in FIG. 14 is spatiotemporal image data. The horizontal axis represents the spatial coordinates x (pixels), and the vertical axis represents the spatial coordinates y (pixels). The time axis t is a time series axis of the captured image input at a predetermined frame rate such as 30 frames per second.

行動認識処理部３５は、図１４に示す時空間画像データを、ｘ方向がＭ画素、ｙ方向がＮ画素、ｔ方向がＴフレームのサイズ（Ｍ×Ｎ×Ｔ）のブロックに分割する。作業者が特定の動作をすると、時空間画像データ中における、作業者の動作に対応するブロックの特徴量が大きくなる（時空間に大きな変化量が発生する）。行動認識処理部３５は、以下に説明するように、変化量の大きいブロックを特徴点として抽出する。 The action recognition processing unit 35 divides the spatiotemporal image data shown in FIG. 14 into blocks having M pixels in the x direction, N pixels in the y direction, and a T frame size (M × N × T) in the t direction. When the worker performs a specific action, the feature amount of the block corresponding to the action of the worker in the spatiotemporal image data becomes large (a large amount of change occurs in the spatiotemporal space). As described below, the action recognition processing unit 35 extracts a block having a large amount of change as a feature point.

行動認識処理部３５は、時空間画像データから特徴点を抽出する場合、まず、空間方向（ｘ、ｙ）方向のノイズを除去するために、以下の（２４）式の演算を行うことで、時空間画像データに平滑化処理を施す。 When the action recognition processing unit 35 extracts the feature points from the spatiotemporal image data, the action recognition processing unit 35 first performs the calculation of the following equation (24) in order to remove the noise in the spatial direction (x, y) direction. Smooth the spatiotemporal image data.

この（２４）式に示すＩ（ｘ，ｙ，ｔ）は、時間ｔのフレームにおけるｘｙ座標の画素値を示している。また、（２４）式に示すｇ（ｘ，ｙ）は、平滑化処理のためのカーネルである。「＊」の記号は、畳み込み処理が行われることを意味する。平滑化処理は、画素を平均化することで行ってもよいし、ガウシアン（Gaussian）平滑化フィルタを用いて行ってもよい。 I (x, y, t) shown in the equation (24) indicates the pixel value of the xy coordinate in the frame at time t. Further, g (x, y) shown in Eq. (24) is a kernel for smoothing processing. The "*" symbol means that the convolution process is performed. The smoothing process may be performed by averaging the pixels, or may be performed using a Gaussian smoothing filter.

次に、行動認識処理部３５は、平滑化処理を施した時空間画像データに対して、時間軸でフィルタリング処理を施す。このフィルタリング処理としては、以下の（２５）式に示すガボール（Gabor）フィルタリング処理が行われる。 Next, the action recognition processing unit 35 performs filtering processing on the time axis with respect to the spatiotemporal image data that has undergone the smoothing processing. As this filtering process, a Gabor filtering process represented by the following equation (25) is performed.

この（２５）式に示す「ｇ_ｅｖ」及び「ｇ_ｏｄ」は、それぞれ以下の（２６）式及び（２７）式に示す、ガボールフィルタのカーネルである。「＊」の記号は、畳み込み処理が行われることを意味する。「τ」及び「ω」は、ガボールフィルタのカーネルのパラメータである。 The (25) _{"g ev"} and _{"g od"} in the expression, respectively in the following (26) and (27), a kernel Gabor filter. The "*" symbol means that the convolution process is performed. “Τ” and “ω” are parameters of the Gabor filter kernel.

次に、行動認識処理部３５は、図１３に示した時空間画像データの全画素に対して、上述の（２）式で示したフィルタリング処理を施した後、図１４に示したように分割したブロックの平均値を、以下の（２８）式の演算で算出する。 Next, the action recognition processing unit 35 performs the filtering process shown by the above equation (2) on all the pixels of the spatiotemporal image data shown in FIG. 13, and then divides the pixels as shown in FIG. The average value of the blocks is calculated by the following formula (28).

行動認識処理部３５は、（２８）式の演算で算出されたブロックの平均値（Ｍ（ｘ，ｙ，ｔ））が、以下の（２９）式に示すように所定の閾値（Thre）以上の値である場合、このブロックを特徴点として抽出する。 In the action recognition processing unit 35, the average value (M (x, y, t)) of the blocks calculated by the calculation of the formula (28) is equal to or higher than a predetermined threshold value (Thre) as shown in the following formula (29). If the value is, this block is extracted as a feature point.

次に、行動認識処理部３５は、時空間画像データから抽出した特徴点となるブロックの画素の時空間エッジ情報を、以下の（３０）式の微分演算を行うことで算出する。 Next, the action recognition processing unit 35 calculates the spatiotemporal edge information of the pixels of the block, which is the feature point extracted from the spatiotemporal image data, by performing the differential calculation of the following equation (30).

図１４に示す例の場合、１つのブロックは、Ｍ×Ｎ×Ｔ個の画素を有するため、Ｍ×Ｎ×Ｔ×３個の微分値を得ることができる。従って、各ブロックを、ＭｘＮｘＴｘ３個の微分値のベクトルで記述できる。すなわち、特徴点をＭ×Ｎ×Ｔ×３次元のベクトルで記述できる。 In the case of the example shown in FIG. 14, since one block has M × N × T pixels, it is possible to obtain M × N × T × 3 differential values. Therefore, each block can be described by a vector of three differential values of MxNxTx. That is, the feature points can be described by a vector of M × N × T × 3 dimensions.

図３に示した行動認識辞書入力部３８から行動認識処理部３５に入力される行動認識辞書は、例えば作業者が荷物を担ぐ、歩く及び荷物を棚に置く等の特定の行動を撮像したＮフレームの撮像画像から検出した特徴点に基づいて、（３０）式の演算を行うことで予め算出（学習）されたＭ×Ｎ×Ｔ×３次元のベクトル情報である。 The action recognition dictionary input from the action recognition dictionary input unit 38 to the action recognition processing unit 35 shown in FIG. 3 is an image of a specific action such as a worker carrying a baggage, walking, or placing the baggage on a shelf. It is M × N × T × 3D vector information calculated (learned) in advance by performing the calculation of the equation (30) based on the feature points detected from the captured image of the frame.

行動認識辞書を作成する場合、行動認識辞書入力部３８は、例えばＫ平均法（k-means clustering）等を用いて、Ｍ×Ｎ×Ｔ×３次元のベクトルである特徴点を、例えばＫ種類の特徴点に分類する。この分類処理を行うことで、近似する特徴を有する特徴点同士を、同じ種類の特徴点として分類することができる。 When creating an action recognition dictionary, the action recognition dictionary input unit 38 uses, for example, the K-means clustering method to generate feature points that are M × N × T × three-dimensional vectors, for example, K types. It is classified into the feature points of. By performing this classification process, feature points having similar features can be classified as the same type of feature points.

次に、行動認識辞書入力部３８は、分類処理したＫ種類の特徴点について、同じ種類の特徴点のＭ×Ｎ×Ｔ×３次元のエッジベクトルを平均化し、Ｋ個の平均ベクトルＶｋを算出する。各ベクトルは、その種類の特徴点を代表する認識ベクトルである。作業者の特定行動の撮像画像から得られた特徴点は、同じ特定行動の学習データで得られた平均ベクトルＶｋの近くに分布する。 Next, the action recognition dictionary input unit 38 averages the M × N × T × 3D edge vectors of the same type of feature points for the classified K types of feature points, and calculates K average vectors Vk. To do. Each vector is a recognition vector that represents that type of feature point. The feature points obtained from the captured image of the specific behavior of the worker are distributed near the average vector Vk obtained from the learning data of the same specific behavior.

この特性を利用し、行動認識辞書入力部３８は、Ｋ種類の各特徴点グループのブロック合計数を計算し、特徴点グループの頻度である認識ヒストグラムＨ（ｋ）を算出する。上述のように、認識対象特徴点の分布は、学習データの特徴点の分布と近似している。このため、認識対象となる例えば作業者の認識ヒストグラムは、作業者の同じ行動（動作）の学習データの学習ヒストグラムと近似する。このため、学習データから求めたヒストグラムＨ（ｋ）で、作業者等の特定行動を認識するための行動認識辞書を作成することができる。 Utilizing this characteristic, the behavior recognition dictionary input unit 38 calculates the total number of blocks of each feature point group of K type, and calculates the recognition histogram H (k) which is the frequency of the feature point group. As described above, the distribution of the feature points to be recognized is close to the distribution of the feature points of the training data. Therefore, for example, the recognition histogram of the worker to be recognized is approximated to the learning histogram of the learning data of the same behavior (behavior) of the worker. Therefore, it is possible to create an action recognition dictionary for recognizing a specific action of a worker or the like with the histogram H (k) obtained from the learning data.

一例ではあるが、行動認識辞書は、ＳＶＭ（Support Vector Machine）の機械学習方法を用いて作成できる。この機械学習方法で行動認識辞書を作成する場合、認識対象となる作業者の特定行動の撮像画像から学習した正の学習データと、特定行動とは異なる作業者の行動の撮像画像から学習した負の学習データで、行動認識辞書を作成する。 As an example, the behavior recognition dictionary can be created by using the machine learning method of SVM (Support Vector Machine). When creating a behavior recognition dictionary by this machine learning method, positive learning data learned from the captured image of the specific behavior of the worker to be recognized and negative learning from the captured image of the behavior of the worker different from the specific behavior. Create a behavior recognition dictionary with the learning data of.

なお、行動認識辞書は、ＳＶＭ機械学習方法以外でも、例えばＫ近傍法（K Nearest Neighbor）又は多層パーセプトロン（Multilayer Perceptron）等の他の機械学習方法を用いて作成してもよい。 The behavior recognition dictionary may be created by using other machine learning methods such as K Nearest Neighbor or Multilayer Perceptron, in addition to the SVM machine learning method.

以上説明した行動認識処理部３５の行動認識動作をまとめると、行動認識処理部３５は、認識対象となる作業者の撮像画像（動画）として入力されたＮフレームの時空間画像データから上述の時空間特徴点を抽出する。行動認識処理部３５は、各特徴点ブロックのＭ×Ｎ×Ｔ×３次元の微分ベクトルを求める。この微分ベクトル及び入力された学習データから求めたＫ個の学習平均ベクトルＶｋとの距離を計算し、特徴点ブロックの種類を、最も距離が近い学習平均ベクトルＶｋの種類に分類する。この方法で特徴点ブロックを分類することで、特徴点ブロックをＫ種類に分類できる。行動認識処理部３５は、各種類の特徴点ブロックの出現頻度に基づいて、認識対象の撮像画像（動画）の特徴点ヒストグラムＴ（ｋ）を作成する。 Summarizing the behavior recognition operations of the behavior recognition processing unit 35 described above, the behavior recognition processing unit 35 is based on the above-mentioned time from the N-frame spatiotemporal image data input as the captured image (moving image) of the worker to be recognized. Extract spatial feature points. The action recognition processing unit 35 obtains an M × N × T × three-dimensional differential vector of each feature point block. The distances from the K learning average vectors Vk obtained from the differential vector and the input learning data are calculated, and the types of the feature point blocks are classified into the types of the learning average vectors Vk having the closest distance. By classifying the feature point blocks by this method, the feature point blocks can be classified into K types. The action recognition processing unit 35 creates a feature point histogram T (k) of the captured image (moving image) to be recognized based on the appearance frequency of each type of feature point block.

そして、行動認識処理部３５は、行動認識辞書入力部３８から入力された行動認識辞書、及び、認識対象の撮像画像の特徴点ヒストグラムＴ（ｋ）に基づき、上述のＳＶＭ機械学習法を用いて、作業者の特定行動の認識処理を行う。ＳＶＭ機械学習法を用いたＳＶＭ認識処理では、作業者の特定行動と特定行動以外の認識結果を出力する。 Then, the behavior recognition processing unit 35 uses the above-mentioned SVM machine learning method based on the behavior recognition dictionary input from the behavior recognition dictionary input unit 38 and the feature point histogram T (k) of the captured image to be recognized. , Performs recognition processing of the specific behavior of the worker. In the SVM recognition process using the SVM machine learning method, the worker's specific action and the recognition result other than the specific action are output.

図３に示す認識結果出力部３６は、作業者の特定行動と特定行動以外の認識結果を、例えば出力インタフェース部１７を介してモニタ装置２３に出力する（ステップＳ８）。これにより、モニタ装置２３を介して作業者等の監視対象の監視を可能とすることができる。図４のフローチャートのステップＳ９では、ＣＰＵ１１が、このような認識処理が終了したか否かを判別する。認識処理が終了していないと判別された場合（ステップＳ９：Ｎｏ）、処理がステップＳ１に戻り、上述の作業者の特定行動の認識処理が繰り返し行われる。認識処理が終了したと判別された場合（ステップＳ９：Ｙｅｓ）、図４のフローチャートの全処理が終了する。 The recognition result output unit 36 shown in FIG. 3 outputs the recognition result other than the specific action of the worker and the specific action to the monitoring device 23 via, for example, the output interface unit 17 (step S8). As a result, it is possible to monitor the monitoring target of the worker or the like via the monitoring device 23. In step S9 of the flowchart of FIG. 4, the CPU 11 determines whether or not such recognition processing is completed. When it is determined that the recognition process has not been completed (step S9: No), the process returns to step S1, and the above-mentioned recognition process for the specific action of the worker is repeatedly performed. When it is determined that the recognition process is completed (step S9: Yes), all the processes of the flowchart of FIG. 4 are completed.

なお、認識結果出力部３６は、認識結果を通信部１４及びネットワーク２１を介してサーバ装置２２に送信してもよい。この場合、管理者等は、スマートホン、タブレット端末装置又はパーソナルコンピュータ装置等の通信機器を介してサーバ装置２２にアクセスし、認識結果を取得する。これにより、作業者等の監視対象の遠隔監視を可能とすることができる。 The recognition result output unit 36 may transmit the recognition result to the server device 22 via the communication unit 14 and the network 21. In this case, the administrator or the like accesses the server device 22 via a communication device such as a smart phone, a tablet terminal device, or a personal computer device, and acquires the recognition result. This makes it possible to remotely monitor the monitoring target of the worker or the like.

次に、作業者が歩行して棚の位置まで移動し、商品を抱えて棚入れする行動を、認識対象例として用いて、認識結果出力部３６による認識結果の出力形態を説明する。この場合、認識結果出力部３６は、図１５に示すように、行動認識処理部３５で認識された作業者の行動の認識結果に基づいて、行動の開始時間及び持続時間を出力する。歩行終了時間ｔ２と歩行開始時間ｔ１の差（ｔ２−ｔ１）は、歩行時間である。 Next, the recognition result output form of the recognition result output unit 36 will be described by using the behavior of the worker walking, moving to the position of the shelf, holding the product and putting it on the shelf as an example of the recognition target. In this case, as shown in FIG. 15, the recognition result output unit 36 outputs the start time and duration of the action based on the recognition result of the worker's action recognized by the action recognition processing unit 35. The difference (t2-t1) between the walking end time t2 and the walking start time t1 is the walking time.

また、棚入れ行動の終了時間ｔ３と棚入れ行動の開始時間ｔ２の差（ｔ３−ｔ２）は棚入れ行動の時間となる。棚入れの作業時間は、作業者の歩行時間と棚入れの作業時間との合計時間となり、棚入れ終了時間ｔ３と歩行開始時間ｔ１の差（ｔ３−ｔ１）の時間である。認識結果出力部３６は、各作業者の歩行時間、棚入れ行動時間、及び、棚入れ作業全体の時間を出力する。 Further, the difference (t3-t2) between the end time t3 of the shelving action and the start time t2 of the shelving action is the time of the shelving action. The shelving work time is the total time of the worker's walking time and the shelving work time, and is the time of the difference (t3-t1) between the shelving end time t3 and the walking start time t1. The recognition result output unit 36 outputs the walking time, the shelving action time, and the total time of the shelving work of each worker.

次に、例えば作業者が他の作業者と重なり合い、また、作業者の姿勢が変化することで、作業者が認識困難となると、行動認識処理部３５による作業者の認識処理が中断される。図１６は、作業者が商品の棚入れ作業を行う際に、歩行途中で、作業者を認識することが困難となった例を示している。図１６に示す時間ｔ３と時間ｔ２との間が、歩行中の作業者の認識が困難となった時間を示している。 Next, for example, when the worker overlaps with another worker and the posture of the worker changes, which makes it difficult for the worker to recognize, the behavior recognition processing unit 35 interrupts the recognition process of the worker. FIG. 16 shows an example in which it becomes difficult for an operator to recognize an operator while walking while shelving products. The time between the time t3 and the time t2 shown in FIG. 16 indicates the time when the worker who is walking becomes difficult to recognize.

このような場合、行動認識処理部３５は、歩行中の作業者の認識が困難となった時間ｔ３−時間ｔ２の時間差は、所定の閾値Ｔｈｒｅ＿ｗ以下であるか否かを判別する。歩行中の作業者が他の作業者と重なり合うことで、例えば２秒間又は５秒間等のように、一時的に認識困難となることは多々ある。このため、行動認識処理部３５は、例えば閾値Ｔｈｒｅ＿ｗを、例えば２秒間又は５秒間等とし、歩行中の作業者の認識が困難となった時間ｔ３−時間ｔ２の時間差が、この２秒以下又は５秒以下であった場合に、時間ｔ３−時間ｔ２の間は、作業者が歩行状態であったものと認識する。 In such a case, the behavior recognition processing unit 35 determines whether or not the time difference between the time t3-time t2 at which the walking worker becomes difficult to recognize is equal to or less than the predetermined threshold value Thr_w. When a walking worker overlaps with another worker, it often becomes difficult to recognize temporarily, for example, for 2 seconds or 5 seconds. Therefore, the action recognition processing unit 35 sets the threshold value Thr_w to, for example, 2 seconds or 5 seconds, and the time difference of the time t3-time t2 at which the walking worker becomes difficult to recognize is 2 seconds or less or When it is 5 seconds or less, it is recognized that the worker is in a walking state during the time t3-time t2.

すなわち、行動認識処理部３５は、作業者の認識が困難となった時間が所定時間以下であれば、その間、認識が困難となる前に認識されていた行動（動作）が継続して行われていたものと認識する。これにより、図１６に示す歩行開始時間ｔ１から棚入れ開始時間ｔ４までの間は、途中、作業者が認識困難となった場合でも、作業者は連続して歩行状態にあったものと認識される。 That is, if the time when the worker's recognition becomes difficult is less than a predetermined time, the action recognition processing unit 35 continuously performs the action (action) recognized before the recognition becomes difficult. Recognize that it was. As a result, during the period from the walking start time t1 to the shelving start time t4 shown in FIG. 16, even if the worker becomes difficult to recognize on the way, it is recognized that the worker is continuously in the walking state. To.

図１７は、棚入れの途中に作業者が一時的に認識困難となった例である。図１７に示す時間ｔ３〜時間ｔ４が、棚入れの途中に作業者が一時的に認識困難となっていた時間である。この場合も上述と同様に、行動認識処理部３５は、作業者が認識困難となっていた時間が例えば２秒間又は５秒間等の閾値Ｔｈｒｅ＿ｗとなる時間以下であれば、作業者が認識困難となっていた時間も、作業者は継続して棚入れ作業を行っていたものと認識する。これにより、図１７に示す棚入れ開始時間ｔ２から棚入れ終了時間ｔ５までの間は、途中、作業者が認識困難となった場合でも、作業者は連続して棚入れ作業中であったものと認識される。 FIG. 17 shows an example in which the worker temporarily becomes difficult to recognize during shelving. Times t3 to time t4 shown in FIG. 17 are times during which the worker was temporarily difficult to recognize during shelving. In this case as well, as described above, if the time during which the worker has difficulty in recognizing is less than or equal to the time at which the threshold value Thr_w is reached, for example, 2 seconds or 5 seconds, the worker is difficult to recognize. It is recognized that the worker continued to carry out the shelving work even during the time that had been reached. As a result, during the period from the shelving start time t2 to the shelving end time t5 shown in FIG. 17, even if the worker becomes difficult to recognize on the way, the worker is continuously carrying out the shelving work. Is recognized.

このように、作業者の認識が困難となった時間が所定時間以下であれば、その間、認識が困難となる前に認識されていた行動（動作）が継続して行われていたものと認識することで、途中、作業者が認識困難となった場合でも、正しい作業時間の測定等を可能とすることができる。 In this way, if the time when the worker's recognition becomes difficult is less than a predetermined time, it is recognized that the action (movement) that was recognized before the recognition became difficult was continuously performed during that time. By doing so, even if the worker becomes difficult to recognize on the way, it is possible to measure the correct working time and the like.

認識結果出力部３６は、作業者毎に、行動の認識結果となる作業開始時間、作業終了時間及び作業時間（一連の作業（動作）の開始から終了までの時間＝所要時間）等を、各作業者の行動認識結果として出力する。 The recognition result output unit 36 sets, for each worker, the work start time, the work end time, the work time (time from the start to the end of a series of work (operations) = required time), etc., which are the recognition results of the action. Output as the worker's behavior recognition result.

（実施の形態の効果）
以上の説明から明らかなように、実施の形態の行動認識システムは、複数の作業者等の監視対象の撮像画像に基づいて各監視対象を認識して、それぞれＩＤを設定する。設定したＩＤに基づいて各監視対象を追尾するが、監視対象が認識困難となった場合、認識困難となった監視対象のＩＤ及び関連情報（位置情報及び画像データ）を維持する。この状態で、上述の（２１）式の演算を行い、認識困難となった監視対象の類似度、及び、再認識した各監視対象の類似度を算出する。そして、再認識した各監視対象のうち、認識困難となった監視対象の類似度に最も近似する類似度を有する監視対象に対して、認識困難となった監視対象のＩＤを設定する。 (Effect of embodiment)
As is clear from the above description, the behavior recognition system of the embodiment recognizes each monitoring target based on the captured images of the monitoring targets of a plurality of workers and the like, and sets an ID for each. Each monitoring target is tracked based on the set ID, but when the monitoring target becomes difficult to recognize, the ID and related information (position information and image data) of the monitoring target that has become difficult to recognize are maintained. In this state, the above-mentioned calculation of the equation (21) is performed to calculate the similarity of the monitored objects that have become difficult to recognize and the similarity of each monitored object that has been re-recognized. Then, among the re-recognized monitoring targets, the ID of the monitoring target that has become difficult to recognize is set for the monitoring target that has the similarity that most closely resembles the similarity of the monitoring target that has become difficult to recognize.

これにより、複数の監視対象を同時に認識し、一部又は全部の監視対象が一時的に認識困難となった場合でも、監視対象の再認識を行った際に、認識困難となる前及び後で、同じＩＤを監視対象に設定して追尾可能とすることができる。このため、複数の監視対象を精度よく監視可能とすることができる。 As a result, even if multiple monitoring targets are recognized at the same time and some or all of the monitoring targets are temporarily difficult to recognize, when the monitoring targets are re-recognized, before and after the recognition becomes difficult. , The same ID can be set as a monitoring target so that it can be tracked. Therefore, it is possible to accurately monitor a plurality of monitoring targets.

また、実施の形態の行動認識システムは、複数フレームの撮像画像から監視対象領域の時空間特徴点を抽出し、抽出した時空間特徴点に基づいて、各監視対象の行動（動作）に対応する特徴量を検出する。そして、この特徴量に基づいて、各監視対象の行動（動作）を認識し、各監視対象の例えば行動開始時間、行動終了時間、及び、所要時間等を認証結果として出力する。これにより、複数の監視対象の行動をそれぞれ可視化することができる。 Further, the action recognition system of the embodiment extracts spatiotemporal feature points of the monitored area from the captured images of a plurality of frames, and responds to the actions (movements) of each monitored target based on the extracted spatiotemporal feature points. Detect feature quantity. Then, based on this feature amount, the action (behavior) of each monitoring target is recognized, and for example, the action start time, the action end time, the required time, etc. of each monitoring target are output as the authentication result. As a result, the behaviors of a plurality of monitored objects can be visualized respectively.

最後に、上述の実施の形態は、一例として提示したものであり、本発明の範囲を限定することは意図していない。この新規な各実施の形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことも可能である。 Finally, the embodiments described above are presented as an example and are not intended to limit the scope of the invention. Each of the novel embodiments can be implemented in various other embodiments, and various omissions, replacements, and changes can be made without departing from the gist of the invention.

例えば、上述の実施の形態の説明では、監視対象は作業者であることとして説明したが、これは、動物、道路上の通行人、特定の場所に集まった人、又は、ロボット等の他の監視対象でもよい。この場合も、上述と同じ効果を得ることができる。 For example, in the description of the above-described embodiment, the monitoring target is described as a worker, but this may be an animal, a passerby on a road, a person gathered at a specific place, or another robot or the like. It may be a monitoring target. In this case as well, the same effect as described above can be obtained.

また、実施の形態及び実施の形態の変形は、発明の範囲や要旨に含まれると共に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 Further, the embodiment and the modification of the embodiment are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalent scope thereof.

１行動認識装置
２カメラ装置
３インタフェース部
１１ＣＰＵ
１５ＨＤＤ
３１入力部
３２認識部
３３初期ＩＤ設定部
３４追尾処理部
３５行動認識処理部
３６認識結果出力部
３７監視対象認識辞書入力部
３８行動認識辞書入力部 1 Action recognition device 2 Camera device 3 Interface unit 11 CPU
15 HDD
31 Input unit 32 Recognition unit 33 Initial ID setting unit 34 Tracking processing unit 35 Action recognition processing unit 36 Recognition result output unit 37 Monitoring target recognition dictionary input unit 38 Behavior recognition dictionary input unit

特開２０１１−１００１７５号公報Japanese Unexamined Patent Publication No. 2011-100175

Claims

A recognition unit that recognizes multiple monitoring targets based on the captured image,
An identification number setting unit that sets an identification number for each of the monitoring targets recognized by the recognition unit,
Based on the region of each monitoring target recognized in the captured image, a tracking processing unit that tracks each monitoring target recognized in the captured image and for which an identification number is set, and a tracking processing unit.
An operation recognition processing unit that recognizes the operation of each of the monitored objects in the result area of the captured image that tracks each of the monitored objects.
An operation recognition device having a recognition result output unit that outputs a recognition result of each recognized operation of the monitored object.

The motion recognition device according to claim 1, wherein the recognition result output unit outputs the start time of the operation of each monitoring target and the time from the start to the end of a series of operations as the recognition result.

Claim 1 or claim, wherein the tracking processing unit maintains the identification number, position information, and image information of each monitoring target, and updates the position information and image information each time the tracking is successful. Item 2. The motion recognition device according to item 2.

The tracking processing unit maintains the identification number, position information, and image information of the monitoring target whose tracking has become difficult.
The identification number setting unit calculates the similarity of the monitored object for which tracking has become difficult, and also calculates the similarity of each monitored object re-recognized by the recognition unit due to the difficulty in tracking. Of the similarities of each monitored object re-recognized by the recognition unit, it became difficult to track a monitored object having a similarity closest to the similarity of the monitored object for which tracking became difficult. The motion recognition device according to claim 3, wherein the same identification number as the identification number set for the monitoring target is set.

From claim 1, the motion recognition processing unit extracts spatiotemporal feature points from the captured images of a plurality of frames, and recognizes the motion of each monitoring target based on the extracted spatiotemporal feature points. The motion recognition device according to any one of claims 4.

When the time during which the tracking target is difficult to track by the tracking processing unit during a series of operations of the monitoring target is less than or equal to a predetermined time, the recognition result output unit continues the series even during the time during which the tracking becomes difficult. 1 to any one of claims 1 to 5, characterized in that the time information corresponding to the operation of the monitored object is output as the recognition result by recognizing that the operation of the above has been performed. The described motion recognition device.

A recognition step in which the recognition unit recognizes a plurality of monitoring targets based on the captured image,
An identification number setting step in which the identification number setting unit sets an identification number for each of the recognized monitoring targets, and
A tracking processing step in which the tracking processing unit tracks each of the monitoring targets recognized in the captured image and having an identification number set based on the region of each monitoring target recognized in the captured image.
A motion recognition processing step in which the motion recognition processing unit recognizes the motion of each monitoring target in the result area of the captured image that tracks each monitoring target.
An operation recognition method in which a recognition result output unit has a recognition result output step for outputting a recognition result of each recognized operation of the monitored object.

Computer,
A recognition unit that recognizes multiple monitoring targets based on the captured image,
An identification number setting unit that sets an identification number for each of the monitoring targets recognized by the recognition unit,
Based on the region of each monitoring target recognized in the captured image, a tracking processing unit that tracks each monitoring target recognized in the captured image and for which an identification number is set, and a tracking processing unit.
An operation recognition processing unit that recognizes the operation of each of the monitored objects in the result area of the captured image that tracks each of the monitored objects.
An operation recognition program characterized in that it functions as a recognition result output unit that outputs the recognition result of each recognized operation of the monitored object.