JP2020021323A

JP2020021323A - Determination device and determination method

Info

Publication number: JP2020021323A
Application number: JP2018145315A
Authority: JP
Inventors: 大塚　誠; Makoto Otsuka; 誠大塚; 弘司椎崎; Koji Shiizaki; 門地　正史; Masashi Monchi; 正史門地; 拓也明石; Takuya Akashi; 昭伍南; Shogo Minami; 中島　慎; Shin Nakajima; 慎中島; 貴大萩野; Takahiro Hagino; 佑輔竹内; Yusuke Takeuchi; 健次中山
Original assignee: Mitsubishi Logisnext Co Ltd
Current assignee: Mitsubishi Logisnext Co Ltd
Priority date: 2018-08-01
Filing date: 2018-08-01
Publication date: 2020-02-06
Anticipated expiration: 2038-08-01
Also published as: JP6613343B1

Abstract

To provide a determination device and a determination method for detecting a scene in which an inappropriate operation has been performed.SOLUTION: A determination device (1) includes: a scene detector (103) that detects a predetermined operation scene from a moving image acquired by photographing a state of work of a forklift (3); and an operation propriety determination unit (104) that determines, based on a result acquired by inputting the scene into a learned model generated by machine learning using moving images acquired by photographing scenes where the predetermined operation is appropriately performed as teacher data, whether the predetermined operation is appropriate.SELECTED DRAWING: Figure 1

Description

本発明は、搬送対象物を搬送する搬送車両に対して作業者が行う操作の適否を判定する判定装置等に関する。 The present invention relates to a determination device and the like for determining whether or not an operation performed by a worker on a transport vehicle that transports a transport target is appropriate.

フォークリフトなどの搬送車両を操作して行う作業においては、作業の適否が作業者の操作に依存するため、作業者の操作が適切に行われなかったことを検出する技術が求められており、そのような技術の開発も進められている。例えば、下記特許文献１には、作業者が誤った載置部からピッキングを行ったことを検出する技術が記載されている。 In the work performed by operating a transport vehicle such as a forklift, the appropriateness of the work depends on the operation of the worker, and therefore, there is a need for a technology for detecting that the operation of the worker was not performed properly. The development of such technology is also underway. For example, Patent Literature 1 below describes a technique for detecting that an operator has performed picking from an incorrect mounting portion.

特開２０１１−７３８７６号公報JP 2011-73876 A

上述の従来技術はピッキング対象の取り違えを防ぐ技術であり、従来、作業の様子を撮影した動画像から不適切な操作が行われたシーンを検出する技術は存在しなかった。しかし、近時、作業現場における人的負担を軽減しつつ、新人などの未熟作業者の作業のチェックや、指導、育成等を行いたいというニーズが高まっている。そのため、作業の様子を撮影した動画像から不適切な操作が行われたシーンを検出する技術が求められている。 The above-described prior art is a technique for preventing a picking target from being mixed up, and conventionally, there is no technique for detecting a scene in which an inappropriate operation has been performed from a moving image obtained by photographing a work state. However, recently, there is an increasing need to check the work, guidance, and training of immature workers such as new workers while reducing the human burden on the work site. Therefore, there is a need for a technique for detecting a scene in which an inappropriate operation has been performed from a moving image obtained by photographing the state of work.

本発明の一態様は、作業の様子を撮影した動画像から不適切な操作が行われたシーンを検出することができる判定装置等を実現することを目的とする。 An object of one embodiment of the present invention is to realize a determination device or the like that can detect a scene in which an inappropriate operation has been performed from a moving image obtained by capturing a state of work.

上記の課題を解決するために、本発明の一態様に係る判定装置は、搬送対象物を搬送する搬送車両を操作して行う作業の様子を撮影した動画像から上記搬送車両に対して所定の操作が行われているシーンを検出するシーン検出部と、上記所定の操作が適切に行われたシーンを撮影した動画像を教師データとした機械学習により生成された学習済みモデルに対して、上記シーン検出部が検出した上記シーンを入力して得られる結果に基づいて、当該シーンにおいて行われた上記所定の操作が適切か否かを判定する操作適否判定部と、を備えている。 In order to solve the above-described problem, a determination device according to one embodiment of the present invention is configured to perform a predetermined operation on the transport vehicle from a moving image obtained by photographing an operation performed by operating the transport vehicle that transports the transport target. A scene detection unit that detects a scene in which an operation is being performed, and a learned model generated by machine learning using a moving image of a scene in which the predetermined operation is appropriately performed as teacher data. An operation suitability determining unit that determines whether the predetermined operation performed in the scene is appropriate based on a result obtained by inputting the scene detected by the scene detecting unit.

また、上記の課題を解決するために、本発明の一態様に係る判定方法は、判定装置による判定方法であって、搬送対象物を搬送する搬送車両を操作して行う作業の様子を撮影した動画像から上記搬送車両に対して所定の操作が行われているシーンを検出するシーン検出ステップと、上記所定の操作が適切に行われたシーンを撮影した動画像を教師データとした機械学習により生成された学習済みモデルに対して、上記シーン検出ステップで検出した上記シーンを入力して得られる結果に基づいて、当該シーンにおいて行われた上記所定の操作が適切か否かを判定する操作適否判定ステップと、を含む。 In addition, in order to solve the above problem, a determination method according to one embodiment of the present invention is a determination method using a determination device, in which a state of a work performed by operating a transport vehicle that transports a transport target object is captured. A scene detection step of detecting a scene in which a predetermined operation is performed on the transport vehicle from a moving image, and machine learning using a moving image obtained by capturing a scene in which the predetermined operation is appropriately performed as teacher data. Based on the result obtained by inputting the scene detected in the scene detecting step with respect to the generated learned model, determining whether the predetermined operation performed in the scene is appropriate or not Determining step.

本発明の一態様によれば、作業の様子を撮影した動画像から不適切な操作が行われたシーンを検出することができる。 According to one embodiment of the present invention, a scene in which an inappropriate operation has been performed can be detected from a moving image obtained by capturing a state of a work.

本発明の実施形態１に係る判定装置の要部構成の一例を示すブロック図である。It is a block diagram showing an example of the important section composition of the judging device concerning Embodiment 1 of the present invention. 上記判定装置を含む判定システムの概要を示す図である。It is a figure showing the outline of the judgment system containing the above-mentioned judging device. フォークリフトを作業者が操作して前進させている様子を示す図である。It is a figure which shows a mode that a forklift is operating and forcibly moving forward. ドライブレコーダのカメラにより作業者およびフォークリフトを撮影して得た画像の例を示す図である。FIG. 3 is a diagram illustrating an example of an image obtained by photographing an operator and a forklift using a camera of a drive recorder. 作業者の目線で撮影された画像の例を示す図である。It is a figure showing an example of the picture picturized by the worker's eyes. 上記判定装置が実行する処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process which the said determination apparatus performs. 本発明の実施形態２に係る判定システムの概要を示す図である。It is a figure showing an outline of a judgment system concerning Embodiment 2 of the present invention.

〔実施形態１〕
（システム概要）
本実施形態の判定システムの概要を図２に基づいて説明する。図２は、判定システム１００の概要を示す図である。判定システム１００は、搬送対象物を搬送する搬送車両を作業者が操作して行う作業において、搬送車両に対する操作が適切に行われたか否かを判定するシステムである。 [Embodiment 1]
(System overview)
An outline of the determination system of the present embodiment will be described with reference to FIG. FIG. 2 is a diagram illustrating an outline of the determination system 100. The determination system 100 is a system that determines whether or not the operation on the transport vehicle has been appropriately performed in the operation performed by the operator by operating the transport vehicle that transports the transport target.

図２に示す判定システム１００は、上記の判定を行う判定装置１と、作業の様子を撮影する撮影装置２と、搬送車両の一例であるフォークリフト３とを含む。なお、図２では、搬送車両の例としてフォークリフト３を記載しているが、判定システム１００に含まれる搬送車両は、作業者の操作に従って搬送対象物を搬送する車両であればよく、フォークリフト３に限られない。 The determination system 100 illustrated in FIG. 2 includes a determination device 1 that performs the above determination, a photographing device 2 that captures an image of a work, and a forklift 3 that is an example of a transport vehicle. In FIG. 2, the forklift 3 is described as an example of the transport vehicle. However, the transport vehicle included in the determination system 100 may be any vehicle that transports the transport target according to the operation of the operator. Not limited.

図２の例では、作業者Ａがフォークリフト３を操作しており、その様子を撮影装置２が動画像で撮影している。撮影装置２は、フォークリフト３による一連の作業を定点で撮影できるような位置に配置しておけばよい。撮影装置２が撮影した動画像は判定装置１に送信され、判定装置１はこの動画像から、所定の操作が行われているシーンを検出する。そして、判定装置１は、検出した上記シーンにおいて、フォークリフト３に対する上記所定の操作が適切に行われたか否かを判定する。 In the example of FIG. 2, the worker A is operating the forklift 3, and the image capturing device 2 captures the state of the forklift 3 as a moving image. The photographing device 2 may be arranged at a position where a series of operations by the forklift 3 can be photographed at a fixed point. The moving image captured by the image capturing device 2 is transmitted to the determining device 1, and the determining device 1 detects a scene in which a predetermined operation is being performed from the moving image. Then, the determination device 1 determines whether or not the predetermined operation on the forklift 3 has been appropriately performed in the detected scene.

そして、判定装置１は、操作が適切に行われなかったと判定した場合には、作業者Ａにその旨を通知する。図２の例では、フォークリフト３を介して通知を行っている。つまり、フォークリフト３は、判定装置１と通信する機能と、作業者Ａに情報を通知する機能とを備えている。情報の通知態様は表示による通知であってもよいし、音声による通知であってもよいし、それらの併用であってもよい。 When the determination device 1 determines that the operation has not been properly performed, the determination device 1 notifies the worker A to that effect. In the example of FIG. 2, the notification is made via the forklift 3. That is, the forklift 3 has a function of communicating with the determination device 1 and a function of notifying the worker A of information. The notification mode of the information may be a notification by display, a notification by voice, or a combination thereof.

このように、判定システム１００によれば、判定装置１が不適切な操作が行われたシーンを検出することができる。そして、作業者Ａに通知することによって、操作が適切に行われなかったシーンを作業者Ａに認識させることができるので、作業者Ａに当該シーンにおける操作を改善させることができる。 Thus, according to the determination system 100, the determination device 1 can detect a scene in which an inappropriate operation has been performed. By notifying the worker A, the scene in which the operation has not been properly performed can be recognized by the worker A, so that the worker A can improve the operation in the scene.

（判定装置の要部構成）
判定装置１の要部構成について図１に基づいて説明する。図１は、判定装置１の要部構成の一例を示すブロック図である。図示のように、判定装置１は、判定装置１の各部を統括して制御する制御部１０と、判定装置１が使用する各種データを記憶する記憶部２０を備えている。さらに、判定装置１は、判定装置１に対する入力操作を受け付ける入力部３０と、判定装置１が情報を出力するための出力部４０と、判定装置１が他の装置と通信するための通信部５０とを備えている。 (Main configuration of the judgment device)
The configuration of the main part of the determination device 1 will be described with reference to FIG. FIG. 1 is a block diagram illustrating an example of a main configuration of the determination device 1. As illustrated, the determination device 1 includes a control unit 10 that controls each unit of the determination device 1 in an integrated manner, and a storage unit 20 that stores various data used by the determination device 1. Further, the determination device 1 includes an input unit 30 for receiving an input operation on the determination device 1, an output unit 40 for the determination device 1 to output information, and a communication unit 50 for the determination device 1 to communicate with another device. And

また、制御部１０には、動画像取得部１０１、学習済みモデル１０２、シーン検出部１０３、操作適否判定部１０４、および通知部１０５が含まれている。そして、記憶部２０には、動画像２０１が記憶されている。動画像２０１は、図２に基づいて説明したように、撮影装置２によりフォークリフト３の作業の様子を撮影することにより生成されたものである。判定装置１は、有線または無線の通信により撮影装置２から取得した動画像を動画像２０１として記憶部２０に記憶しておく。 Further, the control unit 10 includes a moving image acquisition unit 101, a learned model 102, a scene detection unit 103, an operation suitability determination unit 104, and a notification unit 105. The moving image 201 is stored in the storage unit 20. As described with reference to FIG. 2, the moving image 201 is generated by photographing the operation of the forklift 3 by the photographing device 2. The determination device 1 stores the moving image acquired from the photographing device 2 by wired or wireless communication in the storage unit 20 as the moving image 201.

動画像取得部１０１は、動画像を取得して学習済みモデル１０２に対する入力データを生成する。本実施形態の動画像取得部１０１は、記憶部２０に記憶されている動画像２０１を取得する。入力データに用いる動画像は、学習済みモデル１０２の機械学習用の教師データに用いた動画像と同様の撮影条件で、フォークリフト３とその作業者Ａを撮影したものであることが好ましい。上記撮影条件には、例えば、撮影装置２と、フォークリフト３および作業者Ａとの位置関係、フォークリフト３と作業者Ａの背景、および周囲の明るさ等が含まれる。撮影条件を揃えるため、例えば一連の作業が行われる様子が画角に入る位置に固定された撮影装置２によって撮影した動画像から教師データを生成し、同じ撮影装置２によって撮影した動画像を入力データとすることが好ましい。なお、入力データの生成については後述する。 The moving image acquisition unit 101 acquires a moving image and generates input data for the learned model 102. The moving image acquisition unit 101 of the present embodiment acquires the moving image 201 stored in the storage unit 20. The moving image used for the input data is preferably an image of the forklift 3 and the worker A taken under the same shooting conditions as the moving image used for the teacher data for machine learning of the learned model 102. The photographing conditions include, for example, the positional relationship between the photographing device 2, the forklift 3 and the worker A, the background of the forklift 3 and the worker A, the surrounding brightness, and the like. In order to align shooting conditions, for example, teacher data is generated from a moving image shot by the shooting device 2 fixed at a position where a series of operations is performed at an angle of view, and a moving image shot by the same shooting device 2 is input. It is preferably data. The generation of the input data will be described later.

学習済みモデル１０２は、所定の操作が適切に行われたシーンを撮影した動画像を教師データとした機械学習により生成された学習済みモデルである。動画像取得部１０１が生成した入力データを学習済みモデル１０２に入力すると、学習済みモデル１０２は、入力された動画像を複数のシーンに分類すると共に、該分類の確度を示す情報（本実施形態では確率値）を出力する。学習済みモデル１０２としては、例えば、ＣＮＮ（Convolutional Neural Network）とＲＮＮ（Recurrent Neural Network）、あるいはＣＮＮとＬＳＴＭ（Long Short-Term Memory）を組み合わせた学習済みモデルを用いることが好ましい。ＲＮＮやＬＳＴＭのような時系列データの扱いに適したモデルと、画像認識性能の高いＣＮＮとを組み合わせることにより、動画像の分類を高精度に行うことができる。 The learned model 102 is a learned model generated by machine learning using a moving image of a scene in which a predetermined operation is appropriately performed as teacher data. When the input data generated by the moving image acquisition unit 101 is input to the learned model 102, the learned model 102 classifies the input moving image into a plurality of scenes and information indicating the accuracy of the classification (this embodiment Output the probability value). As the learned model 102, for example, it is preferable to use a learned model in which a CNN (Convolutional Neural Network) and an RNN (Recurrent Neural Network) or a combination of a CNN and an LSTM (Long Short-Term Memory). By combining a model suitable for handling time-series data such as RNN or LSTM with a CNN having high image recognition performance, a moving image can be classified with high accuracy.

シーン検出部１０３は、フォークリフト３を操作して行う作業の様子を撮影した動画像からフォークリフト３に対して所定の操作が行われているシーンを検出する。詳細は後述するが、シーン検出部１０３は、学習済みモデル１０２が出力する確率値に基づいて、学習済みモデル１０２に入力された入力データに係るシーンが、機械学習済みのシーンの何れに該当するかを検出する。 The scene detection unit 103 detects a scene in which a predetermined operation is being performed on the forklift 3 from a moving image obtained by capturing a state of a work performed by operating the forklift 3. Although details will be described later, the scene detection unit 103 determines that the scene related to the input data input to the learned model 102 corresponds to any of the machine-learned scenes based on the probability value output from the learned model 102. Or to detect.

操作適否判定部１０４は、学習済みモデル１０２に対して、シーン検出部１０３が検出したシーンを入力して得られる結果に基づいて、当該シーンにおいて行われた作業者Ａの所定の操作が適切か否かを判定する。具体的には、操作適否判定部１０４は、学習済みモデル１０２の出力データから、作業者Ａの操作手順の適否と、操作内容の適否とを判定する。なお、「操作内容」には、フォークリフト３が受け付けた操作内容（前進、後退、リフトの昇降等）のみならず、操作における作業者Ａの任意の所作および姿勢が含まれる。また、本実施形態の操作適否判定部１０４は、操作手順の適否と、操作内容の適否とを判定するが、これらの判定をそれぞれ別のブロックで行うようにしてもよい。また、操作適否判定部１０４は、操作手順の適否と、操作内容の適否の何れか一方のみを判定してもよい。 Based on the result obtained by inputting the scene detected by the scene detection unit 103 to the learned model 102, the operation suitability determination unit 104 determines whether the predetermined operation performed by the worker A in the scene is appropriate. Determine whether or not. Specifically, the operation suitability determination unit 104 determines from the output data of the learned model 102 whether the operation procedure of the worker A is appropriate and whether the operation content is appropriate. The “operation content” includes not only the operation content received by the forklift 3 (forward, backward, lift up / down, etc.), but also an arbitrary action and posture of the worker A in the operation. Further, the operation suitability determination unit 104 of the present embodiment determines suitability of the operation procedure and suitability of the operation content, but these determinations may be performed in different blocks. In addition, the operation suitability determination unit 104 may determine only one of the suitability of the operation procedure and the suitability of the operation content.

通知部１０５は、作業者Ａの操作が適切に行われなかったことを通知する。具体的には、通知部１０５は、通信部５０を介してフォークリフト３と通信し、作業者Ａの操作が適切に行われなかったことをフォークリフト３に通知させる。なお、通知先はフォークリフト３に限られない。例えば、通知部１０５は、作業者Ａの所持する端末装置に通知してもよいし、作業者Ａの作業を管理する管理者の端末装置に通知してもよい。 The notification unit 105 notifies that the operation of the worker A has not been properly performed. Specifically, the notification unit 105 communicates with the forklift 3 via the communication unit 50, and notifies the forklift 3 that the operation of the worker A has not been properly performed. The notification destination is not limited to the forklift 3. For example, the notification unit 105 may notify a terminal device owned by the worker A, or may notify a terminal device of an administrator managing the work of the worker A.

（学習済みモデルの生成）
本実施形態の学習済みモデル１０２は、所定の操作が適切に行われたか否かの判定を行うことができるように、教師ありの機械学習により生成された学習済みモデルである。以下では、学習済みモデル１０２の生成について説明する。 (Generation of trained model)
The trained model 102 of the present embodiment is a trained model generated by supervised machine learning so that it can be determined whether or not a predetermined operation has been appropriately performed. Hereinafter, generation of the learned model 102 will be described.

学習済みモデル１０２の生成には、教師データとして、所定の操作が適切に行われたシーンを撮影した動画像を用いることができる。また、本実施形態の学習済みモデル１０２は、動画像からのシーン検出にも利用される。このため、教師データとして、フォークリフト３による一連の作業を複数のシーンに分けて、各シーンに固有のラベルを付した動画像を用いる。 To generate the learned model 102, a moving image obtained by capturing a scene in which a predetermined operation has been appropriately performed can be used as teacher data. Further, the learned model 102 of the present embodiment is also used for scene detection from a moving image. For this reason, a series of work by the forklift 3 is divided into a plurality of scenes, and a moving image with a unique label attached to each scene is used as teacher data.

例えば、フォークリフト３による荷物の荷役作業を判定装置１の判定対象とする場合、荷役作業が適切に行われた様子を撮影した動画像を用いて教師データを生成することができる。具体的には、まず、動画像を複数のシーンに分けて、各シーンに固有のラベルを付与する。各シーンには、作業者Ａが所定の操作を行う様子が写っていればよい。例えば、「フォークリフト３を荷物に接近させる」シーン、「フォーク（爪）の高さを調整する」シーン、「フォークをフォークポケットに差し込む」シーン等のそれぞれにラベルを付与してもよい。ラベル付与の対象とするシーンは任意であるが、特に、フォークリフト３を適切に操作できる作業者（例えばベテラン作業者）と、不適切な操作を行いがちな作業者（例えば新人作業者）とで操作内容に差が生じやすいシーンはラベル付与しておくことが好ましい。 For example, when the cargo handling work of the luggage by the forklift 3 is to be determined by the determination device 1, the teacher data can be generated using a moving image obtained by photographing a state in which the cargo handling work is appropriately performed. Specifically, first, the moving image is divided into a plurality of scenes, and a unique label is given to each scene. In each scene, it is sufficient that the situation in which the worker A performs a predetermined operation is shown. For example, a label may be given to each of a scene where "the forklift 3 approaches the luggage", a scene where "the height of the fork (claw) is adjusted", and a scene where "the fork is inserted into the fork pocket". Although the scene to be labeled is arbitrary, in particular, a worker who can appropriately operate the forklift 3 (for example, a veteran worker) and a worker who tends to perform inappropriate operation (for example, a new worker) It is preferable to label a scene in which a difference easily occurs in operation contents.

ラベルが付与された動画像は、荷役作業が適切に行われた様子を撮影したものであるから、ラベル付与された各シーンは、そのシーンにおける適切な操作を示すものとなる。複数の動画像を用いてこのような教師データをそれぞれ生成し、生成した教師データを用いて機械学習を行うことにより学習済みモデル１０２を生成することができる。 Since the moving image to which the label has been attached is an image of a state in which the cargo handling work has been appropriately performed, each of the labeled scenes indicates an appropriate operation in the scene. The trained model 102 can be generated by generating such teacher data using a plurality of moving images and performing machine learning using the generated teacher data.

（シーン検出について）
学習済みモデル１０２の出力を用いたシーン検出について説明する。上記のような機械学習により生成した学習済みモデル１０２に動画像を入力することにより、入力された動画像に写るシーンが、ラベル付与された各シーンに該当する確率がそれぞれ出力される。シーン検出部１０３は、この確率の値に基づいて撮影された動画像から所定のシーンを検出する。 (About scene detection)
The scene detection using the output of the learned model 102 will be described. By inputting a moving image to the trained model 102 generated by the machine learning as described above, the probability that a scene appearing in the input moving image corresponds to each labeled scene is output. The scene detection unit 103 detects a predetermined scene from a moving image captured based on the value of the probability.

具体的には、記憶部２０に記憶されている動画像２０１は、撮影が進むにつれて更新されてより長時間の動画像となるので、動画像取得部１０１は、学習済みモデル１０２に入力する動画像２０１の範囲を撮影の進行に合わせて変更する。例えば、動画像取得部１０１は、撮影開始時刻ｔ_０から時刻ｔ_１までの範囲の動画像２０１を学習済みモデル１０２に入力した後、撮影時間が所定時間Δｔだけ進行したときに時刻ｔ_０から時刻ｔ_０＋Δｔまでの動画像２０１を学習済みモデル１０２に入力してもよい。このように、所定時間ずつ範囲を増やすことにより、各範囲における各ラベルの確率の値が学習済みモデル１０２から出力される。 Specifically, the moving image 201 stored in the storage unit 20 is updated as the shooting progresses and becomes a longer moving image. The range of the image 201 is changed according to the progress of shooting. For example, video image acquisition unit 101, after entering the moving image 201 in a range from the imaging start time t ₀ to time t ₁ to the trained model 102, from the time t ₀ when the imaging time has proceeded for a predetermined time Δt The moving image 201 up to the time t ₀ + Δt may be input to the learned model 102. As described above, by increasing the range by the predetermined time, the value of the probability of each label in each range is output from the trained model 102.

そして、シーン検出部１０３は、学習済みモデル１０２から出力される各シーンの確率のうち値が最大のシーンが、学習済みモデル１０２に入力された動画像２０１に写るシーンであると判定する。また、シーン検出部１０３は、そのシーンに該当する確率が最大となる動画像２０１の範囲を、そのシーンが写る範囲であると検出する。 Then, the scene detection unit 103 determines that the scene having the largest value among the probabilities of each scene output from the learned model 102 is a scene appearing in the moving image 201 input to the learned model 102. Further, the scene detection unit 103 detects the range of the moving image 201 in which the probability of corresponding to the scene is the maximum as the range in which the scene appears.

例えば、一連の作業を撮影した動画像２０１のうち、時刻ｔ_０からｔ_１に撮影された範囲に、フォークリフト３を荷物に接近させる操作が撮影されていたとする。この場合、学習済みモデル１０２に入力する動画像２０１の範囲が時刻ｔ_０からｔ_１に近付くにつれて、「フォークリフトを荷物に接近させる」シーンである確率は大きくなり、その確率は、全シーン中で最大の値となる。そして、学習済みモデル１０２に入力する動画像２０１の範囲の末尾がｔ_１を超えると、「フォークリフトを荷物に接近させる」シーンである確率は下がり始める。 For example, of the moving image 201 captured a series of operations, the range taken from time t ₀ to t _1, the operation of approaching the forklift 3 luggage is to have been photographed. In this case, as the range of the moving image 201 input to the learned model 102 approaches from time t ₀ to t ₁ , the probability that the scene is “the forklift approaches the luggage” becomes large. This is the maximum value. When the end of the range of the moving image 201 to be input to the trained model 102 exceeds t _1, the probability is scene "is close to the luggage truck" begins to decrease.

よって、シーン検出部１０３は、動画像２０１のうち開始時刻が何れもｔ_０であり、終了時刻が所定時間ずつ長くなる範囲をそれぞれ学習済みモデル１０２に入力して得られる出力に基づいて、「フォークリフトを荷物に接近させる」シーンを検出することができる。また、シーン検出部１０３は、同様にして他のシーンも検出することができる。 Thus, the scene detection unit 103, both the start time of the moving image 201 is t _0, based on the output end time obtained by inputting each trained model 102 become longer range by a predetermined time, " A scene in which a forklift approaches a load can be detected. The scene detection unit 103 can detect other scenes in the same manner.

（操作適否判定について）
フォークリフト３を用いた作業においては操作手順が決まっているから、正しい手順で作業が行われた場合、所定の順序で所定のシーンが検出される。このため、操作適否判定部１０４は、シーン検出部１０３の検出結果から、正しい手順で作業が行われているか判定する。なお、正しい手順（シーンの正しい検出順）は、例えば学習済みモデル１０２の教師データを生成する際に特定することができる。 (About operation suitability judgment)
In the operation using the forklift 3, the operation procedure is determined, and if the operation is performed in a correct procedure, predetermined scenes are detected in a predetermined order. For this reason, the operation suitability determination unit 104 determines from the detection result of the scene detection unit 103 whether work is being performed in a correct procedure. Note that the correct procedure (correct scene detection order) can be specified, for example, when generating the teacher data of the learned model 102.

例えば、正しい手順が「荷物をフォークで引き出し」、「フォークを手前側に傾け」、「搬送する」という手順であったとする。この場合に、「荷物をフォークで引き出す」シーンが検出された直後に「搬送する」シーンが検出されたときには、操作適否判定部１０４は、誤った手順である（フォークを傾ける操作が抜けている）と判定する。 For example, suppose that the correct procedure is a procedure of “pulling out the luggage with a fork”, “tilting the fork forward”, and “transporting”. In this case, if the “transport” scene is detected immediately after the “pull-out luggage with fork” scene is detected, the operation suitability determination unit 104 determines that the procedure is incorrect (the fork tilt operation is missing). ) Is determined.

また、操作適否判定部１０４は、手順が正しいと判定した場合、操作内容が適切か否かについても判定する。これについて図３に基づいて説明する。図３は、フォークリフト３を作業者Ａが操作して前進させている様子を示す図である。図３のような画像は、例えばフォークリフト３による作業が行われる位置（積み降ろしの対象となる荷物の付近など）が画角に入るように位置固定された撮影装置２によって撮影することができる。 In addition, when determining that the procedure is correct, the operation suitability determination unit 104 also determines whether the operation content is appropriate. This will be described with reference to FIG. FIG. 3 is a diagram illustrating a state in which the worker A operates the forklift 3 to move forward. The image as shown in FIG. 3 can be captured by the image capturing device 2 fixed so that the position where the work by the forklift 3 is performed (for example, the vicinity of the load to be unloaded) is within the angle of view.

一般的な立ち乗りのフォークリフト３を前進させる操作を行う場合、図３の（ａ）に示すように、作業者Ａはフォークリフト３の正面方向（フォークが延びている方向）に対して、身体を斜めに向けることが好ましい。図３の（ｂ）に示すような、作業者Ａの身体がフォークリフト３の正面方向を向き、作業者Ａが操作レバーに正対した状態での操作は、図３の（ａ）の状態での操作と比べて方向転換や後進が行いにくく、作業者Ａを疲労させやすいためである。このため、上述の教師データには、図３の（ａ）のような姿勢で作業が行われた動画像を用いる。 When performing an operation of moving the forklift 3 in a general standing position, as shown in FIG. 3A, the worker A moves his / her body toward the front of the forklift 3 (the direction in which the fork extends). It is preferable to turn it obliquely. As shown in FIG. 3B, the operation of the worker A in the state where the body of the worker A faces the front of the forklift 3 and the worker A directly faces the operation lever is performed in the state of FIG. This is because it is difficult to change the direction and move backward as compared with the operation of, and it is easy for the worker A to be tired. For this reason, a moving image in which the work is performed in the posture as shown in FIG.

ここで、図３の（ａ）のような姿勢で作業が行われた動画像を教師データとして機械学習した学習済みモデル１０２に対し、正しい姿勢でフォークリフト３を前進させる操作を行った様子を撮影した動画像を入力した場合、学習済みモデル１０２の出力する確率の値は高くなる。一方、図３の（ｂ）のような姿勢でフォークリフト３を前進させる操作を行った様子を撮影した動画像を入力した場合、学習済みモデル１０２の出力する確率の値はより低い値となる。 Here, a state in which an operation of moving the forklift 3 forward in a correct posture is performed on the learned model 102 that has been machine-learned using a moving image in which the work is performed in the posture as illustrated in FIG. When the input moving image is input, the value of the probability that the trained model 102 outputs is high. On the other hand, when a moving image obtained by capturing a state in which the operation of moving the forklift 3 forward in the posture as shown in FIG. 3B is input, the value of the probability of the output of the learned model 102 becomes lower.

このように、学習済みモデル１０２の出力する確率の値の高低は、操作内容の適否を反映したものとなる。よって、操作適否判定部１０４は、学習済みモデル１０２の出力する確率に基づいて操作内容が適切な否かを判定することができる。具体的には、操作適否判定部１０４は、シーン検出部１０３が検出したシーンの確率の値が閾値以上であれば操作内容が適切であると判定し、閾値未満であれば操作内容は不適であると判定する。 As described above, the level of the probability output from the learned model 102 reflects the appropriateness of the operation content. Therefore, the operation suitability determination unit 104 can determine whether the operation content is appropriate based on the probability that the learned model 102 outputs. Specifically, the operation suitability determination unit 104 determines that the operation content is appropriate if the value of the probability of the scene detected by the scene detection unit 103 is equal to or greater than the threshold value, and determines that the operation content is inappropriate if the value is less than the threshold value. It is determined that there is.

（画像の他の例）
図３には、フォークリフト３を側方から見た様子を示したが、教師データおよび学習済みモデル１０２への入力に用いる動画像は、作業の様子が認識できるようなものであればよく、フォークリフト３の側方から撮影したものに限られない。例えば、図４や図５に示すような画像を教師データおよび学習済みモデル１０２への入力に用いてもよい。 (Other examples of images)
FIG. 3 shows the forklift 3 viewed from the side. The moving image used for inputting the teacher data and the learned model 102 may be any as long as the work state can be recognized. The image is not limited to the one taken from the side of No.3. For example, images as shown in FIGS. 4 and 5 may be used for input to the teacher data and the learned model 102.

図４は、ドライブレコーダのカメラにより作業者Ａおよびフォークリフト３を撮影して得た動画像から切り出したフレーム画像の例を示している。この画像は、フォークリフト３に搭載されたドライブレコーダのカメラにより作業者Ａおよびフォークリフト３を直上から撮影したものであるから、作業者Ａの手元の動きや顔の向き等が認識しやすい。また、ドライブレコーダの画像を流用しているため、判定装置１が必要な動画像を取得するための撮影装置（例えば図２の撮影装置２のようなもの）を設ける必要がなく、フォークリフト３がどこに移動しても撮影が継続できるという利点もある。 FIG. 4 shows an example of a frame image cut out from a moving image obtained by photographing the worker A and the forklift 3 using the camera of the drive recorder. Since this image is obtained by photographing the worker A and the forklift 3 from directly above using the camera of the drive recorder mounted on the forklift 3, the movement of the worker A at hand, the direction of the face, and the like can be easily recognized. Further, since the image of the drive recorder is diverted, there is no need to provide a photographing device (such as the photographing device 2 in FIG. 2) for the determination device 1 to acquire a necessary moving image. There is also the advantage that shooting can be continued wherever you move.

図５は、作業者Ａの目線で撮影された動画像から切り出したフレーム画像の例を示している。図５の（ａ）では、フォークリフト３のフォークが下降位置にあり、フォークの先端が画像中央よりもやや下方に位置している。また、図５の（ｂ）では、フォークリフト３のフォークが上昇位置にあり、フォークの先端は図５の（ａ）と同様に画像中央よりもやや下方に位置している。 FIG. 5 shows an example of a frame image cut out from a moving image taken with the eye of the worker A. In FIG. 5A, the fork of the forklift 3 is at the lowered position, and the tip of the fork is located slightly below the center of the image. Also, in FIG. 5B, the fork of the forklift 3 is in the raised position, and the tip of the fork is located slightly below the center of the image as in FIG. 5A.

そして、図５の（ｃ）は、パレットＰのフォークポケットにフォークを差し込む様子を撮影した画像である。なお、パレットＰは、荷物を格納・運搬するための荷台である。また、フォークポケットは、フォークの挿入孔である。図５の（ｃ）では、フォークの位置合わせのためにフォークリフト３が照射したレーザ光がパレットＰに投影された線Ｌが現れている。そして、線Ｌは、画像中央よりもやや下方に位置している。また、図５の（ｃ）では、フォークを真後ろからではなく、斜め後方から見ている。 FIG. 5C is an image obtained by shooting a fork inserted into the fork pocket of the pallet P. The pallet P is a carrier for storing and transporting the load. The fork pocket is a fork insertion hole. In FIG. 5C, a line L appears on the pallet P on which the laser light emitted by the forklift 3 for positioning the fork is projected. The line L is located slightly below the center of the image. Further, in FIG. 5C, the fork is viewed not from directly behind but from obliquely behind.

以上のように、作業者Ａの目線で撮影された動画像からは、作業者Ａがどこを注視しているかを認識することができる。よって、作業者Ａの目線で撮影された動画像を用いる場合、操作時に正しい位置を注視しているか否かを判定することができる。図５のような作業者目線の画像は、例えば作業者Ａの頭部にカメラを固定することによって撮影することができる。また、眼鏡型のウェアラブル機器を作業者Ａに装着させて撮影することもできる。 As described above, it is possible to recognize where the worker A is gazing from the moving image captured from the viewpoint of the worker A. Therefore, when using a moving image photographed with the eyes of the worker A, it is possible to determine whether or not the user is gazing at a correct position during the operation. The image of the worker's eyes as shown in FIG. Further, it is also possible to attach the wearable device of glasses type to the worker A and take a picture.

（処理の流れ）
判定装置１が実行する処理の流れを図６に基づいて説明する。図６は、判定装置１が実行する処理（判定方法）の一例を示すフローチャートである。なお、図２の例のように、撮影装置２が撮影した、作業者Ａがフォークリフト３を操作する様子を撮影した動画像は判定装置１に送信され、記憶部２０に動画像２０１として記録される。 (Processing flow)
The flow of the process executed by the determination device 1 will be described with reference to FIG. FIG. 6 is a flowchart illustrating an example of a process (determination method) performed by the determination device 1. As in the example of FIG. 2, a moving image captured by the imaging device 2 and captured by the worker A operating the forklift 3 is transmitted to the determination device 1 and recorded as the moving image 201 in the storage unit 20. You.

Ｓ１では、動画像取得部１０１が、上記のようにして記憶部２０に記録された動画像２０１の取得を開始する。そして、Ｓ２では、動画像取得部１０１が、Ｓ１で取得開始した動画像２０１を学習済みモデル１０２に入力し、学習済みモデル１０２は入力された動画像２０１が各シーンに該当する確率を出力する。 In S1, the moving image acquisition unit 101 starts acquiring the moving image 201 recorded in the storage unit 20 as described above. Then, in S2, the moving image acquisition unit 101 inputs the moving image 201 started to be acquired in S1 to the learned model 102, and the learned model 102 outputs a probability that the input moving image 201 corresponds to each scene. .

Ｓ３（シーン検出ステップ）では、シーン検出部１０３が、学習済みモデル１０２の出力値に基づいてシーン検出を行う。なお、シーン検出の方法については上記「シーン検出について」で説明したとおりであるからここでは説明を繰り返さない。また、以下のＳ４、Ｓ５の詳細についても上記「操作適否判定について」で説明したとおりであるからここでは説明を繰り返さない。 In S3 (scene detection step), the scene detection unit 103 performs scene detection based on the output value of the learned model 102. Note that the method of scene detection is as described in “Scene detection” above, and therefore, description thereof will not be repeated here. Also, the details of S4 and S5 described below are as described in the above “Operation appropriateness determination”, and thus description thereof will not be repeated here.

Ｓ４（操作適否判定ステップ）では、操作適否判定部１０４は、作業者Ａの操作手順が適切であるか否かを判定する。ここで、操作適否判定部１０４が、操作手順が適切であると判定した場合（Ｓ４でＹＥＳ）、処理はＳ５に進む。一方、操作適否判定部１０４が、操作手順が適切ではないと判定した場合（Ｓ４でＮＯ）、処理はＳ６に進む。 In S4 (operation appropriateness determination step), the operation appropriateness determination unit 104 determines whether the operation procedure of the worker A is appropriate. Here, if the operation suitability determination unit 104 determines that the operation procedure is appropriate (YES in S4), the process proceeds to S5. On the other hand, when the operation suitability determination unit 104 determines that the operation procedure is not appropriate (NO in S4), the process proceeds to S6.

Ｓ５（操作適否判定ステップ）では、操作適否判定部１０４は、作業者Ａの操作内容が適切であるか否かを判定する。ここで、操作適否判定部１０４が、操作内容が適切であると判定した場合（Ｓ５でＹＥＳ）、処理はＳ７に進む。一方、操作適否判定部１０４が、操作内容が適切ではないと判定した場合（Ｓ５でＮＯ）、処理はＳ６に進む。 In S5 (operation appropriateness determination step), the operation appropriateness determination unit 104 determines whether the operation content of the worker A is appropriate. Here, when the operation suitability determination unit 104 determines that the operation content is appropriate (YES in S5), the process proceeds to S7. On the other hand, when the operation suitability determination unit 104 determines that the operation content is not appropriate (NO in S5), the process proceeds to S6.

Ｓ６では、通知部１０５が、作業者Ａへの通知を行う。具体的には、通知部１０５は、通信部５０を介してフォークリフト３に命令を送信し、操作手順または操作内容が適切ではなかったことを作業者Ａに通知させる。なお、通知部１０５は、Ｓ４でＮＯと判定されていた場合には操作手順が誤りであったことを作業者Ａに通知させ、Ｓ５でＮＯと判定されていた場合には操作内容が誤りであったことを作業者Ａに通知させる。フォークリフト３による報知の態様は特に限定されず、例えばフォークリフト３がスピーカ等の出力装置を備えている場合にはその出力装置により報知してもよい。また、例えば、例えばフォークリフト３が表示装置を備えている場合にはその表示装置により報知してもよい。 In S6, the notification unit 105 notifies the worker A. Specifically, the notification unit 105 transmits a command to the forklift 3 via the communication unit 50 to notify the worker A that the operation procedure or the operation content is not appropriate. The notification unit 105 causes the worker A to notify that the operation procedure is incorrect when the determination is NO in S4, and the operation content is incorrect when the determination is NO in S5. The worker A is notified of the occurrence. The mode of notification by the forklift 3 is not particularly limited. For example, when the forklift 3 includes an output device such as a speaker, the notification may be performed by the output device. Further, for example, when the forklift 3 has a display device, the notification may be given by the display device.

また、通知部１０５は、フォークリフト３が表示装置を備えている場合には、Ｓ３で検出されたシーンについて、正しい操作が行われたときの動画像（例えば教師データに用いた動画像）をフォークリフト３に送信して、上記表示装置に表示させてもよい。さらに、通知部１０５は、作業者Ａの操作の様子を撮影した動画像を、正しい操作が行われたときの上記動画像と共に表示させてもよい。これにより、作業者Ａは、自身の操作内容と、正しい操作内容とを一目で対比することができるので、改善すべきポイントを容易に認識することができる。また、この場合、作業者Ａが改善すべきポイントである、作業者Ａの操作内容と正しい操作内容との相違点を、画像上で強調表示してもよい。 When the forklift 3 has a display device, the notification unit 105 displays a moving image (for example, a moving image used for teacher data) when a correct operation is performed on the scene detected in S3. 3 to be displayed on the display device. Furthermore, the notification unit 105 may display a moving image obtained by capturing the state of the operation of the worker A, together with the moving image when a correct operation is performed. Thereby, the worker A can compare his / her own operation contents and correct operation contents at a glance, and can easily recognize the points to be improved. In this case, the difference between the operation content of the worker A and the correct operation content, which is a point that the worker A should improve, may be highlighted on the image.

Ｓ７では、シーン検出部１０３が、作業者Ａの操作が終了したか否かを判定する。具体的には、シーン検出部１０３は、１つの作業を構成する一連のシーンの全てを検出していれば操作が終了したと判定し、未検出のシーンがあれば操作は終了していないと判定する。ここで、シーン検出部１０３が操作は終了したと判定した場合（Ｓ７でＹＥＳ）には、図示の処理は終了する。一方、シーン検出部１０３が操作は終了していないと判定した場合（Ｓ７でＮＯ）には処理はＳ２に戻る。 In S7, the scene detection unit 103 determines whether the operation of the worker A has been completed. Specifically, the scene detection unit 103 determines that the operation has been completed if all of a series of scenes constituting one task have been detected, and determines that the operation has not been completed if there is any undetected scene. judge. Here, when the scene detection unit 103 determines that the operation has been completed (YES in S7), the illustrated processing ends. On the other hand, if the scene detection unit 103 determines that the operation has not been completed (NO in S7), the process returns to S2.

なお、図６の例では、不適切な操作があったときに通知を行っているが、適切な操作が行われている期間に通知を行ってもよい。例えば、フォークリフト３が表示装置を備えている場合には、通知部１０５は、適切な操作が行われている期間には、その旨を示す情報を上記表示装置に表示させてもよい。例えば、通知部１０５は、適切な操作が行われている期間には、表示装置に青や緑色の文字や図形、記号等を表示させ、不適切な操作が行われたときにその文字や図形、記号等を変更させると共に、その表示色を黄色や赤に変更させてもよい。また、上記文字は、学習済みモデル１０２が出力した確率値としてもよい。この場合、作業者Ａは確率値が下がらないように意識しながら作業することになるので、注意力を維持した状態で作業を進めさせることができる。 In the example of FIG. 6, the notification is performed when an inappropriate operation is performed. However, the notification may be performed during a period in which an appropriate operation is performed. For example, when the forklift 3 includes a display device, the notification unit 105 may cause the display device to display information indicating that operation is being performed during an appropriate operation. For example, the notification unit 105 causes the display device to display blue or green characters, figures, symbols, and the like during a period in which an appropriate operation is being performed, and displays the characters or figures when an inappropriate operation is performed. , Symbols, etc., and the display color may be changed to yellow or red. Further, the character may be a probability value output from the learned model 102. In this case, since the worker A works while being conscious of the probability value not decreasing, the worker A can proceed with the work while maintaining the attention.

また、図６の例では、作業者Ａを撮影しつつ不適切な操作の有無を判定しているが、作業の終了後に不適切な操作の有無を判定してもよい。この場合、作業者Ａには作業終了後に各シーンの操作内容の適否をフィードバックすればよい。 In the example of FIG. 6, the presence or absence of an inappropriate operation is determined while photographing the worker A. However, the presence or absence of an inappropriate operation may be determined after the end of the operation. In this case, it is only necessary to feed back to the worker A whether or not the operation content of each scene is appropriate after the work is completed.

（実施形態１のまとめ）
以上のように、本実施形態のシーン検出部１０３は、フォークリフト３を操作して行う作業の様子を撮影した動画像から所定の操作が行われているシーンを検出する。そして、操作適否判定部１０４は、学習済みモデル１０２に対して、シーン検出部１０３が検出した上記シーンを入力して得られる結果に基づいて、当該シーンにおいて行われた上記所定の操作が適切か否かを判定する。なお、学習済みモデル１０２は、上記所定の操作が適切に行われたシーンを撮影した動画像を教師データとした機械学習により生成された学習済みモデルである。よって、本実施形態の判定装置１によれば、作業の様子を撮影した動画像から不適切な操作が行われたシーンを検出することができる。 (Summary of Embodiment 1)
As described above, the scene detection unit 103 of the present embodiment detects a scene in which a predetermined operation is being performed from a moving image obtained by photographing a state of a work performed by operating the forklift 3. The operation suitability determination unit 104 determines whether the predetermined operation performed in the scene is appropriate for the learned model 102 based on a result obtained by inputting the scene detected by the scene detection unit 103. Determine whether or not. Note that the learned model 102 is a learned model generated by machine learning using a moving image of a scene in which the predetermined operation is appropriately performed as teacher data. Therefore, according to the determination device 1 of the present embodiment, a scene in which an inappropriate operation has been performed can be detected from a moving image obtained by photographing the state of work.

また、本実施形態の学習済みモデル１０２は、所定の操作が適切に行われた作業の様子を撮影した動画像であって、各シーンが分類済みの動画像を教師データとした機械学習により生成されたものである。そして、上記学習済みモデル１０２は、上記学習済みモデルに入力された動画像を複数のシーンに分類すると共に、該分類の確度を示す確率値を出力する。また、シーン検出部１０３は、上記確率値に基づいて学習済みモデル１０２に入力されたシーンが学習済みの複数のシーンの何れに該当するかを検出する。さらに、操作適否判定部１０４は、上記確率値が所定の閾値未満であった場合に、所定の操作は適切ではないと判定する。よって、作業の様子を撮影した動画像２０１と、１つの学習済みモデル１０２とを用いてシーン検出と操作適否の判定の両方を行うことができる。 The trained model 102 of the present embodiment is a moving image obtained by capturing a state of a work in which a predetermined operation is appropriately performed, and is generated by machine learning using a moving image in which each scene is classified as teacher data. It was done. The trained model 102 classifies the moving image input to the trained model into a plurality of scenes, and outputs a probability value indicating the accuracy of the classification. Further, the scene detection unit 103 detects which of the plurality of learned scenes the scene input to the learned model 102 corresponds to based on the probability value. Further, if the probability value is less than a predetermined threshold, the operation suitability determination unit 104 determines that the predetermined operation is not appropriate. Therefore, both the scene detection and the determination of operation suitability can be performed using the moving image 201 that captures the state of the work and one learned model 102.

〔実施形態２〕
本発明の他の実施形態について、以下に説明する。なお、説明の便宜上、上記実施形態１にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を繰り返さない。本実施形態の判定システム１００は、シーン検出の方法が実施形態１と異なっている。これについて、図７に基づいて説明する。図７は、本発明の実施形態２に係る判定システム１００の概要を示す図である。 [Embodiment 2]
Another embodiment of the present invention will be described below. For convenience of description, members having the same functions as the members described in the first embodiment are denoted by the same reference numerals, and description thereof will not be repeated. The determination system 100 of the present embodiment differs from the first embodiment in the method of scene detection. This will be described with reference to FIG. FIG. 7 is a diagram illustrating an outline of the determination system 100 according to the second embodiment of the present invention.

図７に示すように、本実施形態の判定システム１００では、フォークリフト３の動作状況を示すセンシングデータが判定装置１に入力される。そして、判定装置１のシーン検出部１０３は、上記センシングデータを用いてシーン検出を行う。 As shown in FIG. 7, in the determination system 100 of the present embodiment, sensing data indicating an operation state of the forklift 3 is input to the determination device 1. Then, the scene detection unit 103 of the determination device 1 performs scene detection using the sensing data.

使用するセンシングデータは、シーンを特定するために必要な情報を含むものであればよい。例えば、上記センシングデータは、フォークリフト３に設けた加速度センサで検知した加速度データであってもよい。この場合、シーン検出部１０３は、加速度センサから取得した加速度データから、各シーンに特有の加速度の変動パターンを検出する。例えば、「フォークリフトを荷物に接近させる」シーンの場合、フォークリフト３は荷物に向かって加速した後、荷物付近で減速して停止する。このため、シーン検出部１０３は、加速、減速、停止という加速度の変動パターンから、「フォークリフトを荷物に接近させる」シーンを検出することができる。 The sensing data to be used only needs to include information necessary for specifying the scene. For example, the sensing data may be acceleration data detected by an acceleration sensor provided on the forklift 3. In this case, the scene detection unit 103 detects a variation pattern of acceleration unique to each scene from the acceleration data acquired from the acceleration sensor. For example, in the case of a scene of “making a forklift approach a load”, the forklift 3 accelerates toward the load, then decelerates and stops near the load. For this reason, the scene detection unit 103 can detect a scene in which the forklift approaches the luggage from the variation pattern of the acceleration such as acceleration, deceleration, and stop.

また、シーン検出部１０３は、複数種類のデータを用いてシーン検出を行ってもよい。例えば、フォークリフト３に荷物との距離を計測する距離センサを設ければ、「フォークリフトを荷物に接近させる」シーンをより確実に検出することができる。この他にも、例えばフォークリフト３が受け付けた操作内容（前進、後退、リフトの昇降等の操作の内容）を示す情報を、例えばフォークリフト３から受信する等によって取得し、そのような情報を用いてシーン検出を行うこともできる。 Further, the scene detection unit 103 may perform scene detection using a plurality of types of data. For example, if the forklift 3 is provided with a distance sensor that measures the distance to the load, it is possible to more reliably detect a scene where the forklift 3 approaches the load. In addition, for example, information indicating the operation contents (operation contents such as forward, backward, lift up / down, etc.) received by the forklift 3 is acquired by receiving the information from the forklift 3, for example, and the information is obtained by using such information. Scene detection can also be performed.

なお、本実施形態の判定システム１００では、シーン検出部１０３によるシーン検出後、そのシーンの動画像２０１を動画像取得部１０１が学習済みモデル１０２に入力し、出力データを得る。そして、操作適否判定部１０４は、上記出力データにおいて最も高い確率値となったシーンが何れのシーンであるかによって、操作手順の適否を判定する。また、操作適否判定部１０４は、操作手順が正しい場合、上記確率値が閾値以上であるか否かを判定し、閾値以上であれば操作内容が適切、閾値未満であれば操作内容が不適切と判定する。 In the determination system 100 of the present embodiment, after the scene detection by the scene detection unit 103, the moving image acquisition unit 101 inputs the moving image 201 of the scene to the learned model 102, and obtains output data. Then, the operation suitability determination unit 104 determines the suitability of the operation procedure based on which scene has the highest probability value in the output data. In addition, the operation suitability determination unit 104 determines whether the probability value is equal to or greater than a threshold value if the operation procedure is correct, and if the probability value is equal to or greater than the threshold value, the operation content is appropriate. Is determined.

（実施形態２のまとめ）
以上のように、本実施形態のシーン検出部１０３は、フォークリフト３が操作されたときの当該フォークリフト３の動作状況を示すデータを用いてシーンを検出する。このように、動作状況を示すデータを用いたシーン検出と、学習済みモデルを用いた適否判定という異なる技術要素を組み合わせたことによって、より信頼性の高い判定を行うことが可能になる。例えば、入力された動画像２０１の一部にノイズが含まれており、動画像２０１に基づく判定の精度が落ちている期間についても、センシングデータ等を用いたシーン検出は問題なく行うことができる。よって、そのシーンについては適否判定をスキップする等の処理を採用して、信頼性の低い判定結果を出力しないようにすることもできる。 (Summary of Embodiment 2)
As described above, the scene detection unit 103 of the present embodiment detects a scene using the data indicating the operation status of the forklift 3 when the forklift 3 is operated. In this manner, by combining different technical elements such as scene detection using data indicating the operation status and determination of suitability using a trained model, it is possible to perform determination with higher reliability. For example, in a period in which noise is included in a part of the input moving image 201 and the accuracy of the determination based on the moving image 201 is low, scene detection using sensing data or the like can be performed without any problem. . Therefore, it is also possible to adopt a process such as skipping the appropriateness determination for the scene, so that a low-reliability determination result is not output.

〔分散処理について〕
上記各実施形態で説明した判定装置１の実行する処理の一部は、判定装置１と通信接続された１または複数の装置に実行させてもよい。例えば、学習済みモデル１０２の実行する処理を、判定装置１と通信接続されたＡＩサーバに実行させてもよい。この場合、判定装置１は、動画像２０１から入力データを生成してＡＩサーバに送信し、該ＡＩサーバから出力データを受信してシーン検出や操作適否の判定を行う。 [About distributed processing]
Part of the processing performed by the determination device 1 described in each of the above embodiments may be performed by one or more devices that are communicatively connected to the determination device 1. For example, the processing executed by the learned model 102 may be executed by an AI server communicatively connected to the determination device 1. In this case, the determination device 1 generates input data from the moving image 201, transmits the input data to the AI server, receives output data from the AI server, and performs scene detection and determination of operation suitability.

〔入力データについて〕
学習済みモデル１０２に対する入力データとしては、動画像２０１をそのまま用いてもよいし、動画像２０１に対して所定の処理を施したものを用いてもよい。上記所定の処理は、各シーンの特徴点を失わせることなく、特徴点と関係のない情報を減らすことができるものであればよい。例えば、動画像２０１がカラー画像であれば、グレースケール化して入力データとしてもよい。また、動画像２０１において、動きのある領域のみを抽出して入力データとしてもよい。 [About input data]
As the input data to the learned model 102, the moving image 201 may be used as it is, or data obtained by performing predetermined processing on the moving image 201 may be used. The predetermined process may be any process that can reduce information that is not related to a feature point without losing the feature point of each scene. For example, if the moving image 201 is a color image, it may be converted to grayscale and used as input data. Alternatively, in the moving image 201, only a moving area may be extracted and used as input data.

さらに、例えば動画像２０１に写る対象物の中からフォークリフト３とその作業者Ａを検出し、検出したフォークリフト３とその作業者Ａが写る領域のみを入力データとしてもよい。これにより、フォークリフト３とその作業者Ａの背景の影響を排除することができるので、判定精度を高めることができる。なお、フォークリフト３とその作業者Ａの検出には、例えばＣＮＮ等の学習済みモデルを用いることができる。 Further, for example, the forklift 3 and the worker A thereof may be detected from the target object shown in the moving image 201, and only the detected forklift 3 and the region where the worker A appears may be used as input data. Accordingly, the influence of the forklift 3 and the background of the worker A can be eliminated, so that the determination accuracy can be improved. For detection of the forklift 3 and the worker A, for example, a learned model such as CNN can be used.

〔変形例〕
上記各実施形態では、適切に行われた作業の様子を撮影した動画像を教師データとした機械学習で学習済みモデルを生成する例を説明したが、教師データは、操作内容が不適切であったときの作業の様子を撮影した動画像としてもよい。この場合、操作内容の適否の判定においては、不適切な操作内容に該当する確率が所定の閾値以上であれば、操作内容が不適切であると判定する。 (Modification)
In each of the above embodiments, an example has been described in which a trained model is generated by machine learning using a moving image of an appropriately performed work as teacher data, but the operation of the teacher data is inappropriate. It is also possible to use a moving image obtained by photographing the state of the work when the operation is performed. In this case, in the determination of the suitability of the operation content, if the probability corresponding to the inappropriate operation content is equal to or more than a predetermined threshold, it is determined that the operation content is inappropriate.

例えば、荷物の積み下ろし時に荷崩れを発生させてしまったときの作業の様子を撮影した動画像を教師データとして学習済みモデルを生成したとする。この場合、荷物の積み下ろしシーンにおいて、上記学習済みモデルの出力する確率、すなわち当該シーンが荷崩れを発生させたときの操作内容と同様の操作内容である確率が閾値以上であれば、荷崩れの発生可能性が高い操作が行われたと判定することができる。 For example, it is assumed that a trained model is generated using, as teacher data, a moving image obtained by capturing a state of a work when a load collapse has occurred during loading / unloading of luggage. In this case, in the unloading scene of the luggage, if the probability that the learned model outputs, that is, the probability that the operation content is the same as the operation content when the scene causes the collapse of the load is equal to or larger than the threshold, the load collapse is performed. It can be determined that an operation having a high possibility of occurrence has been performed.

この他にも、例えば荷物の搬送時に人や物に衝突したときの様子を撮影した動画像や、フォークをフォークポケットに挿入しようとして、誤って荷物やパレットにフォークが衝突したときの動画像などを用いてもよい。これにより、荷物の搬送シーンやフォークをフォークポケットに挿入するシーンにおいて、事故などの発生可能性が高い不適切な操作が行われたか否かを判定することができる。 In addition to this, for example, a moving image of a collision with a person or an object during transportation of a luggage, or a moving image of a fork accidentally colliding with a luggage or pallet when trying to insert a fork into a fork pocket May be used. This makes it possible to determine whether or not an improper operation that has a high possibility of occurrence of an accident or the like has been performed in a scene of transporting a load or a scene of inserting a fork into a fork pocket.

〔ソフトウェアによる実現例〕
判定装置１の制御ブロック（特に制御部１０に含まれる各部）は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ソフトウェアによって実現してもよい。 [Example of software implementation]
The control block of the determination device 1 (particularly, each unit included in the control unit 10) may be realized by a logic circuit (hardware) formed on an integrated circuit (IC chip) or the like, or may be realized by software. .

後者の場合、判定装置１は、各機能を実現するソフトウェアであるプログラムの命令を実行するコンピュータを備えている。このコンピュータは、例えば１つ以上のプロセッサを備えていると共に、上記プログラムを記憶したコンピュータ読み取り可能な記録媒体を備えている。そして、上記コンピュータにおいて、上記プロセッサが上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記プロセッサとしては、例えばＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）を用いることができる。上記記録媒体としては、「一時的でない有形の媒体」、例えば、ＲＯＭ（Read Only Memory）等の他、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムを展開するＲＡＭ（Random Access Memory）などをさらに備えていてもよい。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明の一態様は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the determination device 1 includes a computer that executes instructions of a program that is software for realizing each function. This computer includes, for example, one or more processors and a computer-readable recording medium storing the program. Then, in the computer, the object of the present invention is achieved by the processor reading the program from the recording medium and executing the program. As the processor, for example, a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit) can be used. Examples of the recording medium include “temporary tangible medium” such as a ROM (Read Only Memory), a tape, a disk, a card, a semiconductor memory, and a programmable logic circuit. Further, a RAM (Random Access Memory) for expanding the program may be further provided. Further, the program may be supplied to the computer via an arbitrary transmission medium (a communication network, a broadcast wave, or the like) capable of transmitting the program. Note that one embodiment of the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the program is embodied by electronic transmission.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the embodiments described above, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention.

１判定装置
１０２学習済みモデル
１０３シーン検出部
１０４操作適否判定部 1 Judgment device 102 Trained model 103 Scene detection unit 104 Operation suitability judgment unit

Claims

A scene detection unit that detects a scene in which a predetermined operation is performed on the transport vehicle from a moving image obtained by shooting a state of a work performed by operating a transport vehicle that transports the transport target,
A result obtained by inputting the scene detected by the scene detection unit with respect to a trained model generated by machine learning using a moving image obtained by shooting a scene in which the predetermined operation is appropriately performed as teacher data. And an operation suitability determining unit that determines whether the predetermined operation performed in the scene is appropriate based on the operation information.

The learned model is a moving image that captures the state of the work in which the predetermined operation is appropriately performed, and is generated by machine learning using a classified moving image as teacher data. And classifying the moving image input to the learned model into a plurality of scenes, and outputting information indicating the accuracy of the classification,
The scene detection unit detects which of the plurality of scenes the scene input to the learned model is based on the accuracy of the classification,
The determination device according to claim 1, wherein the operation suitability determination unit determines that the predetermined operation is not appropriate when the accuracy of the classification is less than a predetermined threshold.

The scene detection unit detects the scene using data indicating an operation state of the transport vehicle when the transport vehicle is operated,
The determination device according to claim 1, wherein:

A determination method by a determination device,
A scene detection step of detecting a scene in which a predetermined operation is performed on the transport vehicle from a moving image obtained by photographing a state of a work performed by operating a transport vehicle that transports the transport target,
A result obtained by inputting the scene detected in the scene detection step with respect to a trained model generated by machine learning using a moving image obtained by shooting a scene in which the predetermined operation is appropriately performed as teacher data. An operation appropriateness determination step of determining whether the predetermined operation performed in the scene is appropriate based on the above.