JP2023078983A

JP2023078983A - Work management method or system

Info

Publication number: JP2023078983A
Application number: JP2021192348A
Authority: JP
Inventors: 旗城周; Kijo Shu
Original assignee: MTI Ltd
Current assignee: MTI Ltd
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2023-06-07

Abstract

To provide a method or a system, capable of grasping steps without taking labor.SOLUTION: A method executed by a system including one or more calculation devices, includes processing of reproducing a first moving image recording first work configured of a plurality of steps, processing of causing an explainer to orally explain the first work displayed in the first moving image to acquire a voice of the explainer, and measuring processing of analysing the voice to measure a first step time set indicating a time required for each of the plurality of steps.SELECTED DRAWING: Figure 8

Description

本発明は、作業管理方法またはシステムに関する。 The present invention relates to a work management method or system.

作業工程を管理する装置が従来技術として知られている（例えば特許文献１参照）。 2. Description of the Related Art A device for managing work processes is known as a conventional technology (see, for example, Patent Document 1).

特開２０００－２４８５０号公報JP-A-2000-24850

しかし、上記の技術においては、労力をかけずに各工程にかかる時間または工程名称を正確に把握し、工程表を立案することは難しかった。 However, with the above technology, it is difficult to accurately grasp the time required for each step or the name of the step and to draw up a process chart without much effort.

このような課題を鑑み、本発明は一態様として、１以上の演算装置を含むシステムで実行される方法であって、複数の工程で構成される第１作業を記録した第１動画を再生する処理と、前記第１動画に表示される前記第１作業を説明者に口頭で説明させ、前記説明者の音声を取得する処理と、前記音声を解析させ、前記複数の工程それぞれにかかる時間を示す第１工程時間セットを計測する計測処理と、を含む方法を提供する。 In view of such problems, as one aspect of the present invention, there is provided a method executed by a system including one or more computing devices, wherein a first moving image recording a first work composed of a plurality of steps is reproduced. processing, causing an explainer to verbally explain the first work displayed in the first moving image, obtaining the voice of the explainer, analyzing the voice, and determining the time required for each of the plurality of steps. and a measurement process for measuring the indicated first step time set.

労力をかけずに工程を把握できる方法またはシステムの提供が可能である。 It is possible to provide a method or system that can grasp the process without labor.

本実施形態に係る情報処理システムの全体構成図である。1 is an overall configuration diagram of an information processing system according to an embodiment; FIG. 本実施形態に係る情報処理装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the information processing apparatus which concerns on this embodiment. 本実施形態に係る（ａ）配信サーバ及び（ｂ）機械学習サーバの機能構成を示す図である。It is a figure which shows the functional structure of (a) distribution server and (b) machine-learning server which concern on this embodiment. 本実施形態に係る学習器の構造を示す図である。It is a figure which shows the structure of the learning device which concerns on this embodiment. 本実施形態に係る工程情報を示す図である。It is a figure which shows the process information which concerns on this embodiment. 本実施形態に係る学習用データを示す図である。It is a figure which shows the data for learning which concerns on this embodiment. 本実施形態に係る工程表作成処理を示すシークエンス図である。FIG. 4 is a sequence diagram showing process chart creation processing according to the present embodiment; 本実施形態に係る音声解析の概要を示す図である。It is a figure which shows the outline|summary of the speech analysis which concerns on this embodiment. 本実施形態に係る処理において作成された工程表を示す図である。It is a figure which shows the process chart produced in the process which concerns on this embodiment. 本実施形態に係る学習処理を示すフローチャートである。4 is a flowchart showing learning processing according to the embodiment; 本実施形態における、画像処理の概要を示す図である。4 is a diagram showing an outline of image processing in this embodiment; FIG. 本実施形態における、推定処理を示すフローチャートである。6 is a flow chart showing estimation processing in the present embodiment.

本明細書および添付図面の記載により、少なくとも以下の事項が明らかとなる。以下、本発明をその一実施形態に即して添付図面を参照しつつ説明する。 At least the following matters will become apparent from the description of the present specification and the accompanying drawings. DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will now be described in accordance with one embodiment thereof with reference to the accompanying drawings.

図１に本発明の一実施形態に係る情報処理システム１の構成を示す。情報処理システム１は、配信サーバ１０、機械学習サーバ２０、撮影装置４０、および一つ以上のユーザ端末３０の各演算装置を含む。配信サーバ１０、機械学習サーバ２０、撮影装置４０、およびユーザ端末３０は、通信ネットワーク５を介して互いにデータの送受信が可能となるように接続されている。通信ネットワーク５は、無線方式または有線方式の通信手段であり、例えば、インターネット、ＷＡＮ（Wide Area Network）、ＬＡＮ（Local Area Network）、公衆通信網、専用線等である。なお、本実施形態による情報処理システム１は上記複数の情報管理装置によって構成されているが、本発明はこれらの装置の数を限定するものではない。そのため、情報処理システム１は、以下のような機能を備えるものであれば、１以上の装置によって構成することができる。 FIG. 1 shows the configuration of an information processing system 1 according to one embodiment of the present invention. The information processing system 1 includes computing devices of a distribution server 10 , a machine learning server 20 , a photographing device 40 , and one or more user terminals 30 . The distribution server 10, the machine learning server 20, the imaging device 40, and the user terminal 30 are connected via the communication network 5 so that they can transmit and receive data to each other. The communication network 5 is wireless or wired communication means, such as the Internet, WAN (Wide Area Network), LAN (Local Area Network), public communication network, dedicated line, and the like. Although the information processing system 1 according to this embodiment is composed of the plurality of information management devices, the present invention does not limit the number of these devices. Therefore, the information processing system 1 can be configured with one or more devices as long as they have the following functions.

ユーザ端末３０は、ユーザ（作業者や後述の説明者など）によって操作される情報処理装置であり、例えば、スマートフォン、タブレット、携帯電話機、パーソナルコンピュータ等である。 The user terminal 30 is an information processing device operated by a user (such as an operator or an explainer described later), and is, for example, a smart phone, a tablet, a mobile phone, a personal computer, or the like.

配信サーバ１０は、製品の組立や構造物の工事など人間やロボットによって実行される各種作業に関し、工程を管理し、工程表を作成する機能を有する情報処理装置である。 The distribution server 10 is an information processing device that has a function of managing processes and creating a process chart for various works such as product assembly and construction work performed by humans or robots.

機械学習サーバ２０は、機械学習を実行して学習モデルを作成する機能、及び、学習モデルを用いて工程表を作成する機能を有する。 The machine learning server 20 has a function of executing machine learning to create a learning model and a function of creating a process chart using the learning model.

撮影装置４０は、作業を撮影して動画を生成する機能を有する装置であり、例えば、デジタルビデオカメラなどが該当する。 The photographing device 40 is a device having a function of photographing work and generating a moving image, and corresponds to, for example, a digital video camera.

図２は、配信サーバ１０、機械学習サーバ２０、及びユーザ端末３０の実現に用いるハードウェア（以下、「情報処理装置１００」と称する。）の一例である。同図に示すように、情報処理装置１００は、プロセッサ１０１、主記憶装置１０２、補助記憶装置１０３、入力装置１０４、出力装置１０５、および通信装置１０６を備える。これらは図示しないバス等の通信手段を介して互いに通信可能に接続されている。 FIG. 2 shows an example of hardware used to realize the distribution server 10, the machine learning server 20, and the user terminal 30 (hereinafter referred to as "information processing apparatus 100"). As shown in the figure, the information processing apparatus 100 includes a processor 101 , a main memory device 102 , an auxiliary memory device 103 , an input device 104 , an output device 105 and a communication device 106 . These are communicably connected to each other via communication means such as a bus (not shown).

尚、情報処理装置１００は、その全ての構成が必ずしもハードウェアで実現されている必要はなく、構成の全部又は一部が、例えば、クラウドシステム（cloud system）のクラウドサーバ（cloud server）のような仮想的な資源によって実現されていてもよい。 Note that the information processing apparatus 100 does not necessarily have to be implemented entirely by hardware. may be realized by virtual resources such as

プロセッサ１０１は、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）等を用いて構成される。プロセッサ１０１が、主記憶装置１０２に格納されているプログラムを読み出して実行することにより、配信サーバ１０や機械学習サーバ２０、ユーザ端末３０及び撮影装置４０の機能が実現される。 The processor 101 is configured using a CPU (Central Processing Unit), an MPU (Micro Processing Unit), and the like. The functions of the distribution server 10, the machine learning server 20, the user terminal 30, and the imaging device 40 are realized by the processor 101 reading and executing the programs stored in the main storage device 102. FIG.

主記憶装置１０２は、プログラムやデータを記憶する装置であり、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、不揮発性半導体メモリ（ＮＶＲＡＭ（Non Volatile RAM））等である。 The main memory device 102 is a device that stores programs and data, and is a ROM (Read Only Memory), a RAM (Random Access Memory), a nonvolatile semiconductor memory (NVRAM (Non Volatile RAM)), or the like.

補助記憶装置１０３は、例えば、ＳＳＤ（Solid State Drive）、ＳＤメモリカード等の各種不揮発性メモリ（NVRAM:Non-volatile memory）、ハードディスクドライブ、光学式記憶装置（ＣＤ（Compact Disc）、ＤＶＤ(Digital Versatile Disc)等）、クラウドサーバの記憶領域等である。補助記憶装置１０３に格納されているプログラムやデータは主記憶装置１０２に随時読み込まれる。 The auxiliary storage device 103 includes, for example, SSD (Solid State Drive), various non-volatile memories (NVRAM: Non-volatile memory) such as SD memory card, hard disk drive, optical storage device (CD (Compact Disc), DVD (Digital Versatile Disc), etc.), a storage area of a cloud server, and the like. Programs and data stored in the auxiliary storage device 103 are read into the main storage device 102 at any time.

入力装置１０４は、情報の入力を受け付けるインタフェースであり、例えば、キーボード、マウス、タッチパネル、カードリーダ、音声入力装置（マイクロフォン等）、音声認識装置等である。情報処理装置１００が通信装置１０６を介して他の装置との間で情報の入力を受け付ける構成としてもよい。 The input device 104 is an interface for receiving input of information, and includes, for example, a keyboard, a mouse, a touch panel, a card reader, a voice input device (such as a microphone), and a voice recognition device. The information processing device 100 may be configured to receive input of information from another device via the communication device 106 .

出力装置１０５は、各種の情報を出力するインタフェースであり、例えば、画面表示装置（液晶モニタ、ＬＣＤ（Liquid Crystal Display）、グラフィックカード等）、印字装置等）、音声出力装置（スピーカ等）、音声合成装置等である。情報処理装置１００が通信装置１０６を介して他の装置との間で情報の出力を行う構成としてもよい。 The output device 105 is an interface for outputting various kinds of information. synthesizer and the like. The information processing device 100 may be configured to output information to another device via the communication device 106 .

通信装置１０６は、通信ネットワーク５を介した他の装置との間の通信を実現する有線方式又は無線方式の通信インタフェースであり、例えば、ＮＩＣ（Network Interface Card）、無線通信モジュール、ＵＳＢ（Universal Serial Interface）モジュール、シリアル通信モジュール等である。 The communication device 106 is a wired or wireless communication interface that realizes communication with other devices via the communication network 5. Examples of the communication device 106 include a NIC (Network Interface Card), a wireless communication module, and a USB (Universal Serial Interface Card). interface) module, serial communication module, and the like.

〔配信サーバの機能構成〕
図３（ａ）に配信サーバ１０が備える主な機能（ソフトウェア構成）を示す。同図に示すように、配信サーバ１０は、記憶領域１１０、管理部１２０の各機能を備える。これらの機能は、配信サーバ１０のプロセッサ１０１が配信サーバ１０の主記憶装置１０２または補助記憶装置１０３に格納されているプログラムを読み出して実行することにより実現される。また配信サーバ１０は、上記の機能に加えて、オペレーティングシステム、ファイルシステム、デバイスドライバ、ＤＢＭＳ（DataBase Management System）等の機能を備える。 [Functional configuration of distribution server]
FIG. 3A shows main functions (software configuration) of the distribution server 10. As shown in FIG. As shown in the figure, the distribution server 10 has functions of a storage area 110 and a management unit 120 . These functions are realized by processor 101 of distribution server 10 reading and executing programs stored in main memory 102 or auxiliary memory 103 of distribution server 10 . In addition to the functions described above, the distribution server 10 also has functions such as an operating system, a file system, a device driver, and a DBMS (DataBase Management System).

上記の機能のうち、記憶領域１１０は、配信サーバ１０の主記憶装置１０２または補助記憶装置１０３に形成される。記憶領域１１０は、作業状況を撮影して得られる作業動画１５１、工程情報１５２、及び、作業動画１５３、１５４の各データを記憶する。記憶領域１１０は、これらのデータを、例えばデータベースのテーブルや、ファイルシステムによって管理されるファイル等として記憶する。 Among the functions described above, the storage area 110 is formed in the main storage device 102 or the auxiliary storage device 103 of the distribution server 10 . The storage area 110 stores data of a work video 151, process information 152, and work videos 153 and 154 obtained by photographing the work situation. The storage area 110 stores these data as, for example, database tables and files managed by a file system.

工程情報１５２は、作業動画１５１の撮影時刻（または動画の再生時間）、工程にかかる時間、及び工程名称の組合せを複数個保持する（図５）。工程情報１５２、及び作業動画１５３、１５４の生成方法などの詳細については後述する。 The process information 152 holds a plurality of combinations of the shooting time of the work moving image 151 (or the reproducing time of the moving image), the time required for the process, and the process name (FIG. 5). Details of the process information 152 and the method of generating the work animations 153 and 154 will be described later.

管理部１２０は、人間の声を認識して文字データに変換する機能、及び、工程を作成する機能などを備える。 The management unit 120 has a function of recognizing human voice and converting it into character data, a function of creating a process, and the like.

〔機械学習サーバの機能構成〕
図３（ｂ）は機械学習サーバ２０が備える主な機能（ソフトウェア構成）を示している。同図に示すように、機械学習サーバ２０は、記憶領域２１０及び管理部２２０を備える。これらの機能は、機械学習サーバ２０のプロセッサ１０１が、機械学習サーバ２０の主記憶装置１０２に格納されているプログラムを読み出して実行することにより実現される。また機械学習サーバ２０は、上記の機能に加えて、オペレーティングシステム、ファイルシステム、デバイスドライバ、ＤＢＭＳ（DataBase Management System）等の機能を備える。 [Functional configuration of machine learning server]
FIG. 3B shows main functions (software configuration) of the machine learning server 20. As shown in FIG. As shown in the figure, the machine learning server 20 has a storage area 210 and a management unit 220 . These functions are realized by the processor 101 of the machine learning server 20 reading and executing a program stored in the main storage device 102 of the machine learning server 20 . In addition to the functions described above, the machine learning server 20 also has functions such as an operating system, a file system, a device driver, and a DBMS (DataBase Management System).

記憶領域２１０は、機械学習サーバ２０の主記憶装置１０２または補助記憶装置１０３に形成される。記憶領域２１０には、図３（ｂ）に示すように、学習器２２１（後述）の機械学習に用いられる学習用データ２５１が保存される。学習用データ２５１は、フレーム、工程名称ラベルの組合せを複数個有する（図６）。学習用データ２５１の生成方法などの詳細については後述する。 A storage area 210 is formed in the main storage device 102 or the auxiliary storage device 103 of the machine learning server 20 . As shown in FIG. 3B, the storage area 210 stores learning data 251 used for machine learning by a learning device 221 (described later). The learning data 251 has a plurality of combinations of frames and process name labels (FIG. 6). Details such as a method of generating the learning data 251 will be described later.

管理部２２０は、後述する様々な機能を備えているが、少なくとも学習器２２１を備える。学習器２２１は入力された画像の画像特徴量を学習し、入力された画像に対し、画像内の要素の推定結果を示す情報を出力する機能を有する。このような機能を備える学習器２２１としては、様々な種類、構造のモデルが採用し得るが、本実施形態での学習器２２１は、図４に示すように、畳み込みニューラルネットワークなどのニューラルネットワークを構築し、深層学習を行う。 The management unit 220 has various functions to be described later, and includes at least a learning device 221 . The learning device 221 has a function of learning the image feature amount of the input image and outputting information indicating the result of estimating the elements in the image with respect to the input image. As the learning device 221 having such a function, models of various types and structures can be adopted. Build and do deep learning.

学習器２２１は、画像の入力を受け付ける入力層と、注目要素の推定結果を出力する出力層と、入力された画像の特徴量を抽出する中間層とを有する。入力層、出力層、及び中間層の各層は、ノード（図中、白丸で示す）を備えており、これらの各層のノードは、エッジ（図中、矢印で示す）によって接続されている。なお、図４に示す学習器２２１の構成は例示であり、ノード及びエッジの数、中間層の数などは適宜変更可能である。 The learning device 221 has an input layer that receives an input of an image, an output layer that outputs an estimation result of the target element, and an intermediate layer that extracts the feature amount of the input image. Each of the input layer, the output layer, and the intermediate layer has nodes (indicated by white circles in the figure), and the nodes of these layers are connected by edges (indicated by arrows in the figure). Note that the configuration of the learning device 221 shown in FIG. 4 is an example, and the number of nodes and edges, the number of intermediate layers, and the like can be changed as appropriate.

〔工程表作成処理〕
上記構成の情報処理システム１が実行する処理について、主に図７のシークエンス図を用いて以下に説明する。以下では一例として、複数の工程によって構成される物品組立作業について、工程表の作成を行う処理について説明を行う。 [Work schedule creation process]
Processing executed by the information processing system 1 configured as described above will be described below mainly using the sequence diagram of FIG. As an example, the process of creating a process chart for an article assembly work composed of a plurality of processes will be described below.

情報処理システム１で行われる処理は、配信サーバ１０、機械学習サーバ２０、ユーザ端末３０及び撮影装置４０の各装置において主記憶装置１０２または補助記憶装置１０３に格納されているプログラムを読み出して実行することにより実現される。以下の処理では、プログラムによって生成した管理部１２０、２２０などの処理を、配信サーバ１０、機械学習サーバ２０が実行するものとして説明する場合がある。 The processing performed by the information processing system 1 reads and executes programs stored in the main storage device 102 or the auxiliary storage device 103 in each of the distribution server 10, the machine learning server 20, the user terminal 30, and the imaging device 40. It is realized by In the following processing, it may be assumed that the distribution server 10 and the machine learning server 20 execute the processing of the management units 120 and 220 generated by the program.

ステップＳ１において、組立作業の撮影が行われる。少なくとも１つの工程に関する作業に対して撮影がなされるが、作業全体を撮影することが好ましい。撮影で得られた動画データは撮影装置４０から送信され、配信サーバ１０の記憶領域１１０に作業動画１５１として保存される(Ｓ２)。 In step S1, the assembly work is photographed. Filming is done for work related to at least one step, but it is preferable to film the entire work. The moving image data obtained by shooting is transmitted from the imaging device 40 and stored as the working moving image 151 in the storage area 110 of the distribution server 10 (S2).

ステップＳ３において、ユーザ端末３０による動画再生と、説明者による作業説明が行われる。具体的には、ユーザ端末３０を介して作業動画１５１が再生され、組立作業を熟知している説明者がその動画を観ながら、動画に映し出される各工程を説明する。説明する内容は、工程の名称であってもよいし、作業詳細でもよいが、以下ではこの処理における説明者の発言を、一律に「工程（の）名称」として扱う。 In step S3, the user terminal 30 reproduces the moving image and the explainer explains the work. Specifically, the work video 151 is played back via the user terminal 30, and an explainer who is familiar with the assembly work explains each process shown in the video while watching the video. The content to be explained may be the name of the process or the details of the work, but hereinafter, the comment of the explainer in this process will be uniformly treated as the "name of the process".

配信サーバ１０は、ユーザ端末３０を介して説明者の音声を取得する（Ｓ４）。取得の際には、音声を記憶領域１１０に保存してもよいし、保存せずに次の処理（ステップＳ５）を同時に行ってもよい。 The distribution server 10 acquires the voice of the presenter via the user terminal 30 (S4). At the time of acquisition, the voice may be saved in the storage area 110, or the next process (step S5) may be performed simultaneously without saving.

配信サーバ１０は、次に工程情報１５２を作成する（Ｓ５）。工程情報１５２の作成は、図５に示すように、各工程にかかる時間と、各工程名称とを関連付けることによって、作成される。 The distribution server 10 then creates process information 152 (S5). The process information 152 is created by associating the time required for each process with the name of each process, as shown in FIG.

具体的には、図８に示すように、説明者の音声が配信サーバ１０（管理部１２０）によって解析され、説明者の音声が文字に変換される。これに並行し、説明者の音声の継続時間を計測することによって工程にかかる時間が計測される。例えば、「ネジ締め」という音声の発生した時間が２秒であれば、配信サーバ１０は、ネジ締め工程の時間を２秒とする。この処理の結果得られた「ネジ締め」という文字と、工程にかかる時間とを関連付けした工程情報１５２を作成する（図５）。 Specifically, as shown in FIG. 8, the voice of the explainer is analyzed by the distribution server 10 (management unit 120), and the voice of the explainer is converted into characters. In parallel with this, the duration of the process is measured by measuring the duration of the presenter's speech. For example, if the voice "screw tightening" is generated for 2 seconds, the distribution server 10 sets the screw tightening process time to 2 seconds. Process information 152 is created by associating the characters "screw tightening" obtained as a result of this processing with the time required for the process (FIG. 5).

別の方法として、音声の間隔が工程にかかる時間として計測されてもよい。例えば、「ネジ締め」という音声と「製品移動」という音声が順次取得された場合、「ネジ締め」という音声が発生した時刻から「製品移動」という音声の発生した時刻までの間隔を、配信サーバ１０は、ネジ締め工程の時間とする。 Alternatively, intervals of speech may be measured as the time taken for the process. For example, if the voice "tighten a screw" and the voice "moving a product" are acquired in sequence, the interval from the time when the voice "tightening a screw" occurs to the time when the voice "moving the product" occurs is calculated by the delivery server. 10 is the time for the screw tightening process.

なお、図８などに示す「無音」とは、必ずしも音が無いことを指すのではない。例えば人声の周波数帯における音量が閾値以下となる場合に無音とみなす処理や、音声解析で文字に変換できない音声波形が得られる場合に無音とみなす処理が実行され得る。 Note that "silence" shown in FIG. 8 and the like does not necessarily mean that there is no sound. For example, if the sound volume in the frequency band of the human voice is below a threshold value, there is no sound, and if the speech analysis yields a speech waveform that cannot be converted into text, there can be no sound.

配信サーバ１０は、作成された工程情報１５２に基づいて図９に示すような工程表を作成することができる（Ｓ７）。工程情報１５２には、図５のように、動画の撮影時刻（または動画の再生時間、以下同様）と、各工程の名称と、各工程にかかる時間とが関連付けられて保存されている。配信サーバ１０は、これらの関連付けを用いて工程表を作成する。 The distribution server 10 can create a process chart as shown in FIG. 9 based on the created process information 152 (S7). In the process information 152, as shown in FIG. 5, the video recording time (or video playback time, the same shall apply hereinafter), the name of each process, and the time required for each process are associated and stored. The distribution server 10 creates a process chart using these associations.

なお、工程表の形式は様々である。例えば、図９に示すようなフローチャート形式であってもよいし、その他ガントチャートや線図、図５のような表形式など、様々な形式がユーザの希望に応じて採用される。 There are various forms of process charts. For example, a flow chart format as shown in FIG. 9, a Gantt chart, a line diagram, a table format such as FIG. 5, and various other formats may be adopted according to the user's wishes.

〔学習〕
上述の処理で作成された作業動画１５１と工程情報１５２は、以下に説明するように、学習器２２１の学習に用いられる。学習器２２１の学習は、図１０のフローチャートにしたがって行われる。〔study〕
The work animation 151 and the process information 152 created by the above process are used for learning by the learning device 221 as described below. Learning by the learner 221 is performed according to the flowchart of FIG.

ステップＳ１１において、機械学習サーバ２０の管理部２２０は、配信サーバ１０から作業動画１５１を取得し、作業動画１５１に対するラベル付けを実行する。この処理において、作業動画１５１を構成する静止画像である各フレームに対して工程の名称がラベル付けされ、フレームに表示される作業がどの工程に該当するのか示される（図６）。なお、この処理は機械学習サーバ２０でなく、配信サーバ１０によって実行されてもよい。 In step S<b>11 , the management unit 220 of the machine learning server 20 acquires the work video 151 from the distribution server 10 and labels the work video 151 . In this process, each frame, which is a still image that constitutes the work moving image 151, is labeled with the name of the process to indicate which process the work displayed in the frame corresponds to (FIG. 6). Note that this process may be executed by the distribution server 10 instead of the machine learning server 20 .

ラベル付けの際、管理部２２０は工程情報１５２から各工程の名称と各工程に係る時間とを読出す。さらに管理部２２０は、作業動画１５１の撮影時刻に応じて工程の名称をラベル付けする。 When labeling, the management unit 220 reads the name of each process and the time associated with each process from the process information 152 . Furthermore, the management unit 220 labels the process names according to the shooting time of the work moving image 151 .

次のステップＳ１３において、管理部２２０は、作業動画１５１に対してエッジ処理を含む画像処理を行う。エッジ処理を行うことによって、作業者の輪郭が明確に表示されることとなり、学習処理が容易となる。ステップＳ１３までの処理の結果、エッジ処理が施された作業動画１５１は、各フレームが工程名称及び撮影時刻に関連付けされた状態とされ、学習用データ２５１として記憶領域２１０に保存される（図６）。この際、学習用データ２５１には、ユーザによって修正が施されてもよい。 In the next step S<b>13 , the management unit 220 performs image processing including edge processing on the work moving image 151 . By performing the edge processing, the outline of the operator is clearly displayed, which facilitates the learning process. As a result of the processing up to step S13, each frame of the edge-processed working video 151 is associated with the process name and shooting time, and is stored in the storage area 210 as the learning data 251 (FIG. 6). ). At this time, the learning data 251 may be modified by the user.

なお、エッジ処理では、フレーム中で輝度変化が大きい画素を抽出し、その他の部分の画素と区別した二値化画像を作成する手法が一般的に採用される（図１１）。また画像処理（Ｓ１３）においては、エッジ処理だけでなく、ノイズ除去や輪郭を明確にすることを目的として膨張処理や収縮処理などの処理が併用されてもよい。 Note that edge processing generally employs a method of extracting pixels with large luminance changes in a frame and creating a binarized image that is distinguished from pixels in other portions (FIG. 11). Further, in the image processing (S13), not only edge processing but also processing such as expansion processing and contraction processing may be used together for the purpose of removing noise and clarifying contours.

次のステップＳ１５において、管理部２２０は、学習器２２１に対して学習用データ２５１を用いて学習させる。学習器２２１は、作業者、工具、または製品等が表示される各フレームの特徴量、各フレームの変化に関する特徴量を取得し、工程の名称と合わせて学習する。 In the next step S<b>15 , the management unit 220 causes the learning device 221 to learn using the learning data 251 . The learning device 221 acquires a feature amount of each frame in which a worker, a tool, a product, or the like is displayed, and a feature amount relating to changes in each frame, and learns them together with the name of the process.

この学習処理を実行することにより、学習器２２１は動画に表示される作業がどの工程に該当するのかを推定する、学習済みモデルとして機能する。 By executing this learning process, the learning device 221 functions as a learned model that estimates which process the work displayed in the moving image corresponds to.

なお、１つ作業動画１５１だけでなく、複数の作業動画に基づいて複数回学習処理が実行されてもよい。異なる作業内容に関する動画を用いることによって、様々な作業に対応する学習済みモデルを作成できる場合がある。または、推定処理（後述）の精度を向上させることができる。 Note that the learning process may be executed a plurality of times based on not only one work moving image 151 but also a plurality of work moving images. By using videos about different work contents, it may be possible to create a trained model corresponding to various work. Alternatively, the accuracy of estimation processing (described later) can be improved.

〔推定〕
学習済みの学習器２２１を用いると、説明者による説明を用いずに動画から工程を推定する推定処理を実行することができる。推定処理の詳細を図１２のフローチャートを用いて以下に説明する。〔Estimated〕
By using the trained learner 221, it is possible to execute an estimation process for estimating the process from the moving image without using the explanation by the explainer. Details of the estimation process will be described below with reference to the flowchart of FIG. 12 .

ステップＳ２１において、学習器２２１は、作業動画１５１とは異なる新たな作業動画１５３を、撮影装置４０より取得する。作業動画１５３には、作業者が、複数の工程にまたがる作業を行っている様子が撮られている。作業動画１５３の取得に際し、学習器２２１は、Ｓ１３の処理と同様、図１１のように作業動画１５３に対してエッジ処理を含む画像処理を施し、作業者や物体の輪郭を明瞭にした作業動画１５４を生成する（Ｓ２２）。 In step S<b>21 , the learning device 221 acquires a new work video 153 different from the work video 151 from the imaging device 40 . The work video 153 captures the worker performing work across a plurality of processes. When acquiring the work video 153, the learning device 221 performs image processing including edge processing on the work video 153 as shown in FIG. 154 is generated (S22).

次に学習器２２１は、作業動画１５４を構成する複数のフレームを読み込み、作業動画１５４に示される作業の工程名称と各工程にかかる時間を推定する（Ｓ２３）。具体的に述べると学習器２２１は、作業者、工具、または製品等が表示される各フレームの画像の特徴量、または、各フレームの変化に関する特徴量を取得し、取得した特徴量からフレームに表示される作業の工程名称を推定する。 Next, the learning device 221 reads a plurality of frames forming the work video 154, and estimates the process name of the work shown in the work video 154 and the time required for each process (S23). Specifically, the learning device 221 acquires the feature amount of the image of each frame in which the worker, the tool, the product, or the like is displayed, or the feature amount related to the change in each frame, and converts the acquired feature amount into the frame. Estimate the process name of the displayed operation.

同時に、学習器２２１は、フレームの撮影時刻から、各工程にかかる時間を把握することができる。または、同じ工程名称が付されたフレームの数とフレームレートに基づき、工程にかかる時間が計算できる。このようにして、各工程名称と、工程にかかる時間とが推定される。 At the same time, the learning device 221 can grasp the time required for each process from the shooting time of the frame. Alternatively, the time required for the process can be calculated based on the number of frames with the same process name and the frame rate. In this way, the name of each process and the time required for the process are estimated.

学習器２２１は、各フレームで推定された工程名称を用いて工程表を作成する（Ｓ２５）。ステップＳ２３で推定した各工程名称と各工程にかかる時間とに基づいて工程表が作成される。工程表の形式は、ステップＳ５の処理（図９）と同様、ユーザの要望、設定等に応じて適当なものが用意される。また、工程情報１５２と同様の形式の表が作成されてもよい。 The learning device 221 creates a process chart using the process name estimated for each frame (S25). A process chart is created based on the name of each process estimated in step S23 and the time required for each process. As with the processing of step S5 (FIG. 9), an appropriate format of the process chart is prepared according to the user's request, settings, and the like. Also, a table in the same format as the process information 152 may be created.

＜変形例＞
実施形態では、通信ネットワーク５で各装置が通信可能に接続されているが、装置間は必ずしも通信手段でつながれてなくてもよい。例えば、撮影装置４０が通信ネットワーク５から独立しており、メモリなどの記憶媒体を介して作業動画１５１などのデータを各装置に移動させる構成としてよい。他の装置についても同様である。 <Modification>
In the embodiment, each device is communicably connected by the communication network 5, but the devices do not necessarily have to be connected by communication means. For example, the photographing device 40 may be independent from the communication network 5, and data such as the work moving image 151 may be transferred to each device via a storage medium such as a memory. The same is true for other devices.

また、実施形態に示した各装置の処理や機能を、別の装置が実行してもよい。例えば、ラベル付け及びエッジ処理を含む画像処理（Ｓ１１、Ｓ１３、Ｓ２２）の一部または全部を配信サーバ１０や、撮影装置４０、ユーザ端末３０が実行してもよい。同様に、学習（Ｓ１５）及び工程推定（Ｓ２３）の一部または全部を配信サーバ１０や、撮影装置４０、ユーザ端末３０が実行してもよい。他の処理または機能に関しても、同様である。 Also, the processes and functions of each device shown in the embodiment may be executed by another device. For example, part or all of the image processing (S11, S13, S22) including labeling and edge processing may be performed by the distribution server 10, the imaging device 40, or the user terminal 30. Similarly, part or all of the learning (S15) and process estimation (S23) may be performed by the distribution server 10, the photographing device 40, or the user terminal 30. The same applies to other processing or functions.

＜効果＞
上記実施形態において情報処理システム１は、複数の工程で構成される作業を記録した作業動画１５１（第１動画に相当）を再生する処理（Ｓ３）と、作業動画１５１に表示される作業を説明者に口頭で説明させ、説明者の音声を取得する処理（Ｓ４）と、配信サーバ１０に説明者の音声を解析させ、複数の工程それぞれにかかる時間を計測し、図５に示されるような工程時間セットを有する工程情報１５２を取得する計測処理（Ｓ５）と、を実行する。 <effect>
In the above-described embodiment, the information processing system 1 performs a process (S3) of reproducing a work video 151 (corresponding to a first video) in which work composed of a plurality of steps is recorded, and the work displayed in the work video 151 will be described. A process (S4) of having the person give an oral explanation and acquiring the voice of the explainer, and having the distribution server 10 analyze the voice of the explainer, measuring the time required for each of the plurality of steps, and performing the processing as shown in FIG. A measurement process (S5) for acquiring process information 152 having a process time set is executed.

上記構成では、説明者が工程の名称を口頭で述べるだけで工程表が作成される。説明者が工程の名称をキーボード入力したり、各工程にかかる時間をストップウォッチで計測し、メモを取ったりする手間が無い。従来では、そのような手間や労力のため、工程把握をするためには多くの時間と人手を必要としていた。また、時間計測を計測する際の誤差や作業内容の記入ミスなどに起因する品質不良が発生する虞もあった。一方、上記構成では、労力を必要とせず、迅速にまたは正確に工程を把握することができる。 In the above configuration, the process chart is created only by the explainer verbally stating the name of the process. There is no need for the explainer to enter the name of the process from the keyboard, measure the time required for each process with a stopwatch, and take notes. Conventionally, due to such trouble and labor, much time and manpower were required to grasp the process. In addition, there is a possibility that quality defects may occur due to an error in measuring the time or an error in entering the details of the work. On the other hand, with the above configuration, the process can be grasped quickly or accurately without requiring labor.

計測処理（Ｓ５）において、配信サーバ１０は、説明者の音声の継続時間に基づいて複数の工程それぞれにかかる時間を計測する。 In the measurement process (S5), the distribution server 10 measures the time required for each of the multiple steps based on the duration of the voice of the explainer.

このような方法を採ることにより、各工程を遂行するために必要な時間を容易にまたは正確に計測することができる。 By adopting such a method, it is possible to easily or accurately measure the time required to perform each step.

配信サーバ１０は、説明者の音声を解析し、複数の工程それぞれの名称を示す名称セットをテキスト、すなわち文字として保持する工程情報１５２を取得する（Ｓ５）。 The distribution server 10 analyzes the explainer's voice and acquires the process information 152 that holds a name set indicating the names of each of a plurality of processes as text, that is, characters (S5).

上記構成では、説明者の音声を認識し、テキストデータを自動的に作成するため、説明者が工程の名称をキーボード入力するなどの手間が不要である。迅速にデータを作成することができる。 In the above configuration, the explainer's voice is recognized and the text data is automatically created, so the explainer does not need to input the name of the process from the keyboard. Data can be created quickly.

配信サーバ１０は、工程情報１５２から各工程の名称と工程時間とを関連付けして示す工程表を作成する（Ｓ７）。 The distribution server 10 creates a process chart showing the name of each process in association with the process time from the process information 152 (S7).

情報処理システム１は、作業動画１５１、工程情報１５２を用いて学習用データ２５１を作成する処理（Ｓ１１、Ｓ１３）を実行する。また、機械学習サーバ２０では、学習用データ２５１を用いて学習器２２１を学習させ、学習済みモデルを作成する処理（Ｓ１５）が実行される。 The information processing system 1 executes processing (S11, S13) of creating learning data 251 using the work moving image 151 and the process information 152. FIG. In the machine learning server 20, the learning data 251 is used to cause the learning device 221 to learn, and a process (S15) of creating a trained model is executed.

上記構成により、作業動画１５１を機械学習のデータとして使用し、工程を推定し、または工程表を作成する学習モデルを作成することが可能となる。 With the above configuration, it is possible to create a learning model for estimating a process or creating a process chart by using the work video 151 as data for machine learning.

学習済みの学習器２２１は、複数の工程で構成される作業を記録した作業動画１５３を読み込む処理（Ｓ２１）と、作業を構成する複数の工程それぞれの名称を示す名称セット、及び、作業の各工程にかかる時間を示す工程時間セットを推定する推定処理（Ｓ２３）とを実行する。 The learned learning device 221 performs a process (S21) of reading a work video 153 recording a work composed of a plurality of processes, a name set indicating the names of each of the processes that constitute the work, and each of the work An estimation process (S23) for estimating a process time set indicating the time required for the process is executed.

このような処理を実行することにより、説明者が口頭で説明を行わなくとも工程名称と時間とを取得することができる。作業の撮影以外に人間の手を介さず、工程に関するデータを取得することができる。 By executing such processing, the process name and time can be acquired without the explainer's oral explanation. Process data can be obtained without human intervention other than photographing the work.

学習器２２１は、工程の各名称と、工程時間とを関連付けして示す工程表を作成する（Ｓ２５）。 The learning device 221 creates a process chart in which each name of the process is associated with the process time (S25).

上記構成では、人間の手を介さずに簡易に工程表を作成することが可能となり、各工程にかかる時間や作業内容を適切な形式で表現できる。 With the above configuration, it is possible to easily create a process chart without human intervention, and to express the time required for each process and the details of the work in an appropriate format.

推定処理においては、作業動画１５３がエッジ処理される（Ｓ２１）。また、学習器２２１は、画像の特徴量を取得し、工程名称及び工程時間の少なくとも１つを推定する（Ｓ２３）。 In the estimation process, the working moving image 153 is edge-processed (S21). Also, the learning device 221 acquires the feature amount of the image and estimates at least one of the process name and the process time (S23).

上記構成のようにエッジ処理を行うことによって、作業者の輪郭または輪郭の変化を学習器２２１が把握し、正確に特徴量を取得することができる。そのため各工程の作業内容、または各工程にかかる時間の推定が容易となる。 By performing edge processing as described above, the learning device 221 can grasp the contour of the operator or changes in the contour, and can accurately acquire the feature amount. Therefore, it becomes easy to estimate the work content of each process or the time required for each process.

情報処理システム１
サーバ１０
機械学習サーバ２０
ユーザ端末３０
撮影装置４０ Information processing system 1
server 10
machine learning server 20
User terminal 30
imaging device 40

Claims

A method performed in a system comprising one or more computing devices, comprising:
a process of reproducing a first moving image recording a first work composed of a plurality of steps;
a process of causing an explainer to verbally explain the first work displayed in the first moving image and acquiring the voice of the explainer;
and a measurement process of analyzing the audio and measuring a first process time set indicative of the time taken for each of the plurality of processes.

In the measurement process,
2. The method of claim 1, wherein the duration of each of the plurality of steps is measured based on the duration of the speech.

3. The method of claim 1 or 2, further comprising parsing the speech to obtain a first set of names indicating names of each of the plurality of steps as characters.

4. The method according to claim 3, further comprising creating a process chart showing each name in said first name set and each time in said first process time set in association with each other.

A process of creating learning data using the first moving image, the first name set, and the first process time set;
a process of making the arithmetic device learn using the learning data to create a trained model;
5. The method of claim 3 or 4, further comprising:

The trained model is
a process of reading a second video recording a second work composed of a plurality of steps;
an estimation process of estimating a second name set indicating the names of each of the plurality of processes constituting the second work, and a second process time set indicating the time required for each process of the second work;
6. The method of claim 5, performing

The trained model is
7. The method according to claim 6, further comprising creating a process chart showing each name of said second name set and each time of said second process time set in association with each other.

The trained model, in the estimation process,
a process of acquiring an image obtained by performing edge processing on the second moving image;
a process of acquiring a feature amount of the image;
estimating at least one of the second name set and the second process time set based on the image;
8. A method according to claim 6 or 7, wherein performing

A system comprising one or more computing devices,
a process of reproducing a first moving image recording a first work composed of a plurality of steps;
a process of causing an explainer to verbally explain the first work displayed in the first moving image and acquiring the voice of the explainer;
and a measurement process of analyzing the voice and measuring a first process time set indicating the time required for each of the plurality of processes.