WO2024142193A1 - 情報処理装置、情報処理方法及び情報処理プログラム - Google Patents

情報処理装置、情報処理方法及び情報処理プログラム Download PDF

Info

Publication number
WO2024142193A1
WO2024142193A1 PCT/JP2022/048071 JP2022048071W WO2024142193A1 WO 2024142193 A1 WO2024142193 A1 WO 2024142193A1 JP 2022048071 W JP2022048071 W JP 2022048071W WO 2024142193 A1 WO2024142193 A1 WO 2024142193A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
work
worker
similarity
work object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2022/048071
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
雅人 佐羽内
尚吾 清水
勝大 草野
孝之 小平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Priority to JP2024552757A priority Critical patent/JP7599625B2/ja
Priority to PCT/JP2022/048071 priority patent/WO2024142193A1/ja
Publication of WO2024142193A1 publication Critical patent/WO2024142193A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Definitions

  • An operation process is an element that constitutes a work procedure.
  • a single action procedure is constituted by combining multiple operation processes.
  • An operation process is also referred to as a work element below.
  • the operation mode (hereinafter, sometimes simply referred to as the mode) according to the characteristics of the worker and the characteristics of the work object.
  • the mode it is necessary to select an appropriate estimation model according to the characteristics of the worker and/or the characteristics of the work object from among a plurality of estimation models for estimating the work process being performed.
  • Such switching of operation modes is performed manually, but this is a laborious task, and there is a demand for automating the switching.
  • FIG. 13 is a diagram showing a configuration example of a task element specifying device according to a third modification of the first embodiment.
  • FIG. 13 is a diagram showing an example of the configuration of a task element specifying device according to a second embodiment.
  • 13 is a flowchart showing pre-operation processing according to the second embodiment.
  • FIG. 13 is a diagram showing an example of a process for generating a work element table according to the second embodiment.
  • FIG. 13 is a diagram showing an example of a work element table according to the second embodiment.
  • FIG. 13 is a diagram showing a configuration example of a task element specifying device according to a third modification of the first embodiment.
  • FIG. 13 is a diagram showing an example of the configuration of a task element specifying device according to a second embodiment.
  • 13 is a flowchart showing pre-operation processing according to
  • the task element specifying device 10 is a computer.
  • the task element identification device 10 includes, as hardware, a processor 11, a memory 12, a storage 13, and a communication interface 14.
  • the processor 11 is connected to other hardware via signal lines and controls the other hardware.
  • the processor 11 is an integrated circuit (IC) that performs processing.
  • the processor 11 is, for example, a central processing unit (CPU), a digital signal processor (DSP), or a graphics processing unit (GPU).
  • the memory 12 is a storage device that temporarily stores data, and is, for example, a static random access memory (SRAM) or a dynamic random access memory (DRAM).
  • the storage 13 is a storage device that stores data.
  • the storage 13 is, for example, a hard disk drive (HDD).
  • HDD hard disk drive
  • the storage 13 may also be a portable recording medium such as a Secure Digital (registered trademark) memory card, a CompactFlash (registered trademark) memory card, a NAND flash memory, a flexible disk, an optical disk, a compact disk, a Blu-ray (registered trademark) disk, or a Digital Versatile Disk (DVD).
  • the communication interface 14 is an interface for communicating with an external device, and is, for example, an Ethernet (registered trademark), USB (Universal Serial Bus), or HDMI (registered trademark, High-Definition Multimedia Interface) port.
  • the task element identifying device 10 includes, as functional components, an image acquisition unit 21, a skeleton detection unit 22, a product detection unit 23, a movement similarity calculation unit 24, a product similarity calculation unit 25, a mode selection unit 26, and a task element identifying unit 27.
  • the functions of the functional components of the task element identifying device 10 are realized by software.
  • the storage 13 stores a program that realizes each functional component of the work element identification device 10. This program is loaded into the memory 12 by the processor 11 and executed by the processor 11. In this way, each functional component of the work element identification device 10 is realized.
  • the storage 13 also stores a learning video 31 , a true value 32 of the learning video, and a task element identification model 33 .
  • FIG. 1 there is one processor 11. However, there may be multiple processors 11, and the multiple processors 11 may work together to execute programs that realize each functional component.
  • the learning video 31 is a video showing one or more workers performing assembly work and a product that is the work target. In this embodiment, there are a plurality of learning videos 31. Each learning video 31 corresponds to one of a plurality of work procedures.
  • the true value 32 of the learning video is the label information of the work element being performed in each frame of the learning video 31.
  • the task element identification model 33 is a model generated for each learning video 31 .
  • Each work element identification model 33 corresponds to one of the plurality of learning videos 31. That is, each work element identification model 33 corresponds to one of the plurality of work procedures.
  • the work element identification model 33 is an example of an estimation model.
  • the worker's handedness (right-handed, left-handed) is assumed as the worker's characteristic
  • the product model (model i, model ii) is assumed as the work object's characteristic.
  • learning video 31 there is also a learning video 31 in which a left-handed worker works on a product of model i (hereinafter referred to as “left-handed & model i learning video"). There is also a learning video 31 in which a left-handed worker works on a product of model ii (hereinafter referred to as “left-handed & model ii learning video”).
  • the task element identification model 33 is also the same as the learning video 31. In other words, there are four task element identification models 33 that cover all patterns of worker handedness and product models. Specifically, in this embodiment, there is a task element identification model 33 that is applied when a right-handed worker works on a product of model i (hereinafter referred to as the "right-handed & model i model"). There is also a task element identification model 33 that is applied when a right-handed worker works on a product of model ii (hereinafter referred to as the "right-handed & model ii model").
  • task element identification model 33 that is applied when a left-handed worker works on a product of model i (hereinafter referred to as the "left-handed & model i model”).
  • task element identification model 33 that is applied when a left-handed worker works on a product of model ii (hereinafter referred to as the "left-handed & model ii model”).
  • the learning video true value 32 indicates the name of the task element corresponding to each frame of the learning video 31 .
  • the work procedure corresponding to the learning video 31 includes work elements a, b, and c.
  • the true value 32 of the learning video indicates, for example, that frames 1 to 10 are frames of work element a, frames 11 to 20 are frames of work element b, and frames 21 to 30 are frames of work element c.
  • the video acquisition unit 21 acquires a video via the communication interface 14 .
  • the image acquired by the image acquisition unit 21 via the communication interface 14 will be referred to as an input image 46.
  • the input image 46 is an image of a worker performing work on a product, which is a work target.
  • the skeleton detection unit 22 acquires skeleton information, which is motion information of the worker, from the learning video 31 and the input video 46 .
  • the skeleton detection unit 22 corresponds to an acquisition unit.
  • the process performed by the skeleton detection unit 22 corresponds to an acquisition process.
  • the skeleton information acquired by the skeleton detection unit 22 from the input video 46 corresponds to worker motion information.
  • the product detection unit 23 acquires product images from the learning video 31 and the input video 46 .
  • the product detection unit 23 also corresponds to an acquisition unit.
  • the process performed by the product detection unit 23 also corresponds to an acquisition process.
  • the product image acquired by the product detection unit 23 from the input video 46 corresponds to a work object image.
  • the movement similarity calculation unit 24 calculates the similarity between the skeleton information acquired from the learning video 31 and the skeleton image acquired from the input video 46 .
  • the movement similarity calculation unit 24 corresponds to a similarity calculation unit.
  • the product similarity calculation unit 25 calculates the similarity between the product image acquired from the training video 31 and the product image acquired from the input video 46 .
  • the product similarity calculation unit 25 also corresponds to a similarity calculation unit.
  • the mode selection unit 26 uses the movement similarity calculated by the movement similarity calculation unit 24 and the product similarity calculated by the product similarity calculation unit 25 to select a work element identification model 33 to be used to identify a work element from among the multiple work element identification models 33.
  • the mode selection unit 26 corresponds to a model selection unit.
  • the process performed by the mode selection unit 26 corresponds to a model selection process.
  • the task element specification model 33 selected by the mode selection unit 26 corresponds to a utilization estimation model.
  • the work element identification unit 27 uses the work element identification model 33 selected by the mode selection unit 26 to identify the work element being performed by the worker.
  • the task element identifying device 10 performs pre-operation processing and operation processing.
  • the pre-operation processing the task element identification device 10 uses the learning video 31 to prepare for performing the operation processing.
  • the operation process the task element identification device 10 analyzes the input video 46 and selects the task element identification model 33 to be used for identifying the task element.
  • the skeleton detection unit 22 determines one task identification target from one or more workers 41 shown in the learning video 31. Then, the skeleton detection unit 22 detects skeletal information 42 of the task identification target. Furthermore, the skeleton detection unit 22 assigns index information of the task element identification model 33 to the skeletal information 42 of the task identification target in association with the learning video 31 from which the skeletal information 42 was detected. Then, the skeleton detection unit 22 writes the skeletal information 42 together with the index information to the memory 12.
  • Step S12 Product detection process
  • the product detection unit 23 reads all the learning videos 31 from the storage 13 . 4, the product detection unit 23 detects a product 44 to be assembled from a representative frame 43 of each learning video 31, and cuts out a product image 45 showing the product 44 to be assembled.
  • the product detection unit 23 assigns index information of the work element identification model 33 to the cut-out product image 45 in association with the learning video 31 from which the product image 45 was cut out. Then, the product detection unit 23 writes the product image 45 together with the index information into the memory 12.
  • the product detection unit 23 selects the representative frame 43 from the learning video 31 at regular time intervals, and a method in which the true value 32 of the learning video is used.
  • the product detection unit 23 reads the true value 32 of the learning video from the memory 12, and identifies the frame where each work element starts and the frame where each work element ends from the true value 32 of the learning video. Then, the product detection unit 23 selects a central frame or the like from between the frames where each work element starts and ends as a representative frame 43. There may be multiple frames where one work element starts and ends. That is, there may be cases where one work element is performed multiple times in the learning video 31.
  • the product detection unit 23 may prepare a representative frame 43 for each time. Also, the product detection unit 23 may select multiple representative frames 43 from between the frames where each work element starts and ends.
  • Methods for cutting out the product image 45 include a method using deep learning, a method of cutting out the area around the coordinates of the wrist of the specific task target person detected in step S11, a method of cutting out a predetermined range, and the like.
  • Step S21 Video acquisition process
  • the video acquisition unit 21 acquires the input video 46 via the communication interface 14.
  • the video acquisition unit 21 writes the input video 46 into the memory 12.
  • the input image 46 is an image of an arbitrary time that shows one or more workers 41 performing assembly work and a product 44 to be assembled.
  • Step S22 End determination process
  • the video acquisition unit 21 determines whether or not the input video 46 has been acquired. If the input video 46 cannot be acquired, the operation process ends.
  • Step S23 Skeleton detection process
  • the skeleton detection unit 22 reads the input video 46 from the memory 12 .
  • the skeleton detection unit 22 detects skeleton information 42 of one worker 41 from an input video 46.
  • Fig. 6 shows skeleton information 42 for one joint.
  • the skeleton detection unit 22 detects skeleton information 42 for a plurality of joints.
  • the skeleton detection unit 22 determines one specific task target person from one or more workers 41 shown in the input video 46. Then, the skeleton detection unit 22 detects skeletal information 42 of the specific task target person, and further, the skeleton detection unit 22 writes the skeletal information 42 of the specific task target person to the memory 12.
  • the method of detecting the skeletal information 42 may be a method using deep learning, etc.
  • the method of determining the specific task target person may be a method of selecting a person whose Euclidean distance between the detected coordinates of the right shoulder and the left shoulder is the largest, etc.
  • the mode selection unit 26 selects, on a work element basis, the skeletal information 42 of the learning video 31 detected in step S11 that has the greatest similarity to the skeletal information 42 of the input video 46 detected in step S23. In other words, the mode selection unit 26 selects, from the work elements of the multiple pieces of skeletal information 42 of the learning video 31, the work element that has the greatest similarity. Then, the mode selection unit 26 specifies the task element specifying model 33 indicated by the index information assigned to the selected task element as a first candidate model.
  • the mode selection unit 26 selects, from among the product images 45 in the learning video 31 cut out in step S12, the product image 45 that has the highest similarity to the product image 45 in the input video 46 cut out in step S24. Then, the mode selection unit 26 specifies the operation element specifying model 33 indicated by the index information assigned to the selected product image 45 as a second candidate model.
  • the mode selection unit 26 compares a first worker characteristic (handedness) which is a worker characteristic corresponding to the first candidate model with a second worker characteristic (handedness) which is a worker characteristic corresponding to the second candidate model.
  • the mode selection unit 26 also compares a first work object characteristic (model) which is a work object characteristic corresponding to the first candidate model with a second work object characteristic (model) which is a work object characteristic corresponding to the second candidate model. Then, when the first worker characteristic matches the second worker characteristic and the first work object characteristic does not match the second work object characteristic, the mode selection unit 26 selects one of the candidate models by prioritizing the calculation result of the similarity of the skeleton information 42.
  • the mode selection unit 26 normalizes the similarity of the skeleton information 42 and the similarity of the product image 45 to a score between 0 and 1, and selects one of the candidate models by prioritizing the calculation result with the larger total score. In other words, the mode selection unit 26 selects, from the first candidate model and the second candidate model, the candidate model with the larger total score obtained by normalizing the similarity of the skeleton information 42 and the similarity of the product image 45.
  • the mode selection unit 26 selects a candidate model with a high degree of similarity in the skeletal information 42. Furthermore, when the first worker characteristic and the second worker characteristic do not match, such as "right-handed & machine type i model" and “left-handed & machine type ii model,” and further when the first work object characteristic and the second work object characteristic do not match, the mode selection unit 26 normalizes the similarity for the skeleton information 42 and the similarity for the product image 45 to scores between 0 and 1. Then, the mode selection unit 26 selects a candidate model with a large total value of the score for the skeleton information 42 and the score for the product image 45.
  • Step S28 Work element identification process
  • the work element identification unit 27 uses the selected work element identification model 33 to identify the work element being performed by the worker 41, and outputs the name of the work element.
  • the work element identification device 10 also calculates the similarity of the product image 45 between the input video 46 and the learning video 31, and determines the work element identification mode based on the calculated similarity.
  • the similarity of the product image 45 is high if the products 44 are of the same model, and low if the products are of different models. Therefore, according to this embodiment, it is possible to select an appropriate work element identification model 33 depending on the model of the product 44.
  • the task element identifying device 10 is a single device as shown in Fig. 1.
  • the task element identifying device 10 may be configured by a plurality of devices.
  • the task element identifying device 10 may be composed of two devices, a detection device 110 and an identifying device 120.
  • the detection device 110 acquires skeleton information 42 and a product image 45 from a learning video 31.
  • the identifying device 120 identifies a task element from an input video 46.
  • the learning video 31, the true value 32 of the learning video, and the work element identification model 33 may be stored in storage provided outside the detection device 110 and the identification device 120.
  • the learning video 31, the true value 32 of the learning video, and the work element identification model 33 may be stored in storage of either the detection device 110 or the identification device 120.
  • the hardware of the detection device 110 and the identification device 120 is omitted. Similar to the work element identifying device 10, the detecting device 110 and the identifying device 120 each include a processor, a memory, a storage device, and a communication interface as hardware.
  • each functional component of the task element identifying device 10 is realized by software.
  • each functional component of the task element identifying device 10 may be realized by hardware.
  • the third modification will be described below with respect to the differences from the first embodiment.
  • FIG. 14 a configuration example of a work element identifying device 10 according to the third modification will be described.
  • the task element identification device 10 includes an electronic circuit 15 instead of the processor 11, the memory 12, and the storage 13.
  • the electronic circuit 15 is a dedicated circuit for realizing the functions of each functional component, the memory 12, and the storage 13.
  • the search map generating unit 28 treats work elements whose skeleton information 42 has a similarity equal to or greater than a threshold as the same work element, and generates a work element table 47 as shown in FIG. 18, the work element numbers indicate the order of the work elements.
  • learning video A corresponds to work procedure A.
  • work procedure A the work elements are performed in the order of work elements a, c, d, f, and h. It should be noted that work element numbers 1 to 5 in FIG. 18 constitute one cycle of the work procedure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Factory Administration (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Analysis (AREA)
PCT/JP2022/048071 2022-12-27 2022-12-27 情報処理装置、情報処理方法及び情報処理プログラム Ceased WO2024142193A1 (ja)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2024552757A JP7599625B2 (ja) 2022-12-27 2022-12-27 情報処理装置、情報処理方法及び情報処理プログラム
PCT/JP2022/048071 WO2024142193A1 (ja) 2022-12-27 2022-12-27 情報処理装置、情報処理方法及び情報処理プログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/048071 WO2024142193A1 (ja) 2022-12-27 2022-12-27 情報処理装置、情報処理方法及び情報処理プログラム

Publications (1)

Publication Number Publication Date
WO2024142193A1 true WO2024142193A1 (ja) 2024-07-04

Family

ID=91717000

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/048071 Ceased WO2024142193A1 (ja) 2022-12-27 2022-12-27 情報処理装置、情報処理方法及び情報処理プログラム

Country Status (2)

Country Link
JP (1) JP7599625B2 (https=)
WO (1) WO2024142193A1 (https=)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016147010A (ja) * 2015-02-13 2016-08-18 日本電信電話株式会社 作業覚醒度推定装置、方法およびプログラム
JP2021157431A (ja) * 2020-03-26 2021-10-07 株式会社 情報システムエンジニアリング 情報処理装置及び情報処理方法
JP2022136068A (ja) * 2021-03-05 2022-09-15 株式会社 情報システムエンジニアリング 情報表示装置、情報表示システム、情報表示プログラム、学習方法及びデータ構造

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016147010A (ja) * 2015-02-13 2016-08-18 日本電信電話株式会社 作業覚醒度推定装置、方法およびプログラム
JP2021157431A (ja) * 2020-03-26 2021-10-07 株式会社 情報システムエンジニアリング 情報処理装置及び情報処理方法
JP2022136068A (ja) * 2021-03-05 2022-09-15 株式会社 情報システムエンジニアリング 情報表示装置、情報表示システム、情報表示プログラム、学習方法及びデータ構造

Also Published As

Publication number Publication date
JP7599625B2 (ja) 2024-12-13
JPWO2024142193A1 (https=) 2024-07-04

Similar Documents

Publication Publication Date Title
Zia et al. Video and accelerometer-based motion analysis for automated surgical skills assessment
US10108270B2 (en) Real-time 3D gesture recognition and tracking system for mobile devices
Liu et al. Improving generalization in visual reinforcement learning via conflict-aware gradient agreement augmentation
TWI776176B (zh) 手部作業動作評分裝置、方法及電腦可讀取存儲介質
CN115870980B (zh) 一种基于视觉的弹琴机器人控制方法及装置
JPWO2018207351A1 (ja) 距離画像処理装置、距離画像処理システム、距離画像処理方法および距離画像処理プログラム
CN110651298A (zh) 距离图像处理装置、距离图像处理系统、距离图像处理方法以及距离图像处理程序
US20210097692A1 (en) Data filtering of image stacks and video streams
US9946645B2 (en) Information processing apparatus and memory control method
Zhang et al. Target-distractor aware deep tracking with discriminative enhancement learning loss
JP2023161956A (ja) 物体追跡装置、物体追跡方法、及びプログラム
US20190026952A1 (en) Human feedback in 3d model fitting
Golovanov et al. Combining hand detection and gesture recognition algorithms for minimizing computational cost
JP2016162414A (ja) 画像処理装置
CN108052927B (zh) 基于视频数据的手势处理方法及装置、计算设备
JP7599625B2 (ja) 情報処理装置、情報処理方法及び情報処理プログラム
EP4050525A2 (en) Machine learning pipeline skeleton instantiation
CN117315767A (zh) 一种基于ai识别的动态手势行为识别方法及装置
JP6550442B2 (ja) 追跡装置及び追跡プログラム
WO2022003981A1 (ja) 行動特定装置、行動特定方法及び行動特定プログラム
JP7483179B1 (ja) 推定装置、学習装置、推定方法及び推定プログラム
JP7158534B1 (ja) 行動解析装置、行動解析方法及び行動解析プログラム
Chia et al. A parallel algorithm for generating chain code of objects in binary images
US12394249B2 (en) Action-model generation apparatus and action-model generation method
JP7831031B2 (ja) 作業認識装置、作業認識方法、及び作業認識プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22969987

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2024552757

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22969987

Country of ref document: EP

Kind code of ref document: A1