WO2024180682A1 - 機械学習プログラム、方法、及び装置 - Google Patents
機械学習プログラム、方法、及び装置 Download PDFInfo
- Publication number
- WO2024180682A1 WO2024180682A1 PCT/JP2023/007396 JP2023007396W WO2024180682A1 WO 2024180682 A1 WO2024180682 A1 WO 2024180682A1 JP 2023007396 W JP2023007396 W JP 2023007396W WO 2024180682 A1 WO2024180682 A1 WO 2024180682A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- label
- frame
- machine learning
- learning model
- assigned
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
Definitions
- the disclosed technology relates to a machine learning program, a machine learning method, and a machine learning device.
- the movements of people in video are estimated using machine learning models.
- videos with correct labels indicating the type (class) of movement are used as training data.
- the ideal case for training data is one in which correct labels are assigned to each frame (hereinafter referred to as "full annotation").
- full annotation The ideal case for training data.
- assigning correct labels to each frame requires huge work costs.
- the second is that the temporal boundaries at which the types of movements change may become unclear, and different annotators may assign various labels to frames near the boundaries. In this case, there is a possibility that the data may be biased.
- timestamp annotation In response to this, a method called timestamp annotation has been proposed, in which instead of labeling all frames, a label is assigned to one of the multiple frames included in a section showing one action.
- This method reduces the work cost of labeling compared to full annotation. Furthermore, this method also reduces label inconsistencies at temporal boundaries, as annotators can select reliable timestamps for labeling.
- the disclosed technology aims to improve the accuracy of machine learning models for estimating the movements of people in video footage without full annotation.
- the disclosed technology uses a video including a plurality of frames, in which a label indicating a type of a person's movement is assigned to a representative frame included in each section divided according to the type of the person's movement in the video.
- the disclosed technology generates a combined label by combining the first label and the second label for each frame in the video between a first representative frame assigned with a first label and a second representative frame assigned with a second label.
- the disclosed technology trains the machine learning model to maximize the probability that the label of each of the frames estimated by the machine learning model is the first label or the second label included in the combined label generated for each of the frames.
- the machine learning model estimates the label of each frame included in the input video.
- One aspect is that it has the effect of improving the accuracy of machine learning models for estimating the movements of people in video footage without performing full annotation.
- FIG. 1 is a functional block diagram of a machine learning device.
- FIG. 2 is a schematic diagram showing an example of a training video.
- FIG. 13 is a diagram for explaining generation of a combined label.
- FIG. 1 is a diagram for explaining training of a machine learning model using combined labels.
- FIG. 1 is a block diagram showing a schematic configuration of a computer that functions as a machine learning device.
- 1 is a flowchart illustrating an example of a machine learning process.
- 13 is a flowchart illustrating an example of an estimation process.
- FIG. 13 is a diagram for explaining a comparison of estimation results between this method and comparative method 1.
- FIG. 13 is a diagram for explaining a comparison of estimation results between this method and comparative method 2.
- FIG. 1 is a diagram for explaining an example of application of the machine learning device according to the present embodiment to a scoring system for gymnastics.
- training video is input to the machine learning device 10 according to this embodiment, and when estimating a movement, an estimation target video is input.
- FIG. 2 is a diagram showing an example of a training video.
- the top diagram in Figure 2 is a schematic diagram of some of the frames included in the video arranged in chronological order from left to right
- the middle diagram is a schematic diagram of the labels assigned through full annotation
- the bottom diagram is a schematic diagram of the labels assigned through timestamp annotation.
- the schematic diagrams of the labels in the middle and bottom diagrams indicate that the width shown in the leftmost part of the middle diagram corresponds to one frame, and the differences in the labels of each frame are indicated by different hatching.
- labels are assigned to all frames included in a video.
- frames to which the same labels ( c1 , c2 , c3 , and c4 in the example of Fig. 2) are assigned are represented by blocks.
- full annotation has problems in that the work cost of labeling is huge, and the time boundary at which the type of action switches (the dashed line part in the middle part of Fig. 2) becomes unclear, which may cause inconsistencies in the labels assigned by annotators.
- timestamp annotation a label is assigned to only one frame out of multiple frames included in a section showing one action. This reduces the work cost of labeling and eliminates label inconsistencies at time boundaries.
- pseudo labels the two-dot chain line in the lower diagram of Figure 2
- These pseudo labels are less reliable as they are correct because all labels that the machine learning label can output are candidates. Therefore, the estimation accuracy of the trained machine learning model is inferior to that of a machine learning model trained with fully annotated training videos.
- training a machine learning model using training videos labeled with timestamp annotation is referred to as "timestamp semi-supervised learning”.
- a machine learning model is trained by generating combined labels (described in detail below) that are more reliable than the pseudo labels generated during timestamp semi-supervised learning.
- the machine learning device 10 according to this embodiment is described in detail below.
- the machine learning device 10 functionally includes a machine learning unit 12 and an estimation unit 18.
- the machine learning unit 12 further includes a generation unit 14 and a training unit 16.
- a machine learning model 20 is stored in a specified storage area of the machine learning device 10.
- the machine learning model 20 is a model that estimates the label of each frame included in the input video, and is, for example, a model such as a deep neural network.
- the generation unit 14 acquires training video input to the machine learning device 10.
- the generation unit 14 generates a combined label that combines the first label and the second label for each frame between a first representative frame to which a first label has been assigned and a second representative frame to which a second label has been assigned in the acquired training video.
- the generation unit 14 assigns a first label to each frame from the first representative frame to the second representative frame up to the frame immediately preceding the second representative frame.
- the generation unit 14 also assigns a second label to each frame from the second representative frame to the first representative frame up to the frame immediately preceding the first representative frame.
- the generation unit 14 then generates a combined label by combining the multiple labels assigned to each frame.
- the representative frame is a frame to which a label has been assigned using a timestamp annotation.
- the generation unit 14 repeats assigning the label c1 to the next frame in chronological order from the frame to which the label c1 is assigned by the time stamp annotation, up to the frame immediately preceding the frame to which the label c2 is assigned. Also, as shown in B of Fig. 3, the generation unit 14 repeats assigning the label c1 to the previous frame in reverse chronological order from the frame to which the label c1 is assigned, up to the first frame. As a result, as shown in D of Fig. 3, the label c1 is assigned to each frame from the first frame to the frame immediately preceding the frame to which the label c2 is assigned.
- the generation unit 14 repeats assigning the label c2 to the next frame in chronological order from the frame to which the label c2 has been assigned, up to the frame immediately preceding the frame to which the label c3 (not shown) has been assigned. Also, as shown in F of Fig. 3, the generation unit 14 repeats assigning the label c2 to the previous frame in reverse chronological order from the frame to which the label c2 has been assigned, up to the frame immediately following the frame to which the label c1 has been assigned. As a result, as shown in G of Fig. 3, the label c2 is assigned to each frame from the frame immediately following the frame to which the label c1 has been assigned to the frame immediately preceding the frame to which the label c3 has been assigned.
- the generation unit 14 executes the above process for all frames to which labels are added using time stamp annotations, i.e., all representative frames. Then, for the frame shown in FIG. 3H, for example, the generation unit 14 generates a combined label c1 ⁇ c2 by combining the assigned labels c1 and c2 .
- the training unit 16 trains the machine learning model 20 to maximize the probability that the label of each frame is the first label or the second label included in the combined label generated for that frame.
- the machine learning model 20 estimates the probability that the label of each frame is each of multiple labels indicating the type of action, with a value between 0 and 1.
- the training unit 16 trains the machine learning model 20 to minimize a loss function that becomes smaller as the sum of the probability that the label of the frame for which the combined label was generated is the first label and the probability that it is the second label approaches 1.
- the training unit 16 defines a loss function L au for minimizing the difference between the probability of the combined label based on the probability p(y i,f ) estimated by the machine learning model 20 and the true probability of the combined label, for example, by using the mean square error, as shown in the following equation (2).
- N C pos is the number of labels c i included in the combined label
- the numerator in the parentheses on the right side of equation (2) represents the sum of the probabilities p(y i,f ) estimated by the machine learning model 20 for the labels c i included in the combined label. Since the denominator in the parentheses on the right side of equation (2) is 1, the closer the numerator is to 1, the smaller the loss function L au is.
- a case where the machine learning model 20 is trained using a training video including frames to which labels c 1 , c 2 , c 3 , and c 4 are assigned as representative frames will be described.
- a case where timestamp semi-supervised learning is performed using this training video will be described.
- a frame that is not a representative frame such as the frames shown by K and M in FIG.
- a loss function is used in which the sum of the probabilities of the labels included in the combined label approaches 1 and the sum of the probabilities of the labels not included in the combined label approaches 0.
- the training unit 16 stores the trained machine learning model 20 in a specified storage area of the machine learning device 10.
- the estimation unit 18 acquires an estimation target video input to the machine learning device 10.
- the estimation unit 18 inputs the estimation target video to a trained machine learning model 20, and estimates an action indicated by each frame included in the estimation target video.
- the estimation unit 18 estimates the action indicated by the label c i that maximizes p(c i,f ) as the action of frame f, based on the output Y[i,f] of the machine learning model, and outputs it as the estimation result.
- the machine learning device 10 may be realized, for example, by a computer 40 shown in FIG. 5.
- the computer 40 includes a CPU (Central Processing Unit) 41, a GPU (Graphics Processing Unit) 42, a memory 43 as a temporary storage area, and a non-volatile storage device 44.
- the computer 40 also includes an input/output device 45 such as an input device and a display device, and an R/W (Read/Write) device 46 that controls the reading and writing of data from and to a storage medium 49.
- the computer 40 also includes a communication I/F (Interface) 47 that is connected to a network such as the Internet.
- the CPU 41, GPU 42, memory 43, storage device 44, input/output device 45, R/W device 46, and communication I/F 47 are connected to each other via a bus 48.
- the storage device 44 is, for example, a hard disk drive (HDD), a solid state drive (SSD), flash memory, etc.
- the storage device 44 which serves as a storage medium, stores a machine learning program 50 for causing the computer 40 to function as the machine learning device 10.
- the machine learning program 50 has generation process control instructions 54, training process control instructions 56, and estimation process control instructions 58.
- the storage device 44 also has an information storage area 60 in which information constituting the machine learning model 20 is stored.
- the CPU 41 reads the machine learning program 50 from the storage device 44, expands it in the memory 43, and sequentially executes the control instructions of the machine learning program 50.
- the CPU 41 operates as the generation unit 14 shown in FIG. 1 by executing the generation process control instruction 54.
- the CPU 41 also operates as the training unit 16 shown in FIG. 1 by executing the training process control instruction 56.
- the CPU 41 also operates as the estimation unit 18 shown in FIG. 1 by executing the estimation process control instruction 58.
- the CPU 41 also reads information from the information storage area 60 and expands the machine learning model 20 in the memory 43.
- the computer 40 that has executed the machine learning program 50 functions as the machine learning device 10.
- the CPU 41 that executes the program is hardware. Also, part of the program may be executed by the GPU 42.
- the functions realized by the machine learning program 50 may be realized, for example, by a semiconductor integrated circuit, more specifically, an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), etc.
- ASIC Application Specific Integrated Circuit
- FPGA Field-Programmable Gate Array
- the machine learning device 10 executes the machine learning process shown in FIG. 6.
- the machine learning device 10 executes the estimation process shown in FIG. 7. Note that the machine learning process is an example of a machine learning method of the disclosed technology.
- step S10 the generation unit 14 acquires the training video input to the machine learning device 10.
- step S12 the generation unit 14 assigns the label of the representative frame assigned by the timestamp annotation to each frame up to the frame immediately preceding the adjacent representative frame in chronological order.
- the generation unit 14 also assigns the label of the representative frame assigned by the timestamp annotation to each frame up to the frame immediately following the adjacent representative frame in reverse chronological order.
- the generation unit 14 generates a combined label for each frame by combining the multiple labels assigned to that frame.
- step S14 the training unit 16 trains the machine learning model 20 to maximize the probability that the label of each frame is the first label or the second label included in the combined label generated for that frame.
- the training unit 16 then stores the trained machine learning model 20 in a specified storage area of the machine learning device 10, and ends the machine learning process.
- step S20 the estimation unit 18 acquires the estimation target video input to the machine learning device 10.
- step S22 the estimation unit 18 inputs the estimation target video to the trained machine learning model 20, estimates the actions indicated by each frame included in the estimation target video, and outputs the estimation result, whereupon the estimation process ends.
- the machine learning device uses, as training video, video including a plurality of frames in which a label indicating the type of movement is assigned to a representative frame included in each section divided according to the type of movement of a person in the video.
- the machine learning device generates a combined label combining the first label and the second label for each frame in the training video between a first representative frame assigned with a first label and a second representative frame assigned with a second label.
- the machine learning device then trains the machine learning model to maximize the probability that the label of each frame estimated by the machine learning model is the first label or the second label included in the combined label generated for each frame. This makes it possible to improve the accuracy of the machine learning model for estimating the movement of a person in a video without performing full annotation.
- Figure 8 shows the comparison results between the correct labels, the labels estimated by comparison method 1, and the labels estimated by the method of this embodiment (hereinafter referred to as "this method") for each of videos 1 to 3.
- this method similar to Figures 2 to 4 described above, differences in labels are represented by different hatching. The same applies to Figure 9 described below.
- Comparison method 1 is a method of training a machine learning model using training videos that have been labeled by full annotation. The estimation results of this method are very close to the correct answer, and an estimation accuracy is obtained that can be said to be within an acceptable range for use in an application.
- Figure 9 also shows the results of comparing the correct labels, the labels estimated by comparison method 2, and the labels estimated by our method for each of videos 1 to 3.
- Comparison method 2 is timestamp semi-supervised learning. It can be seen that our method has improved estimation accuracy compared to comparison method 2, particularly in the areas surrounded by the thick line frames in Figure 9.
- the probability that the label indicating the motion of each frame, which is the output of the machine learning model, is each of the multiple labels, i.e., Y[i, f], may be output as the estimation result.
- the machine learning unit and the estimation unit are configured in a single computer, but the machine learning unit and the estimation unit may be configured in separate computers.
- the above embodiment can also be applied to, for example, interactions between humans and robots.
- a robot captures human movements with a camera, and estimates the human movements from the captured video using a machine learning model trained as in the above embodiment.
- the robot is then controlled to assist the human's actions or imitate the human's actions according to the estimated movements.
- the above embodiment can also be applied to, for example, a scoring system for gymnastics.
- a scoring system for gymnastics For example, a scoring system for gymnastics.
- FIG. 10 An overview of an example of the processing of a scoring system for gymnastics will be described with reference to FIG. 10.
- the scoring system detects a person's area from each image included in the multi-viewpoint image.
- the scoring system tracks a person by matching areas showing the same person in the time-series multi-viewpoint images between multiple frames from a single viewpoint.
- the scoring system also determines whether the person shown in the detected area is an athlete or a non-athlete, identifies the area showing the athlete, and matches the tracked athlete between the multiple viewpoints, i.e., between the images.
- the scoring system recognizes the athlete's two-dimensional skeletal information from each of the tracked series of images using a recognition model or the like.
- the scoring system estimates three-dimensional skeletal information from the two-dimensional skeletal information using camera parameters.
- the scoring system then performs post-processing such as smoothing on the time-series three-dimensional skeletal information, estimates the phase (break) of the performance, and then recognizes the technique.
- post-processing such as smoothing on the time-series three-dimensional skeletal information
- estimates the phase (break) of the performance and then recognizes the technique.
- a machine learning model trained by the machine learning device according to the above embodiment can be applied to this technique recognition.
- the application of the disclosed technology is not limited to the above-mentioned human-robot interaction, gymnastics scoring systems, etc., but can be used as a general motion recognition application.
- the machine learning program is pre-stored (installed) in the storage device, but this is not limited to the above.
- the program according to the disclosed technology may be provided in a form stored in a storage medium such as a CD-ROM, DVD-ROM, or USB memory.
- Machine learning program Reference Signs List 10 Machine learning device 12 Machine learning unit 14 Generation unit 16 Training unit 18 Estimation unit 20 Machine learning model 30 Estimation unit 40 Computer 41 CPU 42 GPUs 43 Memory 44 Storage device 45 Input/output device 46 R/W device 47 Communication I/F 48 Bus 49 Storage medium 50 Machine learning program 54 Generation process control instructions 56 Training process control instructions 58 Estimation process control instructions 60 Information storage area
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2023/007396 WO2024180682A1 (ja) | 2023-02-28 | 2023-02-28 | 機械学習プログラム、方法、及び装置 |
| JP2025503311A JPWO2024180682A1 (https=) | 2023-02-28 | 2023-02-28 | |
| US19/279,194 US20250348793A1 (en) | 2023-02-28 | 2025-07-24 | Machine learning program, method, and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2023/007396 WO2024180682A1 (ja) | 2023-02-28 | 2023-02-28 | 機械学習プログラム、方法、及び装置 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US19/279,194 Continuation US20250348793A1 (en) | 2023-02-28 | 2025-07-24 | Machine learning program, method, and device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024180682A1 true WO2024180682A1 (ja) | 2024-09-06 |
Family
ID=92589475
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2023/007396 Ceased WO2024180682A1 (ja) | 2023-02-28 | 2023-02-28 | 機械学習プログラム、方法、及び装置 |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20250348793A1 (https=) |
| JP (1) | JPWO2024180682A1 (https=) |
| WO (1) | WO2024180682A1 (https=) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2026083587A1 (ja) * | 2024-10-18 | 2026-04-23 | 富士通株式会社 | 周期作業認識プログラム、方法、及び装置 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2019078857A (ja) * | 2017-10-24 | 2019-05-23 | 国立研究開発法人情報通信研究機構 | 音響モデルの学習方法及びコンピュータプログラム |
| JP2022190920A (ja) * | 2021-06-15 | 2022-12-27 | キヤノン株式会社 | 情報処理装置、クラス判定方法、プログラム |
-
2023
- 2023-02-28 WO PCT/JP2023/007396 patent/WO2024180682A1/ja not_active Ceased
- 2023-02-28 JP JP2025503311A patent/JPWO2024180682A1/ja active Pending
-
2025
- 2025-07-24 US US19/279,194 patent/US20250348793A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2019078857A (ja) * | 2017-10-24 | 2019-05-23 | 国立研究開発法人情報通信研究機構 | 音響モデルの学習方法及びコンピュータプログラム |
| JP2022190920A (ja) * | 2021-06-15 | 2022-12-27 | キヤノン株式会社 | 情報処理装置、クラス判定方法、プログラム |
Non-Patent Citations (1)
| Title |
|---|
| DAVIDE MOLTISANTI; SANJA FIDLER; DIMA DAMEN: "Action Recognition from Single Timestamp Supervision in Untrimmed Videos", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 9 April 2019 (2019-04-09), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081167088 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2026083587A1 (ja) * | 2024-10-18 | 2026-04-23 | 富士通株式会社 | 周期作業認識プログラム、方法、及び装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20250348793A1 (en) | 2025-11-13 |
| JPWO2024180682A1 (https=) | 2024-09-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Zhang et al. | Tokenhpe: Learning orientation tokens for efficient head pose estimation via transformers | |
| CN110781843B (zh) | 课堂行为检测方法及电子设备 | |
| Villegas et al. | Learning to generate long-term future via hierarchical prediction | |
| CN117425916A (zh) | 遮挡感知的多对象跟踪 | |
| CN102227750B (zh) | 移动体检测装置及移动体检测方法 | |
| JP2021190128A (ja) | 全身ポーズを生成するためのシステム | |
| Fu et al. | Moflow: One-step flow matching for human trajectory forecasting via implicit maximum likelihood estimation based distillation | |
| Dundar et al. | Unsupervised disentanglement of pose, appearance and background from images and videos | |
| CN113920170A (zh) | 结合场景上下文和行人社会关系的行人轨迹预测方法、系统及存储介质 | |
| Zhang et al. | Sequential 3D Human Pose Estimation Using Adaptive Point Cloud Sampling Strategy. | |
| CN114067371B (zh) | 一种跨模态行人轨迹生成式预测框架、方法和装置 | |
| JPWO2019111932A1 (ja) | モデル学習装置、モデル学習方法及びコンピュータプログラム | |
| CN119317972A (zh) | 使用工具跟踪的基于视频的外科技能评估 | |
| US20250348793A1 (en) | Machine learning program, method, and device | |
| WO2022024294A1 (ja) | 行動特定装置、行動特定方法及び行動特定プログラム | |
| KR20240018161A (ko) | 데이터 증강 기법과 대조 학습을 이용한 골격 그래프 기반의 행동 인식 시스템 및 방법 | |
| CN120839772A (zh) | 基于具身流表示的无动作标注机器人操作轨迹预测方法及装置 | |
| JP2023553630A (ja) | キーポイントベースの行動位置特定 | |
| CN119964205A (zh) | 一种基于隐编码神经网络表示的动物姿态估计方法及系统 | |
| Kourbane et al. | A hybrid classification-regression approach for 3D hand pose estimation using graph convolutional networks | |
| Zordan et al. | Interactive dynamic response for games | |
| US11620479B2 (en) | System for determining diverting availability of object recognition model | |
| JP6714058B2 (ja) | 動きを予測する方法、装置およびプログラム | |
| Rajendran et al. | Virtual character animation based on data-driven motion capture using deep learning technique | |
| EP4560572A1 (en) | Task cycle inference device, task cycle inference method, and task cycle inference program |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23925239 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2025503311 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2025503311 Country of ref document: JP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 23925239 Country of ref document: EP Kind code of ref document: A1 |