WO2024089772A1 - データ生成方法、データ生成プログラムおよびデータ生成装置 - Google Patents

データ生成方法、データ生成プログラムおよびデータ生成装置 Download PDF

Info

Publication number
WO2024089772A1
WO2024089772A1 PCT/JP2022/039766 JP2022039766W WO2024089772A1 WO 2024089772 A1 WO2024089772 A1 WO 2024089772A1 JP 2022039766 W JP2022039766 W JP 2022039766W WO 2024089772 A1 WO2024089772 A1 WO 2024089772A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
teacher data
inference
machine learning
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2022/039766
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
創輔 山尾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to PCT/JP2022/039766 priority Critical patent/WO2024089772A1/ja
Priority to JP2024552557A priority patent/JP7794329B2/ja
Priority to EP22963424.1A priority patent/EP4610896A4/en
Priority to CN202280099673.XA priority patent/CN119790410A/zh
Publication of WO2024089772A1 publication Critical patent/WO2024089772A1/ja
Priority to US19/054,422 priority patent/US20250191155A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/00Two-dimensional [2D] image generation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing
    • G06T2207/20044Skeletonization; Medial axis transform
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present invention relates to a data generation method, etc.
  • a technology has been established that uses image data of a person captured by a camera to detect skeletal information of the person.
  • Skeletal information is information that indicates the coordinates of each of a person's joints.
  • multiple pieces of training data are prepared, and supervised learning is performed on a machine learning model such as a deep learning network.
  • a machine learning model such as a deep learning network.
  • This data augmentation technique feeds back gradient information on the inference error of a machine learning model during training, and generates training data at each training stage that maximizes the performance of the machine learning model.
  • a 3D model of a person is projected onto a 2D plane to generate image data for training data.
  • the 3D model is rotated, translated, etc. based on the gradient information to generate image data that increases the inference error of the machine learning model.
  • a classifier is provided that excludes image data when the pose of the person in the image data is a pose that the person cannot assume. The process of generating image data from a 3D model using gradient information is called differentiable data augmentation.
  • the data augmentation techniques mentioned above include data augmentation techniques (1) to (3) as described below.
  • Neural Radiance Fields is used to generate image data that increases the inference error of a machine learning model (object detection model).
  • object detection model object detection model
  • NeRF Neural Radiance Fields
  • the bin selection probability is simultaneously trained in the direction that maximizes the inference error.
  • Data augmentation technology (2) is concerned with a machine learning model that converts 2D skeletal information into 3D skeletal information, and generates paired data of 2D skeletal information and 3D skeletal information that increases the inference error of the machine learning model.
  • data augmentation technology (2) uses a multi-layer perceptron to express learnable augmentation operations for existing 3D skeletal information.
  • the augmentation operations include perturbations of joint angles, perturbations of bone lengths, and perturbations of rotation and translation.
  • Data augmentation technology (2) learns the augmentation operations in a direction that maximizes the inference error while executing learning of the machine learning model.
  • data augmentation technique (3) new 3D skeletons are generated from existing 3D skeletons and added to increase the training data set.
  • data augmentation technique (3) executes a process to exchange partial skeletons of two pieces of 3D skeleton information and a process to perturb joint angles.
  • Training data in the field of gymnastics is limited to certain 3D skeletal information that a person may have when doing gymnastics. For this reason, when a machine learning model based on training data in the field of gymnastics is used in another field, the inference accuracy may decrease for skeletal information that is not included in the training data.
  • the present invention aims to provide a data generation method, a data generation program, and a data generation device that can generate new teacher data that deviates from the distribution of existing teacher data.
  • the computer executes the following process.
  • the computer obtains an inference result that is an inference result of skeletal information for each training data when multiple training data are input into a machine learning model, the inference result including an error for each part of the skeleton.
  • the computer identifies first training data from the multiple training data in which the error for a first part is larger than the error for the first part of the other training data.
  • the computer identifies second training data from the multiple training data in which the error for a second part is larger than the error for the second part of the other training data.
  • the computer generates third training data by replacing information about the second part included in the first training data with information about the second part included in the second training data.
  • FIG. 1 is a diagram showing an example of a human body model.
  • FIG. 2 is a diagram showing an example of joint names.
  • FIG. 3 is a diagram for explaining the process of the data generating device according to the present embodiment.
  • FIG. 4A is a diagram for explaining attributes.
  • FIG. 4B is a diagram showing an example of attributes and extended data.
  • FIG. 5 is a diagram for explaining a body part p.
  • FIG. 6 is a diagram for explaining the processing of the integration unit (1).
  • FIG. 7 is a diagram (2) for explaining the processing of the integration unit.
  • FIG. 8 is a diagram showing an example of extended data generated based on attributes and weak point attributes.
  • FIG. 9 is a functional block diagram showing the configuration of a data generating device according to the present embodiment.
  • FIG. 9 is a functional block diagram showing the configuration of a data generating device according to the present embodiment.
  • FIG. 10 is a flowchart illustrating a processing procedure of the data generating device according to the present embodiment.
  • FIG. 11 is a flowchart showing the processing procedure of the integration process.
  • FIG. 12 is a diagram for explaining the effect of the data generating device according to the present embodiment.
  • FIG. 13 is a diagram illustrating an example of a hardware configuration of a computer that realizes the same functions as the data generating device according to the embodiment.
  • FIG. 1 is a diagram showing an example of a human body model.
  • the human body model is defined by 21 joints ar0 to ar20.
  • two-dimensional or three-dimensional coordinates are set for each of the joints ar0 to ar20 defined in the human body model.
  • FIG. 2 is a diagram showing an example of joint names.
  • the joint name of joint ar0 is "SPINE_BASE.”
  • the joint names of joints ar1 to ar20 are as shown in FIG. 2, and explanations will be omitted.
  • FIG. 3 is a diagram for explaining the processing of the data generating device according to this embodiment.
  • the data generating device uses teacher data 30.
  • the teacher data 30 is existing data.
  • the teacher data includes image data and attributes of a person.
  • the attributes include skeletal information, camera parameters, and appearance.
  • the skeletal information is the coordinates of the joints described in FIG. 1, and indicates the coordinates of each joint of the person included in the image data.
  • the coordinates of each joint are two-dimensional or three-dimensional coordinates.
  • the camera parameters indicate the viewpoint position of the camera that captured the image data. Appearance is information about the facial features of the person included in the image data and the background of the person.
  • Fig. 4A is a diagram for explaining attributes.
  • attribute A1 of certain teacher data includes skeletal information a1-1, camera parameters a1-2, and appearances a1-3 and a1-4.
  • camera parameters a1-2 are shown as a conceptual illustration, but in reality, information on the viewpoint position of the camera that captured the image data is included.
  • Appearance a1-3 sets the person's appearance (physique, uniform color, etc.).
  • Appearance a1-4 sets the person's background information.
  • the data generation device inputs the teacher data 30 to the data extension unit 151.
  • a parameter ⁇ 1 is set in the data augmentation unit 151, and the data augmentation unit 151 augments the attributes of the teacher data 30 based on the parameter ⁇ 1 .
  • the data augmentation unit 151 outputs information on the augmented attributes to the image generation unit 152.
  • the image generation unit 152 which will be described later, generates augmented data 40 based on the attributes augmented by the parameter ⁇ 1 , and the augmented data 40 is input to the training object model 50, where an inference error is calculated.
  • the data augmentation unit 151 trains the parameter ⁇ 1 in a direction that increases the inference error when the augmented data 40 is input to the training object model 50, based on gradient information of the inference error fed back from the training object model 50.
  • the data expansion unit 151 changes the joint angles and bone lengths between joints of the skeletal information included in the attribute in a direction that increases the inference error based on the parameter ⁇ 1.
  • the data expansion unit 151 may also perform data expansion by making changes to the camera parameters and appearance in a direction that increases the inference error based on the parameter ⁇ 1 .
  • the data extension unit 151 When extending attributes, the data extension unit 151 guarantees the plausibility of the data. For example, the data extension unit 151 changes the joint angle within the operable range of the joint in the skeletal information. When changing the length of a bone, the data extension unit 151 changes the length of the bone within a predetermined range.
  • the image generation unit 152 generates extended data 40 based on the information of the attributes extended by the data extension unit 151 or the integration unit 153.
  • the image generation unit 152 is a differentiable image generator such as NeRF.
  • NeRF image generator
  • the image generation unit 152 generates a person model and a background model based on the skeletal information and appearance included in the attributes.
  • the image generation unit 152 generates image data (augmented data 40) from a viewpoint based on the camera parameters in the attribute information for a model that combines the person model and the background model.
  • the 4B is a diagram showing an example of attributes and extended data.
  • the attribute A1 includes skeletal information a1-1, camera parameters a1-2, and appearances a1-3 and a1-4.
  • the image generating unit 152 generates extended data 40-1 based on the attribute A1 .
  • the attribute A2 includes skeletal information a2-1, camera parameters a2-2, and appearances a2-3 and a2-4.
  • the image generating unit 152 generates extended data 40-2 based on the attribute A2 .
  • the attribute A3 includes skeletal information a3-1, camera parameters a3-2, and appearances a3-3 and a2-4.
  • the image generating unit 152 generates extended data 40-3 based on the attribute A3 .
  • the data generating device performs machine learning of the training object model 50 based on the extended data 40 and skeletal information (skeletal information of the extended attribute) used when generating the extended data 40.
  • skeletal information skeletal information of the extended attribute
  • the training object model 50 is a Neural Network (NN) or the like.
  • a parameter ⁇ 2 is set in the training object model 50.
  • the data generating device inputs the augmented data 40 to the training object model 50 and obtains an inference result output from the training object model 50.
  • the data generating device updates a parameter ⁇ 2 of the training object model 50 so as to reduce an inference error between the inference result and a correct label.
  • the data generating device feeds back gradient information of the inference error to the data augmentation unit 151.
  • the data generation device For each joint in the skeletal information, the data generation device associates a pair of inference result information indicating the relationship between the inference result and the true value (correct label) with information on the attributes (extended attributes) used when generating the extended data 40, and outputs the pair to the integration unit 153.
  • the integration unit 153 executes the following process based on the inference result information and the extended attribute information.
  • the "extended attribute” is simply referred to as "attribute.”
  • the integration unit 153 waits until the parameter ⁇ 2 is updated multiple times for the training target model 50, and acquires multiple pairs of inference result information and attribute information.
  • the integration unit 153 identifies the inference error for each body part p based on the inference result information.
  • Figure 5 is a diagram for explaining body part p.
  • body parts p include "head”, “armL”, “armR”, “legL”, and “legR”.
  • p ⁇ head, armL, armR, legL, legR ⁇ .
  • the body part "head” corresponds to joints ar3 and ar18.
  • the body part “armL” corresponds to joints ar4, ar5, ar6, and ar19.
  • the body part “armR” corresponds to joints ar7, ar8, ar9, and ar20.
  • the body part “legL” corresponds to joints ar10, ar11, ar12, and ar13.
  • the body part “legR” corresponds to joints ar14, ar15, ar16, and ar17.
  • the integration unit 153 identifies the inference error for each body part p for each piece of inference result information. That is, from one piece of inference result information, the inference errors for the body parts "head”, "armL”, “armR”, “legL”, and “legR” are each identified.
  • the inference error for the body part "head” is the MSE (Mean Squared Error) between the inference results and the true values for joints ar3 and ar18.
  • the inference error for the body part “armL” is the MSE between the inference results and the true values for joints ar4, ar5, ar6, and ar19.
  • the inference error for the body part “armR” is the MSE between the inference results and the true values for joints ar7, ar8, ar9, and ar20.
  • the inference error for the body part “legL” is the MSE between the inference results and the true values for joints ar10, ar11, ar12, and ar13.
  • the inference error for the body part “legR” is the MSE between the inference results and the true values for joints ar14, ar15, ar16, and ar17.
  • the integration unit 153 compares the inference error for each body part p calculated from each piece of inference result information, and identifies the maximum inference error for each body part p and the attribute corresponding to the inference result information with the maximum inference error.
  • FIG. 6 is a diagram (1) for explaining the processing of the integration unit.
  • the inference result information obtained by inputting the extended data generated based on the attribute A1 into the training target model 50 is set as the inference result information R1 .
  • the inference error of the body part "head” obtained based on the inference result R1 is set as the inference error E1-1.
  • the inference error of the body part "armL” obtained based on the inference result R1 is set as the inference error E1-2.
  • the inference error of the body part “armR” obtained based on the inference result R1 is set as the inference error E1-3.
  • the inference error of the body part "legR” obtained based on the inference result R1 is set as the inference error E1-4.
  • the inference error of the body part "legL” obtained based on the inference result R1 is set as the inference error E1-5.
  • the inference result information obtained by inputting the extended data generated based on attribute A2 into the training target model 50 is defined as inference result information R2 .
  • the inference error for the body part "head” obtained based on the inference result R2 is defined as inference error E2-1.
  • the inference error for the body part “armL” obtained based on the inference result R2 is defined as inference error E2-2.
  • the inference error for the body part "armR” obtained based on the inference result R2 is defined as inference error E2-3.
  • the inference error for the body part "legR” obtained based on the inference result R2 is defined as inference error E2-4.
  • the inference error for the body part "legL” obtained based on the inference result R2 is defined as inference error E2-5.
  • the inference result information obtained by inputting the extended data generated based on the attribute A n into the training target model 50 is defined as inference result information R n , where n is a natural number equal to or greater than 3.
  • the inference error for the body part "head” obtained based on the inference result R n is defined as inference error En-1.
  • the inference error for the body part "armL” obtained based on the inference result R n is defined as inference error En-2.
  • the inference error for the body part “armR” obtained based on the inference result R n is defined as inference error En-3.
  • the inference error for the body part "legR” obtained based on the inference result R n is defined as inference error En-4.
  • the inference error for the body part "legL” obtained based on the inference result R n is defined as inference error En-5.
  • the integrating unit 153 compares the inference errors E1-1 to En-1 of the body part "head” and identifies the inference error with the maximum value.
  • the inference error E1-1 is set to be the maximum value.
  • the attribute corresponding to the inference error E1-1 is attribute A1 .
  • the integrating unit 153 identifies the weak point attribute of the body part "head” as weak point attribute A1 .
  • the integrating unit 153 compares the inference errors E1-2 to En-2 for the body part "armL” and identifies the inference error with the maximum value.
  • the inference error E2-2 is identified as the maximum value.
  • the attribute corresponding to the inference error E2-2 is attribute A2 .
  • the integrating unit 153 identifies the weak point attribute of the body part "armL" as weak point attribute A2 .
  • the integrating unit 153 compares the inference errors E1-3 to En-3 of the body part "armR" and identifies the inference error with the maximum value.
  • the inference error E3-3 is set to be the maximum value.
  • the attribute corresponding to the inference error E3-3 is attribute A3 .
  • the integrating unit 153 identifies the weak point attribute of the body part "armR" as weak point attribute A3 . Attribute A3 is not shown in FIG. 6.
  • the integrating unit 153 compares the inference errors E1-4 to En-4 for the body part "legL” and identifies the inference error with the maximum value.
  • the inference error E4-4 is identified as the maximum value.
  • the attribute corresponding to the inference error E4-4 is attribute A4 .
  • the integrating unit 153 identifies the weak point attribute of the body part "legL” as weak point attribute A4 . Attribute A4 is not shown in FIG. 6.
  • the integrating unit 153 compares the inference errors E1-5 to En-5 for the body part "legR” and identifies the inference error with the maximum value. In this embodiment, as an example, among the inference errors E1-5 to En-5, the inference error E5-5 is identified as the maximum value.
  • the attribute corresponding to the inference error E5-5 is attribute A5 . In this case, the integrating unit 153 identifies the weak point attribute of the body part "legR" as weak point attribute A5 . Attribute A5 is not shown in FIG. 6.
  • the integration unit 153 executes the process shown in Fig. 6 to identify the weak point attribute of each body part p.
  • the weak point attribute of the body part "head” is weak point attribute A1 .
  • the weak point attribute of the body part “armL” is weak point attribute A2 .
  • the weak point attribute of the body part “armR” is weak point attribute A3 .
  • the weak point attribute of the body part “legL” is weak point attribute A4 .
  • the weak point attribute of the body part "legR” is weak point attribute A5 .
  • FIG. 7 is a diagram (2) for explaining the processing of the integrating unit.
  • the integrating unit 153 generates a weak point attribute A'1 by integrating the weak point attributes A1 to A5 using the weak point attribute A1 as a base.
  • the weak point attribute A'1 includes skeletal information, camera parameters, and appearance, similar to attributes.
  • the skeletal information of the weak point attribute A'1 is skeletal information combining each joint coordinate of the body part "head" of the weak point attribute A1 , each joint coordinate of the body part "armL” of the weak point attribute A2 , each joint coordinate of the body part “armR” of the weak point attribute A3 , each joint coordinate of the body part “legL” of the weak point attribute A4 , and each joint coordinate of the body part "legR” of the weak point attribute A5 .
  • the camera parameters and appearance of the weak point attribute A'1 are reused from the camera parameters and appearance of the base weak point attribute A1 .
  • the integration unit 153 generates a weak point attribute A'2 by integrating the weak point attributes A1 to A5 using the weak point attribute A2 as a base.
  • the weak point attribute A'2 includes skeletal information, camera parameters, and appearance, similar to attributes.
  • the skeletal information of the weak point attribute A'2 is skeletal information that combines each joint coordinate of the body part "head" of the weak point attribute A1 , each joint coordinate of the body part "armL” of the weak point attribute A2 , each joint coordinate of the body part "armR” of the weak point attribute A3 , each joint coordinate of the body part "legL” of the weak point attribute A4 , and each joint coordinate of the body part "legR” of the weak point attribute A5 .
  • the camera parameters and appearance of the weak point attribute A'2 reuse the camera parameters and appearance of the base weak point attribute A2 .
  • the integration unit 153 generates a weak point attribute A'3 by integrating the weak point attributes A1 to A5 using the weak point attribute A3 as a base.
  • the weak point attribute A'3 includes skeletal information, camera parameters, and appearance, similar to attributes.
  • the skeletal information of the weak point attribute A'3 is skeletal information that combines each joint coordinate of the body part "head" of the weak point attribute A1 , each joint coordinate of the body part "armL” of the weak point attribute A2 , each joint coordinate of the body part “armR” of the weak point attribute A3 , each joint coordinate of the body part "legL” of the weak point attribute A4 , and each joint coordinate of the body part "legR” of the weak point attribute A5 .
  • the camera parameters and appearance of the weak point attribute A'3 are reused from the camera parameters and appearance of the base weak point attribute A3 .
  • the integration unit 153 generates a weak point attribute A'4 by integrating the weak point attributes A1 to A5 using the weak point attribute A4 as a base.
  • the weak point attribute A'4 includes skeletal information, camera parameters, and appearance, similar to attributes.
  • the skeletal information of the weak point attribute A'4 is skeletal information combining each joint coordinate of the body part "head" of the weak point attribute A1 , each joint coordinate of the body part "armL” of the weak point attribute A2 , each joint coordinate of the body part “armR” of the weak point attribute A3 , each joint coordinate of the body part "legL” of the weak point attribute A4 , and each joint coordinate of the body part "legR” of the weak point attribute A5 .
  • the camera parameters and appearance of the weak point attribute A'4 reuse the camera parameters and appearance of the base weak point attribute A4 .
  • the integration unit 153 generates a weak point attribute A'5 by integrating the weak point attributes A1 to A5 using the weak point attribute A5 as a base.
  • the weak point attribute A'5 includes skeletal information, camera parameters, and appearance, similar to attributes.
  • the skeletal information of the weak point attribute A'5 is skeletal information that combines each joint coordinate of the body part "head" of the weak point attribute A1 , each joint coordinate of the body part "armL” of the weak point attribute A2 , each joint coordinate of the body part “armR” of the weak point attribute A3 , each joint coordinate of the body part "legL” of the weak point attribute A4 , and each joint coordinate of the body part "legR” of the weak point attribute A5 .
  • the camera parameters and appearance of the weak point attribute A'5 reuse the camera parameters and appearance of the base weak point attribute A5 .
  • the integration unit 153 generates weak point attributes A'1 to A'5 by executing the process described in Fig. 7.
  • the integration unit 153 inputs the weak point attributes A'1 to A'5 to the image generation unit 152, thereby generating extended data corresponding to the weak point attributes A'1 to A'5 , respectively.
  • FIG. 8 is a diagram showing an example of extended data generated by attributes and weak point attributes.
  • extended data Im10 is image data obtained by inputting attribute A2 to the image generating unit 152.
  • the inference error of the body part "armL” is the maximum value compared to other inference errors.
  • Extended data Im11 is image data obtained by inputting weak point attribute A'2 to the image generating unit 152.
  • the inference errors of each of the body parts "head”, “armL”, “armR”, “legL”, and “legR” are the maximum values. In other words, new teacher data that is far from the distribution of existing teacher data can be generated.
  • the data generation device generates a set of augmented data obtained by inputting the weak point attributes into the image generation unit 152 and skeletal information contained in the weak point attributes as teacher data, and uses this data in the machine learning of the training target model 50.
  • the data generating device identifies an inference error for each body part p based on the inference result information obtained by inputting the extended data 40 into the training target model 50.
  • the data generating device compares the inference errors for each body part p calculated from each piece of inference result information, and identifies the attribute (weak attribute) of the inference result information for which the inference error is at the maximum value.
  • the data generating device integrates the weak attribute for each body part p, and generates extended data based on the integrated weak attribute. This makes it possible to generate new training data that is far from the distribution of existing training data.
  • the data generation device identifies and integrates weak attributes based on the inference result information obtained as a result of inputting the extended data 40 into the training target model 50, but this is not limited to the above.
  • the teacher data 30 may be directly input into the training target model 50, and weak attributes may be identified and integrated based on the inference result information obtained as a result of inputting the teacher data 30 into the training target model 50.
  • FIG. 9 is a functional block diagram showing the configuration of a data generating device according to this embodiment.
  • this data generating device 100 has a communication unit 110, an input unit 120, a display unit 130, a storage unit 140, and a control unit 150.
  • the communication unit 110 executes data communication with external devices etc. via a network.
  • the communication unit 110 is a NIC (Network Interface Card) etc.
  • the control unit 150 which will be described later, exchanges data with external devices via the communication unit 110.
  • the input unit 120 is an input device that inputs various information to the control unit 150 of the data generating device 100.
  • the input unit 120 corresponds to a keyboard, a mouse, a touch panel, etc.
  • the display unit 130 is a display device that displays information output from the control unit 150.
  • the storage unit 140 has a training target model 50 and a teacher dataset 141.
  • the storage unit 140 is a storage device such as a memory.
  • the training target model 50 is a machine learning model that inputs image data (augmented data) and outputs the inference results of skeletal information.
  • the training target model 50 is a neural network, etc.
  • the teacher data set 141 has multiple teacher data.
  • the teacher data includes image data of a person and attributes.
  • the attributes include skeletal information, camera parameters, and appearance.
  • the control unit 150 has a data expansion unit 151, an image generation unit 152, an integration unit 153, and a learning unit 154.
  • the control unit 150 is a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), etc.
  • the data extension unit 151 has a parameter ⁇ 1 set, and extends the attributes of the teacher data based on the parameter ⁇ 1 .
  • the data extension unit 151 outputs information on the extended attributes to the image generation unit 152.
  • the data extension unit 151 trains the parameter ⁇ 1 in a direction that increases the inference error when the extended data is input to the training object model 50, based on gradient information of the inference error fed back from the training object model 50.
  • Other explanations regarding the data extension unit 151 are the same as those regarding the data extension unit 151 described in FIG. 3.
  • the image generation unit 152 generates extended data based on the attribute information extended by the data extension unit 151.
  • the image generation unit 152 also generates extended data based on the weak attribute information generated by the integration unit 153.
  • the image generation unit 152 may add a pair of the weak attribute information and the extended data to the teacher dataset 141 as new teacher data.
  • the integration unit 153 identifies an inference error for each body part p for each piece of inference result information.
  • the integration unit 153 compares the inference errors for each body part p calculated from each piece of inference result information, and identifies the maximum value of the inference error for each body part p and the attribute (weakness attribute) corresponding to the inference result information for which the inference error is maximum.
  • the integration unit 153 generates a weakness attribute by integrating the weakness attributes for each body part p.
  • Other explanations regarding the integration unit 153 are the same as those regarding the integration unit 153 described in Figures 3 and 5 to 7.
  • the learning unit 154 executes machine learning of the training object model 50 based on the teacher dataset 141. For example, the learning unit 154 updates a parameter ⁇ 2 of the training object model 50 based on the backpropagation method so that when image data is input to the training object model 50, the error between the inference result output from the training object model 50 and the correct label is reduced.
  • the pair of image data and correct label that the learning unit 154 inputs to the training target model 50 is a pair of first image data and a first correct label, or a pair of second image data and a second correct label, which will be described next.
  • the first correct answer label is skeletal information obtained when the data extension unit 151 extends the attributes of the teacher data.
  • the first image data is extended data generated by the image generation unit 152 based on the attributes of the teacher data extended by the data extension unit 151.
  • the second correct label is skeletal information of the weak point attribute integrated by the integration unit 153.
  • the second image data is extended data generated by the image generation unit 152 based on the weak point attribute.
  • the learning unit 154 feeds back the gradient information of the inference error to the data expansion unit 151. For each joint of the skeletal information, the learning unit 154 outputs a set of inference result information indicating the relationship between the inference result and the true value (correct label) and attribute information to the integration unit 153.
  • Fig. 10 is a flowchart showing the processing procedure of the data generating device according to the present embodiment.
  • the data extension unit 151 of the data generating device 100 acquires teacher data from the teacher data set 141 (step S101).
  • the data extension unit 151 extends the attributes of the teacher data based on the parameter ⁇ 1 in a direction that increases the inference error of the training object model 50 (step S102).
  • the image generation unit 152 of the data generating device 100 generates extended data based on the extended attributes (step S103).
  • the learning unit 154 of the data generating device 100 performs machine learning of the training target model 50 based on the extended data and the correct label (step S104).
  • the integration unit 153 of the data generating device 100 executes the integration process (step S105).
  • the data expansion unit 151 of the data generating device 100 receives feedback of the gradient information of the inference result and updates the parameter ⁇ 1 (step S106).
  • step S107 If the data generating device 100 continues the process (step S107, Yes), it proceeds to step S101. On the other hand, if the data generating device 100 does not continue the process (step S107, No), it ends the process.
  • FIG. 11 is a flowchart showing the processing steps of the integration process.
  • the integration unit 153 of the data generating device 100 identifies an inference error for each body part p for each piece of inference result information (step S201).
  • the integration unit 153 compares the inference errors for each body part p and identifies weak points attributes for each body part p (step S202). The integration unit 153 integrates the weak points attributes for each body part p (step S203).
  • the image generation unit 152 of the data generation device 100 generates extended data based on the integrated weak point attributes (step S204).
  • the learning unit 154 of the data generating device 100 performs machine learning of the training target model 50 based on the augmented data and the correct label (skeletal information of the weak attribute) (step S205).
  • the data generating device 100 identifies an inference error for each body part p based on inference result information obtained by inputting extended data into the training target model.
  • the data generating device 100 compares the inference errors for each body part p calculated from each piece of inference result information, and identifies the attribute (weak attribute) of the inference result information for which the inference error is maximum.
  • the data generating device 100 integrates the weak attribute for each body part p, and generates extended data based on the integrated weak attribute. This makes it possible to generate new training data that is far from the distribution of existing training data.
  • FIG. 12 is a diagram for explaining the effect of the data generation device according to the present embodiment.
  • Image data Im20 in FIG. 12 is extended data generated by the image generation unit 152 based on the integrated weak point attributes.
  • the learning unit 154 inputs the image data Im20 to the training target model 50, an inference result 60 is output.
  • inference result 60 inference between the joints ar9 and ar20 and the joints ar5, ar6, and ar7 fails.
  • the inference accuracy of the training target model 50 can be improved.
  • inference results 60a and 60b are obtained by training the training target model 50 when machine learning is performed using new image data (teacher data) that is far from the distribution of existing teacher data.
  • Inference result 60a fails to infer at joint ar19, but the inference accuracy is improved at joints ar5 and ar6.
  • Inference result 60b improves the inference accuracy at joints ar9, ar20, joints ar5, ar6, ar19, and joint ar11.
  • the processing of the data generating device 100 is not limited to the above.
  • the data generating device 100 can execute a body detection task and a body area extraction (segmentation) task from image data.
  • the data generating device 100 executes a body detection task and a body area extraction task, and by referencing body parts with large inference errors, can identify weak attributes for each body part and apply the above-mentioned processing.
  • the processing of the data generating device 100 can also be applied to tasks that target more general articulated bodies, such as quadrupedal animals, rather than human bodies.
  • the processing of the data generating device 100 can also be applied to 2D body skeleton estimation tasks and 3D body skeleton estimation tasks.
  • the processing of the data generating device 100 can also be applied to 2D-to-3D skeleton estimation tasks that do not require image data.
  • the data generating device 100 may use a mechanism for evaluating the likelihood of skeletal information obtained by integrating weak point attributes to discard or correct data that has been synthesized into unlikely skeletal information.
  • the data generating device 100 uses VPoser, which evaluates the distance in the latent space of the pose generator, penetration loss, which evaluates the penetration of the body model, and hyper-bending loss, which evaluates the bending of elbows and knees in the opposite direction.
  • VPoser which evaluates the distance in the latent space of the pose generator
  • penetration loss which evaluates the penetration of the body model
  • hyper-bending loss which evaluates the bending of elbows and knees in the opposite direction.
  • technology related to VPoser is described in the literature "G. Pavlakos et al., "Expressive Body Capture: 3D Hands, Face, and Body from a Single Image," CVPR 201".
  • the data generating device 100 may immediately discard the unlikely skeletal information. Alternatively, the data generating device 100 may project the unlikely skeletal information onto a manifold of plausible skeletons (e.g., the latent space of the VPoser) and correct it to plausible skeletal information. The data generating device 100 may select only some parts within the range of plausible skeletons by combinatorial optimization such as a greedy method, so as to maximize the inference error for all parts in total.
  • a manifold of plausible skeletons e.g., the latent space of the VPoser
  • the image generating unit 152 acquires information on the weak spot attribute (integrated weak spot attribute) generated by the integrating unit 153, it determines whether the skeletal information of the integrated weak spot attribute is plausible skeletal information. For example, the image generating unit 152 holds information on the working area of each joint, and determines that the skeletal information is plausible when each joint in the skeletal information is within the range of the working area.
  • the data generating device 100 may identify and use Nw (>1) weak attribute(s) for each body part in descending order of inference error.
  • the data generating device 100 may generate new weak attribute(s) for all combinations Nw ⁇ Np (Np is the number of body parts).
  • the data generating device 100 may select the optimal combination of weak attribute(s) from among all combinations Nw ⁇ Np by combinatorial optimization with the above-mentioned skeletal information likelihood as a constraint.
  • FIG. 13 is a diagram showing an example of the hardware configuration of a computer that realizes the same functions as the data generating device of the embodiment.
  • computer 200 has a CPU 201 that executes various types of arithmetic processing, an input device 202 that accepts data input from the user, and a display 203.
  • Computer 200 also has a communication device 204 that transmits and receives data to and from camera 15, external devices, etc., via a wired or wireless network, and an interface device 205.
  • Computer 200 also has a RAM 206 that temporarily stores various types of information, and a hard disk device 207. Each of devices 201 to 207 is connected to a bus 208.
  • the hard disk device 207 has a data expansion program 207a, an image generation program 207b, an integration program 207c, and a learning program 207d.
  • the CPU 201 also reads out each of the programs 207a to 207d and expands them in the RAM 206.
  • the data extension program 207a functions as a data extension process 206a.
  • the image generation program 207b functions as an image generation process 206b.
  • the integration program 207c functions as an integration process 206c.
  • the learning program 207d functions as a learning process 206d.
  • the processing of the data extension process 206a corresponds to the processing of the data extension unit 151.
  • the processing of the image generation process 206b corresponds to the processing of the image generation unit 152.
  • the processing of the integration process 206c corresponds to the processing of the integration unit 153.
  • the processing of the learning process 206d corresponds to the processing of the learning unit 154.
  • each of the programs 207a to 207d does not necessarily have to be stored in the hard disk device 207 from the beginning.
  • each program may be stored in a "portable physical medium" such as a flexible disk (FD), CD-ROM, DVD, magneto-optical disk, or IC card that is inserted into the computer 200. Then, the computer 200 may read and execute each of the programs 207a to 207d.
  • Training object model 100 Data generating device 110 Communication unit 120 Input unit 130 Display unit 140 Storage unit 141 Teacher data set 150 Control unit 151 Data expansion unit 152 Image generating unit 153 Integration unit 154 Learning unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
PCT/JP2022/039766 2022-10-25 2022-10-25 データ生成方法、データ生成プログラムおよびデータ生成装置 Ceased WO2024089772A1 (ja)

Priority Applications (5)

Application Number Priority Date Filing Date Title
PCT/JP2022/039766 WO2024089772A1 (ja) 2022-10-25 2022-10-25 データ生成方法、データ生成プログラムおよびデータ生成装置
JP2024552557A JP7794329B2 (ja) 2022-10-25 2022-10-25 データ生成方法、データ生成プログラムおよびデータ生成装置
EP22963424.1A EP4610896A4 (en) 2022-10-25 2022-10-25 DATA GENERATION METHOD, DATA GENERATION PROGRAM, AND DATA GENERATION DEVICE
CN202280099673.XA CN119790410A (zh) 2022-10-25 2022-10-25 数据生成方法、数据生成程序以及数据生成装置
US19/054,422 US20250191155A1 (en) 2022-10-25 2025-02-14 Data generation method, non-transitory computer-readable recording medium storing data generation program, and data generation device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/039766 WO2024089772A1 (ja) 2022-10-25 2022-10-25 データ生成方法、データ生成プログラムおよびデータ生成装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US19/054,422 Continuation US20250191155A1 (en) 2022-10-25 2025-02-14 Data generation method, non-transitory computer-readable recording medium storing data generation program, and data generation device

Publications (1)

Publication Number Publication Date
WO2024089772A1 true WO2024089772A1 (ja) 2024-05-02

Family

ID=90830350

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/039766 Ceased WO2024089772A1 (ja) 2022-10-25 2022-10-25 データ生成方法、データ生成プログラムおよびデータ生成装置

Country Status (5)

Country Link
US (1) US20250191155A1 (https=)
EP (1) EP4610896A4 (https=)
JP (1) JP7794329B2 (https=)
CN (1) CN119790410A (https=)
WO (1) WO2024089772A1 (https=)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019212106A (ja) 2018-06-06 2019-12-12 日本電信電話株式会社 領域抽出モデル学習装置、領域抽出モデル学習方法、プログラム
JP2020034998A (ja) * 2018-08-27 2020-03-05 日本電信電話株式会社 拡張装置、拡張方法及び拡張プログラム
US20220156511A1 (en) * 2020-11-16 2022-05-19 Waymo Llc Rare pose data generation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10489683B1 (en) 2018-12-17 2019-11-26 Bodygram, Inc. Methods and systems for automatic generation of massive training data sets from 3D models for training deep learning networks
CN113554736B (zh) 2021-09-22 2021-12-21 成都市谛视科技有限公司 骨骼动画顶点矫正方法及模型的学习方法、装置和设备

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019212106A (ja) 2018-06-06 2019-12-12 日本電信電話株式会社 領域抽出モデル学習装置、領域抽出モデル学習方法、プログラム
JP2020034998A (ja) * 2018-08-27 2020-03-05 日本電信電話株式会社 拡張装置、拡張方法及び拡張プログラム
US20220156511A1 (en) * 2020-11-16 2022-05-19 Waymo Llc Rare pose data generation

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
G. PAVLAKOS ET AL.: "Expressive Body Capture: 3D Hands, Face, and Body from a Single Image", CVPR 201
GONG ET AL.: "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR, 2021
PENG XI; TANG ZHIQIANG; YANG FEI; FERIS ROGERIO S.; METAXAS DIMITRIS: "Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, IEEE, 18 June 2018 (2018-06-18), pages 2226 - 2234, XP033476188, DOI: 10.1109/CVPR.2018.00237 *
S. LI ET AL.: "Cascaded Deep Monocular 3D Human Pose Estimation with Evolutionary Training Data", CVPR, 2020
See also references of EP4610896A4
Y. GE ET AL.: "Neural-Sim: Learning to Generate Training Data with NeRF", ECCV, 2022

Also Published As

Publication number Publication date
US20250191155A1 (en) 2025-06-12
EP4610896A4 (en) 2025-09-10
EP4610896A1 (en) 2025-09-03
JPWO2024089772A1 (https=) 2024-05-02
CN119790410A (zh) 2025-04-08
JP7794329B2 (ja) 2026-01-06

Similar Documents

Publication Publication Date Title
Mao et al. Multi-level motion attention for human motion prediction
CN110827383B (zh) 三维模型的姿态模拟方法、装置、存储介质和电子设备
Liu et al. Realdex: Towards human-like grasping for robotic dexterous hand
Zurdo et al. Animating wrinkles by example on non-skinned cloth
Ye et al. Synthesis of detailed hand manipulations using contact sampling
Green et al. Quantifying and recognizing human movement patterns from monocular video images-part i: a new framework for modeling human motion
CN112990154B (zh) 一种数据处理方法、计算机设备以及可读存储介质
CN110232727A (zh) 一种连续姿态动作评估智能算法
CN104463788A (zh) 基于运动捕捉数据的人体运动插值方法
Heap et al. Extending the point distribution model using polar coordinates
CN111515959B (zh) 一种可编程木偶表演机器人控制方法、系统及机器人
KR20200021702A (ko) 모션 리타겟팅 장치 및 모션 리타겟팅 방법
JP7794329B2 (ja) データ生成方法、データ生成プログラムおよびデータ生成装置
JPH0620055A (ja) 画像信号処理方法とその装置
CN120259502A (zh) 一种动画重定向方法、系统、设备、介质及产品
Shi Stage performance characteristics of minority dance based on human motion recognition
Meng Analysis and evaluation on the harmfulness of sports dance based on intelligent computing
Men et al. Retrieval of spatial–temporal motion topics from 3D skeleton data
CN114973396A (zh) 图像处理方法、装置、终端设备及计算机可读存储介质
Zeng et al. An evaluation approach of multi-person movement synchronization level using OpenPose
Hu et al. Real Garment Benchmark (RGBench): A Comprehensive Benchmark for Robotic Garment Manipulation featuring a High-Fidelity Scalable Simulator
CN119600692B (zh) 一种基于机器学习的动漫角色动作捕捉方法及系统
Wu et al. Video driven adaptive grasp planning of virtual hand using deep reinforcement learning
Huang et al. Multi-stream adaptive GCN with dropgraph for skeleton-based action recognition
CN111612874B (zh) 一种用于绘制复杂模型的3d画板

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22963424

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2024552557

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 202280099673.X

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 202280099673.X

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2022963424

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022963424

Country of ref document: EP

Effective date: 20250526

WWP Wipo information: published in national office

Ref document number: 2022963424

Country of ref document: EP