WO2021155653A1 - Procédé de suivi de processus d'interaction main humaine-objet basé sur un filtrage d'évolution différentielle collaborative - Google Patents

Procédé de suivi de processus d'interaction main humaine-objet basé sur un filtrage d'évolution différentielle collaborative Download PDF

Info

Publication number
WO2021155653A1
WO2021155653A1 PCT/CN2020/101671 CN2020101671W WO2021155653A1 WO 2021155653 A1 WO2021155653 A1 WO 2021155653A1 CN 2020101671 W CN2020101671 W CN 2020101671W WO 2021155653 A1 WO2021155653 A1 WO 2021155653A1
Authority
WO
WIPO (PCT)
Prior art keywords
hand
observation
posture
human hand
human
Prior art date
Application number
PCT/CN2020/101671
Other languages
English (en)
Chinese (zh)
Inventor
李东年
郭阳
陈成军
赵正旭
温晋杰
张庆海
Original Assignee
青岛理工大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 青岛理工大学 filed Critical 青岛理工大学
Publication of WO2021155653A1 publication Critical patent/WO2021155653A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present disclosure relates to the technical field of three-dimensional human hand tracking, and in particular to a method for tracking a human hand-object interaction process based on collaborative differential evolution filtering.
  • Computer vision-based 3D human hand tracking can be applied in fields such as robot teaching and learning, motion capture, human-computer interaction, gesture recognition, etc.
  • the tracking of the human hand-object interaction process is trapped by many complex factors.
  • the human hand has multiple degrees of freedom, the problem is essentially a high-dimensional space problem;
  • the human hand will cause frequent occlusion during the interaction with the object, including the mutual occlusion between the human hand and the object being manipulated.
  • the self-occlusion of the human hand in addition, the useful information carried by the object context will promote the recognition and estimation of human hand movement.
  • vision-based hand-object tracking methods are generally divided into two categories: appearance-based methods and model-based methods.
  • the appearance-based method builds a mapping through learning, and maps the image feature space to the human hand-object state space, thereby directly estimating the human hand state and the object state from the image features.
  • This type of method does not need to be initialized, and the tracking speed is fast, but its accuracy is affected by the training samples.
  • Et al. proposed a method to simultaneously recognize the human hand movement and the manipulated object, and express the time-varying relationship between the human hand movement and the object through a conditional random field model, but this method does not give detailed information about the hand movement posture.
  • Romero et al. proposed a real-time appearance-based non-parametric method to reconstruct the three-dimensional pose of the human hand interacting with the object.
  • the method uses a histogram of directional gradients (HoG) to describe the characteristics of the human hand and executes it in a large template database. Nearest neighbor search is used to find the hand pose that best matches the input image.
  • this method cannot precisely track the hand movement in a high-dimensional space.
  • Gupta et al. proposed a Bayesian method to integrate multiple perception tasks in the process of human-object interaction, and seek consistent semantic expression by imposing spatial constraints on the perception elements. This method can recognize objects and corresponding actions when the appearance is not sufficiently discernible, and can also recognize human actions from static images without using any motion information. However, this method does not give detailed information about the posture of the human body. Yao et al.
  • Model-based methods use pre-established human hand models and object models to generate hand-object posture hypotheses.
  • the features extracted from the model are compared with those extracted from visual observations, and the similarity between the two is evaluated.
  • a group of human hand-object states with the best similarity are searched in the model state space.
  • This type of method can use more prior information (such as human hand shape, joint constraints, etc.), but its tracking process needs to be initialized, and it faces a difficult problem of searching in a high-dimensional space.
  • Hamer et al. activated an independent local tracker for each part of the multi-joint human hand, used a paired Markov random field to connect two adjacent human hand parts, and used belief propagation (BP) to find the optimal human hand.
  • BP belief propagation
  • Oikonomidis et al. proposed a model-based method to track the movement of the human hand and the manipulated object at the same time. This method establishes a three-dimensional model and a motion model for the human hand and the object at the same time.
  • the tracking problem is regarded as a sequence optimization problem, and the search and input image matching error
  • the smallest human hand pose parameters and model pose parameters, the system uses a multi-eye RGB image sequence as input.
  • Kyriazis et al. used a depth camera to obtain observation input, and proposed a method of searching only the hand posture parameters.
  • the state of the object is derived from the state of the hand and the force model between the hand and the object.
  • the role involves many factors, and it is difficult to accurately model it.
  • the model-based method there are fineness problems in the three-dimensional modeling of human hands and objects, and the particle filter framework is used to track the motion of human hands or human bodies. Due to the extreme sparsity of particle sampling in high-dimensional space, It is difficult to use a limited number of particles to effectively express the true posterior distribution of the human hand state, which can easily lead to tracking failure.
  • the present disclosure proposes a method for tracking the human hand-object interaction process based on cooperative differential evolution filtering, which uses a model-based method to simultaneously track human hands and objects in the human hand-object interaction process, and integrates the differential evolution algorithm
  • the particle filter framework two coordinated particle filter trackers are used to track the movement of human hands and objects respectively, and differential evolution is used to optimize the matching error under the current observation to drive the particles to move to the high likelihood area and improve the particles.
  • the sample distribution is filtered, so that a small number of particles can be used to achieve robust tracking of human hand-object movement.
  • the present disclosure provides a method for tracking human hand-object interaction process based on cooperative differential evolution filtering, including:
  • the hand motion posture and the object motion posture are obtained respectively.
  • the hand motion posture and the object motion posture form the hand-object posture vector and generate the corresponding rendering depth map;
  • the collaborative differential evolution filtering algorithm is used to calculate the matching error function to optimize the posture of the human hand and the object respectively, and obtain the motion tracking results of the human hand and the object during the hand-object interaction process.
  • the present disclosure provides a human-hand-object interaction process tracking system based on cooperative differential evolution filtering, including:
  • the image processing module to be measured is configured to extract the foreground area corresponding to the human hand and the object in the image to be measured, and generate an observation depth map and a corresponding observation silhouette map;
  • the hand-object movement posture module is configured to obtain the hand movement posture and the object movement posture based on the constructed hand kinematics model and the object kinematics model respectively.
  • the hand movement posture and the object movement posture form the hand-object posture vector and generate the corresponding rendering Depth map
  • the matching error function building module is configured to take the image to be measured as the observation input, and to calculate the depth feature matching degree between the observation depth map and the rendered depth map and the silhouette feature matching degree between the observation silhouette map and the rendered depth map. Construct a matching error function between the observation input and the human hand-object pose vector;
  • the tracking module is configured to use the cooperative differential evolution filtering algorithm to optimize the posture of the human hand and the object by calculating the matching error function, and obtain the motion tracking result of the human hand and the object during the hand-object interaction process.
  • the present disclosure provides an electronic device including a memory, a processor, and computer instructions stored in the memory and running on the processor.
  • a cooperative differential evolution filtering is completed. The steps described in the human-hand-object interaction process tracking method.
  • the present disclosure provides a computer-readable storage medium for storing computer instructions that, when executed by a processor, complete the method for tracking a human hand-object interaction process based on collaborative differential evolution filtering. step.
  • a model-based method is used to simultaneously track human hands and objects in the process of human-object interaction, and the differential evolution algorithm is integrated into the particle filter framework, and a new improved particle filter algorithm-collaborative differential evolution filtering to track human hands-is proposed.
  • the differential evolution algorithm is integrated into the particle filter framework, and a new improved particle filter algorithm-collaborative differential evolution filtering to track human hands-is proposed.
  • two coordinated particle filter trackers are used to track the movement of human hands and objects respectively, and differential evolution is used to optimize the matching error under the current observation to drive the particles to move to the high-likelihood area and improve the particle filter samples. Distribution to achieve robust tracking of human hand and object motion with a small number of particles.
  • FIG. 1 is a schematic diagram of a method for tracking a human hand-object interaction process based on cooperative differential evolution filtering according to Embodiment 1 of the disclosure;
  • FIG. 2 is a schematic diagram of a kinematics model of a human hand provided in Embodiment 1 of the disclosure;
  • Fig. 3(a) is a schematic diagram of a human hand-spherical body model provided in Embodiment 1 of the present disclosure
  • Fig. 3(b) is a schematic diagram of a human hand-pillar model provided in Embodiment 1 of the present disclosure
  • Embodiment 4 is a flow chart of human hand-object tracking provided by Embodiment 1 of the present disclosure.
  • FIGS. 6(a)-(c) are diagrams of the tracking results of the interaction process between the human hand and the cylinder provided in Embodiment 1 of the present disclosure.
  • this embodiment provides a method for tracking human hand-object interaction process based on cooperative differential evolution filtering, including:
  • S1 Extract the foreground area corresponding to the human hand and the object in the image to be measured, and generate the observation depth map and the corresponding observation silhouette map; based on the constructed human hand kinematics model and object kinematics model to obtain the hand movement posture and object movement posture, hand movement
  • the posture and object motion posture compose the hand-object posture vector and generate the corresponding rendering depth map;
  • S2 Take the image to be measured as the observation input, and calculate the depth feature matching degree between the observation depth map and the rendered depth map, and the silhouette feature matching degree between the observation silhouette image and the rendered depth map, and construct the observation input and human hand-object. Matching error function of the pose vector;
  • S3 The coordinated differential evolution filtering algorithm is used to calculate the matching error function to optimize the posture of the human hand and the object respectively, and obtain the motion tracking of the human hand and the object during the hand-object interaction process.
  • this embodiment uses a method based on the human hand-object kinematics model to track the interaction process between the human hand and the object, establishes a three-dimensional model and a motion model for the human hand and the object, and simultaneously tracks the motion of the human hand and the object in the three-dimensional space.
  • the human hand 3D model is used to generate the human hand posture hypothesis
  • the object 3D model is used to generate the object posture hypothesis
  • the matching error between the model feature group and the observation feature group obtained from the input image is calculated
  • the tracking problem is regarded as a sequence optimization problem , Search for the state parameter that minimizes the matching error in the state space of the human hand and the object, that is, the optimal solution corresponding to the current frame of the input image.
  • Figure 2 shows the hand kinematics model.
  • the hand motion state x h contains 29 freedoms in total.
  • Degree variables include global palm motion with 6 degrees of freedom, local finger motion with 20 degrees of freedom, and 3 degrees of freedom for the wrist joint.
  • the CMC joints of each finger are fixed, and the palm is modeled as a rigid body.
  • Its motion corresponds to 6 global degrees of freedom (3 translation and 3 rotation) of the human hand; the motion of 5 fingers corresponds to 20 local degrees of freedom, and each finger is composed of Modeling with 4 degrees of freedom; except for the thumb, the MCP joint of each finger and the TM joint of the thumb both contain 2 degrees of freedom (1 flexion and extension and 1 abduction and adduction), while the PIP and DIP joints of each finger and the thumb’s
  • the MCP joint and the IP joint only contain one degree of freedom in flexion and extension; the wrist joint contains one degree of freedom in flexion and extension, one degree of freedom in abduction and extension, and one degree of freedom in scale transformation.
  • the object motion state x o contains the 6-degree-of-freedom pose state (3 translations and 3 rotations) of the object in the three-dimensional space.
  • This embodiment limits the value of the angles of the finger joints and the wrist joints of the human hand within a certain range based on human anatomical factors.
  • the application of these motion constraints can not only ensure that the solution obtained by the posture estimation process is effective, but also greatly Compress the search range of the human state space and reduce the search difficulty.
  • PTC Pro/Engineer and Multigen-Paradigm Creator are used to establish a unified three-dimensional model for the human hand and the manipulated object with parameterized geometric primitives, and a tree-like hierarchical organization structure is established for the human hand-object model in the Creator. Add local coordinate system and DOF (Degree of Freedom) motion nodes.
  • DOF Degree of Freedom
  • the three-dimensional human hand model established in this embodiment includes a part of the human forearm, so that the established three-dimensional model can describe the forearm pixels connected to the human hand pixels in the segmented depth image.
  • the wrist joint has a scale transformation. Degree of freedom, capable of telescopic transformation of the forearm model.
  • This embodiment proposes the interaction process between the human hand and the following two types of objects: a sphere and a column.
  • Figure 3(a) shows the three-dimensional model of the human hand and the sphere
  • Figure 3(b) shows the three-dimensional model of the human hand and the column.
  • Model the method used is also suitable for tracking the interaction process between human hands and objects of other shapes.
  • this embodiment when constructing the matching error function and the observation likelihood function, this embodiment combines the two types of feature information, the depth feature and the silhouette feature. Taking the depth image obtained by the Kinect depth camera as the observation input z, the foreground area corresponding to the human hand and the manipulated object is extracted through simple depth threshold segmentation, and the observation depth map z d (z) is generated, and the depth map z obtained from the observation is d (z) generate observation silhouette map z s (z);
  • the corresponding rendering depth map r d (x ho ) is generated by the rendering depth Figure r d (x ho ) generates a rendered silhouette image r s (x ho );
  • z s (z) and r s (x ho ) are both binary images, which are taken at the foreground area corresponding to the human hand and the manipulated object It is 1, and the value is 0 in the background.
  • the matching error function is used to express the matching degree between the observation z and the human hand-object pose vector x ho .
  • a small matching error means a high matching degree.
  • the matching error function is defined as:
  • E (z, x ho) by the depth of feature items E d, silhouette characteristic items E s and a penalty term E p of three parts, ⁇ d, ⁇ s and ⁇ p weighting factor is a constant weight of each portion.
  • E d measures the depth deviation between the observed depth map z d (z) and the rendered depth map r d (x ho ) corresponding to the attitude vector x ho , which is defined as follows:
  • the depth deviation (in mm as the measurement unit) is calculated and accumulated pixel by pixel on the entire feature map, and the accumulated sum is normalized by dividing by the total area of the human hand and the pixel area of the operated object. Certain large depth deviations will cause large changes in the function value, thereby affecting the performance of the search method. For this reason, the maximum depth deviation constant T d is introduced, and the range of the depth deviation on each pixel is limited to [0, T d ].
  • E s describes the matching degree of silhouette features by calculating the size of the non-overlapping area between the observation silhouette image z s (z) and the rendered silhouette image r s (x ho ), which is defined as follows:
  • the first part of the above formula calculates the pixel area belonging to the observation silhouette area z s (z) but not the rendered silhouette area r s (x ho ), and the second part calculates the pixel area belonging to r s (x ho ) instead of z s (z ) Of the pixel area, the two parts were standardized.
  • the application of the regional feature item E s has a smooth effect on the objective function, reducing the local minimum around the global minimum, so that the optimization process can better converge to the actual global minimum, and enhance the robustness of the optimization process sex.
  • J represents three pairs of adjacent fingers except the thumb, Represents the deviation between the abduction and adduction angles of the MCP joints of a pair of fingers in the human hand posture hypothesis x h.
  • observation likelihood function and the matching error function E(z, x ho ) are in a monotonically decreasing relationship.
  • the observation likelihood function is defined as follows:
  • ⁇ e is a constant normalization factor, and its value is determined by the observation noise.
  • the present embodiment uses the cooperative differential evolution filter algorithm to optimize the poses of the human hand and the object by calculating the matching error function.
  • This embodiment integrates the differential evolution algorithm into the particle filter framework, and proposes a new tracking method.
  • the algorithm that is, the cooperative differential evolution filter algorithm to track the human hand-object movement in the high-dimensional space.
  • the algorithm uses two cooperative particle filter trackers to track the human hand and the object respectively, and uses differential evolution to track the current observations.
  • the matching error is optimized to improve the particle filter sample distribution.
  • Differential evolution algorithm is an efficient emerging swarm intelligence optimization algorithm, which can effectively solve the optimization problem of nonlinear and non-differentiable objective equations.
  • differential evolution passes through N D-dimensional vectors Iterative evolution of to search for the global optimal solution in a continuous space.
  • the evolution of the population is carried out through the three basic operations of mutation, crossover, and selection; mutation and crossover operations are used to generate new candidate individuals, and the selection operation is used to determine whether the newly generated candidate individuals can survive in the next generation.
  • differential evolution randomly selects 3 different individuals from the previous generation, and combines them to generate a mutant individual.
  • the individual indexes r 1 , r 2 , and r 3 are randomly selected in the range of [1,2,...,N], they are different from each other and different from i; F is the difference vector The scaling factor of to control the convergence speed of the search process;
  • the scale factor F of the standard differential evolution algorithm is a constant.
  • rand j ⁇ U(0,1) is a random number uniformly distributed in the interval [0,1]; the cross parameter CR determines the probability of each element of the candidate individual being inherited from the variant individual.
  • CR Take 0.9; It is a random number selected in the range [1,2,...,D] to ensure that the candidate individual obtains at least one element from the variant individual.
  • a differential evolution population is allocated to the human hand and the manipulated object to perform pose optimization.
  • the human hand motion posture x h and the object motion posture x o of the current frame are respectively optimized, and these two populations are denoted as the population h and the population o. ;
  • Particle filtering is a robust motion tracking framework. Through the propagation of multiple samples in time, it has the characteristic of expressing multimodal distribution.
  • the basic idea is: according to the sampling value of the posterior probability distribution p(x t-1
  • One of the main problems of the standard particle filter algorithm is that it uses the state transition prior model p(x t
  • the standard particle filter algorithm needs to collect a large number of samples. If the sample set is too small, it will cause sample poverty, reduce the estimation accuracy, and even lead to the divergence of the sample set and the failure of the estimation.
  • Differential evolution filtering integrates the differential evolution algorithm into the particle filter framework. After predicting the new particle position, using the matching error function under the latest observation z t as the objective function, running the differential evolution algorithm to iteratively evolve the particles and move the particles Go to an area with greater observation likelihood in the state space.
  • the optimization process of the particle position can be regarded as an importance sampling process, and the new particle swarm generated after the optimization process can be regarded as the optimal importance distribution p(x t
  • the particle filter sample distribution is improved, and the convergence of the particle set is accelerated, so that a small number of particles can be used to achieve robust tracking of human hand-object motion.
  • differential evolution filtering defines the transfer prior p(x t
  • Resampling According to the weight of the particle set Perform resampling to obtain a new set of equal-weight particles
  • State estimation output the system state estimation value based on the maximum a posteriori criterion.
  • two coordinated differential evolution filter trackers are used to track the motion and posture of a human hand and an object respectively, and a collaborative differential evolution filter algorithm is proposed.
  • a differential evolution filter tracker By assigning a differential evolution filter tracker to the human hand and the manipulated object, respectively, the human hand movement posture x h and the object movement posture x o are tracked.
  • the two trackers are not independent of each other, but constantly exchange information during the tracking process.
  • the human hand tracker is iteratively optimizing the human hand motion posture x h in the current frame, the posture x o of the manipulated object is regarded as static, and the posture x o of the manipulated object is adjusted by the corresponding object tracker at the beginning of the optimization process.
  • the tracking result of the previous frame is determined; while the object tracker considers the hand posture x h as static when iteratively optimizes the object motion posture x o of the current frame, and the hand posture x h is tracked by the hand at the beginning of the optimization process To determine the tracking result of the previous frame.
  • each tracker After each tracker obtains the posture tracking result of the current frame, it immediately passes it to another tracker, and the corresponding posture value remains static during the iterative optimization process of the next frame of the other tracker.
  • This collaborative tracking scheme not only models the occlusion by considering the human hand and the manipulated object, but also decomposes the joint pose space through the use of multiple trackers, and decomposes the high-dimensional problem into multiple relatively low-dimensional problems. The problem of dimensionality reduces the difficulty of optimizing the search and reduces the computational cost.
  • the depth image obtained by the Kinect depth camera is used as the observation input, and the human hand-object tracking prototype system is developed based on the 3D graphics rendering technology, and the pre-configured 3D human hand-object model is loaded into the 3D graphics.
  • the rendering engine OpenSceneGraph (OSG) during the tracking process, the osgSim::DOFTransform class is used to control the movement of the human hand and the object, and the OSG off-screen rendering technology is used to render the depth image of the human hand-object model, which is used to communicate with the observation image.
  • the matching error value and observation likelihood value of each particle are compared and calculated, and the state parameter that minimizes the matching error is searched in the state space of the human hand and the object through the collaborative differential evolution filtering algorithm.
  • OSG is an open source cross-platform graphics engine based on OpenGL. It uses a tree structure (scene node tree) to organize spatial data, and achieves high performance through a variety of scene cutting technologies, rendering state sorting, and multi-threaded rendering mechanisms. 3D graphics rendering. The rendering process of each frame of OSG can be broken down into three stages: update traversal, crop traversal, and drawing traversal.
  • update traversal crop traversal
  • drawing traversal drawing traversal.
  • multi-threaded mode is used to render the scene.
  • a thread is created for each camera and its corresponding graphics device. The clipping operation is performed in the thread, and the drawing operation is performed in the graphics device thread. This multi-threaded mode will start a new frame of scene update and cropping operations before the end of the drawing work of the graphics device thread, thereby improving the operating efficiency of the system and maximizing the computing power of the system.
  • this embodiment is based on the proposed collaborative differential evolution filtering algorithm, by using OSG and off-screen rendering technology to develop a hand-object tracking prototype system, creating a virtual camera to render each hand-object posture hypothesis The corresponding depth image is calculated for matching error.
  • This camera has a scene model node as a child node and is bound to a device cache object at the same time.
  • the scene model node contains three-dimensional models of human hands and objects, and the device cache object can be bound to the camera through a frame cache object (FBO).
  • FBO frame cache object
  • the virtual camera will render the content of its scene model child nodes to its bound cache object.
  • the system of this embodiment uses a collaborative differential evolution filtering algorithm to iteratively calculate new human hand-object posture parameters.
  • the system creates a node callback object (osg::NodeCallback) for the scene model node, which is used to update the posture parameters of the human hand and object model during the update phase of each frame of OSG.
  • the system also creates a drawing callback object (osg::Camera::DrawCallback) for the camera.
  • the system calculates the rendered depth image in this callback object The matching error with the observed depth image.
  • each frame will start a thread for each camera and its associated graphics device.
  • the update phase of the next frame will begin.
  • the system creates an event object for the camera, and uses Win32 API's SetEvent() function and WaitForSingleObject() function to synchronize and communicate between threads.
  • the corresponding event object is set to a signaled state through the SetEvent() function, and the main thread is notified. The main thread will perform the next calculation operation after receiving the event signal.
  • This embodiment conducts experiments on real sequences to verify the effectiveness of the human hand-object motion tracking method proposed in this embodiment.
  • the collaborative differential evolution filtering algorithm proposed in this embodiment uses 32 particles for the human hand posture tracker, 8 particles for the object posture tracker, and both trackers for each frame of image input, the DE algorithm is iteratively optimized 60 Second-rate.
  • the tracking experiment in this embodiment runs on a PC with a 4-core Core i5 2.9 GHz CPU, 4.0 GB memory and Nvidia GeForce GTX 950M GPU, and it takes an average of 5 seconds to track one frame of image.
  • the real sequence is used to evaluate the tracking algorithm, and the depth image sequence captured by the Microsoft Kinect 1.0 Beta 2 SDK is used as the observation input, the image resolution is 640 ⁇ 480, and the frame rate is 30 frames/s.
  • the experiment is divided into two groups.
  • the first experiment tracks the movement process of the human hand grasping the sphere.
  • Figures 5(a)-(c) show the tracking results of this embodiment on some frames of the real sequence of the interaction process between the human hand and the sphere;
  • Figure 5(a) is the RGB image captured by the Kinect RGB camera
  • Figure 5(b) is the depth image captured by the Kinect depth camera and subjected to simple depth segmentation
  • Figure 5(c) is the collaborative differential evolution filtering algorithm The result of tracking the depth image sequence;
  • FIGS 6(a)-(c) show the tracking results of this embodiment on some frames of the real sequence of the interaction process between the human hand and the cylinder;
  • Figure 6(a) is the RGB image captured by the Kinect RGB camera, and
  • Figure 6 (b) is the depth image captured by the Kinect depth camera and after simple depth segmentation.
  • Figure 6(c) is the result of tracking the depth image sequence using the collaborative differential evolution filtering algorithm. It can be seen from the experimental results that the collaborative differential evolution filtering algorithm can effectively track the interaction process between human hands and objects.
  • a human-hand-object interaction process tracking system based on cooperative differential evolution filtering including:
  • the image processing module to be measured is configured to extract the foreground area corresponding to the human hand and the object in the image to be measured, and generate an observation depth map and a corresponding observation silhouette map;
  • the hand-object movement posture module is configured to obtain the hand movement posture and the object movement posture based on the constructed hand kinematics model and the object kinematics model respectively.
  • the hand movement posture and the object movement posture form the hand-object posture vector and generate the corresponding rendering Depth map
  • the matching error function building module is configured to take the image to be measured as the observation input, and to calculate the depth feature matching degree between the observation depth map and the rendered depth map and the silhouette feature matching degree between the observation silhouette map and the rendered depth map. Construct a matching error function between the observation input and the human hand-object pose vector;
  • the tracking module is configured to use the cooperative differential evolution filtering algorithm to optimize the posture of the human hand and the object by calculating the matching error function, and obtain the motion tracking result of the human hand and the object during the hand-object interaction process.
  • An electronic device including a memory and a processor, and computer instructions stored on the memory and running on the processor.
  • the computer instructions When executed by the processor, it completes a human hand-object interaction process tracking based on cooperative differential evolution filtering The steps described in the method.
  • a computer-readable storage medium is used to store computer instructions that, when executed by a processor, complete the steps described in a method for tracking a human hand-object interaction process based on collaborative differential evolution filtering.
  • the differential evolution algorithm is integrated into the particle filter framework, and two coordinated particle filter trackers are used to separately track the human hand and the object.
  • Object motion tracking using differential evolution to optimize the matching error under current observations to drive particles to move to high-likelihood regions, improve particle filter sample distribution, and achieve robust tracking of human hand and object motion with a small number of particles.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Genetics & Genomics (AREA)
  • Physiology (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé de suivi de processus d'interaction main humaine-objet basé sur un filtrage d'évolution différentielle collaborative. Le procédé consiste à : extraire une zone de premier plan correspondant à une main humaine et à un objet dans une image à détecter, et générer une carte de profondeur d'observation et une carte de silhouette d'observation correspondante ; obtenir respectivement une posture de mouvement de main humaine et une posture de mouvement d'objet sur la base d'un modèle cinématique de main humaine construit et d'un modèle cinématique d'objet, la posture de mouvement de main humaine et la posture de mouvement d'objet formant un vecteur de posture main humaine-objet, et générer une carte de profondeur de rendu correspondante ; par la considération de l'image à détecter en tant qu'entrée d'observation, construire une fonction d'erreur d'appariement de l'entrée d'observation et du vecteur de posture main humaine-objet ; et faire appel à un algorithme de filtrage d'évolution différentielle collaborative pour appliquer respectivement une optimisation de posture à la main humaine et à l'objet au moyen d'un calcul de la fonction d'erreur d'appariement, de manière à obtenir un suivi de mouvement de la main humaine et de l'objet pendant le processus d'interaction main humaine-objet. Le suivi robuste du mouvement main humaine-objet est effectué en faisant intervenir un petit nombre de particules.
PCT/CN2020/101671 2020-02-06 2020-07-13 Procédé de suivi de processus d'interaction main humaine-objet basé sur un filtrage d'évolution différentielle collaborative WO2021155653A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010081555.9A CN111311648A (zh) 2020-02-06 2020-02-06 基于协作差分进化滤波的人手-物体交互过程跟踪方法
CN202010081555.9 2020-02-06

Publications (1)

Publication Number Publication Date
WO2021155653A1 true WO2021155653A1 (fr) 2021-08-12

Family

ID=71156439

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/101671 WO2021155653A1 (fr) 2020-02-06 2020-07-13 Procédé de suivi de processus d'interaction main humaine-objet basé sur un filtrage d'évolution différentielle collaborative

Country Status (2)

Country Link
CN (1) CN111311648A (fr)
WO (1) WO2021155653A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111311648A (zh) * 2020-02-06 2020-06-19 青岛理工大学 基于协作差分进化滤波的人手-物体交互过程跟踪方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102148921A (zh) * 2011-05-04 2011-08-10 中国科学院自动化研究所 基于动态群组划分的多目标跟踪方法
US20120113223A1 (en) * 2010-11-05 2012-05-10 Microsoft Corporation User Interaction in Augmented Reality
CN110007754A (zh) * 2019-03-06 2019-07-12 清华大学 手与物体交互过程的实时重建方法及装置
CN111311648A (zh) * 2020-02-06 2020-06-19 青岛理工大学 基于协作差分进化滤波的人手-物体交互过程跟踪方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120113223A1 (en) * 2010-11-05 2012-05-10 Microsoft Corporation User Interaction in Augmented Reality
CN102148921A (zh) * 2011-05-04 2011-08-10 中国科学院自动化研究所 基于动态群组划分的多目标跟踪方法
CN110007754A (zh) * 2019-03-06 2019-07-12 清华大学 手与物体交互过程的实时重建方法及装置
CN111311648A (zh) * 2020-02-06 2020-06-19 青岛理工大学 基于协作差分进化滤波的人手-物体交互过程跟踪方法

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
I. OIKONOMIDIS ; N. KYRIAZIS ; A. A. ARGYROS: "Tracking the articulated motion of two strongly interacting hands", COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012 IEEE CONFERENCE ON, IEEE, 16 June 2012 (2012-06-16), pages 1862 - 1869, XP032232284, ISBN: 978-1-4673-1226-4, DOI: 10.1109/CVPR.2012.6247885 *
LI DONGNIAN, ZHOU YIQI: "Combining Differential Evolution with Particle Filtering for Articulated Hand Tracking from Single Depth Images", INTERNATIONAL JOURNAL OF SIGNAL PROCESSING, IMAGE PROCESSING AND PATTERN RECOGNITION, vol. 8, no. 4, 30 April 2015 (2015-04-30), pages 237 - 248, XP055833579, ISSN: 2005-4254, DOI: 10.14257/ijsip.2015.8.4.21 *
LI DONGNIAN; CHEN CHENGJUN: "Tracking a hand in interaction with an object based on single depth images", MULTIMEDIA TOOLS AND APPLICATIONS., KLUWER ACADEMIC PUBLISHERS, BOSTON., US, vol. 78, no. 6, 30 July 2018 (2018-07-30), US, pages 6745 - 6762, XP036755923, ISSN: 1380-7501, DOI: 10.1007/s11042-018-6452-0 *
LI, DONGNIAN: "Research on 3D Hand Motion Tracking Based on Depth Images", DOCTORAL DISSERTATIONS, 31 May 2015 (2015-05-31), pages 1 - 124, XP009529615, ISSN: 1674-022X *
WANG PEICHONG, HE YI-CHAO, QIAN XU: "Cooperation Differential Evolution Algorithm with Double Populations and Two Evolutionary Models", COMPUTER ENGINEERING AND APPLICATIONS, HUABEI JISUAN JISHU YANJIUSUO, CN, vol. 44, no. 25, 1 January 2008 (2008-01-01), CN, pages 60 - 64, XP055833938, ISSN: 1002-8331, DOI: 10.3778/j.issn.1002- 8331.2008.25.019 *

Also Published As

Publication number Publication date
CN111311648A (zh) 2020-06-19

Similar Documents

Publication Publication Date Title
Bandini et al. Analysis of the hands in egocentric vision: A survey
Palafox et al. Npms: Neural parametric models for 3d deformable shapes
Oberweger et al. Deepprior++: Improving fast and accurate 3d hand pose estimation
Stenger et al. Model-based hand tracking using a hierarchical bayesian filter
Wang et al. Hmor: Hierarchical multi-person ordinal relations for monocular multi-person 3d pose estimation
Dockstader et al. Multiple camera tracking of interacting and occluded human motion
Gall et al. Optimization and filtering for human motion capture: A multi-layer framework
Liang et al. Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization
Lei et al. Cadex: Learning canonical deformation coordinate space for dynamic surface representation via neural homeomorphism
Pons-Moll et al. Model-based pose estimation
Li et al. Hmor: Hierarchical multi-person ordinal relations for monocular multi-person 3d pose estimation
Zhou et al. Toch: Spatio-temporal object correspondence to hand for motion refinement
Wu et al. HPGCN: Hierarchical poselet-guided graph convolutional network for 3D pose estimation
Huang et al. Tracking-by-detection of 3d human shapes: from surfaces to volumes
JP2008140101A (ja) 無制約、リアルタイム、かつマーカ不使用の手トラッキング装置
Baradel et al. Posebert: A generic transformer module for temporal 3d human modeling
Dani et al. 3dposelite: a compact 3d pose estimation using node embeddings
Lifkooee et al. Real-time avatar pose transfer and motion generation using locally encoded laplacian offsets
Ikram et al. Real time hand gesture recognition using leap motion controller based on CNN-SVM architechture
Zangeneh et al. A probabilistic framework for visual localization in ambiguous scenes
WO2021155653A1 (fr) Procédé de suivi de processus d'interaction main humaine-objet basé sur un filtrage d'évolution différentielle collaborative
CN112199994A (zh) 一种实时检测rgb视频中的3d手与未知物体交互的方法和装置
Li et al. 3D hand reconstruction from a single image based on biomechanical constraints
Ni et al. A hybrid framework for 3-D human motion tracking
Yu et al. Multi-activity 3D human motion recognition and tracking in composite motion model with synthesized transition bridges

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20917332

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20917332

Country of ref document: EP

Kind code of ref document: A1