WO2021155653A1 - Human hand-object interaction process tracking method based on collaborative differential evolution filtering - Google Patents

Human hand-object interaction process tracking method based on collaborative differential evolution filtering Download PDF

Info

Publication number
WO2021155653A1
WO2021155653A1 PCT/CN2020/101671 CN2020101671W WO2021155653A1 WO 2021155653 A1 WO2021155653 A1 WO 2021155653A1 CN 2020101671 W CN2020101671 W CN 2020101671W WO 2021155653 A1 WO2021155653 A1 WO 2021155653A1
Authority
WO
WIPO (PCT)
Prior art keywords
hand
observation
posture
human hand
human
Prior art date
Application number
PCT/CN2020/101671
Other languages
French (fr)
Chinese (zh)
Inventor
李东年
郭阳
陈成军
赵正旭
温晋杰
张庆海
Original Assignee
青岛理工大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 青岛理工大学 filed Critical 青岛理工大学
Publication of WO2021155653A1 publication Critical patent/WO2021155653A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present disclosure relates to the technical field of three-dimensional human hand tracking, and in particular to a method for tracking a human hand-object interaction process based on collaborative differential evolution filtering.
  • Computer vision-based 3D human hand tracking can be applied in fields such as robot teaching and learning, motion capture, human-computer interaction, gesture recognition, etc.
  • the tracking of the human hand-object interaction process is trapped by many complex factors.
  • the human hand has multiple degrees of freedom, the problem is essentially a high-dimensional space problem;
  • the human hand will cause frequent occlusion during the interaction with the object, including the mutual occlusion between the human hand and the object being manipulated.
  • the self-occlusion of the human hand in addition, the useful information carried by the object context will promote the recognition and estimation of human hand movement.
  • vision-based hand-object tracking methods are generally divided into two categories: appearance-based methods and model-based methods.
  • the appearance-based method builds a mapping through learning, and maps the image feature space to the human hand-object state space, thereby directly estimating the human hand state and the object state from the image features.
  • This type of method does not need to be initialized, and the tracking speed is fast, but its accuracy is affected by the training samples.
  • Et al. proposed a method to simultaneously recognize the human hand movement and the manipulated object, and express the time-varying relationship between the human hand movement and the object through a conditional random field model, but this method does not give detailed information about the hand movement posture.
  • Romero et al. proposed a real-time appearance-based non-parametric method to reconstruct the three-dimensional pose of the human hand interacting with the object.
  • the method uses a histogram of directional gradients (HoG) to describe the characteristics of the human hand and executes it in a large template database. Nearest neighbor search is used to find the hand pose that best matches the input image.
  • this method cannot precisely track the hand movement in a high-dimensional space.
  • Gupta et al. proposed a Bayesian method to integrate multiple perception tasks in the process of human-object interaction, and seek consistent semantic expression by imposing spatial constraints on the perception elements. This method can recognize objects and corresponding actions when the appearance is not sufficiently discernible, and can also recognize human actions from static images without using any motion information. However, this method does not give detailed information about the posture of the human body. Yao et al.
  • Model-based methods use pre-established human hand models and object models to generate hand-object posture hypotheses.
  • the features extracted from the model are compared with those extracted from visual observations, and the similarity between the two is evaluated.
  • a group of human hand-object states with the best similarity are searched in the model state space.
  • This type of method can use more prior information (such as human hand shape, joint constraints, etc.), but its tracking process needs to be initialized, and it faces a difficult problem of searching in a high-dimensional space.
  • Hamer et al. activated an independent local tracker for each part of the multi-joint human hand, used a paired Markov random field to connect two adjacent human hand parts, and used belief propagation (BP) to find the optimal human hand.
  • BP belief propagation
  • Oikonomidis et al. proposed a model-based method to track the movement of the human hand and the manipulated object at the same time. This method establishes a three-dimensional model and a motion model for the human hand and the object at the same time.
  • the tracking problem is regarded as a sequence optimization problem, and the search and input image matching error
  • the smallest human hand pose parameters and model pose parameters, the system uses a multi-eye RGB image sequence as input.
  • Kyriazis et al. used a depth camera to obtain observation input, and proposed a method of searching only the hand posture parameters.
  • the state of the object is derived from the state of the hand and the force model between the hand and the object.
  • the role involves many factors, and it is difficult to accurately model it.
  • the model-based method there are fineness problems in the three-dimensional modeling of human hands and objects, and the particle filter framework is used to track the motion of human hands or human bodies. Due to the extreme sparsity of particle sampling in high-dimensional space, It is difficult to use a limited number of particles to effectively express the true posterior distribution of the human hand state, which can easily lead to tracking failure.
  • the present disclosure proposes a method for tracking the human hand-object interaction process based on cooperative differential evolution filtering, which uses a model-based method to simultaneously track human hands and objects in the human hand-object interaction process, and integrates the differential evolution algorithm
  • the particle filter framework two coordinated particle filter trackers are used to track the movement of human hands and objects respectively, and differential evolution is used to optimize the matching error under the current observation to drive the particles to move to the high likelihood area and improve the particles.
  • the sample distribution is filtered, so that a small number of particles can be used to achieve robust tracking of human hand-object movement.
  • the present disclosure provides a method for tracking human hand-object interaction process based on cooperative differential evolution filtering, including:
  • the hand motion posture and the object motion posture are obtained respectively.
  • the hand motion posture and the object motion posture form the hand-object posture vector and generate the corresponding rendering depth map;
  • the collaborative differential evolution filtering algorithm is used to calculate the matching error function to optimize the posture of the human hand and the object respectively, and obtain the motion tracking results of the human hand and the object during the hand-object interaction process.
  • the present disclosure provides a human-hand-object interaction process tracking system based on cooperative differential evolution filtering, including:
  • the image processing module to be measured is configured to extract the foreground area corresponding to the human hand and the object in the image to be measured, and generate an observation depth map and a corresponding observation silhouette map;
  • the hand-object movement posture module is configured to obtain the hand movement posture and the object movement posture based on the constructed hand kinematics model and the object kinematics model respectively.
  • the hand movement posture and the object movement posture form the hand-object posture vector and generate the corresponding rendering Depth map
  • the matching error function building module is configured to take the image to be measured as the observation input, and to calculate the depth feature matching degree between the observation depth map and the rendered depth map and the silhouette feature matching degree between the observation silhouette map and the rendered depth map. Construct a matching error function between the observation input and the human hand-object pose vector;
  • the tracking module is configured to use the cooperative differential evolution filtering algorithm to optimize the posture of the human hand and the object by calculating the matching error function, and obtain the motion tracking result of the human hand and the object during the hand-object interaction process.
  • the present disclosure provides an electronic device including a memory, a processor, and computer instructions stored in the memory and running on the processor.
  • a cooperative differential evolution filtering is completed. The steps described in the human-hand-object interaction process tracking method.
  • the present disclosure provides a computer-readable storage medium for storing computer instructions that, when executed by a processor, complete the method for tracking a human hand-object interaction process based on collaborative differential evolution filtering. step.
  • a model-based method is used to simultaneously track human hands and objects in the process of human-object interaction, and the differential evolution algorithm is integrated into the particle filter framework, and a new improved particle filter algorithm-collaborative differential evolution filtering to track human hands-is proposed.
  • the differential evolution algorithm is integrated into the particle filter framework, and a new improved particle filter algorithm-collaborative differential evolution filtering to track human hands-is proposed.
  • two coordinated particle filter trackers are used to track the movement of human hands and objects respectively, and differential evolution is used to optimize the matching error under the current observation to drive the particles to move to the high-likelihood area and improve the particle filter samples. Distribution to achieve robust tracking of human hand and object motion with a small number of particles.
  • FIG. 1 is a schematic diagram of a method for tracking a human hand-object interaction process based on cooperative differential evolution filtering according to Embodiment 1 of the disclosure;
  • FIG. 2 is a schematic diagram of a kinematics model of a human hand provided in Embodiment 1 of the disclosure;
  • Fig. 3(a) is a schematic diagram of a human hand-spherical body model provided in Embodiment 1 of the present disclosure
  • Fig. 3(b) is a schematic diagram of a human hand-pillar model provided in Embodiment 1 of the present disclosure
  • Embodiment 4 is a flow chart of human hand-object tracking provided by Embodiment 1 of the present disclosure.
  • FIGS. 6(a)-(c) are diagrams of the tracking results of the interaction process between the human hand and the cylinder provided in Embodiment 1 of the present disclosure.
  • this embodiment provides a method for tracking human hand-object interaction process based on cooperative differential evolution filtering, including:
  • S1 Extract the foreground area corresponding to the human hand and the object in the image to be measured, and generate the observation depth map and the corresponding observation silhouette map; based on the constructed human hand kinematics model and object kinematics model to obtain the hand movement posture and object movement posture, hand movement
  • the posture and object motion posture compose the hand-object posture vector and generate the corresponding rendering depth map;
  • S2 Take the image to be measured as the observation input, and calculate the depth feature matching degree between the observation depth map and the rendered depth map, and the silhouette feature matching degree between the observation silhouette image and the rendered depth map, and construct the observation input and human hand-object. Matching error function of the pose vector;
  • S3 The coordinated differential evolution filtering algorithm is used to calculate the matching error function to optimize the posture of the human hand and the object respectively, and obtain the motion tracking of the human hand and the object during the hand-object interaction process.
  • this embodiment uses a method based on the human hand-object kinematics model to track the interaction process between the human hand and the object, establishes a three-dimensional model and a motion model for the human hand and the object, and simultaneously tracks the motion of the human hand and the object in the three-dimensional space.
  • the human hand 3D model is used to generate the human hand posture hypothesis
  • the object 3D model is used to generate the object posture hypothesis
  • the matching error between the model feature group and the observation feature group obtained from the input image is calculated
  • the tracking problem is regarded as a sequence optimization problem , Search for the state parameter that minimizes the matching error in the state space of the human hand and the object, that is, the optimal solution corresponding to the current frame of the input image.
  • Figure 2 shows the hand kinematics model.
  • the hand motion state x h contains 29 freedoms in total.
  • Degree variables include global palm motion with 6 degrees of freedom, local finger motion with 20 degrees of freedom, and 3 degrees of freedom for the wrist joint.
  • the CMC joints of each finger are fixed, and the palm is modeled as a rigid body.
  • Its motion corresponds to 6 global degrees of freedom (3 translation and 3 rotation) of the human hand; the motion of 5 fingers corresponds to 20 local degrees of freedom, and each finger is composed of Modeling with 4 degrees of freedom; except for the thumb, the MCP joint of each finger and the TM joint of the thumb both contain 2 degrees of freedom (1 flexion and extension and 1 abduction and adduction), while the PIP and DIP joints of each finger and the thumb’s
  • the MCP joint and the IP joint only contain one degree of freedom in flexion and extension; the wrist joint contains one degree of freedom in flexion and extension, one degree of freedom in abduction and extension, and one degree of freedom in scale transformation.
  • the object motion state x o contains the 6-degree-of-freedom pose state (3 translations and 3 rotations) of the object in the three-dimensional space.
  • This embodiment limits the value of the angles of the finger joints and the wrist joints of the human hand within a certain range based on human anatomical factors.
  • the application of these motion constraints can not only ensure that the solution obtained by the posture estimation process is effective, but also greatly Compress the search range of the human state space and reduce the search difficulty.
  • PTC Pro/Engineer and Multigen-Paradigm Creator are used to establish a unified three-dimensional model for the human hand and the manipulated object with parameterized geometric primitives, and a tree-like hierarchical organization structure is established for the human hand-object model in the Creator. Add local coordinate system and DOF (Degree of Freedom) motion nodes.
  • DOF Degree of Freedom
  • the three-dimensional human hand model established in this embodiment includes a part of the human forearm, so that the established three-dimensional model can describe the forearm pixels connected to the human hand pixels in the segmented depth image.
  • the wrist joint has a scale transformation. Degree of freedom, capable of telescopic transformation of the forearm model.
  • This embodiment proposes the interaction process between the human hand and the following two types of objects: a sphere and a column.
  • Figure 3(a) shows the three-dimensional model of the human hand and the sphere
  • Figure 3(b) shows the three-dimensional model of the human hand and the column.
  • Model the method used is also suitable for tracking the interaction process between human hands and objects of other shapes.
  • this embodiment when constructing the matching error function and the observation likelihood function, this embodiment combines the two types of feature information, the depth feature and the silhouette feature. Taking the depth image obtained by the Kinect depth camera as the observation input z, the foreground area corresponding to the human hand and the manipulated object is extracted through simple depth threshold segmentation, and the observation depth map z d (z) is generated, and the depth map z obtained from the observation is d (z) generate observation silhouette map z s (z);
  • the corresponding rendering depth map r d (x ho ) is generated by the rendering depth Figure r d (x ho ) generates a rendered silhouette image r s (x ho );
  • z s (z) and r s (x ho ) are both binary images, which are taken at the foreground area corresponding to the human hand and the manipulated object It is 1, and the value is 0 in the background.
  • the matching error function is used to express the matching degree between the observation z and the human hand-object pose vector x ho .
  • a small matching error means a high matching degree.
  • the matching error function is defined as:
  • E (z, x ho) by the depth of feature items E d, silhouette characteristic items E s and a penalty term E p of three parts, ⁇ d, ⁇ s and ⁇ p weighting factor is a constant weight of each portion.
  • E d measures the depth deviation between the observed depth map z d (z) and the rendered depth map r d (x ho ) corresponding to the attitude vector x ho , which is defined as follows:
  • the depth deviation (in mm as the measurement unit) is calculated and accumulated pixel by pixel on the entire feature map, and the accumulated sum is normalized by dividing by the total area of the human hand and the pixel area of the operated object. Certain large depth deviations will cause large changes in the function value, thereby affecting the performance of the search method. For this reason, the maximum depth deviation constant T d is introduced, and the range of the depth deviation on each pixel is limited to [0, T d ].
  • E s describes the matching degree of silhouette features by calculating the size of the non-overlapping area between the observation silhouette image z s (z) and the rendered silhouette image r s (x ho ), which is defined as follows:
  • the first part of the above formula calculates the pixel area belonging to the observation silhouette area z s (z) but not the rendered silhouette area r s (x ho ), and the second part calculates the pixel area belonging to r s (x ho ) instead of z s (z ) Of the pixel area, the two parts were standardized.
  • the application of the regional feature item E s has a smooth effect on the objective function, reducing the local minimum around the global minimum, so that the optimization process can better converge to the actual global minimum, and enhance the robustness of the optimization process sex.
  • J represents three pairs of adjacent fingers except the thumb, Represents the deviation between the abduction and adduction angles of the MCP joints of a pair of fingers in the human hand posture hypothesis x h.
  • observation likelihood function and the matching error function E(z, x ho ) are in a monotonically decreasing relationship.
  • the observation likelihood function is defined as follows:
  • ⁇ e is a constant normalization factor, and its value is determined by the observation noise.
  • the present embodiment uses the cooperative differential evolution filter algorithm to optimize the poses of the human hand and the object by calculating the matching error function.
  • This embodiment integrates the differential evolution algorithm into the particle filter framework, and proposes a new tracking method.
  • the algorithm that is, the cooperative differential evolution filter algorithm to track the human hand-object movement in the high-dimensional space.
  • the algorithm uses two cooperative particle filter trackers to track the human hand and the object respectively, and uses differential evolution to track the current observations.
  • the matching error is optimized to improve the particle filter sample distribution.
  • Differential evolution algorithm is an efficient emerging swarm intelligence optimization algorithm, which can effectively solve the optimization problem of nonlinear and non-differentiable objective equations.
  • differential evolution passes through N D-dimensional vectors Iterative evolution of to search for the global optimal solution in a continuous space.
  • the evolution of the population is carried out through the three basic operations of mutation, crossover, and selection; mutation and crossover operations are used to generate new candidate individuals, and the selection operation is used to determine whether the newly generated candidate individuals can survive in the next generation.
  • differential evolution randomly selects 3 different individuals from the previous generation, and combines them to generate a mutant individual.
  • the individual indexes r 1 , r 2 , and r 3 are randomly selected in the range of [1,2,...,N], they are different from each other and different from i; F is the difference vector The scaling factor of to control the convergence speed of the search process;
  • the scale factor F of the standard differential evolution algorithm is a constant.
  • rand j ⁇ U(0,1) is a random number uniformly distributed in the interval [0,1]; the cross parameter CR determines the probability of each element of the candidate individual being inherited from the variant individual.
  • CR Take 0.9; It is a random number selected in the range [1,2,...,D] to ensure that the candidate individual obtains at least one element from the variant individual.
  • a differential evolution population is allocated to the human hand and the manipulated object to perform pose optimization.
  • the human hand motion posture x h and the object motion posture x o of the current frame are respectively optimized, and these two populations are denoted as the population h and the population o. ;
  • Particle filtering is a robust motion tracking framework. Through the propagation of multiple samples in time, it has the characteristic of expressing multimodal distribution.
  • the basic idea is: according to the sampling value of the posterior probability distribution p(x t-1
  • One of the main problems of the standard particle filter algorithm is that it uses the state transition prior model p(x t
  • the standard particle filter algorithm needs to collect a large number of samples. If the sample set is too small, it will cause sample poverty, reduce the estimation accuracy, and even lead to the divergence of the sample set and the failure of the estimation.
  • Differential evolution filtering integrates the differential evolution algorithm into the particle filter framework. After predicting the new particle position, using the matching error function under the latest observation z t as the objective function, running the differential evolution algorithm to iteratively evolve the particles and move the particles Go to an area with greater observation likelihood in the state space.
  • the optimization process of the particle position can be regarded as an importance sampling process, and the new particle swarm generated after the optimization process can be regarded as the optimal importance distribution p(x t
  • the particle filter sample distribution is improved, and the convergence of the particle set is accelerated, so that a small number of particles can be used to achieve robust tracking of human hand-object motion.
  • differential evolution filtering defines the transfer prior p(x t
  • Resampling According to the weight of the particle set Perform resampling to obtain a new set of equal-weight particles
  • State estimation output the system state estimation value based on the maximum a posteriori criterion.
  • two coordinated differential evolution filter trackers are used to track the motion and posture of a human hand and an object respectively, and a collaborative differential evolution filter algorithm is proposed.
  • a differential evolution filter tracker By assigning a differential evolution filter tracker to the human hand and the manipulated object, respectively, the human hand movement posture x h and the object movement posture x o are tracked.
  • the two trackers are not independent of each other, but constantly exchange information during the tracking process.
  • the human hand tracker is iteratively optimizing the human hand motion posture x h in the current frame, the posture x o of the manipulated object is regarded as static, and the posture x o of the manipulated object is adjusted by the corresponding object tracker at the beginning of the optimization process.
  • the tracking result of the previous frame is determined; while the object tracker considers the hand posture x h as static when iteratively optimizes the object motion posture x o of the current frame, and the hand posture x h is tracked by the hand at the beginning of the optimization process To determine the tracking result of the previous frame.
  • each tracker After each tracker obtains the posture tracking result of the current frame, it immediately passes it to another tracker, and the corresponding posture value remains static during the iterative optimization process of the next frame of the other tracker.
  • This collaborative tracking scheme not only models the occlusion by considering the human hand and the manipulated object, but also decomposes the joint pose space through the use of multiple trackers, and decomposes the high-dimensional problem into multiple relatively low-dimensional problems. The problem of dimensionality reduces the difficulty of optimizing the search and reduces the computational cost.
  • the depth image obtained by the Kinect depth camera is used as the observation input, and the human hand-object tracking prototype system is developed based on the 3D graphics rendering technology, and the pre-configured 3D human hand-object model is loaded into the 3D graphics.
  • the rendering engine OpenSceneGraph (OSG) during the tracking process, the osgSim::DOFTransform class is used to control the movement of the human hand and the object, and the OSG off-screen rendering technology is used to render the depth image of the human hand-object model, which is used to communicate with the observation image.
  • the matching error value and observation likelihood value of each particle are compared and calculated, and the state parameter that minimizes the matching error is searched in the state space of the human hand and the object through the collaborative differential evolution filtering algorithm.
  • OSG is an open source cross-platform graphics engine based on OpenGL. It uses a tree structure (scene node tree) to organize spatial data, and achieves high performance through a variety of scene cutting technologies, rendering state sorting, and multi-threaded rendering mechanisms. 3D graphics rendering. The rendering process of each frame of OSG can be broken down into three stages: update traversal, crop traversal, and drawing traversal.
  • update traversal crop traversal
  • drawing traversal drawing traversal.
  • multi-threaded mode is used to render the scene.
  • a thread is created for each camera and its corresponding graphics device. The clipping operation is performed in the thread, and the drawing operation is performed in the graphics device thread. This multi-threaded mode will start a new frame of scene update and cropping operations before the end of the drawing work of the graphics device thread, thereby improving the operating efficiency of the system and maximizing the computing power of the system.
  • this embodiment is based on the proposed collaborative differential evolution filtering algorithm, by using OSG and off-screen rendering technology to develop a hand-object tracking prototype system, creating a virtual camera to render each hand-object posture hypothesis The corresponding depth image is calculated for matching error.
  • This camera has a scene model node as a child node and is bound to a device cache object at the same time.
  • the scene model node contains three-dimensional models of human hands and objects, and the device cache object can be bound to the camera through a frame cache object (FBO).
  • FBO frame cache object
  • the virtual camera will render the content of its scene model child nodes to its bound cache object.
  • the system of this embodiment uses a collaborative differential evolution filtering algorithm to iteratively calculate new human hand-object posture parameters.
  • the system creates a node callback object (osg::NodeCallback) for the scene model node, which is used to update the posture parameters of the human hand and object model during the update phase of each frame of OSG.
  • the system also creates a drawing callback object (osg::Camera::DrawCallback) for the camera.
  • the system calculates the rendered depth image in this callback object The matching error with the observed depth image.
  • each frame will start a thread for each camera and its associated graphics device.
  • the update phase of the next frame will begin.
  • the system creates an event object for the camera, and uses Win32 API's SetEvent() function and WaitForSingleObject() function to synchronize and communicate between threads.
  • the corresponding event object is set to a signaled state through the SetEvent() function, and the main thread is notified. The main thread will perform the next calculation operation after receiving the event signal.
  • This embodiment conducts experiments on real sequences to verify the effectiveness of the human hand-object motion tracking method proposed in this embodiment.
  • the collaborative differential evolution filtering algorithm proposed in this embodiment uses 32 particles for the human hand posture tracker, 8 particles for the object posture tracker, and both trackers for each frame of image input, the DE algorithm is iteratively optimized 60 Second-rate.
  • the tracking experiment in this embodiment runs on a PC with a 4-core Core i5 2.9 GHz CPU, 4.0 GB memory and Nvidia GeForce GTX 950M GPU, and it takes an average of 5 seconds to track one frame of image.
  • the real sequence is used to evaluate the tracking algorithm, and the depth image sequence captured by the Microsoft Kinect 1.0 Beta 2 SDK is used as the observation input, the image resolution is 640 ⁇ 480, and the frame rate is 30 frames/s.
  • the experiment is divided into two groups.
  • the first experiment tracks the movement process of the human hand grasping the sphere.
  • Figures 5(a)-(c) show the tracking results of this embodiment on some frames of the real sequence of the interaction process between the human hand and the sphere;
  • Figure 5(a) is the RGB image captured by the Kinect RGB camera
  • Figure 5(b) is the depth image captured by the Kinect depth camera and subjected to simple depth segmentation
  • Figure 5(c) is the collaborative differential evolution filtering algorithm The result of tracking the depth image sequence;
  • FIGS 6(a)-(c) show the tracking results of this embodiment on some frames of the real sequence of the interaction process between the human hand and the cylinder;
  • Figure 6(a) is the RGB image captured by the Kinect RGB camera, and
  • Figure 6 (b) is the depth image captured by the Kinect depth camera and after simple depth segmentation.
  • Figure 6(c) is the result of tracking the depth image sequence using the collaborative differential evolution filtering algorithm. It can be seen from the experimental results that the collaborative differential evolution filtering algorithm can effectively track the interaction process between human hands and objects.
  • a human-hand-object interaction process tracking system based on cooperative differential evolution filtering including:
  • the image processing module to be measured is configured to extract the foreground area corresponding to the human hand and the object in the image to be measured, and generate an observation depth map and a corresponding observation silhouette map;
  • the hand-object movement posture module is configured to obtain the hand movement posture and the object movement posture based on the constructed hand kinematics model and the object kinematics model respectively.
  • the hand movement posture and the object movement posture form the hand-object posture vector and generate the corresponding rendering Depth map
  • the matching error function building module is configured to take the image to be measured as the observation input, and to calculate the depth feature matching degree between the observation depth map and the rendered depth map and the silhouette feature matching degree between the observation silhouette map and the rendered depth map. Construct a matching error function between the observation input and the human hand-object pose vector;
  • the tracking module is configured to use the cooperative differential evolution filtering algorithm to optimize the posture of the human hand and the object by calculating the matching error function, and obtain the motion tracking result of the human hand and the object during the hand-object interaction process.
  • An electronic device including a memory and a processor, and computer instructions stored on the memory and running on the processor.
  • the computer instructions When executed by the processor, it completes a human hand-object interaction process tracking based on cooperative differential evolution filtering The steps described in the method.
  • a computer-readable storage medium is used to store computer instructions that, when executed by a processor, complete the steps described in a method for tracking a human hand-object interaction process based on collaborative differential evolution filtering.
  • the differential evolution algorithm is integrated into the particle filter framework, and two coordinated particle filter trackers are used to separately track the human hand and the object.
  • Object motion tracking using differential evolution to optimize the matching error under current observations to drive particles to move to high-likelihood regions, improve particle filter sample distribution, and achieve robust tracking of human hand and object motion with a small number of particles.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed is a human hand-object interaction process tracking method based on collaborative differential evolution filtering. The method comprises: extracting a foreground area corresponding to a human hand and an object in an image to be detected, and generating an observation depth map and a corresponding observation silhouette map; respectively obtaining a human hand motion posture and an object motion posture on the basis of a constructed human hand kinematic model and an object kinematic model, wherein the human hand motion posture and the object motion posture form a human hand-object posture vector, and generating a corresponding rendering depth map; by means of taking the image to be detected as an observation input, constructing a matching error function of the observation input and the human hand-object posture vector; and using a collaborative differential evolution filtering algorithm to respectively perform posture optimization on the human hand and the object by means of calculating the matching error function, so as to obtain motion tracking of the human hand and the object during the human hand-object interaction process. The robust tracking of human hand-object motion is performed by using a small number of particles.

Description

基于协作差分进化滤波的人手-物体交互过程跟踪方法Human-hand-object interaction process tracking method based on cooperative differential evolution filtering 技术领域Technical field
本公开涉及三维人手跟踪技术领域,特别是涉及一种基于协作差分进化滤波的人手-物体交互过程跟踪方法。The present disclosure relates to the technical field of three-dimensional human hand tracking, and in particular to a method for tracking a human hand-object interaction process based on collaborative differential evolution filtering.
背景技术Background technique
本部分的陈述仅仅是提供了与本公开相关的背景技术信息,不必然构成在先技术。The statements in this section merely provide background technical information related to the present disclosure, and do not necessarily constitute prior art.
基于计算机视觉的三维人手跟踪可以应用在如机器人示教学习、运动捕捉、人机交互、手势识别等领域,然而人手-物体交互过程的跟踪受困于多个复杂因素。首先,由于人手具有多个自由度,该问题本质上是一个高维空间问题;其次,人手在与物体在交互过程中会造成频繁的遮挡,包括人手与被操作的物体之间的相互遮挡以及人手的自遮挡;另外,物体上下文所携带的有用信息会对人手运动的识别和估计起到促进作用。Computer vision-based 3D human hand tracking can be applied in fields such as robot teaching and learning, motion capture, human-computer interaction, gesture recognition, etc. However, the tracking of the human hand-object interaction process is trapped by many complex factors. First, because the human hand has multiple degrees of freedom, the problem is essentially a high-dimensional space problem; secondly, the human hand will cause frequent occlusion during the interaction with the object, including the mutual occlusion between the human hand and the object being manipulated. The self-occlusion of the human hand; in addition, the useful information carried by the object context will promote the recognition and estimation of human hand movement.
目前,基于视觉的人手-物体跟踪方法通常分为两类:基于表观的方法和基于模型的方法。At present, vision-based hand-object tracking methods are generally divided into two categories: appearance-based methods and model-based methods.
基于表观的方法通过学习来建立一个映射,将图像特征空间映射到人手-物体状态空间,从而直接从图像特征估计出人手状态和物体状态。该类方法不需要初始化,跟踪速度快,但其精确度受训练样本的影响。
Figure PCTCN2020101671-appb-000001
等提出一种同时识别人手动作与被操作物体的方法,通过条件随机场模型来表达随时间变化的人手动作与物体的关系,但该方法没有给出人手运动姿势的细节信息。Romero等提出一种实时的基于表观的非参数方法来对与物体进行交互的人手进行三维姿势重建,该方法采用方向梯度直方图(HoG)来描述人手特征,通过在一个大型模板数据库中执行最近邻搜索来寻找与输入图像最匹配的人手姿势,但由于采用的是基于表观的方法,该方法不能在高维空间中对人手运动进行精细地跟踪。Gupta等提出一种贝叶斯方法来集成人与物体交互过程中的多个感知任务,通过对感知元素施加空间限制来寻求一致的语义表达。该方法可以在表观不充分可辨的情况下识别物体和对应动作,也可以在不借助任何运动信息的情况下从静态图像识别人体动作。但该方法并没有给出人体运动姿势的细节信息。Yao等采用一种新的随机场模型来为物体和人体姿势共同建模,通过一种结构学习方法来估计模型中物体、人体姿势、人体各部分之间的连结度,并通过一种新的最大边缘算法计算模型的参数。在这种模式下,物体检测为人体姿势估计提供了强大的先验知识,而人体姿势的估计可以使系统更加准确地检测与人体进行交互的物体。但该方法对人体姿势采用的是一种二维的估计方法。
The appearance-based method builds a mapping through learning, and maps the image feature space to the human hand-object state space, thereby directly estimating the human hand state and the object state from the image features. This type of method does not need to be initialized, and the tracking speed is fast, but its accuracy is affected by the training samples.
Figure PCTCN2020101671-appb-000001
Et al. proposed a method to simultaneously recognize the human hand movement and the manipulated object, and express the time-varying relationship between the human hand movement and the object through a conditional random field model, but this method does not give detailed information about the hand movement posture. Romero et al. proposed a real-time appearance-based non-parametric method to reconstruct the three-dimensional pose of the human hand interacting with the object. The method uses a histogram of directional gradients (HoG) to describe the characteristics of the human hand and executes it in a large template database. Nearest neighbor search is used to find the hand pose that best matches the input image. However, because of the appearance-based method, this method cannot precisely track the hand movement in a high-dimensional space. Gupta et al. proposed a Bayesian method to integrate multiple perception tasks in the process of human-object interaction, and seek consistent semantic expression by imposing spatial constraints on the perception elements. This method can recognize objects and corresponding actions when the appearance is not sufficiently discernible, and can also recognize human actions from static images without using any motion information. However, this method does not give detailed information about the posture of the human body. Yao et al. used a new random field model to jointly model objects and human body poses, and used a structure learning method to estimate the degree of connection between the objects, human body poses, and various parts of the human body, and adopted a new The maximum edge algorithm calculates the parameters of the model. In this mode, object detection provides strong prior knowledge for human pose estimation, and the estimation of human pose allows the system to more accurately detect objects that interact with the human body. However, this method adopts a two-dimensional estimation method for human posture.
基于模型的方法使用预先建立的人手模型和物体模型生成人手-物体姿态假设,将从模型中提取的特征与从视觉观测提取的特征进行比较,评价二者的相似度,通过某种优化方法在模型状态空间中搜索出一组具有最佳相似度的人手-物体状态。该类方法可以利用较多的先验信息(如人手形状、关节约束等),但其跟踪过程需要初始化,并且面临一个在高维空间中搜索的难题。Hamer等为多关节人手的每一个部分分别启动一个独立的局部跟踪器,采用一种成对的Markov随机场来连接相邻的两个人手部分,采用信念传播(BP)来寻找最优的人手状态配置,但该方法并没有为被操作物体建模。Oikonomidis等提出一种基于模型的方法来同时跟踪人手和被操作物体的运动,该方法同时为人手和物体建立三维模型和运动模型,将跟踪问题视为一个序列优化问题,搜索与输入图像匹配误差最小的人手姿态参数和模型位姿参数,该系统采用多目RGB图像序列作为输入。Kyriazis等以深度相机获取观测输入,提出一种只搜索人手姿势参数的方法,物体状态是根据人手状态和人手-物体之间的作用力模型推导出的,然而,现实世界中物体之间的相互作用涉及许多因素,很难对其进行精确建模。Model-based methods use pre-established human hand models and object models to generate hand-object posture hypotheses. The features extracted from the model are compared with those extracted from visual observations, and the similarity between the two is evaluated. A group of human hand-object states with the best similarity are searched in the model state space. This type of method can use more prior information (such as human hand shape, joint constraints, etc.), but its tracking process needs to be initialized, and it faces a difficult problem of searching in a high-dimensional space. Hamer et al. activated an independent local tracker for each part of the multi-joint human hand, used a paired Markov random field to connect two adjacent human hand parts, and used belief propagation (BP) to find the optimal human hand. State configuration, but this method does not model the manipulated object. Oikonomidis et al. proposed a model-based method to track the movement of the human hand and the manipulated object at the same time. This method establishes a three-dimensional model and a motion model for the human hand and the object at the same time. The tracking problem is regarded as a sequence optimization problem, and the search and input image matching error The smallest human hand pose parameters and model pose parameters, the system uses a multi-eye RGB image sequence as input. Kyriazis et al. used a depth camera to obtain observation input, and proposed a method of searching only the hand posture parameters. The state of the object is derived from the state of the hand and the force model between the hand and the object. However, the interaction between objects in the real world The role involves many factors, and it is difficult to accurately model it.
综上,发明人发现现有技术中至少存在以下问题:基于表观的方法中,无法给出人手运动姿势的细节信息,不能在高维空间中对人手运动进行精细地跟踪,以及限于二维的估计方法;基于模型的方法中,对人手和物体的三维建模中存在精细度问题,以及用粒子滤波框架来进行人手或人体的运动跟踪,由于高维空间中粒子采样的极度稀疏性,难以使用有限个数粒子来有效表达人手状态的真实后验分布,很容易导致跟踪失败。In summary, the inventor found that the prior art has at least the following problems: in the method based on appearance, the detailed information of the hand movement posture cannot be given, the hand movement cannot be tracked finely in the high-dimensional space, and the hand movement is limited to two dimensions. In the model-based method, there are fineness problems in the three-dimensional modeling of human hands and objects, and the particle filter framework is used to track the motion of human hands or human bodies. Due to the extreme sparsity of particle sampling in high-dimensional space, It is difficult to use a limited number of particles to effectively express the true posterior distribution of the human hand state, which can easily lead to tracking failure.
发明内容Summary of the invention
为了解决上述问题,本公开提出了一种基于协作差分进化滤波的人手-物体交互过程跟踪方法,采用基于模型的方法对人手-物体交互过程中的人手和物体进行同时跟踪,将差分进化算法集成到粒子滤波框架中,采用两个互相协作的粒子滤波跟踪器分别对人手和物体进行运动跟踪,利用差分进化对当前观测下的匹配误差的优化来驱动粒子向高似然概率区域运动,改善粒子滤波样本分布,从而能够采用少量粒子来实现人手-物体运动的鲁棒跟踪。In order to solve the above problems, the present disclosure proposes a method for tracking the human hand-object interaction process based on cooperative differential evolution filtering, which uses a model-based method to simultaneously track human hands and objects in the human hand-object interaction process, and integrates the differential evolution algorithm In the particle filter framework, two coordinated particle filter trackers are used to track the movement of human hands and objects respectively, and differential evolution is used to optimize the matching error under the current observation to drive the particles to move to the high likelihood area and improve the particles. The sample distribution is filtered, so that a small number of particles can be used to achieve robust tracking of human hand-object movement.
为了实现上述目的,本公开采用如下技术方案:In order to achieve the above objectives, the present disclosure adopts the following technical solutions:
第一方面,本公开提供一种基于协作差分进化滤波的人手-物体交互过程跟踪方法,包括:In the first aspect, the present disclosure provides a method for tracking human hand-object interaction process based on cooperative differential evolution filtering, including:
提取待测图像中人手和物体对应的前景区域,生成观测深度图及对应的观测剪影图;Extract the foreground area corresponding to the human hand and the object in the image to be measured, and generate the observation depth map and the corresponding observation silhouette map;
基于构建的人手运动学模型和物体运动学模型分别得到人手运动姿态和物体运动姿态,人手运动姿态和物体运动姿态组成人手-物体姿态向量并生成对应的渲染深度图;Based on the constructed hand kinematics model and object kinematics model, the hand motion posture and the object motion posture are obtained respectively. The hand motion posture and the object motion posture form the hand-object posture vector and generate the corresponding rendering depth map;
以待测图像作为观测输入,以计算得到观测深度图与渲染深度图之间的深度特征匹配度 以及观测剪影图和渲染深度图的剪影特征匹配度为目标,构建观测输入与人手-物体姿态向量的匹配误差函数;Take the image to be measured as the observation input, and calculate the depth feature matching degree between the observation depth map and the rendered depth map and the silhouette feature matching degree between the observation silhouette map and the rendered depth map, and construct the observation input and the human hand-object pose vector The matching error function;
采用协作差分进化滤波算法通过计算匹配误差函数,分别对人手和物体进行姿态优化,得到人手-物体交互过程中人手和物体的运动跟踪结果。The collaborative differential evolution filtering algorithm is used to calculate the matching error function to optimize the posture of the human hand and the object respectively, and obtain the motion tracking results of the human hand and the object during the hand-object interaction process.
第二方面,本公开提供一种基于协作差分进化滤波的人手-物体交互过程跟踪系统,包括:In the second aspect, the present disclosure provides a human-hand-object interaction process tracking system based on cooperative differential evolution filtering, including:
待测图像处理模块,被配置为提取待测图像中人手和物体对应的前景区域,生成观测深度图及对应的观测剪影图;The image processing module to be measured is configured to extract the foreground area corresponding to the human hand and the object in the image to be measured, and generate an observation depth map and a corresponding observation silhouette map;
人手-物体运动姿态模块,被配置为基于构建的人手运动学模型和物体运动学模型分别得到人手运动姿态和物体运动姿态,人手运动姿态和物体运动姿态组成人手-物体姿态向量并生成对应的渲染深度图;The hand-object movement posture module is configured to obtain the hand movement posture and the object movement posture based on the constructed hand kinematics model and the object kinematics model respectively. The hand movement posture and the object movement posture form the hand-object posture vector and generate the corresponding rendering Depth map
匹配误差函数构建模块,被配置为以待测图像作为观测输入,以计算得到观测深度图与渲染深度图之间的深度特征匹配度以及观测剪影图和渲染深度图的剪影特征匹配度为目标,构建观测输入与人手-物体姿态向量的匹配误差函数;The matching error function building module is configured to take the image to be measured as the observation input, and to calculate the depth feature matching degree between the observation depth map and the rendered depth map and the silhouette feature matching degree between the observation silhouette map and the rendered depth map. Construct a matching error function between the observation input and the human hand-object pose vector;
跟踪模块,被配置为采用协作差分进化滤波算法通过计算匹配误差函数,分别对人手和物体进行姿态优化,得到人手-物体交互过程中人手和物体的运动跟踪结果。The tracking module is configured to use the cooperative differential evolution filtering algorithm to optimize the posture of the human hand and the object by calculating the matching error function, and obtain the motion tracking result of the human hand and the object during the hand-object interaction process.
第三方面,本公开提供一种电子设备,包括存储器和处理器以及存储在存储器上并在处理器上运行的计算机指令,所述计算机指令被处理器运行时,完成一种基于协作差分进化滤波的人手-物体交互过程跟踪方法所述的步骤。In a third aspect, the present disclosure provides an electronic device including a memory, a processor, and computer instructions stored in the memory and running on the processor. When the computer instructions are executed by the processor, a cooperative differential evolution filtering is completed. The steps described in the human-hand-object interaction process tracking method.
第四方面,本公开提供一种计算机可读存储介质,用于存储计算机指令,所述计算机指令被处理器执行时,完成一种基于协作差分进化滤波的人手-物体交互过程跟踪方法所述的步骤。In a fourth aspect, the present disclosure provides a computer-readable storage medium for storing computer instructions that, when executed by a processor, complete the method for tracking a human hand-object interaction process based on collaborative differential evolution filtering. step.
与现有技术相比,本公开的有益效果为:Compared with the prior art, the beneficial effects of the present disclosure are:
采用基于模型的方法对人手-物体交互过程中的人手和物体进行同时跟踪,将差分进化算法集成到粒子滤波框架之中,提出一种新的改进粒子滤波算法-协作差分进化滤波来跟踪人手-物体运动,采用两个互相协作的粒子滤波跟踪器来分别对人手和物体进行运动跟踪,利用差分进化对当前观测下的匹配误差的优化来驱动粒子向高似然概率区域运动,改善粒子滤波样本分布,实现能够采用少量粒子对人手和物体运动的鲁棒跟踪。A model-based method is used to simultaneously track human hands and objects in the process of human-object interaction, and the differential evolution algorithm is integrated into the particle filter framework, and a new improved particle filter algorithm-collaborative differential evolution filtering to track human hands-is proposed. For object movement, two coordinated particle filter trackers are used to track the movement of human hands and objects respectively, and differential evolution is used to optimize the matching error under the current observation to drive the particles to move to the high-likelihood area and improve the particle filter samples. Distribution to achieve robust tracking of human hand and object motion with a small number of particles.
附图说明Description of the drawings
构成本公开的一部分的说明书附图用来提供对本公开的进一步理解,本公开的示意性实 施例及其说明用于解释本公开,并不构成对本公开的不当限定。The drawings of the specification forming a part of the present disclosure are used to provide a further understanding of the present disclosure. The schematic embodiments and descriptions of the present disclosure are used to explain the present disclosure, and do not constitute an improper limitation of the present disclosure.
图1为本公开实施例1提供的基于协作差分进化滤波的人手-物体交互过程跟踪方法示意图;FIG. 1 is a schematic diagram of a method for tracking a human hand-object interaction process based on cooperative differential evolution filtering according to Embodiment 1 of the disclosure;
图2为本公开实施例1提供的人手运动学模型示意图;2 is a schematic diagram of a kinematics model of a human hand provided in Embodiment 1 of the disclosure;
图3(a)为本公开实施例1提供的人手-球状体模型示意图;Fig. 3(a) is a schematic diagram of a human hand-spherical body model provided in Embodiment 1 of the present disclosure;
图3(b)为本公开实施例1提供的人手-柱状体模型示意图;Fig. 3(b) is a schematic diagram of a human hand-pillar model provided in Embodiment 1 of the present disclosure;
图4为本公开实施例1提供的人手-物体跟踪流程图;4 is a flow chart of human hand-object tracking provided by Embodiment 1 of the present disclosure;
图5(a)-(c)为本公开实施例1提供的人手与球体交互过程跟踪结果图;5(a)-(c) are diagrams of the tracking result of the interaction process between the human hand and the sphere provided in Embodiment 1 of the present disclosure;
图6(a)-(c)为本公开实施例1提供的人手与圆柱体交互过程跟踪结果图。6(a)-(c) are diagrams of the tracking results of the interaction process between the human hand and the cylinder provided in Embodiment 1 of the present disclosure.
具体实施方式:Detailed ways:
下面结合附图与实施例对本公开做进一步说明。The present disclosure will be further described below in conjunction with the drawings and embodiments.
应该指出,以下详细说明都是例示性的,旨在对本公开提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本公开所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed descriptions are all illustrative, and are intended to provide further descriptions of the present disclosure. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the technical field to which the present disclosure belongs.
需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本公开的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terms used here are only for describing specific embodiments, and are not intended to limit the exemplary embodiments according to the present disclosure. As used herein, unless the context clearly indicates otherwise, the singular form is also intended to include the plural form. In addition, it should also be understood that when the terms "comprising" and/or "including" are used in this specification, they indicate There are features, steps, operations, devices, components, and/or combinations thereof.
实施例1Example 1
如图1所示,本实施例提供一种基于协作差分进化滤波的人手-物体交互过程跟踪方法,包括:As shown in FIG. 1, this embodiment provides a method for tracking human hand-object interaction process based on cooperative differential evolution filtering, including:
S1:提取待测图像中人手和物体对应的前景区域,生成观测深度图及对应的观测剪影图;基于构建的人手运动学模型和物体运动学模型分别得到人手运动姿态和物体运动姿态,人手运动姿态和物体运动姿态组成人手-物体姿态向量并生成对应的渲染深度图;S1: Extract the foreground area corresponding to the human hand and the object in the image to be measured, and generate the observation depth map and the corresponding observation silhouette map; based on the constructed human hand kinematics model and object kinematics model to obtain the hand movement posture and object movement posture, hand movement The posture and object motion posture compose the hand-object posture vector and generate the corresponding rendering depth map;
S2:以待测图像作为观测输入,以计算得到观测深度图与渲染深度图之间的深度特征匹配度以及观测剪影图和渲染深度图的剪影特征匹配度为目标,构建观测输入与人手-物体姿态向量的匹配误差函数;S2: Take the image to be measured as the observation input, and calculate the depth feature matching degree between the observation depth map and the rendered depth map, and the silhouette feature matching degree between the observation silhouette image and the rendered depth map, and construct the observation input and human hand-object. Matching error function of the pose vector;
S3:采用协作差分进化滤波算法通过计算匹配误差函数,分别对人手和物体进行姿态优化,得到人手-物体交互过程中人手和物体的运动跟踪。S3: The coordinated differential evolution filtering algorithm is used to calculate the matching error function to optimize the posture of the human hand and the object respectively, and obtain the motion tracking of the human hand and the object during the hand-object interaction process.
所述步骤S1中,本实施例采用基于人手-物体运动学模型的方法来跟踪人手与物体的交互过程,为人手和物体建立三维模型和运动模型同时跟踪三维空间中人手与物体的运动。In the step S1, this embodiment uses a method based on the human hand-object kinematics model to track the interaction process between the human hand and the object, establishes a three-dimensional model and a motion model for the human hand and the object, and simultaneously tracks the motion of the human hand and the object in the three-dimensional space.
在跟踪过程中利用人手三维模型生成人手姿态假设,利用物体三维模型生成物体姿态假设,计算模型特征组和由输入图像获得的观测特征组之间的匹配误差,将跟踪问题视为一个序列优化问题,在人手和物体的状态空间中搜索使匹配误差最小化的状态参数,即为当前帧输入图像所对应的最优解。In the tracking process, the human hand 3D model is used to generate the human hand posture hypothesis, the object 3D model is used to generate the object posture hypothesis, the matching error between the model feature group and the observation feature group obtained from the input image is calculated, and the tracking problem is regarded as a sequence optimization problem , Search for the state parameter that minimizes the matching error in the state space of the human hand and the object, that is, the optimal solution corresponding to the current frame of the input image.
本实施例中,将人手运动状态和物体运动状态组成人手-物体姿态向量x h-o=(x h,x o),图2所示为人手运动学模型,人手运动状态x h共包含29个自由度变量,包括6自由度的手掌全局运动,20个自由度的局部手指运动,以及腕关节的3个自由度。各手指的CMC关节固定,手掌作为一个刚体建模,其运动对应人手的6个全局自由度(3个平移和3个旋转);5根手指的运动对应20个局部自由度,每根手指由4个自由度建模;除拇指外各手指的MCP关节和拇指的TM关节均包含2个自由度(1个屈伸和1个外展内收),而各手指的PIP和DIP关节以及拇指的MCP关节和IP关节只包含1个屈伸自由度;腕关节包含1个屈伸自由度、1个外展内收自由度和1个尺度变换自由度。 In this embodiment, the hand motion state and the object motion state are composed of the hand-object posture vector x ho = (x h , x o ). Figure 2 shows the hand kinematics model. The hand motion state x h contains 29 freedoms in total. Degree variables include global palm motion with 6 degrees of freedom, local finger motion with 20 degrees of freedom, and 3 degrees of freedom for the wrist joint. The CMC joints of each finger are fixed, and the palm is modeled as a rigid body. Its motion corresponds to 6 global degrees of freedom (3 translation and 3 rotation) of the human hand; the motion of 5 fingers corresponds to 20 local degrees of freedom, and each finger is composed of Modeling with 4 degrees of freedom; except for the thumb, the MCP joint of each finger and the TM joint of the thumb both contain 2 degrees of freedom (1 flexion and extension and 1 abduction and adduction), while the PIP and DIP joints of each finger and the thumb’s The MCP joint and the IP joint only contain one degree of freedom in flexion and extension; the wrist joint contains one degree of freedom in flexion and extension, one degree of freedom in abduction and extension, and one degree of freedom in scale transformation.
物体运动状态x o包含三维空间中物体6自由度的位姿状态(3个平移和3个旋转)。 The object motion state x o contains the 6-degree-of-freedom pose state (3 translations and 3 rotations) of the object in the three-dimensional space.
本实施例根据人体解剖学因素将人手各手指关节角度取值和腕关节角度取值限定在一定范围内,应用这些运动约束,不仅可以确保姿态估计过程求得的解是有效的,而且可以大大压缩人手状态空间的搜索范围,降低搜索难度。This embodiment limits the value of the angles of the finger joints and the wrist joints of the human hand within a certain range based on human anatomical factors. The application of these motion constraints can not only ensure that the solution obtained by the posture estimation process is effective, but also greatly Compress the search range of the human state space and reduce the search difficulty.
本实施例通过采用PTC Pro/Engineer和Multigen-Paradigm Creator以参数化几何基元为人手和被操作物体建立统一的三维模型,并在Creator中为人手-物体模型建立树状层次组织结构并为其添加局部坐标系和DOF(Degree of Freedom)运动节点。In this embodiment, PTC Pro/Engineer and Multigen-Paradigm Creator are used to establish a unified three-dimensional model for the human hand and the manipulated object with parameterized geometric primitives, and a tree-like hierarchical organization structure is established for the human hand-object model in the Creator. Add local coordinate system and DOF (Degree of Freedom) motion nodes.
另外,本实施例建立的三维人手模型包含了人体前臂的一部分,从而使得所建立的三维模型能够对分割后的深度图像中与人手像素所连接的前臂像素进行描述,腕关节具有1个尺度变换自由度,能够对前臂模型进行伸缩变换。In addition, the three-dimensional human hand model established in this embodiment includes a part of the human forearm, so that the established three-dimensional model can describe the forearm pixels connected to the human hand pixels in the segmented depth image. The wrist joint has a scale transformation. Degree of freedom, capable of telescopic transformation of the forearm model.
本实施例提出人手与以下两类物体的交互过程:球状体和柱状体,图3(a)所示为人手与球状体的三维模型,图3(b)所示为人手与柱状体的三维模型;另外,所用方法同样适用于人手与其它形状物体的交互过程的跟踪。This embodiment proposes the interaction process between the human hand and the following two types of objects: a sphere and a column. Figure 3(a) shows the three-dimensional model of the human hand and the sphere, and Figure 3(b) shows the three-dimensional model of the human hand and the column. Model; In addition, the method used is also suitable for tracking the interaction process between human hands and objects of other shapes.
所述步骤S2中,在构建匹配误差函数和观测似然函数时,本实施例结合深度特征与剪影特征这两类特征信息。以Kinect深度相机获取的深度图像作为观测输入z,通过简单的深度阈值分割将人手和被操作物体所对应的前景区域提取出来,生成观测深度图z d(z),由观测得的深度图z d(z)生成观测剪影图z s(z); In the step S2, when constructing the matching error function and the observation likelihood function, this embodiment combines the two types of feature information, the depth feature and the silhouette feature. Taking the depth image obtained by the Kinect depth camera as the observation input z, the foreground area corresponding to the human hand and the manipulated object is extracted through simple depth threshold segmentation, and the observation depth map z d (z) is generated, and the depth map z obtained from the observation is d (z) generate observation silhouette map z s (z);
对于每一个人手-物体姿态向量x h-o=(x h,x o),在给定深度相机校准信息的情况下,使用图形渲染手段生成对应的渲染深度图r d(x h-o),由渲染深度图r d(x h-o)生成渲染剪影图r s(x h-o);z s(z)和r s(x h-o)均为二值图,在人手和被操作物体所对应的前景区域处取值为1,在背景处取值为0。 For each human hand-object posture vector x ho = (x h , x o ), given the calibration information of the depth camera, the corresponding rendering depth map r d (x ho ) is generated by the rendering depth Figure r d (x ho ) generates a rendered silhouette image r s (x ho ); z s (z) and r s (x ho ) are both binary images, which are taken at the foreground area corresponding to the human hand and the manipulated object It is 1, and the value is 0 in the background.
通过匹配误差函数来表达观测z和人手-物体姿态向量x h-o之间的匹配度,小的匹配误差意味着高的匹配度,本实施例将匹配误差函数定义为: The matching error function is used to express the matching degree between the observation z and the human hand-object pose vector x ho . A small matching error means a high matching degree. In this embodiment, the matching error function is defined as:
E(z,x h-o)=λ dE d(z,x h-o)+λ sE s(z,x h-o)+λ pE p(x h)      (1) E(z,x ho )=λ d E d (z,x ho )+λ s E s (z,x ho )+λ p E p (x h ) (1)
上式中,E(z,x h-o)由深度特征项E d、剪影特征项E s和一个惩罚项E p三部分组成,λ d、λ s和λ p为各部分的常数权重因子。 In the above formula, E (z, x ho) by the depth of feature items E d, silhouette characteristic items E s and a penalty term E p of three parts, λ d, λ s and λ p weighting factor is a constant weight of each portion.
其中,在匹配误差函数中,Among them, in the matching error function,
S2.1:E d度量观测所得深度图z d(z)和姿态向量x h-o所对应渲染深度图r d(x h-o)之间的深度偏差,其定义如下: S2.1: E d measures the depth deviation between the observed depth map z d (z) and the rendered depth map r d (x ho ) corresponding to the attitude vector x ho , which is defined as follows:
Figure PCTCN2020101671-appb-000002
Figure PCTCN2020101671-appb-000002
深度偏差(以mm为度量单位)在整个特征图上逐像素计算并进行累加,该累加和通过除以人手和被操作物体像素区域的总面积来进行规范化。某些大深度偏差会引起函数值的大变化,从而影响搜索方法的性能,为此,引入最大深度偏差常量T d,将各像素上深度偏差的范围限定在[0,T d]。 The depth deviation (in mm as the measurement unit) is calculated and accumulated pixel by pixel on the entire feature map, and the accumulated sum is normalized by dividing by the total area of the human hand and the pixel area of the operated object. Certain large depth deviations will cause large changes in the function value, thereby affecting the performance of the search method. For this reason, the maximum depth deviation constant T d is introduced, and the range of the depth deviation on each pixel is limited to [0, T d ].
S2.2:E s通过计算观测剪影图z s(z)与渲染剪影图r s(x h-o)之间不重叠区域的大小来描述剪影特征匹配度,其定义如下: S2.2: E s describes the matching degree of silhouette features by calculating the size of the non-overlapping area between the observation silhouette image z s (z) and the rendered silhouette image r s (x ho ), which is defined as follows:
Figure PCTCN2020101671-appb-000003
Figure PCTCN2020101671-appb-000003
上式的第一部分计算属于观测剪影区域z s(z)而不属于渲染剪影区域r s(x h-o)的像素面积, 第二部分则计算属于r s(x h-o)而不属于z s(z)的像素面积,两个部分分别进行了规范化。区域特征项E s的应用对目标函数起到了平滑作用,减少了全局最小值周围的局部极小值,从而使优化过程能更好的收敛到实际的全局最小值,增强了优化过程的鲁棒性。 The first part of the above formula calculates the pixel area belonging to the observation silhouette area z s (z) but not the rendered silhouette area r s (x ho ), and the second part calculates the pixel area belonging to r s (x ho ) instead of z s (z ) Of the pixel area, the two parts were standardized. The application of the regional feature item E s has a smooth effect on the objective function, reducing the local minimum around the global minimum, so that the optimization process can better converge to the actual global minimum, and enhance the robustness of the optimization process sex.
S2.3:为惩罚相邻手指的相互穿透,匹配误差函数E(z,x h)增加了一个先验部分,即第三部分E p(x h),其定义如下: S2.3: penalty for the mutually adjacent fingers penetrate, matching error function E (z, x h) adds a priori portion, i.e., the third portion E p (x h), which is defined as follows:
Figure PCTCN2020101671-appb-000004
Figure PCTCN2020101671-appb-000004
其中,J表示除拇指外的三对相邻手指,
Figure PCTCN2020101671-appb-000005
表示在人手姿势假设x h中某对手指MCP关节外展内收角度之间的偏差。
Among them, J represents three pairs of adjacent fingers except the thumb,
Figure PCTCN2020101671-appb-000005
Represents the deviation between the abduction and adduction angles of the MCP joints of a pair of fingers in the human hand posture hypothesis x h.
S2.4:观测似然函数与匹配误差函数E(z,x h-o)之间呈单调递减关系,将观测似然函数定义如下: S2.4: The observation likelihood function and the matching error function E(z, x ho ) are in a monotonically decreasing relationship. The observation likelihood function is defined as follows:
p(z|x h-o)∝exp(-λ e·E(z,x h-o))      (5) p(z|x ho )∝exp(-λ e ·E(z,x ho )) (5)
其中,λ e为常数规范化因子,其取值由观测噪声决定。 Among them, λ e is a constant normalization factor, and its value is determined by the observation noise.
所述步骤S3中,本实施例采用协作差分进化滤波算法通过计算匹配误差函数分别对人手和物体进行姿态优化,本实施例将差分进化算法集成到粒子滤波框架之中,提出一种新的跟踪算法,即,协作差分进化滤波算法来跟踪高维空间中的人手-物体运动,该算法采用两个互相协作的粒子滤波跟踪器来分别对人手和物体进行运动跟踪,利用差分进化对当前观测下的匹配误差的优化来改善粒子滤波样本分布。In the step S3, the present embodiment uses the cooperative differential evolution filter algorithm to optimize the poses of the human hand and the object by calculating the matching error function. This embodiment integrates the differential evolution algorithm into the particle filter framework, and proposes a new tracking method. The algorithm, that is, the cooperative differential evolution filter algorithm to track the human hand-object movement in the high-dimensional space. The algorithm uses two cooperative particle filter trackers to track the human hand and the object respectively, and uses differential evolution to track the current observations. The matching error is optimized to improve the particle filter sample distribution.
具体的:差分进化算法是一种高效的新兴群体智能优化算法,能够有效解决非线性、非可微目标方程的优化问题。在初始化后,差分进化通过N个D维矢量
Figure PCTCN2020101671-appb-000006
的迭代进化来在一个连续的空间中搜索全局最优解。种群的进化通过变异、交叉、选择这3个基本操作来进行;变异和交叉操作用来产生新的候选个体,而选择操作用来决定新产生的候选个体能否在下一代中存活下来。
Specifically: Differential evolution algorithm is an efficient emerging swarm intelligence optimization algorithm, which can effectively solve the optimization problem of nonlinear and non-differentiable objective equations. After initialization, differential evolution passes through N D-dimensional vectors
Figure PCTCN2020101671-appb-000006
Iterative evolution of to search for the global optimal solution in a continuous space. The evolution of the population is carried out through the three basic operations of mutation, crossover, and selection; mutation and crossover operations are used to generate new candidate individuals, and the selection operation is used to determine whether the newly generated candidate individuals can survive in the next generation.
在进行变异操作时,对于种群中的每个个体索引i,差分进化从上一代随机选取3个不同的个体,将它们进行结合来生成一个变异个体,During the mutation operation, for each individual index i in the population, differential evolution randomly selects 3 different individuals from the previous generation, and combines them to generate a mutant individual.
Figure PCTCN2020101671-appb-000007
Figure PCTCN2020101671-appb-000007
其中,个体索引r 1、r 2、r 3在[1,2,…,N]范围内随机选取,它们两两不同且不同于i;F 为差分矢量
Figure PCTCN2020101671-appb-000008
的缩放比例因子,控制搜索过程的收敛速度;
Among them, the individual indexes r 1 , r 2 , and r 3 are randomly selected in the range of [1,2,...,N], they are different from each other and different from i; F is the difference vector
Figure PCTCN2020101671-appb-000008
The scaling factor of to control the convergence speed of the search process;
标准差分进化算法的比例因子F为常数,本实施例为改进算法的收敛性,采用一个“jitter”因子σ=1.0来在每一个维度上对F进行调节,则F=F C·N(0,1),其中,F C为一常数,N(0,1)为0均值1方差的高斯随机数。在本实施例中,F C取为0.5。 The scale factor F of the standard differential evolution algorithm is a constant. In this embodiment, to improve the convergence of the algorithm, a "jitter" factor σ = 1.0 is used to adjust F in each dimension, then F = F C · N (0 , 1), where, F C is a constant, N (0,1) is a zero mean Gaussian random number variance. In this embodiment, F C is taken as 0.5.
然后,通过交叉操作,变异个体
Figure PCTCN2020101671-appb-000009
与旧个体
Figure PCTCN2020101671-appb-000010
进行组合来生成候选个体
Figure PCTCN2020101671-appb-000011
Then, through the crossover operation, the mutated individual
Figure PCTCN2020101671-appb-000009
With the old individual
Figure PCTCN2020101671-appb-000010
Combine to generate candidate individuals
Figure PCTCN2020101671-appb-000011
Figure PCTCN2020101671-appb-000012
Figure PCTCN2020101671-appb-000012
其中,rand j~U(0,1)为一个在区间[0,1]上均匀分布的随机数;交叉参数CR决定了候选个体各元素继承自变异个体的概率,在本实施例中,CR取为0.9;
Figure PCTCN2020101671-appb-000013
为在范围[1,2,…,D]内选取的随机数,用来确保候选个体至少从变异个体获取一个元素。
Among them, rand j ~ U(0,1) is a random number uniformly distributed in the interval [0,1]; the cross parameter CR determines the probability of each element of the candidate individual being inherited from the variant individual. In this embodiment, CR Take 0.9;
Figure PCTCN2020101671-appb-000013
It is a random number selected in the range [1,2,...,D] to ensure that the candidate individual obtains at least one element from the variant individual.
在变异和交叉操作之后,执行一对一的贪婪选择操作:After mutation and crossover operations, perform a one-to-one greedy selection operation:
Figure PCTCN2020101671-appb-000014
Figure PCTCN2020101671-appb-000014
比较新产生的候选个体
Figure PCTCN2020101671-appb-000015
和旧个体
Figure PCTCN2020101671-appb-000016
以确定哪个个体将被保留到下一代;如果候选个体
Figure PCTCN2020101671-appb-000017
具有一个比旧个体
Figure PCTCN2020101671-appb-000018
更好的目标函数值,它将替代
Figure PCTCN2020101671-appb-000019
被保留到下一代;否则,旧个体将被继续保留。
Comparison of newly generated candidates
Figure PCTCN2020101671-appb-000015
And old individuals
Figure PCTCN2020101671-appb-000016
To determine which individual will be retained for the next generation; if the candidate individual
Figure PCTCN2020101671-appb-000017
Has an older individual
Figure PCTCN2020101671-appb-000018
Better objective function value, it will replace
Figure PCTCN2020101671-appb-000019
Be retained to the next generation; otherwise, the old individual will continue to be retained.
差分进化算法的基本步骤可归纳如下:The basic steps of the differential evolution algorithm can be summarized as follows:
1)初始化:对种群
Figure PCTCN2020101671-appb-000020
进行随机初始化;根据目标函数对种群中的个体进行评价,并记录对应目标值;将
Figure PCTCN2020101671-appb-000021
中具有最优目标值的个体复制到种群的全局最优b 0中,并记录其对应目标值;
1) Initialization: to the population
Figure PCTCN2020101671-appb-000020
Perform random initialization; evaluate individuals in the population according to the objective function, and record the corresponding target value;
Figure PCTCN2020101671-appb-000021
The individual with the optimal target value in the population is copied to the global optimal b 0 of the population, and its corresponding target value is recorded;
2)变异:根据式(6)对种群中的个体执行变异操作,以生成变异个体
Figure PCTCN2020101671-appb-000022
2) Mutation: Perform mutation operation on individuals in the population according to formula (6) to generate mutated individuals
Figure PCTCN2020101671-appb-000022
3)交叉:根据式(7)对种群中的旧个体
Figure PCTCN2020101671-appb-000023
和其对应的变异个体
Figure PCTCN2020101671-appb-000024
执行交叉操作,以生成候选个体
Figure PCTCN2020101671-appb-000025
3) Crossover: According to formula (7), the old individuals in the population
Figure PCTCN2020101671-appb-000023
And its corresponding variant individuals
Figure PCTCN2020101671-appb-000024
Perform crossover operations to generate candidate individuals
Figure PCTCN2020101671-appb-000025
4)评价各候选个体:根据目标函数对所生成的各候选个体
Figure PCTCN2020101671-appb-000026
进行评价,并记录对应目标值;
4) Evaluate each candidate individual: According to the objective function to generate each candidate individual
Figure PCTCN2020101671-appb-000026
Perform evaluation and record the corresponding target value;
5)选择:根据式(8)在旧个体
Figure PCTCN2020101671-appb-000027
和其对应的候选个体
Figure PCTCN2020101671-appb-000028
之间执行选择操作,以确定二 者之中谁将被保留到新种群中;
5) Selection: According to formula (8) in the old individual
Figure PCTCN2020101671-appb-000027
And its corresponding candidate individuals
Figure PCTCN2020101671-appb-000028
Perform a selection operation between them to determine which of the two will be retained in the new population;
6)更新全局最优:比较所有新个体
Figure PCTCN2020101671-appb-000029
与全局最优b g的目标值,以生成新的全局最优b g+1
6) Update the global optimal: compare all new individuals
Figure PCTCN2020101671-appb-000029
And b g global optimum target value, to generate a new globally optimal b g + 1;
7)判断是否结束:若是,则输出全局最优b g+1及其对应目标值,并退出算法,否则转到第2步。 7) Determine whether it is over: If it is, output the global optimal b g+1 and its corresponding target value, and exit the algorithm, otherwise go to step 2.
本实施例为人手和被操作物体分别分配一个差分进化种群进行姿态优化,分别对当前帧的人手运动姿态x h和物体运动姿态x o进行优化,将这两个种群记为种群h和种群o; In this embodiment, a differential evolution population is allocated to the human hand and the manipulated object to perform pose optimization. The human hand motion posture x h and the object motion posture x o of the current frame are respectively optimized, and these two populations are denoted as the population h and the population o. ;
种群h在对当前帧的人手运动姿态x h进行迭代优化时,将被操作物体的姿态x o视作静态的,被操作物体的姿态x o在优化过程开始时由种群o对上一帧的优化结果来确定;而种群o在对当前帧的物体运动姿态x o进行迭代优化时,将人手姿态x h视作静态的,人手姿态x h在优化过程开始时由种群h对上一帧的优化结果来确定。 When the population h hand gesture motion of the current frame X h iterative optimization, operational attitude of the object to be treated as a static X o, is operated in the pose of the object X o o optimization process starts by the population of the previous frame determining optimal results; the population at the time of moving object o X o pose current frame iterative optimization, the hand gesture considered static h X, X h hand gesture at the beginning of the process optimization of h on a population of The optimization result is determined.
粒子滤波是一种具有强鲁棒性的运动跟踪框架,通过多个样本在时间上的传播,具有表达多峰分布的特性。其基本思路为:根据t-1时刻系统状态后验概率分布p(x t-1|z 1:t-1)的采样值
Figure PCTCN2020101671-appb-000030
利用预测模型p(x t|x t-1)和观测模型p(z t|x t),寻找一组逼近t时刻系统状态后验概率分布p(x t|z 1:t)的采样值
Figure PCTCN2020101671-appb-000031
其中,上标i为粒子序号,x t为t时刻系统状态向量,在本实施例中表示t时刻人手-物体姿态x h-o,t,w t为x t对应权值,z 1:t为系统从1时刻累计到t时刻的观测值。
Particle filtering is a robust motion tracking framework. Through the propagation of multiple samples in time, it has the characteristic of expressing multimodal distribution. The basic idea is: according to the sampling value of the posterior probability distribution p(x t-1 |z 1:t-1) of the system state at time t-1
Figure PCTCN2020101671-appb-000030
Use the prediction model p(x t |x t-1 ) and the observation model p(z t |x t ) to find a set of sampling values that approximate the posterior probability distribution of the system state at time t p(x t |z 1:t)
Figure PCTCN2020101671-appb-000031
Among them, the superscript i is the particle number, and x t is the system state vector at time t. In this embodiment, it represents the hand-object posture x ho,t at time t, w t is the corresponding weight of x t , and z 1:t is the system The observation value from time 1 to time t is accumulated.
标准粒子滤波算法的一个主要问题是采用了没有考虑最新观测值z t的状态转移先验模型p(x t|x t-1)作为重要密度函数,因而粒子的重要性采样过程是次优的。在跟踪过程中,为逼近系统状态的真实后验概率密度分布,标准粒子滤波算法需要采集大量样本。样本集合过小会产生样本贫乏现象,降低估计精度,甚至导致样本集发散和预估失败。 One of the main problems of the standard particle filter algorithm is that it uses the state transition prior model p(x t |x t-1 ) that does not consider the latest observation z t as the important density function, so the importance of the particle sampling process is sub-optimal . In the tracking process, in order to approximate the true posterior probability density distribution of the system state, the standard particle filter algorithm needs to collect a large number of samples. If the sample set is too small, it will cause sample poverty, reduce the estimation accuracy, and even lead to the divergence of the sample set and the failure of the estimation.
差分进化滤波将差分进化算法集成到粒子滤波框架中,在预测新的粒子位置后,以最新观测z t下的匹配误差函数为目标函数,运行差分进化算法来对粒子进行迭代演化,将粒子移动到状态空间中具有更大观测似然值的区域。对粒子位置的优化过程可以视为一种重要性采样过程,优化过程结束后产生的新的粒子群则可以视为对最优重要性分布p(x t|x t-1,z t)的一种近似。通过差分进化算法的优化过程,改善了粒子滤波样本分布,加快了粒子集的收敛,从而能够采用少量粒子来实现人手-物体运动的鲁棒跟踪。 Differential evolution filtering integrates the differential evolution algorithm into the particle filter framework. After predicting the new particle position, using the matching error function under the latest observation z t as the objective function, running the differential evolution algorithm to iteratively evolve the particles and move the particles Go to an area with greater observation likelihood in the state space. The optimization process of the particle position can be regarded as an importance sampling process, and the new particle swarm generated after the optimization process can be regarded as the optimal importance distribution p(x t |x t-1 ,z t ) An approximation. Through the optimization process of the differential evolution algorithm, the particle filter sample distribution is improved, and the convergence of the particle set is accelerated, so that a small number of particles can be used to achieve robust tracking of human hand-object motion.
如式(9)所示,差分进化滤波将转移先验p(x t|x t-1)定义为一阶运动模型,用来在时间序列上对粒子进行传播, As shown in equation (9), differential evolution filtering defines the transfer prior p(x t |x t-1 ) as a first-order motion model, which is used to propagate particles in a time series.
Figure PCTCN2020101671-appb-000032
Figure PCTCN2020101671-appb-000032
其中,
Figure PCTCN2020101671-appb-000033
为t–1时刻差分进化迭代优化固定代数G后粒子收敛到的最终位置。
Figure PCTCN2020101671-appb-000034
为0均值多变量高斯噪声,Σ为其协方差矩阵,Σ的对角线元素由待跟踪序列的最大帧间角度或位移差决定;所获得的新粒子集
Figure PCTCN2020101671-appb-000035
用来初始化t时刻的差分进化种群。
in,
Figure PCTCN2020101671-appb-000033
The final position where the particles converge to after the fixed algebra G is optimized for the differential evolution iteration at time t-1.
Figure PCTCN2020101671-appb-000034
Multivariate Gaussian noise with 0 mean value, Σ is its covariance matrix, and the diagonal elements of Σ are determined by the maximum inter-frame angle or displacement difference of the sequence to be tracked; the obtained new particle set
Figure PCTCN2020101671-appb-000035
Used to initialize the differential evolution population at time t.
差分进化滤波算法总结如下:The differential evolution filtering algorithm is summarized as follows:
For t>0:For t>0:
1)重采样:根据权值大小对粒子集
Figure PCTCN2020101671-appb-000036
进行重采样,得到新的等权粒子集
Figure PCTCN2020101671-appb-000037
1) Resampling: According to the weight of the particle set
Figure PCTCN2020101671-appb-000036
Perform resampling to obtain a new set of equal-weight particles
Figure PCTCN2020101671-appb-000037
2)预测:根据式(9),由粒子在t–1时刻的位置预测其在t时刻的位置,得到新的粒子集
Figure PCTCN2020101671-appb-000038
2) Prediction: According to formula (9), predict the position of the particle at time t from the position of the particle at time t-1, and obtain a new particle set
Figure PCTCN2020101671-appb-000038
3)优化:以最新观测z t下的匹配误差函数为目标函数,运行差分进化算法来对粒子集
Figure PCTCN2020101671-appb-000039
进行优化;
3) Optimization: Take the matching error function under the latest observation z t as the objective function, and run the differential evolution algorithm to analyze the particle set
Figure PCTCN2020101671-appb-000039
optimize;
4)权值更新:利用观测似然更新粒子权值
Figure PCTCN2020101671-appb-000040
得到加权粒子集
Figure PCTCN2020101671-appb-000041
并对权值
Figure PCTCN2020101671-appb-000042
进行归一化以使
Figure PCTCN2020101671-appb-000043
4) Weight update: use observation likelihood to update particle weights
Figure PCTCN2020101671-appb-000040
Get the weighted particle set
Figure PCTCN2020101671-appb-000041
And weight
Figure PCTCN2020101671-appb-000042
Normalize to make
Figure PCTCN2020101671-appb-000043
5)状态估计:以最大后验准则输出系统状态估计值。5) State estimation: output the system state estimation value based on the maximum a posteriori criterion.
本实施例中,采用两个互相协作的差分进化滤波跟踪器来分别对人手和物体进行运动姿态跟踪,提出的协作差分进化滤波算法。通过为人手和被操作物体分别分配一个差分进化滤波跟踪器,分别对人手运动姿态x h和物体运动姿态x o进行跟踪。两个跟踪器之间不是相互独立的,而是在跟踪过程中不断进行信息交流。人手跟踪器在对当前帧的人手运动姿态x h进行迭代优化时,将被操作物体的姿态x o视作静态的,被操作物体的姿态x o在优化过程开始时由相应的物体跟踪器对上一帧的跟踪结果来确定;而物体跟踪器在对当前帧的物体运动姿态x o进行迭代优化时将人手姿态x h视作静态的,人手运动姿态x h在优化过程开始时由人手跟踪器对上一帧的跟踪结果来确定。 In this embodiment, two coordinated differential evolution filter trackers are used to track the motion and posture of a human hand and an object respectively, and a collaborative differential evolution filter algorithm is proposed. By assigning a differential evolution filter tracker to the human hand and the manipulated object, respectively, the human hand movement posture x h and the object movement posture x o are tracked. The two trackers are not independent of each other, but constantly exchange information during the tracking process. When the human hand tracker is iteratively optimizing the human hand motion posture x h in the current frame, the posture x o of the manipulated object is regarded as static, and the posture x o of the manipulated object is adjusted by the corresponding object tracker at the beginning of the optimization process. The tracking result of the previous frame is determined; while the object tracker considers the hand posture x h as static when iteratively optimizes the object motion posture x o of the current frame, and the hand posture x h is tracked by the hand at the beginning of the optimization process To determine the tracking result of the previous frame.
各跟踪器在获得对当前帧的姿态跟踪结果后,就立即将其传递给另一个跟踪器,相应的 姿态值在另一个跟踪器下一帧的迭代优化过程中保持静态。这种协作跟踪方案不仅通过对人手和被操作物体进行联合考虑来为遮挡进行了建模,还通过多个跟踪器的采用对联合姿态空间进行了分解,将高维问题分解为多个相对低维的问题,降低了优化搜索的难度,减小了计算代价。After each tracker obtains the posture tracking result of the current frame, it immediately passes it to another tracker, and the corresponding posture value remains static during the iterative optimization process of the next frame of the other tracker. This collaborative tracking scheme not only models the occlusion by considering the human hand and the manipulated object, but also decomposes the joint pose space through the use of multiple trackers, and decomposes the high-dimensional problem into multiple relatively low-dimensional problems. The problem of dimensionality reduces the difficulty of optimizing the search and reduces the computational cost.
实验验证过程:本实施例以Kinect深度相机获取的深度图像为观测输入,基于三维图形渲染技术开发了人手-物体跟踪原型系统,将预先构造的状态可配置的三维人手-物体模型加载到三维图形渲染引擎OpenSceneGraph(OSG)中,在跟踪过程中,通过osgSim::DOFTransform类来控制人手和物体的运动,采用OSG离屏渲染技术来渲染生成人手-物体模型的深度图像,用于与观测图像进行比较计算各粒子匹配误差值和观测似然值,通过协作差分进化滤波算法在人手和物体的状态空间中搜索使匹配误差最小化的状态参数。Experimental verification process: In this embodiment, the depth image obtained by the Kinect depth camera is used as the observation input, and the human hand-object tracking prototype system is developed based on the 3D graphics rendering technology, and the pre-configured 3D human hand-object model is loaded into the 3D graphics. In the rendering engine OpenSceneGraph (OSG), during the tracking process, the osgSim::DOFTransform class is used to control the movement of the human hand and the object, and the OSG off-screen rendering technology is used to render the depth image of the human hand-object model, which is used to communicate with the observation image. The matching error value and observation likelihood value of each particle are compared and calculated, and the state parameter that minimizes the matching error is searched in the state space of the human hand and the object through the collaborative differential evolution filtering algorithm.
OSG是一个基于OpenGL的开源跨平台图形引擎,采用一种树状结构(场景节点树)来对空间数据进行组织,通过多种场景裁剪技术、渲染状态排序、多线程渲染等机制能够实现高性能的三维图形渲染。OSG每一帧的渲染流程可以分解为更新遍历、裁剪遍历以及绘制遍历3个阶段,默认采用多线程模式来进行场景的渲染,为每个相机及其对应的图形设备分别创建一个线程,在相机线程中执行裁剪操作,在图形设备线程中执行绘制操作。这种多线程模式会在图形设备线程的绘制工作结束之前开始新一帧的场景更新和裁剪操作,从而提升系统运行效率,最大限度地发挥系统的计算能力。OSG is an open source cross-platform graphics engine based on OpenGL. It uses a tree structure (scene node tree) to organize spatial data, and achieves high performance through a variety of scene cutting technologies, rendering state sorting, and multi-threaded rendering mechanisms. 3D graphics rendering. The rendering process of each frame of OSG can be broken down into three stages: update traversal, crop traversal, and drawing traversal. By default, multi-threaded mode is used to render the scene. A thread is created for each camera and its corresponding graphics device. The clipping operation is performed in the thread, and the drawing operation is performed in the graphics device thread. This multi-threaded mode will start a new frame of scene update and cropping operations before the end of the drawing work of the graphics device thread, thereby improving the operating efficiency of the system and maximizing the computing power of the system.
如图4所示,本实施例基于所提出的协作差分进化滤波算法,通过采用OSG和离屏渲染技术开发人手-物体跟踪的原型系统,创建了一个虚拟相机用于渲染各人手-物体姿势假设所对应的深度图像,进行匹配误差计算。As shown in Figure 4, this embodiment is based on the proposed collaborative differential evolution filtering algorithm, by using OSG and off-screen rendering technology to develop a hand-object tracking prototype system, creating a virtual camera to render each hand-object posture hypothesis The corresponding depth image is calculated for matching error.
此相机有一个场景模型节点作为子节点,同时与一个设备缓存对象绑定。场景模型节点包含人手和物体的三维模型,而设备缓存对象可通过帧缓存对象(FBO)与相机绑定。在OSG每一帧的渲染过程中,该虚拟相机就会将其场景模型子节点的内容渲染到其绑定的缓存对象中。This camera has a scene model node as a child node and is bound to a device cache object at the same time. The scene model node contains three-dimensional models of human hands and objects, and the device cache object can be bound to the camera through a frame cache object (FBO). During the rendering process of each frame of OSG, the virtual camera will render the content of its scene model child nodes to its bound cache object.
本实施例系统通过协作差分进化滤波算法来迭代计算新的人手-物体姿态参数。系统为场景模型节点创建了一个节点回调对象(osg::NodeCallback),用于在OSG每帧的更新阶段更新人手和物体模型的姿态参数。同时,系统还为相机创建一个绘制回调对象(osg::Camera::DrawCallback),当相机将更新后的三维人手和物体模型渲染到缓存对象后,系统在此回调对象中计算渲染得的深度图像与观测得的深度图像之间的匹配误差。由于OSG默认以多线程方式运行,每一帧运行时都会为每个相机及其关联的图形设备各启动一个线程, 当上一帧的绘制阶段还未结束时,即开始下一帧的更新阶段。为避免线程间数据冲突,该系统为相机创建一个事件对象,采用Win32 API的SetEvent()函数和WaitForSingleObject()函数来进行线程间的同步和通信。当在图形设备线程中完成匹配误差的计算后,通过SetEvent()函数将对应事件对象设置为有信号状态,通知主线程,主线程接收到该事件信号后,才会进行下一步的计算操作。The system of this embodiment uses a collaborative differential evolution filtering algorithm to iteratively calculate new human hand-object posture parameters. The system creates a node callback object (osg::NodeCallback) for the scene model node, which is used to update the posture parameters of the human hand and object model during the update phase of each frame of OSG. At the same time, the system also creates a drawing callback object (osg::Camera::DrawCallback) for the camera. When the camera renders the updated 3D human hand and object model to the cache object, the system calculates the rendered depth image in this callback object The matching error with the observed depth image. Since OSG runs in multi-threaded mode by default, each frame will start a thread for each camera and its associated graphics device. When the drawing phase of the previous frame has not ended, the update phase of the next frame will begin. . To avoid data conflicts between threads, the system creates an event object for the camera, and uses Win32 API's SetEvent() function and WaitForSingleObject() function to synchronize and communicate between threads. After the calculation of the matching error is completed in the graphics device thread, the corresponding event object is set to a signaled state through the SetEvent() function, and the main thread is notified. The main thread will perform the next calculation operation after receiving the event signal.
本实施例在真实序列上进行实验来验证本实施例所提出的人手-物体运动跟踪方法的有效性。本实施例所提出的协作差分进化滤波算法在所有实验中,人手姿态跟踪器使用32个粒子,物体姿态跟踪器使用8个粒子,两个跟踪器对于每帧图像输入,DE算法均迭代优化60次。本实施例跟踪实验在4核Core i5 2.9 GHz CPU,4.0GB内存和Nvidia GeForce GTX 950M GPU的PC机上运行,每跟踪一帧图像的平均耗时5s。This embodiment conducts experiments on real sequences to verify the effectiveness of the human hand-object motion tracking method proposed in this embodiment. In all experiments, the collaborative differential evolution filtering algorithm proposed in this embodiment uses 32 particles for the human hand posture tracker, 8 particles for the object posture tracker, and both trackers for each frame of image input, the DE algorithm is iteratively optimized 60 Second-rate. The tracking experiment in this embodiment runs on a PC with a 4-core Core i5 2.9 GHz CPU, 4.0 GB memory and Nvidia GeForce GTX 950M GPU, and it takes an average of 5 seconds to track one frame of image.
本实施例采用真实序列来对跟踪算法进行评价,使用由Microsoft Kinect 1.0Beta2 SDK抓获的深度图像序列作为观测输入,图像分辨率为640×480,帧速率为30帧/s。In this embodiment, the real sequence is used to evaluate the tracking algorithm, and the depth image sequence captured by the Microsoft Kinect 1.0 Beta 2 SDK is used as the observation input, the image resolution is 640×480, and the frame rate is 30 frames/s.
实验分为两组,第1组实验跟踪人手抓取球体的运动过程,图5(a)-(c)所示为本实施例在人手与球体交互过程真实序列部分帧上的跟踪结果;其中图5(a)为由Kinect RGB相机所抓获的RGB图像,图5(b)为由Kinect深度相机所抓获并进行简单深度分割后的深度图像,图5(c)为采用协作差分进化滤波算法对深度图像序列进行跟踪所得结果;The experiment is divided into two groups. The first experiment tracks the movement process of the human hand grasping the sphere. Figures 5(a)-(c) show the tracking results of this embodiment on some frames of the real sequence of the interaction process between the human hand and the sphere; Figure 5(a) is the RGB image captured by the Kinect RGB camera, Figure 5(b) is the depth image captured by the Kinect depth camera and subjected to simple depth segmentation, and Figure 5(c) is the collaborative differential evolution filtering algorithm The result of tracking the depth image sequence;
第2组实验跟踪人手抓取圆柱体的运动过程。图6(a)-(c)所示为本实施例在人手与圆柱体交互过程真实序列部分帧上的跟踪结果;其中图6(a)为由Kinect RGB相机所抓获的RGB图像,图6(b)为由Kinect深度相机所抓获并进行简单深度分割后的深度图像,图6(c)为采用协作差分进化滤波算法对深度图像序列进行跟踪所得结果。由实验结果可以看出,协作差分进化滤波算法能有效跟踪人手与物体的交互过程。The second group of experiments tracked the movement process of human hands grasping the cylinder. Figures 6(a)-(c) show the tracking results of this embodiment on some frames of the real sequence of the interaction process between the human hand and the cylinder; Figure 6(a) is the RGB image captured by the Kinect RGB camera, and Figure 6 (b) is the depth image captured by the Kinect depth camera and after simple depth segmentation. Figure 6(c) is the result of tracking the depth image sequence using the collaborative differential evolution filtering algorithm. It can be seen from the experimental results that the collaborative differential evolution filtering algorithm can effectively track the interaction process between human hands and objects.
在其他实施例中,还提供:In other embodiments, it also provides:
一种基于协作差分进化滤波的人手-物体交互过程跟踪系统,包括:A human-hand-object interaction process tracking system based on cooperative differential evolution filtering, including:
待测图像处理模块,被配置为提取待测图像中人手和物体对应的前景区域,生成观测深度图及对应的观测剪影图;The image processing module to be measured is configured to extract the foreground area corresponding to the human hand and the object in the image to be measured, and generate an observation depth map and a corresponding observation silhouette map;
人手-物体运动姿态模块,被配置为基于构建的人手运动学模型和物体运动学模型分别得到人手运动姿态和物体运动姿态,人手运动姿态和物体运动姿态组成人手-物体姿态向量并生成对应的渲染深度图;The hand-object movement posture module is configured to obtain the hand movement posture and the object movement posture based on the constructed hand kinematics model and the object kinematics model respectively. The hand movement posture and the object movement posture form the hand-object posture vector and generate the corresponding rendering Depth map
匹配误差函数构建模块,被配置为以待测图像作为观测输入,以计算得到观测深度图与 渲染深度图之间的深度特征匹配度以及观测剪影图和渲染深度图的剪影特征匹配度为目标,构建观测输入与人手-物体姿态向量的匹配误差函数;The matching error function building module is configured to take the image to be measured as the observation input, and to calculate the depth feature matching degree between the observation depth map and the rendered depth map and the silhouette feature matching degree between the observation silhouette map and the rendered depth map. Construct a matching error function between the observation input and the human hand-object pose vector;
跟踪模块,被配置为采用协作差分进化滤波算法通过计算匹配误差函数,分别对人手和物体进行姿态优化,得到人手-物体交互过程中人手和物体的运动跟踪结果。The tracking module is configured to use the cooperative differential evolution filtering algorithm to optimize the posture of the human hand and the object by calculating the matching error function, and obtain the motion tracking result of the human hand and the object during the hand-object interaction process.
一种电子设备,包括存储器和处理器以及存储在存储器上并在处理器上运行的计算机指令,所述计算机指令被处理器运行时,完成一种基于协作差分进化滤波的人手-物体交互过程跟踪方法所述的步骤。An electronic device, including a memory and a processor, and computer instructions stored on the memory and running on the processor. When the computer instructions are executed by the processor, it completes a human hand-object interaction process tracking based on cooperative differential evolution filtering The steps described in the method.
一种计算机可读存储介质,用于存储计算机指令,所述计算机指令被处理器执行时,完成一种基于协作差分进化滤波的人手-物体交互过程跟踪方法所述的步骤。A computer-readable storage medium is used to store computer instructions that, when executed by a processor, complete the steps described in a method for tracking a human hand-object interaction process based on collaborative differential evolution filtering.
在以上实施例中,能够实现对人手-物体交互过程中的人手和物体进行同时跟踪,将差分进化算法集成到粒子滤波框架之中,采用两个互相协作的粒子滤波跟踪器来分别对人手和物体进行运动跟踪,利用差分进化对当前观测下的匹配误差的优化来驱动粒子向高似然概率区域运动,改善粒子滤波样本分布,实现能够采用少量粒子对人手和物体运动的鲁棒跟踪。In the above embodiments, it is possible to realize simultaneous tracking of human hands and objects in the process of human hand-object interaction. The differential evolution algorithm is integrated into the particle filter framework, and two coordinated particle filter trackers are used to separately track the human hand and the object. Object motion tracking, using differential evolution to optimize the matching error under current observations to drive particles to move to high-likelihood regions, improve particle filter sample distribution, and achieve robust tracking of human hand and object motion with a small number of particles.
以上仅为本公开的优选实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。The above are only preferred embodiments of the present disclosure and are not used to limit the present disclosure. For those skilled in the art, the present disclosure may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure shall be included in the protection scope of the present disclosure.
上述虽然结合附图对本公开的具体实施方式进行了描述,但并非对本公开保护范围的限制,所属领域技术人员应该明白,在本公开的技术方案的基础上,本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本公开的保护范围以内。Although the specific embodiments of the present disclosure are described above in conjunction with the accompanying drawings, they do not limit the scope of protection of the present disclosure. Those skilled in the art should understand that on the basis of the technical solutions of the present disclosure, those skilled in the art do not need to make creative efforts. Various modifications or deformations that can be made are still within the protection scope of the present disclosure.

Claims (10)

  1. 一种基于协作差分进化滤波的人手-物体交互过程跟踪方法,其特征在于,包括:A tracking method for human-hand-object interaction process based on cooperative differential evolution filtering, which is characterized in that it includes:
    提取待测图像中人手和物体对应的前景区域,生成观测深度图及对应的观测剪影图;Extract the foreground area corresponding to the human hand and the object in the image to be measured, and generate the observation depth map and the corresponding observation silhouette map;
    基于构建的人手运动学模型和物体运动学模型分别得到人手运动姿态和物体运动姿态,人手运动姿态和物体运动姿态组成人手-物体姿态向量并生成对应的渲染深度图;Based on the constructed hand kinematics model and object kinematics model, the hand motion posture and the object motion posture are obtained respectively. The hand motion posture and the object motion posture form the hand-object posture vector and generate the corresponding rendering depth map;
    以待测图像作为观测输入,以计算得到观测深度图与渲染深度图之间的深度特征匹配度以及观测剪影图和渲染深度图的剪影特征匹配度为目标,构建观测输入与人手-物体姿态向量的匹配误差函数;Take the image to be measured as the observation input, and calculate the depth feature matching degree between the observation depth map and the rendered depth map and the silhouette feature matching degree between the observation silhouette map and the rendered depth map, and construct the observation input and the human hand-object pose vector The matching error function;
    采用协作差分进化滤波算法通过计算匹配误差函数,分别对人手和物体进行姿态优化,得到人手-物体交互过程中人手和物体的运动跟踪结果。The collaborative differential evolution filtering algorithm is used to calculate the matching error function to optimize the posture of the human hand and the object respectively, and obtain the motion tracking results of the human hand and the object during the hand-object interaction process.
  2. 如权利要求1所述的基于协作差分进化滤波的人手-物体交互过程跟踪方法,其特征在于,所述匹配误差函数中深度特征项E d定义为计算观测深度图与渲染深度图之间的深度偏差: The human-hand-object interaction process tracking method based on collaborative differential evolution filtering according to claim 1, wherein the depth feature item E d in the matching error function is defined as the depth between the calculated observation depth map and the rendered depth map deviation:
    Figure PCTCN2020101671-appb-100001
    Figure PCTCN2020101671-appb-100001
    其中,x h-o为人手-物体姿态向量,z为观测输入,z d(z)为观测深度图,z s(z)为观测剪影图,r s(x h-o)为渲染剪影图,r d(x h-o)为渲染深度图,T d为最大深度偏差常量。 Among them, x ho is the hand-object pose vector, z is the observation input, z d (z) is the observation depth map, z s (z) is the observation silhouette map, r s (x ho ) is the rendered silhouette map, r d ( x ho ) is the rendering depth map, and T d is the maximum depth deviation constant.
  3. 如权利要求1所述的基于协作差分进化滤波的人手-物体交互过程跟踪方法,其特征在于,所述匹配误差函数中剪影特征项E s定义为通过计算观测剪影图和渲染深度图之间不重叠区域的大小描述剪影特征匹配度: The human-hand-object interaction process tracking method based on collaborative differential evolution filtering according to claim 1, wherein the silhouette feature item E s in the matching error function is defined as the difference between the observed silhouette image and the rendered depth image by calculation. The size of the overlapping area describes the matching degree of silhouette features:
    Figure PCTCN2020101671-appb-100002
    Figure PCTCN2020101671-appb-100002
    其中,x h-o为人手-物体姿态向量,z为观测输入,z s(z)为观测剪影图,r s(x h-o)为渲染剪影图。 Among them, x ho is the hand-object pose vector, z is the observation input, z s (z) is the observation silhouette image, and r s (x ho ) is the rendered silhouette image.
  4. 如权利要求1所述的基于协作差分进化滤波的人手-物体交互过程跟踪方法,其特征在于,所述匹配误差函数中增加惩罚项E p(x h),其定义如下: Differential Evolution based collaborative filtering according manpower claim 1 - interaction object tracking method, characterized in that, said matching error function added penalty term E p (x h), which is defined as follows:
    Figure PCTCN2020101671-appb-100003
    Figure PCTCN2020101671-appb-100003
    其中,x h为人手运动姿态,J表示除拇指外的三对相邻手指,
    Figure PCTCN2020101671-appb-100004
    表示在人手运动姿态x h 中某对手指MCP关节外展内收角度之间的偏差,p为观测似然函数。
    Among them, x h is the hand movement posture, J represents three pairs of adjacent fingers except the thumb,
    Figure PCTCN2020101671-appb-100004
    It represents the deviation between the abduction and adduction angles of a certain pair of fingers in the hand movement posture x h, and p is the observation likelihood function.
  5. 如权利要求4所述的基于协作差分进化滤波的人手-物体交互过程跟踪方法,其特征在于,所述观测似然函数与匹配误差函数E(z,x h-o)之间呈单调递减关系,观测似然函数定义如下: The human-hand-object interaction process tracking method based on cooperative differential evolution filtering according to claim 4, wherein the observation likelihood function and the matching error function E(z, x ho ) are in a monotonically decreasing relationship, and the observation The likelihood function is defined as follows:
    p(z|x h-o)∝exp(-λ e·E(z,x h-o)) p(z|x ho )∝exp(-λ e ·E(z,x ho ))
    其中,λ e为常数规范化因子,其取值由观测噪声决定,x h-o为人手-物体姿态向量。 Among them, λ e is a constant normalization factor, whose value is determined by the observation noise, and x ho is the hand-object pose vector.
  6. 如权利要求1所述的基于协作差分进化滤波的人手-物体交互过程跟踪方法,其特征在于,采用协作差分进化滤波算法为人手和物体分别分配差分进化种群,对人手运动姿态x h和物体运动姿态x o进行优化,将两个差分进化种群记为种群h和种群o; The human hand-object interaction process tracking method based on cooperative differential evolution filtering according to claim 1, characterized in that the cooperative differential evolution filtering algorithm is used to allocate differential evolution populations to human hands and objects respectively, and to determine the hand movement posture x h and object movement. The posture x o is optimized, and the two differential evolution populations are recorded as population h and population o;
    种群h对人手运动姿态x h进行迭代优化时,将物体运动姿态x o视作静态,物体运动姿态x o在优化过程开始时由种群o对上一帧的优化结果来确定; When the population h performs iterative optimization on the human hand motion posture x h , the object motion posture x o is regarded as static, and the object motion posture x o is determined by the optimization result of the previous frame by the population o at the beginning of the optimization process;
    种群o对物体运动姿态x o进行迭代优化时,将人手运动姿态x h视作静态,人手运动姿态x h在优化过程开始时由种群h对上一帧的优化结果来确定。 When the population o performs iterative optimization on the object motion posture x o , the hand motion posture x h is regarded as static, and the hand motion posture x h is determined by the optimization result of the population h on the previous frame at the beginning of the optimization process.
  7. 如权利要求1所述的基于协作差分进化滤波的人手-物体交互过程跟踪方法,其特征在于,所述匹配误差函数E(z,x h-o)为: The human-hand-object interaction process tracking method based on cooperative differential evolution filtering according to claim 1, wherein the matching error function E(z, x ho ) is:
    E(z,x h-o)=λ dE d(z,x h-o)+λ sE s(z,x h-o)+λ pE p(x h) E(z,x ho )=λ d E d (z,x ho )+λ s E s (z,x ho )+λ p E p (x h )
    其中,E d为深度特征项,E s为剪影特征项,E p为惩罚项,x h-o为人手-物体姿态向量,z为观测输入,x h为人手运动姿态,λ d、λ s和λ p为权重因子; Wherein, E d is the depth of the feature item, E s is the sketch feature item, E p is the penalty term, x ho human hand - object pose vector, z is observed input, x h is hand motion gestures, λ d, λ s, and [lambda] p is the weighting factor;
    所述采用协作差分进化滤波算法包括:根据粒子权重对粒子集进行重采样,得到等权粒子集;The adoption of the collaborative differential evolution filtering algorithm includes: re-sampling the particle set according to the particle weight to obtain an equal-weight particle set;
    由粒子在t–1时刻的位置预测其在t时刻的位置,得到新的粒子集;Predict the position of the particle at time t from the position of the particle at time t-1, and obtain a new particle set;
    以最新观测输入下的匹配误差函数为目标函数,采用差分进化算法对新的粒子集进行优化;Take the matching error function under the latest observation input as the objective function, and use the differential evolution algorithm to optimize the new particle set;
    利用观测似然函数更新粒子权重,得到加权粒子集,并对粒子权重进行归一化;以最大后验准则输出人手-物体交互过程的状态估计值。Use the observation likelihood function to update the particle weights to obtain the weighted particle set, and normalize the particle weights; use the maximum posterior criterion to output the state estimation value of the human hand-object interaction process.
  8. 一种基于协作差分进化滤波的人手-物体交互过程跟踪系统,其特征在于,包括:A human-hand-object interaction process tracking system based on cooperative differential evolution filtering is characterized in that it includes:
    待测图像处理模块,被配置为提取待测图像中人手和物体对应的前景区域,生成观测深 度图及对应的观测剪影图;The image processing module to be measured is configured to extract the foreground area corresponding to the human hand and the object in the image to be measured, and generate an observation depth map and a corresponding observation silhouette map;
    人手-物体运动姿态模块,被配置为基于构建的人手运动学模型和物体运动学模型分别得到人手运动姿态和物体运动姿态,人手运动姿态和物体运动姿态组成人手-物体姿态向量并生成对应的渲染深度图;The hand-object movement posture module is configured to obtain the hand movement posture and the object movement posture based on the constructed hand kinematics model and the object kinematics model respectively. The hand movement posture and the object movement posture form the hand-object posture vector and generate the corresponding rendering Depth map
    匹配误差函数构建模块,被配置为以待测图像作为观测输入,以计算得到观测深度图与渲染深度图之间的深度特征匹配度以及观测剪影图和渲染深度图的剪影特征匹配度为目标,构建观测输入与人手-物体姿态向量的匹配误差函数;The matching error function building module is configured to take the image to be measured as the observation input, and to calculate the depth feature matching degree between the observation depth map and the rendered depth map and the silhouette feature matching degree between the observation silhouette map and the rendered depth map. Construct a matching error function between the observation input and the human hand-object pose vector;
    跟踪模块,被配置为采用协作差分进化滤波算法通过计算匹配误差函数,分别对人手和物体进行姿态优化,得到人手-物体交互过程中人手和物体的运动跟踪结果。The tracking module is configured to use the cooperative differential evolution filtering algorithm to optimize the posture of the human hand and the object by calculating the matching error function, and obtain the motion tracking result of the human hand and the object during the hand-object interaction process.
  9. 一种电子设备,其特征在于,包括存储器和处理器以及存储在存储器上并在处理器上运行的计算机指令,所述计算机指令被处理器运行时,完成权利要求1-7任一项方法所述的步骤。An electronic device, characterized by comprising a memory and a processor, and computer instructions stored on the memory and running on the processor, and when the computer instructions are executed by the processor, they complete the method described in any one of claims 1-7. The steps described.
  10. 一种计算机可读存储介质,其特征在于,用于存储计算机指令,所述计算机指令被处理器执行时,完成权利要求1-7任一项方法所述的步骤。A computer-readable storage medium, characterized in that it is used to store computer instructions, and when the computer instructions are executed by a processor, the steps described in any one of the methods of claims 1-7 are completed.
PCT/CN2020/101671 2020-02-06 2020-07-13 Human hand-object interaction process tracking method based on collaborative differential evolution filtering WO2021155653A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010081555.9 2020-02-06
CN202010081555.9A CN111311648A (en) 2020-02-06 2020-02-06 Method for tracking human hand-object interaction process based on collaborative differential evolution filtering

Publications (1)

Publication Number Publication Date
WO2021155653A1 true WO2021155653A1 (en) 2021-08-12

Family

ID=71156439

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/101671 WO2021155653A1 (en) 2020-02-06 2020-07-13 Human hand-object interaction process tracking method based on collaborative differential evolution filtering

Country Status (2)

Country Link
CN (1) CN111311648A (en)
WO (1) WO2021155653A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111311648A (en) * 2020-02-06 2020-06-19 青岛理工大学 Method for tracking human hand-object interaction process based on collaborative differential evolution filtering

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102148921A (en) * 2011-05-04 2011-08-10 中国科学院自动化研究所 Multi-target tracking method based on dynamic group division
US20120113223A1 (en) * 2010-11-05 2012-05-10 Microsoft Corporation User Interaction in Augmented Reality
CN110007754A (en) * 2019-03-06 2019-07-12 清华大学 The real-time reconstruction method and device of hand and object interactive process
CN111311648A (en) * 2020-02-06 2020-06-19 青岛理工大学 Method for tracking human hand-object interaction process based on collaborative differential evolution filtering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120113223A1 (en) * 2010-11-05 2012-05-10 Microsoft Corporation User Interaction in Augmented Reality
CN102148921A (en) * 2011-05-04 2011-08-10 中国科学院自动化研究所 Multi-target tracking method based on dynamic group division
CN110007754A (en) * 2019-03-06 2019-07-12 清华大学 The real-time reconstruction method and device of hand and object interactive process
CN111311648A (en) * 2020-02-06 2020-06-19 青岛理工大学 Method for tracking human hand-object interaction process based on collaborative differential evolution filtering

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
I. OIKONOMIDIS ; N. KYRIAZIS ; A. A. ARGYROS: "Tracking the articulated motion of two strongly interacting hands", COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012 IEEE CONFERENCE ON, IEEE, 16 June 2012 (2012-06-16), pages 1862 - 1869, XP032232284, ISBN: 978-1-4673-1226-4, DOI: 10.1109/CVPR.2012.6247885 *
LI DONGNIAN, ZHOU YIQI: "Combining Differential Evolution with Particle Filtering for Articulated Hand Tracking from Single Depth Images", INTERNATIONAL JOURNAL OF SIGNAL PROCESSING, IMAGE PROCESSING AND PATTERN RECOGNITION, vol. 8, no. 4, 30 April 2015 (2015-04-30), pages 237 - 248, XP055833579, ISSN: 2005-4254, DOI: 10.14257/ijsip.2015.8.4.21 *
LI DONGNIAN; CHEN CHENGJUN: "Tracking a hand in interaction with an object based on single depth images", MULTIMEDIA TOOLS AND APPLICATIONS., KLUWER ACADEMIC PUBLISHERS, BOSTON., US, vol. 78, no. 6, 30 July 2018 (2018-07-30), US, pages 6745 - 6762, XP036755923, ISSN: 1380-7501, DOI: 10.1007/s11042-018-6452-0 *
LI, DONGNIAN: "Research on 3D Hand Motion Tracking Based on Depth Images", DOCTORAL DISSERTATIONS, 31 May 2015 (2015-05-31), pages 1 - 124, XP009529615, ISSN: 1674-022X *
WANG PEICHONG, HE YI-CHAO, QIAN XU: "Cooperation Differential Evolution Algorithm with Double Populations and Two Evolutionary Models", COMPUTER ENGINEERING AND APPLICATIONS, HUABEI JISUAN JISHU YANJIUSUO, CN, vol. 44, no. 25, 1 January 2008 (2008-01-01), CN, pages 60 - 64, XP055833938, ISSN: 1002-8331, DOI: 10.3778/j.issn.1002- 8331.2008.25.019 *

Also Published As

Publication number Publication date
CN111311648A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
Jiang et al. Hand-object contact consistency reasoning for human grasps generation
Palafox et al. Npms: Neural parametric models for 3d deformable shapes
Bandini et al. Analysis of the hands in egocentric vision: A survey
Oberweger et al. Deepprior++: Improving fast and accurate 3d hand pose estimation
Stenger et al. Model-based hand tracking using a hierarchical bayesian filter
Dockstader et al. Multiple camera tracking of interacting and occluded human motion
Gall et al. Optimization and filtering for human motion capture: A multi-layer framework
Wang et al. Hmor: Hierarchical multi-person ordinal relations for monocular multi-person 3d pose estimation
Liang et al. Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization
Pons-Moll et al. Model-based pose estimation
Lei et al. Cadex: Learning canonical deformation coordinate space for dynamic surface representation via neural homeomorphism
Krejov et al. Combining discriminative and model based approaches for hand pose estimation
Li et al. Hmor: Hierarchical multi-person ordinal relations for monocular multi-person 3d pose estimation
Zhou et al. Toch: Spatio-temporal object correspondence to hand for motion refinement
Huang et al. Tracking-by-detection of 3d human shapes: from surfaces to volumes
JP2008140101A (en) Unconstrained and real-time hand tracking device using no marker
Lifkooee et al. Real-time avatar pose transfer and motion generation using locally encoded laplacian offsets
Wu et al. HPGCN: Hierarchical poselet-guided graph convolutional network for 3D pose estimation
Dani et al. 3DPoseLite: A compact 3d pose estimation using node embeddings
Baradel et al. Posebert: A generic transformer module for temporal 3d human modeling
WO2021155653A1 (en) Human hand-object interaction process tracking method based on collaborative differential evolution filtering
Zangeneh et al. A probabilistic framework for visual localization in ambiguous scenes
Ikram et al. Real time hand gesture recognition using leap motion controller based on CNN-SVM architechture
Li et al. Human motion representation and motion pattern recognition based on complex fuzzy theory
CN112199994A (en) Method and device for detecting interaction between 3D hand and unknown object in RGB video in real time

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20917332

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20917332

Country of ref document: EP

Kind code of ref document: A1