CN114882113B

CN114882113B - Five-finger mechanical dexterous hand grabbing and transferring method based on shape correspondence of similar objects

Info

Publication number: CN114882113B
Application number: CN202210560549.0A
Authority: CN
Inventors: 延建行; 文洪涛; 彭万里; 孙怡
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2022-05-23
Filing date: 2022-05-23
Publication date: 2024-09-27
Anticipated expiration: 2042-05-23
Also published as: CN114882113A

Abstract

The invention discloses a five-finger mechanical dexterous hand grabbing and transferring method based on the shape correspondence of similar objects, the method uses the correspondence relation of the shapes of similar objects as a bridge, and transfers the grabbing mode of a mechanical dexterous hand from a source object to a similar target object. According to the invention, on one hand, the estimation of the grabbing pose of the same kind of object can be realized by only marking one instance in the same kind of object, and the grabbing marking difficulty is greatly reduced. On the other hand, the grabbing marking of the single instance object in the class can be completed quickly by manual work, and the grabbing gesture after migration is ensured to conform to the grabbing operation habit of human beings.

Description

Five-finger mechanical dexterous hand grabbing and transferring method based on shape correspondence of similar objects

Technical Field

The invention relates to a five-finger mechanical dexterous hand grabbing pose estimation method, in particular to a five-finger mechanical dexterous hand grabbing and moving method based on the shape correspondence of similar objects, which estimates the shape correspondence relation of the similar objects and further realizes the five-finger mechanical dexterous hand grabbing pose moving method aiming at different examples of the similar objects.

Background

With the development of robot technology, intelligent service robots have been increasingly used in various aspects such as catering, medical services, life care, etc. However, due to the complexity and variety of human activity scenes, various objects are available, and the existing two-finger manipulator with only a clamping function is difficult to complete a complex service task. In recent years, the humanoid dexterous Hand represented by Shadow Hand has the flexibility of hands, can adapt to the diversity of operated objects, and provides hardware guarantee for realizing humanoid operation of a service robot. However, for a multi-finger dexterous manipulator, it is very difficult to directly estimate the gripping gesture of the dexterous manipulator on an object due to the flexibility of its high-dimensional structure and the diversity of target shapes in a scene.

In order to solve the problem, many researches realize the imitation learning of the smart hand grabbing method by tracking information such as the position, angle, speed, acceleration and the like of joints in the process of grabbing an object by a human hand and directly mapping the information to a mechanical smart hand. However, due to the diversity of the shapes of the objects and the complexity of the grabbing modes, the method is difficult to extend to unknown environments and articles, and autonomous operation of the robot is difficult to realize. In order to achieve autonomous grabbing of the smart manipulator, many researches utilize the 'force sealing' criterion of the smart manipulator at the target contact point to optimally solve grabbing positions and grabbing gestures, however, the methods only can ensure that the smart manipulator can stably grab objects, but cannot ensure that the grabbing gestures meet grabbing habits of human beings. Furthermore, such methods require a known accurate 3D model, which is difficult to obtain in an actual grabbing scenario. In recent years, the development of neural networks and deep learning techniques provides new ideas for the grasping gesture estimation of smart manipulators. Due to the strong feature extraction capability of deep learning technology, some methods directly utilize deep neural networks to fit the correspondence between the shape of an input object and the output grabbing gesture. The method relies on the labeling of a large number of grabbing postures, and for a smart manipulator with high degree of freedom, manually labeling the grabbing postures of a large number of mechanical dexterous manipulators on a large number of objects is a huge and tedious project, and the grabbing postures generated by assistance of simulation software only take stability grabbing as criteria, and the deep neural network is trained by utilizing the data set, so that the estimated grabbing postures cannot be guaranteed to conform to grabbing operation habits of human beings.

Disclosure of Invention

Although there are differences in the shape of the objects in the actual scene, the contact position and the gripping manner in which they are gripped tend to be similar for the same class of objects due to their similar functional structures and geometries. Based on the method, the five-finger mechanical dexterous hand grabbing pose migration method of different examples of similar objects is provided, the correspondence relation of the shapes of the similar objects is taken as a bridge, and the grabbing mode of the mechanical dexterous hand is migrated from one object (source object) to other objects (target objects) of the same kind. In addition, the invention regards grabbing migration as an optimization problem, designs a contact point migration objective function and an anti-collision objective function, and gradually optimizes the undershot grabbing pose through a tiny smart hand forward kinematics module.

In order to achieve the above object, the present invention has the following technical scheme:

The five-finger mechanical dexterous hand grabbing and transferring method based on the shape correspondence of similar objects comprises the following steps:

Step1, preprocessing a data set;

the dataset is made up of 3D model data for a plurality of object classes, each class containing objects of different shapes and stored in the form of a triangular mesh. The collected 3D model dataset is divided into a training set and a testing set for network training and testing, respectively. For each 3D model, the preprocessing process includes: rendering the 3D model into point clouds under different view angles; a directed distance field (SIGNED DISTANCE FIELD, SDF) is generated that covers the object-space sampling points.

Step 2, building an intra-class shape correspondence estimation network;

The intra-class shape correspondence estimation network comprises three sub-modules: a shape coding module, a shape deforming module and a shape reconstructing module. The shape coding module codes the input single-view point cloud into a shape implicit vector. The shape deformation module deforms object space sampling points of the input model to corresponding points of the shape template space in the class according to the shape implicit vector, and based on the shape dense correspondence between the objects in different shapes in the same class can be established through the shape template in the class. The shape decoding module decodes the deformed sampling point into an SDF value of the point, and can reconstruct the complete three-dimensional shape of the object by using the covering space sampling point and the SDF value thereof. The three sub-modules are all fully connected networks.

Step 3, training the intra-class shape correspondence estimation network in the step 2;

The training process is divided into two steps: the first step, the shape deformation module and the shape reconstruction module are jointly trained, the input of the network is the implicit shape vector and the object space sampling point, the deformation of the sampling point to the corresponding point of the template space is output in the middle, then the deformed sampling point is input to the shape reconstruction module, and finally the SDF value corresponding to the original sampling point is output. The shape implicit vector is randomly initialized and updated along with the network parameters. And secondly, training a shape coding module, wherein the input of the module is a single-view point cloud of the 3D model, and the output of the module is the implicit shape vector of the 3D model obtained by training in the first step.

Step 4, marking the grabbing pose of the dexterous hand on the source object;

For each category in the 3D model data set, selecting an object from the training set as a source object, and manually marking the grabbing positions of a plurality of mechanical dexterous hands on the source object, wherein the grabbing positions comprise six-degree-of-freedom positions of wrists and rotation angles of finger joints. For each grabbing pose g _i marked on a source object, firstly, acquiring the point cloud of the dexterous hand under g _i through forward kinematics, calculating the SDF value of the point cloud relative to the source object by utilizing the intra-class shape correspondence estimation network in the step 2, and marking the point with the SDF absolute value smaller than the set threshold as a point set contacted with the object on the dexterous hand Points with SDF values greater than a set threshold are marked as a set of points on the smart hand that are not in contact with an objectAt the same time, marking the point set contacted by the smart hand on the source object asPoint setAndIs one-to-one. SubsequentlyAndThe grabbing pose used in step 5 is migrated.

Step 5, transferring the grabbing pose marked on the source object to the target object;

The grabbing pose migration is to migrate grabbing labels on the source object in the step 4 to other objects in the same kind and different shapes, and the grabbing labels are input into single-view point clouds of the other objects in the same kind, wherein the single-view point clouds are acquired by a depth sensor. Firstly, reconstructing an input single-view point cloud through the intra-class shape correspondence estimation network in the step 2, establishing a correspondence relation between a reconstructed object and a source object, and transferring a point set contacted by a smart hand on the source object to a reconstructed target object on the basis. Then, the grabbing pose migration is achieved by utilizing a micro smart hand forward kinematics module and two objective functions in an iterative optimization method.

The forward kinematics module of the micro smart hand and the two objective functions are specifically as follows:

(1) The micro-dexterous hand forward kinematics module obtains the positions of all parts of the dexterous hand under a wrist coordinate system by inputting the grabbing pose of the dexterous hand. The module is mainly used for carrying out counter-propagation on the gradient of the objective function when the grabbing pose is migrated so as to gradually optimize the undersize grabbing pose. The module can be described as follows:

Wherein P ^H represents the smart hand point cloud in the initial state, A smart hand point cloud under pose g _i is represented.

(2) The contact point migration objective function L _transfer is defined as follows:

wherein, Is that the step 4 is performed by the shape dense correspondence of the source object and the target objectMigration to the target object is achieved, wherein shape-dense correspondence is established by the shape deformation module described in step 2.Is the set of points on the smart hand in contact with the object described in step 4. Point setAndEach of which contains N points and,AndRespectively representAndIs the j-th point in (a). The contact point migration objective function can adjust the rotation angle of the finger joints of the smart hand, so that the smart hand is close to a contacted point set on the surface of a target object, the smart hand is attached to the shape of the target object, and the grabbing success rate is improved.

(3) The anti-collision objective function L _collision is defined as follows:

wherein, Is the point set of the smart hand which is not contacted with the object in the step 4Through the forward kinematics module. Point setComprising a number of M points which are arranged in a row,Representing the j-th point therein. SDF (·) refers to solving the SDF value of the input point for the target object using the intra-class shape correspondence estimation network described in step 2. The anti-collision objective function can adjust the translational component of the wrist of the dexterous hand and the rotation angle of the finger joint, and effectively avoid the dexterous hand from colliding or penetrating with a target object.

In summary, the above-mentioned capture pose migration may be defined as the following optimization problem:

Where λ _transfer、λ_collision is the weight of the objective function.

In the optimization process, in order to fully exert the respective functions of the contact point migration objective function and the anti-collision objective function, the invention designs the following optimization strategy: the translational components of the wrist of the dexterous hand are first adjusted using the anti-collision objective function alone to ensure that the dexterous hand is in a reasonable gripping position overall. And then the rotation angle of the finger joints of the smart hand is adjusted by using the two objective functions together, so that the finger is attached to the surface of the target object.

The beneficial effects of the invention are as follows:

The invention designs a five-finger mechanical dexterous hand grabbing and transferring method based on the shape correspondence of similar objects, the method uses the correspondence relation of the shapes of similar objects as a bridge, and transfers the grabbing mode of a mechanical dexterous hand from a source object to a similar target object. According to the invention, on one hand, the estimation of the grabbing pose of the same kind of object can be realized by only marking one instance in the same kind of object, and the grabbing marking difficulty is greatly reduced. On the other hand, the grabbing marking of the single instance object in the class can be completed quickly by manual work, and the grabbing gesture after migration is ensured to conform to the grabbing operation habit of human beings.

Drawings

FIG. 1 is a flow chart of the method steps of the present invention.

FIG. 2 is a diagram of a network structure for intra-class correspondence estimation in accordance with the present invention.

Fig. 3 is a general diagram of the grab migration method of the present invention.

Fig. 4 (a) and (b) are a grab label display diagram and a grab migration result display diagram, respectively, in the embodiment of the present invention.

Detailed Description

The following describes the embodiments of the present invention further, taking the mug and bottle category as examples, with reference to the accompanying drawings and technical solutions.

As shown in fig. 1, the five-finger mechanical dexterous hand grabbing and transferring method based on the shape correspondence of the similar objects comprises the following steps:

And step 1, preprocessing a data set. The object 3D model dataset employs ShapeNetCore mug and bottle categories of datasets, each category containing object 3D models of different shapes, the same category model being placed in the same orientation and normalized to unit space. The model is stored in the form of a triangular mesh. The collected 3D model dataset is divided into training and testing sets in a 6:1 ratio for network training and testing, respectively. For each 3D model, the preprocessing process includes:

(1) Rendering point clouds of the 3D model under 100 different viewing angles;

(2) A directed distance field (SIGNED DISTANCE FIELD, SDF) covering 50000 sample points of the object space is generated, each sample point corresponding to an SDF value, the SDF representing the closest distance of the sample point to the object surface, the SDF value being positive when the sample point is outside the object, the SDF value being negative when the sample point is inside the object, the SDF value being zero when the sample point is on the object surface.

And 2, building an intra-class shape correspondence estimation network. As shown in fig. 2, the intra-class shape correspondence estimation network includes three sub-modules: a shape coding module, a shape deforming module and a shape reconstructing module. The shape coding module codes the input single-view point cloud into a shape implicit vector. Specifically, for a single view point cloud with the number of n, firstly, the single view point cloud is sent to a Multi-Layer persistence (MLP) with shared weight, features are extracted from each point, then, feature fusion is carried out on the features extracted from the n points through a maximum value pooling Layer, and finally, 128-dimensional shape implicit vector is output after the fused features pass through another MLP. The shape deformation module deforms object space sampling points of the input model to corresponding points of the shape template space in the class according to the shape implicit vector, and based on the shape deformation module, dense correspondence between objects in different shapes in the class can be established through the shape template. Specifically, m points are sampled in a normalization space, and then the m points and the 128-dimensional shape implicit vector output by the shape coding module are sent into one MLP together to obtain the deformation of the sampling points to the corresponding points of the templates, so that the corresponding points of the sampling points in the template space are found. The shape decoding module decodes the deformed sampling point into an SDF value of the sampling point through another MLP, and then obtains the complete three-dimensional shape of the reconstructed object by using Marching Cubes algorithm.

And 3, training the intra-class shape correspondence estimation network in the step 2. The training process is divided into two steps: the first step, the shape deformation module and the shape reconstruction module are jointly trained, the network input is the implicit shape vector and the sampling point obtained by preprocessing in the step 1, the deformation of the sampling point to the corresponding point of the template space is output in the middle, then the deformed sampling point is input to the shape reconstruction module, and finally the SDF value corresponding to the original sampling point is output. The shape implicit vector is randomly initialized with a normal distribution N (0,0.01) and updated along with the network parameters. The loss functions used include: SDF value constraint L _sdf, normal constraint L _normal, shape implicit vector regularization term L _code, normal consistency constraint upon deformation L _{deform_normal}, and smoothness constraint L _{deform_smooth}, the specific formulas are as follows.

Wherein Ω represents a 3D space in which the object is located, S represents a shape surface of the object, S _pred and S _gt represent a predicted value and a true value of SDF, respectively, λ is a penalty coefficient, λ > 1; n _pred represents the normal direction of the object surface obtained by deriving the coordinates of the sampling points by using the SDF predicted value, and n _gt represents the true value of the normal direction; alpha represents a shape implicit vector; representing a normal direction obtained by deriving coordinates of the deformed sampling points by using the SDF predicted value; δ _pred = { Δx, Δy, Δz } represents the deformation amount corresponding to the sampling point p.

And step two, training a shape coding module, inputting the single-view point cloud obtained by preprocessing in the step 1, and outputting the shape implicit vector of the 3D model obtained by training in the step one. The loss function is the implicit vector loss L _α of the shape, and the formula is as follows. Where α _pred and α _gt represent the predicted and actual values, respectively, of the shape implicit vector.

L_α＝|α_pred-α_gt|

The two steps complete training on NVIDIA GTX 1080 display card, the training process uses Adam optimizer, and the initial learning rate is 1e-4. After the first step of training convergence, evaluating a training result by calculating the chamfering distance between the reconstructed three-dimensional shape and the true value of the three-dimensional shape, and taking a network parameter with the minimum chamfering distance as a final model of the shape deformation module and the shape reconstruction module; and after the second step of training convergence, calculating an L1 distance evaluation training result of the predicted value and the true value of the implicit shape variable, and taking the network parameter with the minimum L1 distance as a final model of the shape coding module.

And 4, marking the grabbing pose of the dexterous hand on the source object. For each category in the 3D model data set, selecting an object from the training set as a source object, and manually marking a plurality of grabbing positions on the source object by using a Shadow Hand dexterous Hand, wherein the grabbing positions comprise six-degree-of-freedom positions [ R, t ] of a wrist and rotation angles theta of 22 finger joints. Then, the grabbing label is ensured to be successfully grabbed in a simulation environment. The capturing of the label is shown in fig. 4 (a), and the simulation environment used in this embodiment is Isaac Gym. In addition, for each grabbing pose g _i marked on the source object, firstly, acquiring a point cloud of the dexterous hand under g _i through forward kinematics, and calculating an SDF value (the implicit vector of the shape of the source object is known) of the point cloud on the source object by utilizing the network in the step 2, wherein points with the SDF absolute value smaller than a threshold value (5 e-3) are marked as a point set which is in contact with the object on the dexterous handPoints with SDF values greater than the threshold (1 e-2) are marked as a set of points on the smart hand that are not in contact with an objectAt the same time, marking the point set contacted by the smart hand on the source object asPoint setAndIs one-to-one. SubsequentlyAndThe grabbing pose used in step 5 is migrated.

And 5, transferring the grabbing pose labels of the source objects to the target objects.

The grabbing gesture migration is to migrate grabbing labels on the source object in the step 4 to other objects in the same kind and different shapes, and the grabbing labels are input into single-view point clouds of the other objects in the same kind, wherein the single-view point clouds are acquired by a depth sensor. Firstly, reconstructing an object from an input single-view point cloud through the network in the step 2, establishing a corresponding relation between the reconstructed object and a source object, and transferring a contact point set on the source object to a reconstructed target object on the basis. Then, the grasping pose migration is realized by using a micro smart hand forward kinematics module and two objective functions in an iterative optimization method, as shown in fig. 3. The forward kinematics module of the micro smart hand and the two objective functions are specifically as follows:

Wherein P ^H represents the smart hand point cloud in the initial state, A smart hand point cloud under the grasp gesture g _i is represented as shown in fig. 3. The forward kinematics module is implemented using Pytorch frames.

wherein, Is that the step 4 is performed by the shape dense correspondence of the source object and the target objectMigration to the target object is achieved, wherein shape-dense correspondence is established by the shape deformation module described in step 2.Is the set of points on the smart hand that contact the object described in step 4. Point setAndEach of which contains N points and,AndRespectively representAndIs the j-th point in (a). The contact point migration objective function can adjust the rotation angle of the finger joints of the smart hand, so that the smart hand is close to a contacted point set on the surface of a target object, the smart hand is attached to the shape of the target object, and the grabbing success rate is improved.

(3) The anti-collision objective function L _collision is defined as follows:

wherein, Is the non-contact point set of the smart hand point cloud in the step 4Through the forward kinematics module. Point setComprising a number of M points which are arranged in a row,Representing the j-th point therein. SDF (·) refers to solving the SDF value of the input point for the target object using the network described in step 2. The anti-collision objective function can adjust the translational component of the wrist of the dexterous hand and the rotation angle of the finger joint, and effectively avoid the dexterous hand from colliding or penetrating with a target object.

where λ _transfer、λ_collision is the weight of the objective function. In the optimization process, in order to fully exert the respective functions of the contact point migration objective function and the anti-collision objective function, an optimization strategy is designed in the invention: the translational component t of the wrist is first adjusted using the anti-collision objective function alone to ensure that the Shadow Hand dexterous Hand is in a reasonable gripping position overall. And then, the rotation angle theta of the finger joints in the grabbing pose is adjusted by using the two objective functions together, so that the fingers are attached to the surface of the target object. The optimization process uses an Adam optimizer, the learning rate is set to be 1e-3, and the iterative optimization times are 200.

And 6, testing the grabbing success rate of grabbing the pose after migration in a simulation environment.

The embodiment adopts Isaac Gym simulation environment test grabbing, and the environment can be used for parallel test of a plurality of objects. The experimental setup was as follows: the first stage, setting a simulation environment as zero gravity, suspending a 3D model of a target object, setting a dexterous hand according to a grabbing pose, performing penetration detection at the moment, and when the dexterous hand and the object are subjected to large penetration, bouncing the object by the dexterous hand, wherein grabbing fails; and after 3 seconds, the device enters a second stage, gravity is recovered, the grabbing stability is detected, and the object can fall off when the grabbing stability is weaker. The number of successful grips that could be achieved was counted after 10 seconds of gravity. The invention judges whether the grabbing is successful or not according to the change of the pose of the object before and after grabbing. Gripping is considered successful only when the amount of change in position of the object before and after being gripped is less than Δt _max, while the amount of change in angle is less than Δθ _max. Specific calculation formulas of the position change amount Δt ^obj and the angle change amount Δθ ^obj of the object are as follows.

Wherein, A rotation matrix and a translation vector representing an initial pose of the object,A rotation matrix and translation vector representing the final pose of the object. tr (·) represents the trace of the matrix.

Table 1 shows the quantitative evaluation results of the two objective functions of the contact point migration objective function and the anti-collision objective function. The invention provides the grabbing success rate when the maximum deviation delta t _max of the target object position is 3 cm and the maximum deviation delta theta _max of the angle is 5 degrees, 15 degrees and 25 degrees respectively. A strategy that does not employ any objective function refers to copying the capture of the annotation on the source object directly onto the target object. Compared with direct copying and grabbing, the grabbing success rate is effectively improved when two objective functions are independently used for optimization; in addition, performance improvement is particularly evident in the presence of contact point migration objective functions. When two objective functions are simultaneously used for optimization, the grabbing success rate reaches the optimal performance. Fig. 4 (b) shows the result of partial successful grabbing with the object pose deviation less than 5 ° and 3 cm.

TABLE 1

Claims

1. The five-finger mechanical dexterous hand grabbing and transferring method based on the shape correspondence of similar objects is characterized by comprising the following steps of:

Step1, preprocessing a data set;

The data set consists of 3D model data of a plurality of object categories, each category comprises objects with different shapes and is stored in a triangular grid form; the collected 3D model data set is divided into a training set and a testing set which are respectively used for network training and testing; for each 3D model, the preprocessing process includes: rendering the 3D model into point clouds under different view angles; generating a directed distance field SDF covering the spatial sampling points;

step 2, building an intra-class shape correspondence estimation network;

The intra-class shape correspondence estimation network comprises three sub-modules: a shape coding module, a shape deforming module and a shape reconstructing module; the shape coding module codes the input single-view point cloud into a shape implicit vector; the shape deformation module deforms object space sampling points of the input model to corresponding points of an intra-class shape template space according to the shape implicit vector, and based on the shape dense correspondence between similar objects with different shapes is established through the intra-class shape template; the shape decoding module decodes the deformed sampling points into SDF values of the points, and reconstructs the complete three-dimensional shape of the object by using the covering space sampling points and the SDF values thereof; the three sub-modules are all fully connected networks;

the training process is divided into two steps: the method comprises the steps of firstly, jointly training a shape deformation module and a shape reconstruction module, wherein the input of a network is a shape implicit vector and an object space sampling point, outputting the deformation of the sampling point to a template space corresponding point in the middle, inputting the deformed sampling point to the shape reconstruction module, and finally outputting the SDF value corresponding to an original sampling point; the shape implicit vector is randomly initialized and updated along with the network parameters; training a shape coding module, wherein the input of the module is single-view point cloud of the 3D model, and the output of the module is the implicit shape vector of the 3D model obtained by training in the first step;

step 4, marking the grabbing pose of the dexterous hand on the source object;

For each category in the 3D model data set, selecting an object from the training set as a source object, and manually marking a plurality of grabbing positions of the mechanical dexterous hands on the source object, wherein the grabbing positions comprise six-degree-of-freedom positions of wrists and rotation angles of finger joints; for each grabbing pose g _i marked on a source object, firstly, acquiring the point cloud of the dexterous hand under g _i through forward kinematics, calculating the SDF value of the point cloud relative to the source object by utilizing the intra-class shape correspondence estimation network in the step 2, marking the point with the SDF absolute value smaller than a set threshold as a point set P _i ^C contacted with the object on the dexterous hand, and marking the point with the SDF value larger than the set threshold as a point set not contacted with the object on the dexterous hand Meanwhile, marking a point set contacted by a smart hand on a source object as P _i ^O, wherein each point in the point sets P _i ^C and P _i ^O is in one-to-one correspondence;

The grabbing pose migration is to migrate grabbing labels on the source object in the step 4 to other objects in the same kind and different shapes, and the grabbing labels are input into single-view point clouds of the other objects in the same kind, wherein the single-view point clouds are acquired by a depth sensor; firstly, reconstructing an input single-view point cloud through the intra-class shape correspondence estimation network in the step 2, establishing a correspondence relation between a reconstructed object and a source object, and transferring a point set contacted by a smart hand on the source object to a reconstructed target object on the basis; then, utilizing a micro smart hand forward kinematics module and two objective functions to realize the grabbing pose migration by an iterative optimization method;

(1) The micro-dexterous hand forward kinematics module obtains the positions of all parts of the dexterous hand under a wrist coordinate system by inputting the grabbing pose of the dexterous hand; the module is used for carrying out counter-propagation on the gradient of the objective function during the transfer of the grabbing pose so as to gradually optimize the undersize grabbing pose, and is described as follows:

P_i ^H＝FK(g_i,P^H)

Wherein, P ^H represents the smart hand point cloud in the initial state, and P _i ^H represents the smart hand point cloud in the pose g _i;

Wherein Q _i ^O is obtained by migrating P _i ^O described in step 4 to the target object through the shape-dense correspondence of the source object and the target object, wherein the shape-dense correspondence is established by the shape deformation module described in step 2; p _i ^C is the set of points on the smart hand in contact with the object described in step 4; point set And P _i ^C each contain N points,AndRespectively representAnd the j-th point in P _i ^C;

(3) The collision avoidance objective function L _collision is defined as follows:

wherein, Is the point set of the smart hand which is not contacted with the object in the step 4The position after passing through the forward kinematics module; point setComprising a number of M points which are arranged in a row,Represents the j-th point therein; SDF (·) refers to solving the SDF value of the input point for the target object using the intra-class shape correspondence estimation network described in step 2;

In summary, the capture pose migration is defined as the following optimization problem:

Where λ _transfer、λ_collision is the weight of the objective function;

In the optimization process, firstly, independently using an anti-collision objective function to adjust the translational component of the wrist of the smart hand so as to ensure that the whole smart hand is in a reasonable grabbing position; and then the rotation angle of the finger joints of the smart hand is adjusted by using the two objective functions together, so that the finger is attached to the surface of the target object.