CN111124107A - Hand and object complex interaction scene reconstruction method and device - Google Patents

Hand and object complex interaction scene reconstruction method and device Download PDF

Info

Publication number
CN111124107A
CN111124107A CN201911113777.8A CN201911113777A CN111124107A CN 111124107 A CN111124107 A CN 111124107A CN 201911113777 A CN201911113777 A CN 201911113777A CN 111124107 A CN111124107 A CN 111124107A
Authority
CN
China
Prior art keywords
data
hand
segmentation
prediction
rgbd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911113777.8A
Other languages
Chinese (zh)
Inventor
徐枫
张�浩
薄子豪
杨东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201911113777.8A priority Critical patent/CN111124107A/en
Publication of CN111124107A publication Critical patent/CN111124107A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for reconstructing a complex interaction scene of a hand and an object, wherein the method comprises the following steps: collecting an RGBD sequence of a hand-object interaction scene by using a single RGBD camera to obtain an RGBD image; the RGBD image is sent into a gesture prediction neural network for prediction, and left-hand gesture prediction data and right-hand gesture prediction data are obtained; sending the RGBD image into a segmentation recognition neural network to obtain left hand data, right hand data and segmentation data of different objects; and fusing the depth data and the color data of different objects obtained by segmentation into an object model to obtain a final object model. The method can reconstruct three-dimensional information of the interaction process from the sequence acquired by the single RGBD camera aiming at the complex interaction process of the human hand and the object, can obtain the gesture motion of the human hand and the surface of the object, and effectively solves the problems of high complexity and uncertain strength in the process of reconstructing the complex interaction process based on the single RGBD camera.

Description

Hand and object complex interaction scene reconstruction method and device
Technical Field
The invention relates to the technical field of three-dimensional reconstruction, in particular to a method and a device for reconstructing a complex interaction scene of a hand and an object.
Background
In people's daily life, it is the most common practice for a person to interact with different objects in the environment using their hands. Rebuilding the interaction process of the human hand and the object has very important value for AR/VR, human-computer interaction and intelligent robots.
The process of hand-object interaction contains rich information. In many applications, such as the field of AR and intelligent robots, reconstruction of the interaction process between an hand and an object is required to obtain three-dimensional information of the interaction process. In reality, the process of interaction between the hand and the object is complex. The complexity of the method is represented by complex actions of hands, complex interactive objects and strong ambiguity caused by mutual shielding between the hands and the objects in the interactive process.
The reconstruction using a single RGBD camera has the advantage of simple system, but the use of a single camera also makes the viewing angle of observation single, limits the amount of effective information that can be obtained, and increases the difficulty of reconstruction. In summary, reconstructing the interactive process based on a single RGBD camera is a very meaningful and at the same time very challenging task.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, one object of the present invention is to provide a hand and object complex interaction scene reconstruction method, which can reconstruct three-dimensional information of an interaction process from a sequence acquired by a single RGBD camera for a human hand and object complex interaction process, can obtain gesture motion of the human hand and a surface of the object, and effectively solve the problems of high complexity and uncertainty in the reconstruction of the complex interaction process based on the single RGBD camera.
Another object of the present invention is to provide a hand and object complex interaction scene reconstruction apparatus.
In order to achieve the above object, an embodiment of the present invention provides a method for reconstructing a complex interaction scene between a hand and an object, including the following steps: collecting an RGBD sequence of a hand-object interaction scene by using a single RGBD camera to obtain an RGBD image; the RGBD image is sent into a gesture prediction neural network for prediction, and left-hand gesture prediction data and right-hand gesture prediction data are obtained; sending the RGBD image into a segmentation recognition neural network to obtain left hand data, right hand data and segmentation data of different objects; and fusing the depth data and the color data of different objects obtained by segmentation into an object model to obtain a final object model.
According to the method for reconstructing the complex interaction scene of the hand and the object, the reconstruction is performed based on a single RGBD camera, and the system is simple; not only the movement of the human hand but also the geometry and the movement of the object are reconstructed, and the information reconstruction is complete; and can handle the complicated interactive situation comprising both hands and many objects; the method is combined with a complex interaction process reconstruction scheme of a human hand posture estimation method, an object recognition segmentation method, a unified energy optimization method and a multi-object reconstruction method, and finally complete three-dimensional information of an interaction process is obtained, so that the three-dimensional information of the interaction process can be reconstructed from a sequence acquired by a single RGBD camera aiming at the complex interaction process of a human hand and an object, the posture motion of the human hand and the surface of the object can be obtained, and the problems of high complexity and uncertainty in the process of reconstructing the complex interaction process based on the single RGBD camera are effectively solved.
In addition, the hand and object complex interaction scene reconstruction method according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the acquiring the RGBD sequence includes: the RGB information and the D information are aligned.
Further, in an embodiment of the present invention, the RGBD image is fed into a gesture prediction neural network for prediction, which includes: and using an open source library OpenPose or training on the basis of the open source library to obtain the left-hand posture prediction data and the right-hand posture prediction data.
Further, in an embodiment of the present invention, the feeding the RGBD image into a segmentation recognition neural network includes: and training by using Mask-RCNN or on the basis of the Mask-RCNN to obtain the left-hand data, the right-hand data and the segmentation data of the different objects.
Further, in an embodiment of the present invention, the fusing the depth data and the color data of the different segmented objects into the object model includes: and performing unified optimization motion solution according to the left-hand posture prediction data, the right-hand posture prediction data, the left-hand data, the right-hand data and the segmentation data of different objects to obtain a final object surface reconstruction result.
In order to achieve the above object, an embodiment of another aspect of the present invention provides a device for reconstructing a complex interaction scene between a hand and an object, including: the acquisition module is used for acquiring an RGBD sequence of a hand-object interaction scene by using a single RGBD camera to obtain an RGBD image; the prediction module is used for sending the RGBD image into a gesture prediction neural network for prediction to obtain left-hand gesture prediction data and right-hand gesture prediction data; the segmentation module is used for sending the RGBD image into a segmentation recognition neural network to obtain left hand data, right hand data and segmentation data of different objects; and the fusion module is used for fusing the depth data and the color data of the different objects obtained by segmentation into the object model to obtain the final object model.
The hand and object complex interaction scene reconstruction device provided by the embodiment of the invention is based on a single RGBD camera for reconstruction, and the system is simple; not only the movement of the human hand but also the geometry and the movement of the object are reconstructed, and the information reconstruction is complete; and can handle the complicated interactive situation comprising both hands and many objects; the method is combined with a complex interaction process reconstruction scheme of a human hand posture estimation method, an object recognition segmentation method, a unified energy optimization method and a multi-object reconstruction method, and finally complete three-dimensional information of an interaction process is obtained, so that the three-dimensional information of the interaction process can be reconstructed from a sequence acquired by a single RGBD camera aiming at the complex interaction process of a human hand and an object, the posture motion of the human hand and the surface of the object can be obtained, and the problems of high complexity and uncertainty in the process of reconstructing the complex interaction process based on the single RGBD camera are effectively solved.
In addition, the hand and object complex interaction scene reconstruction device according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the acquisition module is further configured to align the RGB information and the D information.
Further, in an embodiment of the present invention, the prediction module is further configured to use an open source library openpos, or perform training on the basis of the open source library, to obtain the left-hand posture prediction data and the right-hand posture prediction data.
Further, in an embodiment of the present invention, the segmentation module is further configured to use Mask-RCNN or train on the basis of the Mask-RCNN to obtain the left-hand data, the right-hand data, and the segmentation data of the different objects.
Further, in an embodiment of the present invention, the fusion module is further configured to perform a unified optimization motion solution according to left-hand pose prediction data, right-hand pose prediction data, left-hand data, right-hand data, and segmentation data of different objects, so as to obtain the final object surface reconstruction result.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart of a method for reconstructing a complex interaction scene between a hand and an object according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for reconstructing a complex interaction scene of a hand and an object according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a hand-object complex interaction scene reconstruction apparatus according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The method and the device for reconstructing a complex interaction scene between a hand and an object according to an embodiment of the present invention are described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a method for reconstructing a complex interaction scene between a hand and an object according to an embodiment of the present invention.
As shown in FIG. 1, the method for reconstructing the complex interaction scene of the hand and the object comprises the following steps:
in step S101, an RGBD sequence of a scene where a hand and an object interact with each other is acquired by using a single RGBD camera, so as to obtain an RGBD image.
In one embodiment of the present invention, the RGBD sequence is acquired, and includes: the RGB information and the D information are aligned.
It is understood that the single RGBD camera may be a Realsense SR300 camera, and there are many types of single RGBD cameras, which are only used as an example to avoid redundancy, and are not limited in detail. For example, as shown in fig. 2, an embodiment of the present invention may acquire an RGBD sequence of a hand-object interaction scene using a Realsense SR300 camera. It should be noted that since the RGB information and the D information originate from two cameras, the RGB-D information needs to be aligned.
It should be noted that, in the embodiment of the present invention, an RGBD image with a resolution of 640 × 480 is used, but of course, images with other resolutions may also be used, and are not limited in particular.
In step S102, the RGBD image is sent to a gesture prediction neural network for prediction, so as to obtain left-hand gesture prediction data and right-hand gesture prediction data.
It can be understood that, as shown in fig. 2, the RGBD image is input into the gesture prediction neural network for prediction, so as to obtain left-hand and right-hand posture prediction data.
Further, in an embodiment of the present invention, the RGBD image is fed into a gesture prediction neural network for prediction, which includes: and (3) training by using an open source library OpenPose or on the basis of the open source library to obtain the left-hand posture prediction data and the right-hand posture prediction data.
It can be understood that, in order to estimate the hand posture information from the input RGBD image, the open source library OpenPose may be directly used, or further training may be performed on the basis of the open source library to adapt to the input of the system, so as to obtain more accurate predicted information of the hand posture.
Specifically, the embodiment of the invention trains a network capable of predicting the hand posture in the interaction process by using a deep neural network method. The existing research proves that the deep neural network has better performance in the estimation of the human posture and the hand posture, and a feasible solution which is closer to a true value can be obtained. The estimation of the hand posture in the interaction obtained by the deep neural network can provide a better initial solution for the visible hand part of the single camera and a reasonable solution for the invisible hand part.
In step S103, the RGBD image is sent to a segmentation recognition neural network, so as to obtain left-hand data, right-hand data, and segmentation data of different objects.
It can be understood that, as shown in fig. 2, the RGBD image is fed into the segmentation recognition neural network, and left-hand data, right-hand data, and segmentation data of different objects are obtained.
Further, in an embodiment of the present invention, the feeding the RGBD image into the segmentation recognition neural network includes: and (4) performing training by using Mask-RCNN or on the basis of the Mask-RCNN to obtain left-hand data, right-hand data and segmentation data of different objects.
It will be appreciated that in order to achieve example segmentation data obtained from the input RGBD image, Mask-RCNN may be used directly or further training may be performed on the basis thereof to obtain better results. The focus of this step is on the acquisition of training data.
Specifically, the embodiment of the invention trains a network capable of performing example segmentation on data acquired by a single RGBD camera by using a neural network method. By utilizing the segmentation recognition network, the collected left-hand and right-hand data and the data of a plurality of objects in the interactive process can be separated, and guidance is provided for solving the motion of the human hand and the plurality of objects in a unified energy optimization manner and reconstructing the surfaces of the plurality of objects.
In step S104, the depth data and color data of the different objects obtained by the segmentation are fused into the object model to obtain a final object model.
In one embodiment of the present invention, fusing the depth data and the color data of the different segmented objects into an object model includes: and performing unified optimization motion solution according to the left-hand posture prediction data, the right-hand posture prediction data, the left-hand data, the right-hand data and the segmentation data of different objects to obtain a final object surface reconstruction result.
It can be understood that, as shown in fig. 2, the embodiment of the present invention sends the predicted left-hand and right-hand postures, the left-hand and right-hand depth data obtained by segmentation, and the depth and color data of the object into the unified energy optimization framework for solution, so as to obtain the accurate posture of the hand and the motion of the plurality of objects. And on the basis of obtaining accurate object motion, fusing depth data and color data of different objects obtained by example segmentation into an object model to finally obtain a complete object model.
Specifically, on the basis of obtaining the gesture of the hand in interaction for rough gesture estimation and preliminary data segmentation, the accurate gesture of the hand and the motion of each object are obtained by utilizing unified energy optimization. And then fusing the data of each object obtained by segmentation by using the motion information obtained by solving. The gestural motion of the human hand and the complete surface of the plurality of objects in the interaction are finally reconstructed.
In summary, the hand and object complex interaction scene reconstruction method provided by the embodiment of the invention is based on reconstruction by a single RGBD camera, and the system is simple; not only the movement of the human hand but also the geometry and the movement of the object are reconstructed, and the information reconstruction is complete; and can handle the complicated interactive situation comprising both hands and many objects; the method is combined with a complex interaction process reconstruction scheme of a human hand posture estimation method, an object recognition segmentation method, a unified energy optimization method and a multi-object reconstruction method, and finally complete three-dimensional information of an interaction process is obtained, so that the three-dimensional information of the interaction process can be reconstructed from a sequence acquired by a single RGBD camera aiming at the complex interaction process of a human hand and an object, the posture motion of the human hand and the surface of the object can be obtained, and the problems of high complexity and uncertainty in the process of reconstructing the complex interaction process based on the single RGBD camera are effectively solved.
The hand and object complex interaction scene reconstruction device proposed by the embodiment of the invention is described next with reference to the attached drawings.
Fig. 3 is a schematic structural diagram of a hand-object complex interaction scene reconstruction apparatus according to an embodiment of the present invention.
As shown in fig. 3, the hand and object complex interaction scene reconstruction apparatus 10 includes: an acquisition module 100, a prediction module 200, a segmentation module 300, and a fusion module 400.
The acquisition module 100 is configured to acquire an RGBD sequence of a hand-object interaction scene by using a single RGBD camera to obtain an RGBD image. The prediction module 200 is configured to send the RGBD image to a gesture prediction neural network for prediction, so as to obtain left-hand gesture prediction data and right-hand gesture prediction data. The segmentation module 300 is configured to send the RGBD image to a segmentation recognition neural network, so as to obtain left-hand data, right-hand data, and segmentation data of different objects. The fusion module 400 is configured to fuse the depth data and the color data of different segmented objects into an object model to obtain a final object model. The device 10 provided by the embodiment of the invention can reconstruct three-dimensional information of the interaction process from the sequence acquired by a single RGBD camera aiming at the complex interaction process of the human hand and the object, can obtain the gesture movement of the human hand and the surface of the object, and effectively solves the problems of high complexity and uncertain strength in the process of reconstructing the complex interaction process based on the single RGBD camera.
Further, in an embodiment of the present invention, the acquisition module 100 is further configured to align the RGB information and the D information.
Further, in an embodiment of the present invention, the prediction module 200 is further configured to use an open source library openpos, or perform training on the basis of the open source library, to obtain left-hand posture prediction data and right-hand posture prediction data.
Further, in an embodiment of the present invention, the segmentation module 300 is further configured to use or train on the basis of the Mask-RCNN to obtain left-hand data, right-hand data, and segmentation data of different objects.
Further, in an embodiment of the present invention, the fusion module 400 is further configured to perform a unified optimization motion solution according to the left-hand posture prediction data, the right-hand posture prediction data, the left-hand data, the right-hand data, and the segmentation data of different objects, so as to obtain a final object surface reconstruction result.
It should be noted that the explanation of the foregoing embodiment of the method for reconstructing a complex interaction scene between an opponent and an object is also applicable to the apparatus for reconstructing a complex interaction scene between an opponent and an object in this embodiment, and is not repeated here.
According to the hand and object complex interaction scene reconstruction device provided by the embodiment of the invention, reconstruction is carried out based on a single RGBD camera, and the system is simple; not only the movement of the human hand but also the geometry and the movement of the object are reconstructed, and the information reconstruction is complete; and can handle the complicated interactive situation comprising both hands and many objects; the method is combined with a complex interaction process reconstruction scheme of a human hand posture estimation method, an object recognition segmentation method, a unified energy optimization method and a multi-object reconstruction method, and finally complete three-dimensional information of an interaction process is obtained, so that the three-dimensional information of the interaction process can be reconstructed from a sequence acquired by a single RGBD camera aiming at the complex interaction process of a human hand and an object, the posture motion of the human hand and the surface of the object can be obtained, and the problems of high complexity and uncertainty in the process of reconstructing the complex interaction process based on the single RGBD camera are effectively solved.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1.一种手与物体复杂交互场景重建方法,其特征在于,包括以下步骤:1. A method for reconstructing a complex interaction scene between a hand and an object, comprising the following steps: 利用单RGBD相机采集手与物体交互场景的RGBD序列,得到RGBD图像;Use a single RGBD camera to collect the RGBD sequence of the interaction scene between the hand and the object to obtain an RGBD image; 将所述RGBD图像送入手势预测神经网络中进行预测,得到左手的姿态预测数据与右手的姿态预测数据;The RGBD image is sent into the gesture prediction neural network for prediction, and the gesture prediction data of the left hand and the gesture prediction data of the right hand are obtained; 将所述RGBD图像送入分割识别神经网络中,得到左手数据、右手数据、不同物体的分割数据;以及The RGBD image is sent into the segmentation and recognition neural network to obtain left-hand data, right-hand data, and segmentation data of different objects; and 将分割所得的不同物体的深度数据和颜色数据融合进物体模型中,得到最终的物体模型。The depth data and color data of different objects obtained by segmentation are fused into the object model to obtain the final object model. 2.根据权利要求1所述的方法,其特征在于,所述采集所述RGBD序列,包括:2. The method according to claim 1, wherein the collecting the RGBD sequence comprises: 对RGB信息和D信息进行对齐。Align RGB information and D information. 3.根据权利要求1所述的方法,其特征在于,所述RGBD图像送入手势预测神经网络中进行预测,包括:3. The method according to claim 1, wherein the RGBD image is sent into a gesture prediction neural network for prediction, comprising: 使用开源库OpenPose,或者在所述开源库的基础上进行训练,以得到所述左手的姿态预测数据与所述右手的姿态预测数据。The open source library OpenPose is used, or training is performed on the basis of the open source library to obtain the posture prediction data of the left hand and the posture prediction data of the right hand. 4.根据权利要求1所述的方法,其特征在于,所述将所述RGBD图像送入分割识别神经网络中,包括:4. The method according to claim 1, wherein the sending the RGBD image into a segmentation and recognition neural network comprises: 使用Mask-RCNN,或者在此所述Mask-RCNN的基上进行训练,以得到所述左手数据、所述右手数据、所述不同物体的分割数据。Use Mask-RCNN, or perform training on the basis of the Mask-RCNN described herein, to obtain the left-hand data, the right-hand data, and the segmentation data of the different objects. 5.根据权利要求1所述的方法,其特征在于,所述将分割所得的不同物体的深度数据和颜色数据融合进物体模型中,包括:5. The method according to claim 1, wherein the depth data and color data of different objects obtained by segmentation are fused into the object model, comprising: 根据左手的姿态预测数据、所述右手的姿态预测数据、所述左手数据、所述右手数据、所述不同物体的分割数据进行统一优化运动求解,得到所述最终物体表面重建结果。According to the left-hand posture prediction data, the right-hand posture prediction data, the left-hand data, the right-hand data, and the segmentation data of different objects, a unified optimization motion solution is performed to obtain the final object surface reconstruction result. 6.一种手与物体复杂交互场景重建装置,其特征在于,包括:6. A device for reconstructing a complex interaction scene between a hand and an object, comprising: 采集模块,用于利用单RGBD相机采集手与物体交互场景的RGBD序列,得到RGBD图像;The acquisition module is used to collect the RGBD sequence of the interaction scene between the hand and the object by using a single RGBD camera to obtain an RGBD image; 预测模块,用于将所述RGBD图像送入手势预测神经网络中进行预测,得到左手的姿态预测数据与右手的姿态预测数据;The prediction module is used to send the RGBD image into the gesture prediction neural network for prediction, and obtain the gesture prediction data of the left hand and the gesture prediction data of the right hand; 分割模块,用于将所述RGBD图像送入分割识别神经网络中,得到左手数据、右手数据、不同物体的分割数据;以及A segmentation module for sending the RGBD image into a segmentation and recognition neural network to obtain left-hand data, right-hand data, and segmentation data of different objects; and 融合模块,用于将分割所得的不同物体的深度数据和颜色数据融合进物体模型中,得到最终的物体模型。The fusion module is used to fuse the depth data and color data of different objects obtained by segmentation into the object model to obtain the final object model. 7.根据权利要求6所述的装置,其特征在于,所述采集模块进一步用于对RGB信息和D信息进行对齐。7. The apparatus according to claim 6, wherein the acquisition module is further configured to align the RGB information and the D information. 8.根据权利要求6所述的装置,其特征在于,所述预测模块进一步用于使用开源库OpenPose,或者在所述开源库的基础上进行训练,以得到所述左手的姿态预测数据与所述右手的姿态预测数据。8. The apparatus according to claim 6, wherein the prediction module is further configured to use an open source library OpenPose, or perform training on the basis of the open source library, so as to obtain the posture prediction data of the left hand and all the parameters. Describe the right-hand pose prediction data. 9.根据权利要求6所述的装置,其特征在于,所述分割模块进一步用于使用Mask-RCNN,或者在此所述Mask-RCNN的基上进行训练,以得到所述左手数据、所述右手数据、所述不同物体的分割数据。9. The apparatus according to claim 6, wherein the segmentation module is further configured to use Mask-RCNN, or perform training on the basis of the Mask-RCNN, to obtain the left-hand data, the Right hand data, segmentation data of the different objects. 10.根据权利要求6所述的装置,其特征在于,所述融合模块进一步用于根据左手的姿态预测数据、所述右手的姿态预测数据、所述左手数据、所述右手数据、所述不同物体的分割数据进行统一优化运动求解,得到所述最终物体表面重建结果。10. The apparatus according to claim 6, wherein the fusion module is further configured to predict data according to the posture of the left hand, the posture prediction data of the right hand, the left hand data, the right hand data, the different The segmentation data of the object is solved by a unified optimization motion, and the final surface reconstruction result of the object is obtained.
CN201911113777.8A 2019-11-14 2019-11-14 Hand and object complex interaction scene reconstruction method and device Pending CN111124107A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911113777.8A CN111124107A (en) 2019-11-14 2019-11-14 Hand and object complex interaction scene reconstruction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911113777.8A CN111124107A (en) 2019-11-14 2019-11-14 Hand and object complex interaction scene reconstruction method and device

Publications (1)

Publication Number Publication Date
CN111124107A true CN111124107A (en) 2020-05-08

Family

ID=70495647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911113777.8A Pending CN111124107A (en) 2019-11-14 2019-11-14 Hand and object complex interaction scene reconstruction method and device

Country Status (1)

Country Link
CN (1) CN111124107A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112720504A (en) * 2021-01-20 2021-04-30 清华大学 Method and device for controlling learning of hand and object interactive motion from RGBD video
US11335007B2 (en) * 2020-05-29 2022-05-17 Zebra Technologies Corporation Method to generate neural network training image annotations

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150009415A1 (en) * 2013-07-04 2015-01-08 Canon Kabushiki Kaisha Projected user interface system for multiple users
CN107688391A (en) * 2017-09-01 2018-02-13 广州大学 A kind of gesture identification method and device based on monocular vision
CN109272513A (en) * 2018-09-30 2019-01-25 清华大学 Hand and object interactive segmentation method and device based on depth camera
CN109614882A (en) * 2018-11-19 2019-04-12 浙江大学 A violent behavior detection system and method based on human body pose estimation
CN109658412A (en) * 2018-11-30 2019-04-19 湖南视比特机器人有限公司 It is a kind of towards de-stacking sorting packing case quickly identify dividing method
CN110007754A (en) * 2019-03-06 2019-07-12 清华大学 The real-time reconstruction method and device of hand and object interactive process
CN110197156A (en) * 2019-05-30 2019-09-03 清华大学 Manpower movement and the shape similarity metric method and device of single image based on deep learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150009415A1 (en) * 2013-07-04 2015-01-08 Canon Kabushiki Kaisha Projected user interface system for multiple users
CN107688391A (en) * 2017-09-01 2018-02-13 广州大学 A kind of gesture identification method and device based on monocular vision
CN109272513A (en) * 2018-09-30 2019-01-25 清华大学 Hand and object interactive segmentation method and device based on depth camera
CN109614882A (en) * 2018-11-19 2019-04-12 浙江大学 A violent behavior detection system and method based on human body pose estimation
CN109658412A (en) * 2018-11-30 2019-04-19 湖南视比特机器人有限公司 It is a kind of towards de-stacking sorting packing case quickly identify dividing method
CN110007754A (en) * 2019-03-06 2019-07-12 清华大学 The real-time reconstruction method and device of hand and object interactive process
CN110197156A (en) * 2019-05-30 2019-09-03 清华大学 Manpower movement and the shape similarity metric method and device of single image based on deep learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11335007B2 (en) * 2020-05-29 2022-05-17 Zebra Technologies Corporation Method to generate neural network training image annotations
CN112720504A (en) * 2021-01-20 2021-04-30 清华大学 Method and device for controlling learning of hand and object interactive motion from RGBD video

Similar Documents

Publication Publication Date Title
Kwon et al. H2o: Two hands manipulating objects for first person interaction recognition
Li et al. Connecting touch and vision via cross-modal prediction
Zhang et al. Adafuse: Adaptive multiview fusion for accurate human pose estimation in the wild
Zheng et al. Gimo: Gaze-informed human motion prediction in context
Jiang et al. Scaling up dynamic human-scene interaction modeling
Karunratanakul et al. A skeleton-driven neural occupancy representation for articulated hands
Sengan et al. Cost-effective and efficient 3D human model creation and re-identification application for human digital twins
Mei et al. Waymo open dataset: Panoramic video panoptic segmentation
Nazir et al. SemAttNet: Toward attention-based semantic aware guided depth completion
Chen et al. Mvhm: A large-scale multi-view hand mesh benchmark for accurate 3d hand pose estimation
Wang et al. Dual transfer learning for event-based end-task prediction via pluggable event to image translation
Reimat et al. Cwipc-sxr: Point cloud dynamic human dataset for social xr
Zanfir et al. Hum3dil: Semi-supervised multi-modal 3d humanpose estimation for autonomous driving
Zhou et al. Hemlets posh: Learning part-centric heatmap triplets for 3d human pose and shape estimation
Krejov et al. Combining discriminative and model based approaches for hand pose estimation
Li et al. MannequinChallenge: Learning the depths of moving people by watching frozen people
Jinka et al. Sharp: Shape-aware reconstruction of people in loose clothing
CN110007754B (en) Real-time reconstruction method and device for hand-object interaction process
Shimada et al. Decaf: Monocular deformation capture for face and hand interactions
CN111124107A (en) Hand and object complex interaction scene reconstruction method and device
Liu et al. Deep learning for 3d human pose estimation and mesh recovery: A survey
Choi et al. Handnerf: Learning to reconstruct hand-object interaction scene from a single rgb image
Kini et al. 3dmodt: Attention-guided affinities for joint detection & tracking in 3d point clouds
CN114973355A (en) Human face and mouth reconstruction method and device
KR101225644B1 (en) method for object recognition and pose estimation at robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200508