CN111124107A - Hand and object complex interaction scene reconstruction method and device - Google Patents
Hand and object complex interaction scene reconstruction method and device Download PDFInfo
- Publication number
- CN111124107A CN111124107A CN201911113777.8A CN201911113777A CN111124107A CN 111124107 A CN111124107 A CN 111124107A CN 201911113777 A CN201911113777 A CN 201911113777A CN 111124107 A CN111124107 A CN 111124107A
- Authority
- CN
- China
- Prior art keywords
- hand
- data
- prediction
- segmentation
- rgbd
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/006—Mixed reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Computer Graphics (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- User Interface Of Digital Computer (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method and a device for reconstructing a complex interaction scene of a hand and an object, wherein the method comprises the following steps: collecting an RGBD sequence of a hand-object interaction scene by using a single RGBD camera to obtain an RGBD image; the RGBD image is sent into a gesture prediction neural network for prediction, and left-hand gesture prediction data and right-hand gesture prediction data are obtained; sending the RGBD image into a segmentation recognition neural network to obtain left hand data, right hand data and segmentation data of different objects; and fusing the depth data and the color data of different objects obtained by segmentation into an object model to obtain a final object model. The method can reconstruct three-dimensional information of the interaction process from the sequence acquired by the single RGBD camera aiming at the complex interaction process of the human hand and the object, can obtain the gesture motion of the human hand and the surface of the object, and effectively solves the problems of high complexity and uncertain strength in the process of reconstructing the complex interaction process based on the single RGBD camera.
Description
Technical Field
The invention relates to the technical field of three-dimensional reconstruction, in particular to a method and a device for reconstructing a complex interaction scene of a hand and an object.
Background
In people's daily life, it is the most common practice for a person to interact with different objects in the environment using their hands. Rebuilding the interaction process of the human hand and the object has very important value for AR/VR, human-computer interaction and intelligent robots.
The process of hand-object interaction contains rich information. In many applications, such as the field of AR and intelligent robots, reconstruction of the interaction process between an hand and an object is required to obtain three-dimensional information of the interaction process. In reality, the process of interaction between the hand and the object is complex. The complexity of the method is represented by complex actions of hands, complex interactive objects and strong ambiguity caused by mutual shielding between the hands and the objects in the interactive process.
The reconstruction using a single RGBD camera has the advantage of simple system, but the use of a single camera also makes the viewing angle of observation single, limits the amount of effective information that can be obtained, and increases the difficulty of reconstruction. In summary, reconstructing the interactive process based on a single RGBD camera is a very meaningful and at the same time very challenging task.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, one object of the present invention is to provide a hand and object complex interaction scene reconstruction method, which can reconstruct three-dimensional information of an interaction process from a sequence acquired by a single RGBD camera for a human hand and object complex interaction process, can obtain gesture motion of the human hand and a surface of the object, and effectively solve the problems of high complexity and uncertainty in the reconstruction of the complex interaction process based on the single RGBD camera.
Another object of the present invention is to provide a hand and object complex interaction scene reconstruction apparatus.
In order to achieve the above object, an embodiment of the present invention provides a method for reconstructing a complex interaction scene between a hand and an object, including the following steps: collecting an RGBD sequence of a hand-object interaction scene by using a single RGBD camera to obtain an RGBD image; the RGBD image is sent into a gesture prediction neural network for prediction, and left-hand gesture prediction data and right-hand gesture prediction data are obtained; sending the RGBD image into a segmentation recognition neural network to obtain left hand data, right hand data and segmentation data of different objects; and fusing the depth data and the color data of different objects obtained by segmentation into an object model to obtain a final object model.
According to the method for reconstructing the complex interaction scene of the hand and the object, the reconstruction is performed based on a single RGBD camera, and the system is simple; not only the movement of the human hand but also the geometry and the movement of the object are reconstructed, and the information reconstruction is complete; and can handle the complicated interactive situation comprising both hands and many objects; the method is combined with a complex interaction process reconstruction scheme of a human hand posture estimation method, an object recognition segmentation method, a unified energy optimization method and a multi-object reconstruction method, and finally complete three-dimensional information of an interaction process is obtained, so that the three-dimensional information of the interaction process can be reconstructed from a sequence acquired by a single RGBD camera aiming at the complex interaction process of a human hand and an object, the posture motion of the human hand and the surface of the object can be obtained, and the problems of high complexity and uncertainty in the process of reconstructing the complex interaction process based on the single RGBD camera are effectively solved.
In addition, the hand and object complex interaction scene reconstruction method according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the acquiring the RGBD sequence includes: the RGB information and the D information are aligned.
Further, in an embodiment of the present invention, the RGBD image is fed into a gesture prediction neural network for prediction, which includes: and using an open source library OpenPose or training on the basis of the open source library to obtain the left-hand posture prediction data and the right-hand posture prediction data.
Further, in an embodiment of the present invention, the feeding the RGBD image into a segmentation recognition neural network includes: and training by using Mask-RCNN or on the basis of the Mask-RCNN to obtain the left-hand data, the right-hand data and the segmentation data of the different objects.
Further, in an embodiment of the present invention, the fusing the depth data and the color data of the different segmented objects into the object model includes: and performing unified optimization motion solution according to the left-hand posture prediction data, the right-hand posture prediction data, the left-hand data, the right-hand data and the segmentation data of different objects to obtain a final object surface reconstruction result.
In order to achieve the above object, an embodiment of another aspect of the present invention provides a device for reconstructing a complex interaction scene between a hand and an object, including: the acquisition module is used for acquiring an RGBD sequence of a hand-object interaction scene by using a single RGBD camera to obtain an RGBD image; the prediction module is used for sending the RGBD image into a gesture prediction neural network for prediction to obtain left-hand gesture prediction data and right-hand gesture prediction data; the segmentation module is used for sending the RGBD image into a segmentation recognition neural network to obtain left hand data, right hand data and segmentation data of different objects; and the fusion module is used for fusing the depth data and the color data of the different objects obtained by segmentation into the object model to obtain the final object model.
The hand and object complex interaction scene reconstruction device provided by the embodiment of the invention is based on a single RGBD camera for reconstruction, and the system is simple; not only the movement of the human hand but also the geometry and the movement of the object are reconstructed, and the information reconstruction is complete; and can handle the complicated interactive situation comprising both hands and many objects; the method is combined with a complex interaction process reconstruction scheme of a human hand posture estimation method, an object recognition segmentation method, a unified energy optimization method and a multi-object reconstruction method, and finally complete three-dimensional information of an interaction process is obtained, so that the three-dimensional information of the interaction process can be reconstructed from a sequence acquired by a single RGBD camera aiming at the complex interaction process of a human hand and an object, the posture motion of the human hand and the surface of the object can be obtained, and the problems of high complexity and uncertainty in the process of reconstructing the complex interaction process based on the single RGBD camera are effectively solved.
In addition, the hand and object complex interaction scene reconstruction device according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the acquisition module is further configured to align the RGB information and the D information.
Further, in an embodiment of the present invention, the prediction module is further configured to use an open source library openpos, or perform training on the basis of the open source library, to obtain the left-hand posture prediction data and the right-hand posture prediction data.
Further, in an embodiment of the present invention, the segmentation module is further configured to use Mask-RCNN or train on the basis of the Mask-RCNN to obtain the left-hand data, the right-hand data, and the segmentation data of the different objects.
Further, in an embodiment of the present invention, the fusion module is further configured to perform a unified optimization motion solution according to left-hand pose prediction data, right-hand pose prediction data, left-hand data, right-hand data, and segmentation data of different objects, so as to obtain the final object surface reconstruction result.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart of a method for reconstructing a complex interaction scene between a hand and an object according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for reconstructing a complex interaction scene of a hand and an object according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a hand-object complex interaction scene reconstruction apparatus according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The method and the device for reconstructing a complex interaction scene between a hand and an object according to an embodiment of the present invention are described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a method for reconstructing a complex interaction scene between a hand and an object according to an embodiment of the present invention.
As shown in FIG. 1, the method for reconstructing the complex interaction scene of the hand and the object comprises the following steps:
in step S101, an RGBD sequence of a scene where a hand and an object interact with each other is acquired by using a single RGBD camera, so as to obtain an RGBD image.
In one embodiment of the present invention, the RGBD sequence is acquired, and includes: the RGB information and the D information are aligned.
It is understood that the single RGBD camera may be a Realsense SR300 camera, and there are many types of single RGBD cameras, which are only used as an example to avoid redundancy, and are not limited in detail. For example, as shown in fig. 2, an embodiment of the present invention may acquire an RGBD sequence of a hand-object interaction scene using a Realsense SR300 camera. It should be noted that since the RGB information and the D information originate from two cameras, the RGB-D information needs to be aligned.
It should be noted that, in the embodiment of the present invention, an RGBD image with a resolution of 640 × 480 is used, but of course, images with other resolutions may also be used, and are not limited in particular.
In step S102, the RGBD image is sent to a gesture prediction neural network for prediction, so as to obtain left-hand gesture prediction data and right-hand gesture prediction data.
It can be understood that, as shown in fig. 2, the RGBD image is input into the gesture prediction neural network for prediction, so as to obtain left-hand and right-hand posture prediction data.
Further, in an embodiment of the present invention, the RGBD image is fed into a gesture prediction neural network for prediction, which includes: and (3) training by using an open source library OpenPose or on the basis of the open source library to obtain the left-hand posture prediction data and the right-hand posture prediction data.
It can be understood that, in order to estimate the hand posture information from the input RGBD image, the open source library OpenPose may be directly used, or further training may be performed on the basis of the open source library to adapt to the input of the system, so as to obtain more accurate predicted information of the hand posture.
Specifically, the embodiment of the invention trains a network capable of predicting the hand posture in the interaction process by using a deep neural network method. The existing research proves that the deep neural network has better performance in the estimation of the human posture and the hand posture, and a feasible solution which is closer to a true value can be obtained. The estimation of the hand posture in the interaction obtained by the deep neural network can provide a better initial solution for the visible hand part of the single camera and a reasonable solution for the invisible hand part.
In step S103, the RGBD image is sent to a segmentation recognition neural network, so as to obtain left-hand data, right-hand data, and segmentation data of different objects.
It can be understood that, as shown in fig. 2, the RGBD image is fed into the segmentation recognition neural network, and left-hand data, right-hand data, and segmentation data of different objects are obtained.
Further, in an embodiment of the present invention, the feeding the RGBD image into the segmentation recognition neural network includes: and (4) performing training by using Mask-RCNN or on the basis of the Mask-RCNN to obtain left-hand data, right-hand data and segmentation data of different objects.
It will be appreciated that in order to achieve example segmentation data obtained from the input RGBD image, Mask-RCNN may be used directly or further training may be performed on the basis thereof to obtain better results. The focus of this step is on the acquisition of training data.
Specifically, the embodiment of the invention trains a network capable of performing example segmentation on data acquired by a single RGBD camera by using a neural network method. By utilizing the segmentation recognition network, the collected left-hand and right-hand data and the data of a plurality of objects in the interactive process can be separated, and guidance is provided for solving the motion of the human hand and the plurality of objects in a unified energy optimization manner and reconstructing the surfaces of the plurality of objects.
In step S104, the depth data and color data of the different objects obtained by the segmentation are fused into the object model to obtain a final object model.
In one embodiment of the present invention, fusing the depth data and the color data of the different segmented objects into an object model includes: and performing unified optimization motion solution according to the left-hand posture prediction data, the right-hand posture prediction data, the left-hand data, the right-hand data and the segmentation data of different objects to obtain a final object surface reconstruction result.
It can be understood that, as shown in fig. 2, the embodiment of the present invention sends the predicted left-hand and right-hand postures, the left-hand and right-hand depth data obtained by segmentation, and the depth and color data of the object into the unified energy optimization framework for solution, so as to obtain the accurate posture of the hand and the motion of the plurality of objects. And on the basis of obtaining accurate object motion, fusing depth data and color data of different objects obtained by example segmentation into an object model to finally obtain a complete object model.
Specifically, on the basis of obtaining the gesture of the hand in interaction for rough gesture estimation and preliminary data segmentation, the accurate gesture of the hand and the motion of each object are obtained by utilizing unified energy optimization. And then fusing the data of each object obtained by segmentation by using the motion information obtained by solving. The gestural motion of the human hand and the complete surface of the plurality of objects in the interaction are finally reconstructed.
In summary, the hand and object complex interaction scene reconstruction method provided by the embodiment of the invention is based on reconstruction by a single RGBD camera, and the system is simple; not only the movement of the human hand but also the geometry and the movement of the object are reconstructed, and the information reconstruction is complete; and can handle the complicated interactive situation comprising both hands and many objects; the method is combined with a complex interaction process reconstruction scheme of a human hand posture estimation method, an object recognition segmentation method, a unified energy optimization method and a multi-object reconstruction method, and finally complete three-dimensional information of an interaction process is obtained, so that the three-dimensional information of the interaction process can be reconstructed from a sequence acquired by a single RGBD camera aiming at the complex interaction process of a human hand and an object, the posture motion of the human hand and the surface of the object can be obtained, and the problems of high complexity and uncertainty in the process of reconstructing the complex interaction process based on the single RGBD camera are effectively solved.
The hand and object complex interaction scene reconstruction device proposed by the embodiment of the invention is described next with reference to the attached drawings.
Fig. 3 is a schematic structural diagram of a hand-object complex interaction scene reconstruction apparatus according to an embodiment of the present invention.
As shown in fig. 3, the hand and object complex interaction scene reconstruction apparatus 10 includes: an acquisition module 100, a prediction module 200, a segmentation module 300, and a fusion module 400.
The acquisition module 100 is configured to acquire an RGBD sequence of a hand-object interaction scene by using a single RGBD camera to obtain an RGBD image. The prediction module 200 is configured to send the RGBD image to a gesture prediction neural network for prediction, so as to obtain left-hand gesture prediction data and right-hand gesture prediction data. The segmentation module 300 is configured to send the RGBD image to a segmentation recognition neural network, so as to obtain left-hand data, right-hand data, and segmentation data of different objects. The fusion module 400 is configured to fuse the depth data and the color data of different segmented objects into an object model to obtain a final object model. The device 10 provided by the embodiment of the invention can reconstruct three-dimensional information of the interaction process from the sequence acquired by a single RGBD camera aiming at the complex interaction process of the human hand and the object, can obtain the gesture movement of the human hand and the surface of the object, and effectively solves the problems of high complexity and uncertain strength in the process of reconstructing the complex interaction process based on the single RGBD camera.
Further, in an embodiment of the present invention, the acquisition module 100 is further configured to align the RGB information and the D information.
Further, in an embodiment of the present invention, the prediction module 200 is further configured to use an open source library openpos, or perform training on the basis of the open source library, to obtain left-hand posture prediction data and right-hand posture prediction data.
Further, in an embodiment of the present invention, the segmentation module 300 is further configured to use or train on the basis of the Mask-RCNN to obtain left-hand data, right-hand data, and segmentation data of different objects.
Further, in an embodiment of the present invention, the fusion module 400 is further configured to perform a unified optimization motion solution according to the left-hand posture prediction data, the right-hand posture prediction data, the left-hand data, the right-hand data, and the segmentation data of different objects, so as to obtain a final object surface reconstruction result.
It should be noted that the explanation of the foregoing embodiment of the method for reconstructing a complex interaction scene between an opponent and an object is also applicable to the apparatus for reconstructing a complex interaction scene between an opponent and an object in this embodiment, and is not repeated here.
According to the hand and object complex interaction scene reconstruction device provided by the embodiment of the invention, reconstruction is carried out based on a single RGBD camera, and the system is simple; not only the movement of the human hand but also the geometry and the movement of the object are reconstructed, and the information reconstruction is complete; and can handle the complicated interactive situation comprising both hands and many objects; the method is combined with a complex interaction process reconstruction scheme of a human hand posture estimation method, an object recognition segmentation method, a unified energy optimization method and a multi-object reconstruction method, and finally complete three-dimensional information of an interaction process is obtained, so that the three-dimensional information of the interaction process can be reconstructed from a sequence acquired by a single RGBD camera aiming at the complex interaction process of a human hand and an object, the posture motion of the human hand and the surface of the object can be obtained, and the problems of high complexity and uncertainty in the process of reconstructing the complex interaction process based on the single RGBD camera are effectively solved.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
Claims (10)
1. A hand and object complex interaction scene reconstruction method is characterized by comprising the following steps:
collecting an RGBD sequence of a hand-object interaction scene by using a single RGBD camera to obtain an RGBD image;
the RGBD image is sent into a gesture prediction neural network for prediction, and left-hand gesture prediction data and right-hand gesture prediction data are obtained;
sending the RGBD image into a segmentation recognition neural network to obtain left hand data, right hand data and segmentation data of different objects; and
and fusing the depth data and the color data of different objects obtained by segmentation into an object model to obtain a final object model.
2. The method of claim 1, wherein the acquiring the RGBD sequence comprises:
the RGB information and the D information are aligned.
3. The method of claim 1, wherein the RGBD image is fed into a gesture prediction neural network for prediction, and comprises:
and using an open source library OpenPose or training on the basis of the open source library to obtain the left-hand posture prediction data and the right-hand posture prediction data.
4. The method of claim 1, wherein the feeding the RGBD image into a segmentation recognition neural network comprises:
and training by using Mask-RCNN or on the basis of the Mask-RCNN to obtain the left-hand data, the right-hand data and the segmentation data of the different objects.
5. The method of claim 1, wherein fusing the segmented depth data and color data of different objects into an object model comprises:
and performing unified optimization motion solution according to the left-hand posture prediction data, the right-hand posture prediction data, the left-hand data, the right-hand data and the segmentation data of different objects to obtain a final object surface reconstruction result.
6. A hand and object complex interaction scene reconstruction device is characterized by comprising:
the acquisition module is used for acquiring an RGBD sequence of a hand-object interaction scene by using a single RGBD camera to obtain an RGBD image;
the prediction module is used for sending the RGBD image into a gesture prediction neural network for prediction to obtain left-hand gesture prediction data and right-hand gesture prediction data;
the segmentation module is used for sending the RGBD image into a segmentation recognition neural network to obtain left hand data, right hand data and segmentation data of different objects; and
and the fusion module is used for fusing the depth data and the color data of the different objects obtained by segmentation into the object model to obtain the final object model.
7. The apparatus of claim 6, wherein the capture module is further configured to align the RGB information and the D information.
8. The apparatus of claim 6, wherein the prediction module is further configured to use an open source library OpenPose, or train on the basis of the open source library, to obtain the left-hand pose prediction data and the right-hand pose prediction data.
9. The apparatus of claim 6, wherein the segmentation module is further configured to use or train on a basis of a Mask-RCNN to obtain the left-hand data, the right-hand data, and the segmentation data of the different objects.
10. The apparatus according to claim 6, wherein the fusion module is further configured to perform unified optimization motion solution according to left-hand pose prediction data, right-hand pose prediction data, left-hand data, right-hand data, and segmentation data of different objects, so as to obtain the final object surface reconstruction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911113777.8A CN111124107A (en) | 2019-11-14 | 2019-11-14 | Hand and object complex interaction scene reconstruction method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911113777.8A CN111124107A (en) | 2019-11-14 | 2019-11-14 | Hand and object complex interaction scene reconstruction method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111124107A true CN111124107A (en) | 2020-05-08 |
Family
ID=70495647
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911113777.8A Pending CN111124107A (en) | 2019-11-14 | 2019-11-14 | Hand and object complex interaction scene reconstruction method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111124107A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112720504A (en) * | 2021-01-20 | 2021-04-30 | 清华大学 | Method and device for controlling learning of hand and object interactive motion from RGBD video |
US11335007B2 (en) * | 2020-05-29 | 2022-05-17 | Zebra Technologies Corporation | Method to generate neural network training image annotations |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150009415A1 (en) * | 2013-07-04 | 2015-01-08 | Canon Kabushiki Kaisha | Projected user interface system for multiple users |
CN107688391A (en) * | 2017-09-01 | 2018-02-13 | 广州大学 | A kind of gesture identification method and device based on monocular vision |
CN109272513A (en) * | 2018-09-30 | 2019-01-25 | 清华大学 | Hand and object interactive segmentation method and device based on depth camera |
CN109614882A (en) * | 2018-11-19 | 2019-04-12 | 浙江大学 | A kind of act of violence detection system and method based on human body attitude estimation |
CN109658412A (en) * | 2018-11-30 | 2019-04-19 | 湖南视比特机器人有限公司 | It is a kind of towards de-stacking sorting packing case quickly identify dividing method |
CN110007754A (en) * | 2019-03-06 | 2019-07-12 | 清华大学 | The real-time reconstruction method and device of hand and object interactive process |
CN110197156A (en) * | 2019-05-30 | 2019-09-03 | 清华大学 | Manpower movement and the shape similarity metric method and device of single image based on deep learning |
-
2019
- 2019-11-14 CN CN201911113777.8A patent/CN111124107A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150009415A1 (en) * | 2013-07-04 | 2015-01-08 | Canon Kabushiki Kaisha | Projected user interface system for multiple users |
CN107688391A (en) * | 2017-09-01 | 2018-02-13 | 广州大学 | A kind of gesture identification method and device based on monocular vision |
CN109272513A (en) * | 2018-09-30 | 2019-01-25 | 清华大学 | Hand and object interactive segmentation method and device based on depth camera |
CN109614882A (en) * | 2018-11-19 | 2019-04-12 | 浙江大学 | A kind of act of violence detection system and method based on human body attitude estimation |
CN109658412A (en) * | 2018-11-30 | 2019-04-19 | 湖南视比特机器人有限公司 | It is a kind of towards de-stacking sorting packing case quickly identify dividing method |
CN110007754A (en) * | 2019-03-06 | 2019-07-12 | 清华大学 | The real-time reconstruction method and device of hand and object interactive process |
CN110197156A (en) * | 2019-05-30 | 2019-09-03 | 清华大学 | Manpower movement and the shape similarity metric method and device of single image based on deep learning |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11335007B2 (en) * | 2020-05-29 | 2022-05-17 | Zebra Technologies Corporation | Method to generate neural network training image annotations |
CN112720504A (en) * | 2021-01-20 | 2021-04-30 | 清华大学 | Method and device for controlling learning of hand and object interactive motion from RGBD video |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Connecting touch and vision via cross-modal prediction | |
Kwon et al. | H2o: Two hands manipulating objects for first person interaction recognition | |
CN113099208B (en) | Method and device for generating dynamic human body free viewpoint video based on nerve radiation field | |
Zhang et al. | Adafuse: Adaptive multiview fusion for accurate human pose estimation in the wild | |
Zheng et al. | Gimo: Gaze-informed human motion prediction in context | |
Mei et al. | Waymo open dataset: Panoramic video panoptic segmentation | |
Chen et al. | Mvhm: A large-scale multi-view hand mesh benchmark for accurate 3d hand pose estimation | |
CN114494610B (en) | Intelligent understanding system and device for real-time reconstruction of large scene light field | |
Wang et al. | Dual transfer learning for event-based end-task prediction via pluggable event to image translation | |
Kim et al. | Simvodis: Simultaneous visual odometry, object detection, and instance segmentation | |
Krejov et al. | Combining discriminative and model based approaches for hand pose estimation | |
JP2008191072A (en) | Three-dimensional shape restoration method, three-dimensional shape restoring device, three-dimensional shape restoration program implemented with the method, and recording medium with the program stored | |
CN110007754B (en) | Real-time reconstruction method and device for hand-object interaction process | |
CN111124107A (en) | Hand and object complex interaction scene reconstruction method and device | |
Reimat et al. | Cwipc-sxr: Point cloud dynamic human dataset for social xr | |
Li et al. | MannequinChallenge: Learning the depths of moving people by watching frozen people | |
Zhou et al. | Hemlets posh: Learning part-centric heatmap triplets for 3d human pose and shape estimation | |
CN112907737A (en) | Dynamic human body three-dimensional reconstruction method and device based on implicit function fusion | |
Zanfir et al. | Hum3dil: Semi-supervised multi-modal 3d humanpose estimation for autonomous driving | |
Jinka et al. | Sharp: Shape-aware reconstruction of people in loose clothing | |
Luvizon et al. | Scene‐Aware 3D Multi‐Human Motion Capture from a Single Camera | |
Zhang et al. | Roam: Robust and object-aware motion generation using neural pose descriptors | |
Tian et al. | Variational distillation for multi-view learning | |
Khan et al. | Towards monocular neural facial depth estimation: Past, present, and future | |
Shimada et al. | Decaf: Monocular deformation capture for face and hand interactions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200508 |
|
RJ01 | Rejection of invention patent application after publication |