WO2023082922A1 - 用于非连续观测情况下物体定位的方法、装置和存储介质 - Google Patents

用于非连续观测情况下物体定位的方法、装置和存储介质 Download PDF

Info

Publication number
WO2023082922A1
WO2023082922A1 PCT/CN2022/124913 CN2022124913W WO2023082922A1 WO 2023082922 A1 WO2023082922 A1 WO 2023082922A1 CN 2022124913 W CN2022124913 W CN 2022124913W WO 2023082922 A1 WO2023082922 A1 WO 2023082922A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
observation
interruption
present disclosure
similarity
Prior art date
Application number
PCT/CN2022/124913
Other languages
English (en)
French (fr)
Inventor
黎意枫
李广林
孔涛
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023082922A1 publication Critical patent/WO2023082922A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/14Transformations for image registration, e.g. adjusting or mapping for alignment of images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Definitions

  • the present disclosure relates to computer vision, including object localization in computer vision.
  • Dynamic object reconstruction and localization is a key task in computer vision and robotics, and its applications can range from autonomous navigation and augmented reality to robotic grasping and manipulation.
  • CAD computer-aided design
  • conventional approaches often ignore that everyday objects come in all shapes and sizes, and that CAD models may not be known or readily available.
  • the obtained object models may be segmented due to limited viewing angles or occlusions between objects, and sensors may not be able to observe multiple objects consecutively. During the interruption/loss of observations, the layout of the object may have changed dramatically. This will adversely affect object observation, reconstruction and localization.
  • a method for locating an object in a discontinuous observation scene comprising the following steps: acquiring an object model based on a reference image obtained when observation is resumed after interruption of observation; and obtaining an object model based on the acquired object model and the object Rebuild the model to enable object correlation before and after observation interruptions.
  • an object association device in a discontinuous observation scene including: a model acquisition unit configured to acquire an object model based on a reference image obtained when observation is resumed after interruption of observation; and an association unit , configured to reconstruct the model based on the acquired object model and the object, so as to realize object association before and after observation interruption.
  • an electronic device including: a memory; and a processor coupled to the memory, the processor configured to execute the instructions described in the present disclosure based on instructions stored in the memory. The method of any embodiment.
  • a computer-readable storage medium on which a computer program is stored, and the program, when executed by a processor, causes the method of any embodiment described in the present disclosure to be implemented.
  • a computer program product comprising instructions which, when executed by a processor, cause implementing the method of any one of the embodiments described in the present disclosure.
  • a computer program the program code included in the computer program causes the method of any embodiment described in the present disclosure to be implemented when executed by a computer.
  • Figure 1 schematically shows a discontinuous observation scenario.
  • FIG. 2A and 2B show a method for locating an object in a discontinuous observation scene according to an embodiment of the present disclosure
  • FIG. 2C shows a schematic diagram of an object matching process according to an embodiment of the present disclosure.
  • Fig. 3 shows an example of object association according to an embodiment of the present disclosure.
  • Fig. 4 shows an object localization device in a discontinuous observation scene according to an embodiment of the present disclosure.
  • Figure 5 shows a block diagram of some embodiments of an electronic device of the present disclosure.
  • FIG. 6 shows a block diagram of other embodiments of the electronic device of the present disclosure.
  • comprising and its variants used in the present disclosure mean an open term including at least the following elements/features but not excluding other elements/features, ie “including but not limited to”.
  • the term “comprising” and its variants used in the present disclosure mean an open term that includes at least the following elements/features but does not exclude other elements/features, namely “comprising but not limited to”. Thus, including is synonymous with containing.
  • the term “based on” means “based at least in part on”.
  • references throughout this specification to "one embodiment,” “some embodiments,” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments.”
  • appearances of the phrase “in one embodiment,” “in some embodiments,” or “in an embodiment” in various places throughout the specification are not necessarily all referring to the same embodiment, but may also refer to the same embodiment. Example.
  • Dynamic object reconstruction and localization are crucial for robots to understand their surroundings and manipulate objects in the environment during robotic manipulation.
  • reconstruction can help to model partially observed or occluded objects.
  • accurate pose estimation can improve the completeness and accuracy of object reconstruction.
  • FIG. 1 shows a scenario where discontinuous observations occur over time, where the layout of objects may change dramatically during the interruption/loss of observations.
  • (a) and (b) in Figure 1 show the object layout before and after observation interruption/loss, respectively.
  • the object shown in (b) is completely scrambled when observed after the interruption resumes. This will adversely affect object observation, reconstruction and localization. How to correlate objects in discontinuous observations under such circumstances and accurately locate objects in new scenes is quite a challenging problem.
  • the object location in the discontinuous observation scene is performed by acquiring the models of the objects before and after the interruption of observation.
  • the solution of the present disclosure can obtain the object model after resuming the observation when the discontinuous observation occurs, and realize the observation before and after the interruption based on the obtained object model and the object reconstruction model obtained according to the information before the interruption of observation. The correlation between the objects, so that the object positioning can be further accurately performed.
  • the object pose estimation can be further performed to perform object alignment, which is more conducive to the subsequent processing of the object.
  • object alignment For example, dynamic object reconstruction and pose estimation tasks can be robustly handled without CAD models and continuous observations, generating explicit point cloud models suitable for robotic object grasping.
  • FIG. 2A illustrates a method for object localization in a discontinuous observation scene according to some embodiments of the present disclosure.
  • an object model is acquired based on a reference image obtained when observation is resumed after interruption of observation; and in step S202, a model is reconstructed based on the acquired object model and object to realize object association before and after observation interruption.
  • the object model acquired based on the reference image is a model of the object in the observed scene after the observation recovery/scene change.
  • the object model obtained based on the reference image can be a model of various appropriate forms, which can contain/indicate/describe various attribute information of the object in the observed scene, including texture, structure, posture, color, etc. .
  • the object model is a point cloud model of the object, which may be in any suitable form.
  • the reference images used to generate the object model may be a predetermined number of images observed after restoration of observations.
  • the objects before and after the observation interruption can be correlated as soon as possible, and the object model can be generated by using a predetermined number of continuous images before the observation resumes.
  • the predetermined number should be as small as possible for the purpose of Realize object association quickly and efficiently.
  • the reference image is an image obtained from a single perspective during resuming observation, such as an initial image obtained during resuming observation, for example, the first frame image.
  • a reference image may be referred to as a query image.
  • a 2.5D instance point cloud may be obtained from a reference image as an object point cloud model, particularly a 2.5D instance point cloud from a starting image.
  • a 2.5D instance point cloud is back-projected from a depth image under the guidance of an instance segmentation network.
  • the observed reference image after recovery observation such as a single-view image
  • the obtained point cloud model of the object is essentially a point cloud model of an incomplete object or a part of the object, for example, it can belong to a part of the point cloud observed under a single perspective, which is an incomplete point cloud model .
  • the object point cloud model acquired from the reference image is also referred to as "partial object point cloud model" or "incomplete object point cloud model", and these expressions are synonymous in the context of the present disclosure.
  • the reconstructed model of the object may refer to the model of the object in the scene before the observation interruption/scene change, which can be used for cooperative processing with the object model obtained after the observation interruption/scene change to Realize the association between objects before and after observation interruption/scene change.
  • the reconstructed model of the object is obtained by reconstructing the model of the object based on continuous images of the object.
  • the consecutive images of the object are a predetermined number of consecutive observed images before observation is interrupted.
  • the reconstructed model of the object may be predetermined and stored in a suitable storage device.
  • model reconstruction is performed as successive observations of the object are made and stored in suitable storage means.
  • model reconstruction may be performed periodically, for example during continuous observation of objects.
  • the model reconstruction may be performed continuously, for example, the model reconstruction is performed every time a predetermined number of images are continuously observed. In this way, the pre-stored reconstructed object model can be directly invoked when the object localization/association operation starts in the discontinuous observation scene.
  • object reconstruction may be performed first, for example, a reconstruction model of the object is obtained based on a predetermined number of continuous observation images before the observation is interrupted, thereby based on the Rebuild the model for object positioning.
  • the reconstructed model of the object is any suitable type of object model, in particular, it may be a surfel-based model, a point cloud model, or other suitable object models. Similarly, it can also contain/indicate/describe various attribute information of objects in the observed scene, including texture, structure, pose, color, etc.
  • the continuous images include continuous RGB images and depth images of the object, also referred to as RGB-D images, and preferably, the model reconstructed based on the continuous images is a surfel-based model. Compared with the single-view point cloud model, the surfel-based model can obtain more comprehensive object attribute information and build a more complete model, thereby reducing or even eliminating the geometric and density errors of the object.
  • the model reconstruction of the object can be performed by various methods.
  • object-level model construction with a given RGB-D image as input can be achieved by introducing a learning-based instance segmentation method in SLAM to obtain a reconstruction model of the object, such as SLAM++, Fusion++, Co-Fusion in related technologies And MaskFusion, MID-Fusion, etc.
  • S preferably, based on MaskFusion, surfel is used to represent the object model during construction, and the surfel object model obtained in this way can reflect object features more accurately and comprehensively than the point cloud model, so that in The disclosed scheme is more efficient in operation than when applying point cloud models.
  • the association between objects before and after the interruption of observation may refer to matching between objects before and after the interruption of observation, that is, to find the corresponding relationship between objects before and after the interruption of observation, especially a one-to-one correspondence, Therefore, it is possible to correlate the objects before the interruption of observation with the objects re-observed after the interruption resumes, so as to facilitate subsequent operations.
  • object associations across different frames are studied. MOT focuses on tracking dynamic objects across consecutive frames. Most MOT methods rely on continuity assumptions (such as GIoU or Bayesian filtering) to perform data association, but fail when observations are discontinuous.
  • object models before and after observation interruption/loss in a discontinuous observation scene are used to implement association, so that object association can be efficiently implemented.
  • step S202 based on the obtained object model and the object reconstruction model to realize the object association before and after the observation interruption further includes: in step S2021, based on the object information, determine the relationship between the object point cloud model and the object reconstruction model , and in step S2022, the correlation between objects before and after the interruption of observation is performed based on the similarity. As shown in Figure 2B.
  • the object information may be various attribute information that characterizes the object, such as various attribute information that can be obtained from object observation results, for example, it may include at least one of object geometric features, object texture features, and object color features one.
  • object information may be extracted from an observed object image, an object model obtained from an object image, or the like.
  • the object information includes both geometric features and color features
  • the similarity between the object point cloud model and the object reconstruction model is determined based on both the geometric features and the color features.
  • similar objects tend to be similar in structure and texture. Since some objects have similar shapes or textures, it is difficult to distinguish them only by geometric information or only by color information. Therefore, the present disclosure proposes to use both geometric features and color features of objects as object information to determine model similarity.
  • both the geometric feature and the color feature of the object can be extracted from the object's colored point cloud obtained from the object observation image.
  • correlating objects before and after interruption of observation based on similarity further includes: determining a one-to-one correspondence between objects before and after interruption of observation based on similarity so as to associate objects before and after interruption of observation.
  • the one-to-one correspondence of objects before and after the interruption of observation is determined based on the maximum sum similarity between the acquired object model and the reconstructed model of the object.
  • Both the obtained object model and the object reconstruction model may contain parameter information of multiple objects, and the object parameters corresponding to the maximum similarity/maximum matching status may indicate the correspondence between the objects.
  • various appropriate algorithms may be used to determine the maximum matching condition between the acquired object model and the reconstructed object model to determine the correspondence between the objects.
  • FIG. 2C shows a schematic diagram of an object matching process according to an embodiment of the present disclosure.
  • a matching algorithm such as Sinkhorn's algorithm
  • the method 200 further includes step S203: aligning the objects before and after the interruption of the associated observation.
  • step S203 aligning the objects before and after the interruption of the associated observation.
  • Object pose estimation can be implemented in various appropriate ways, such as LatentFusion, a framework for 6D pose estimation of invisible objects, which proposes to solve the 6D pose of invisible objects by reconstructing the latent 3D representation of the object using sparse reference views estimate.
  • object pose estimation can also be implemented in other ways, which will not be described in detail here.
  • the associations before and after the interruption of observation/the poses of the corresponding objects are aligned.
  • Object alignment can be achieved by various suitable methods.
  • spatial transformation is used to align the pose of the object after the interruption of observation with that of the object before the interruption of observation.
  • the obtained object model such as the object point cloud model
  • the object reconstruction model such as the object surfel model
  • Object alignment can be implemented by various appropriate methods/algorithms, which will not be described in detail here.
  • the object point cloud model obtained from a single or a small number of reference images can be further corrected based on the object reconstruction model after observation restoration, so that the object model obtained after observation restoration can be more accurate. Integrity, which better reflects the status of the observed object. For example, compared with a single view, the solution according to the present disclosure can better obtain models of objects that are partially or fully blocked due to scene changes, observation interruptions, and the like.
  • the alignment operation is not necessary for the scheme of the present disclosure, that is to say, even without the alignment operation, the scheme of the present disclosure can still accurately and efficiently determine the correlation between objects before and after the interruption of observation by using the object model to efficiently realize object localization in discontinuous observation scenes.
  • Fig. 3 schematically shows three building blocks of an illustrative example of the present disclosure: object model reconstruction, object association, and object alignment.
  • object model reconstruction continuous frames can be used as input, and the object model can be reconstructed based on the SLAM system.
  • object association discontinuous frames are performed, that is, object-level data association between previous observation frames and new observation frames; in object alignment , to align object models with new observations via a point cloud registration network.
  • MaskFusion represents the object model in terms of surfels and performs camera tracking by aligning a given RGB-D image with a projection of the reconstructed model.
  • Mask (mask) R-CNN uses Mask (mask) R-CNN to obtain instance masks, which are fused into object-level construction for each instance mask.
  • the disclosure further trains a class-independent segmentation network, which can be combined with MaskFusion to perform dynamic object reconstruction of a wider category.
  • two sets of colored point clouds are used as input: the object reconstruction model M m and the 2.5D instance point cloud P n extracted from the reference image through back projection, and the respective geometric features and Color features to achieve association.
  • Color features analyze color distribution, and help distinguish different objects through statistical histograms. Specifically, using a 3D histogram of size (32, 32, 32) to calculate the RGB distribution. The three channels of the histogram represent red, green, and blue in the same way as an RGB image. The (256, 256, 256) color space in the image can be scaled to a smaller (32, 32, 32) color space to improve efficiency and robust performance to lighting changes in different scenes.
  • i, j, k can respectively represent the three-channel element coordinates in the three-dimensional histogram, and when the object model has the number of R, G, and B color elements of x, the corresponding histogram element value is x.
  • Feature extraction can be performed using various appropriate methods, such as various methods known in the art, which will not be described in detail here.
  • the multiscale grouped classification version of PointNet++ can be used in the feature extractor.
  • one-to-one matches are found between our reconstructed m objects M m and the 2.5D instance point cloud P n in the reference image.
  • the weighted sum S of S geo and S rgb is used to evaluate the similarity between these two sets.
  • is the L2 normalization function.
  • is a flattening function that converts a 3D histogram of color features into a vector.
  • can be any suitable value, such as 0.1.
  • the goal of the one-to-one matching problem is to find the correspondence with the largest total similarity between two sets. Specifically, one needs to find the maximum weight matching between m object models and n 2.5D instance point clouds. In this example it is expressed as an optimal transport theoretical model. Also, n and m may not be equal when some objects disappear or new objects appear. To handle these cases, slack variables are introduced in the formulation to find objects with no correspondence. Taking the case of m ⁇ n as an example, the n ⁇ n distance matrix D is defined as follows:
  • the transmission matrix is T, where T ij is the matching probability of M i and P j .
  • the matching problem can be formulated as:
  • Point cloud registration refers to the problem of finding a rigid transformation to align two given point clouds, such as the reconstruction model and the object model after observation recovery.
  • T ⁇ SE3 the transformation T ⁇ SE3 that aligns the two point clouds.
  • RPMNet a point cloud registration network that achieves state-of-the-art performance in partial, noisy and unseen point cloud registration tasks, can be used for each pre-matched set of point clouds.
  • further processing may be performed to optimize the alignment process before feeding the point cloud into the network.
  • filter using a filter with a radius of 0.005 and at least 16 neighbors.
  • the point cloud is then down-sampled with a voxel size of 0.01.
  • the point cloud is scaled to the unit sphere and translated to the origin.
  • we generate several hypotheses with initial angles of 90°, 180°, and 270° for each axis, which further reduces the sensitivity to the initial angle between the two point clouds. These operations can be performed based on the Open3D library, and of course can also be performed based on any other suitable library.
  • Fig. 4 shows an object association device in a discontinuous observation scene according to some embodiments of the present disclosure.
  • the device 400 includes: a model acquisition unit 401 configured to acquire an object model based on a reference image obtained when observation is resumed after interruption of observation; and an associating unit 402 configured to implement observation based on the acquired object model and object reconstruction model Interrupts the association of front and rear objects.
  • the associating unit 402 further includes: a similarity determining unit 4021 configured to determine the similarity between the object point cloud model and the object reconstruction model based on the object information, and the associating unit further determines based on the similarity Correlation between objects before and after interruption of observation is made.
  • the object information includes both geometric features and color features
  • the similarity determination unit is configured to determine the similarity between the object point cloud model and the object reconstruction model based on both the geometric features and the color features .
  • the associating unit 402 is further configured to: determine a one-to-one correspondence between the objects before and after the interruption of observation based on the similarity, so that the objects before and after the interruption of observation are associated.
  • the one-to-one correspondence between the objects before and after the interruption of observation is determined based on the maximum sum similarity between multiple object information and multiple object models.
  • determining the one-to-one correspondence between objects may be implemented by a matching unit, that is, the associating unit includes a matching unit configured to determine the one-to-one correspondence between objects before and after the interruption of observation based on similarity.
  • the operations of the similarity determining unit and the matching unit are all implemented by the associating unit itself.
  • the association device 400 may further include: an alignment unit 403 configured to align the objects before and after the interruption of the associated observation.
  • the aligning unit is configured to align the pose of the object after the observation is interrupted with the object before the observation is interrupted through spatial transformation.
  • the association device may further include a model reconstruction unit 404 configured to perform model reconstruction on the object based on the continuous images of the object.
  • the consecutive images of the object are a predetermined number of consecutive observed images before observation is interrupted.
  • the reconstructed model of the object is any one selected from the group consisting of a surfel-based model and a point cloud model.
  • the continuous images include continuous RGB images and depth images of the object, and the model reconstructed based on the continuous images is a surfel-based model. It should be noted that the model rebuilding unit may not be included in the associated device, and may be invoked by the associated device to perform model reconstruction during operation.
  • model reconstruction unit 404 is shown with dashed lines to indicate that the model reconstruction unit 404 can also be located outside the model training device 400 , eg in this case, the device 400 can still achieve the advantageous effects of the present disclosure as described above.
  • each of the above units may be implemented as an independent physical entity, or may also be implemented by a single entity (for example, a processor (CPU or DSP, etc.), an integrated circuit, etc.).
  • the above-mentioned units are shown with dotted lines in the drawings to indicate that these units may not actually exist, and the operations/functions realized by them may be realized by the processing circuit itself.
  • the device may also include a memory that can store various information generated in operation by the device, each unit included in the device, programs and data for operations, data to be transmitted by a communication unit, etc. .
  • the memory can be volatile memory and/or non-volatile memory.
  • memory may include, but is not limited to, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), read only memory (ROM), flash memory.
  • RAM random access memory
  • DRAM dynamic random access memory
  • SRAM static random access memory
  • ROM read only memory
  • flash memory flash memory
  • the device may also include a communication unit, which may be used to communicate with other devices.
  • the communication unit may be implemented in an appropriate manner known in the art, for example including communication components such as antenna arrays and/or radio frequency links, various types of interfaces, communication units and the like. It will not be described in detail here.
  • the device may also include other components not shown, such as a radio frequency link, a baseband processing unit, a network interface, a processor, a controller, and the like. It will not be described in detail here.
  • the scheme according to the present disclosure can form an object observation system alone or in combination with any existing object observation scheme, which can be used to observe objects, such as continuous observation, and when observation is resumed after interruption/loss, Object localization in discrete observation situations according to the present disclosure is performed.
  • continuous RGB-D video frames may be used as input to reconstruct an object model in a scene following object observation, which may be performed by a dynamic object reconstruction module.
  • the association module obtains the newly observed 2.5D instance point cloud, and then evaluates the similarity between the reconstructed object model and the 2.5D instance point cloud in the new observation to find the observation interruption/scene change One-to-one correspondence between front and back.
  • the system according to the present disclosure is also well applicable to various application tasks, especially robotic grasping tasks.
  • a UR5 robotic arm with a robotiq-2f-85 gripper and a Realsense D435i RGB-D camera on the wrist is used as the hardware platform.
  • the system according to the present disclosure can help the robot to grasp the task in the following steps: a) The robotic arm scans the messy objects placed on the desktop, and during the scanning process, the unknown object model is reconstructed by the system according to the present disclosure. b) The robotic arm acquires the query image from a single viewpoint, and uses the system according to the present disclosure to align the reconstructed object model with the object in the query image, and obtain an aligned object point cloud.
  • the aligned object point cloud output by the system according to the present disclosure is more complete, for example, an object that is completely or partially occluded can still be properly positioned, so that the grasp pose generation module can correct the occlusion of the object Partial or occluded objects generate grabs.
  • this disclosure mainly considers the completely new task of dynamic object reconstruction and localization without continuous observation and known CAD model prior, and proposes a new system to perform dynamic object-level reconstruction, multiple Object association and alignment.
  • Systems according to the present disclosure are general and state-of-the-art in various tasks such as model-free object pose estimation, model alignment-based single-view object completion, and dynamic multi-object robotic grasping.
  • FIG. 5 shows a block diagram of some embodiments of an electronic device of the present disclosure.
  • the electronic device 5 can be various types of devices, such as but not limited to mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablet Computers), Mobile terminals such as PMPs (Portable Multimedia Players), vehicle-mounted terminals (eg, vehicle-mounted navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like.
  • PDAs Personal Digital Assistants
  • PADs Tablett Computers
  • PMPs Portable Multimedia Players
  • vehicle-mounted terminals eg, vehicle-mounted navigation terminals
  • stationary terminals such as digital TVs, desktop computers, and the like.
  • the electronic device 5 may include a display panel for displaying data and/or execution results utilized in the solutions according to the present disclosure.
  • the display panel can be in various shapes, such as a rectangular panel, an oval panel, or a polygonal panel, and the like.
  • the display panel can be not only a flat panel, but also a curved panel, or even a spherical panel.
  • the electronic device 5 of this embodiment includes: a memory 51 and a processor 52 coupled to the memory 51 .
  • the components of the electronic device 50 shown in FIG. 5 are exemplary rather than limiting, and the electronic device 50 may also have other components according to actual application requirements.
  • Processor 52 may control other components in electronic device 5 to perform desired functions.
  • memory 51 is used to store one or more computer readable instructions.
  • processor 52 is used to execute computer-readable instructions, the computer-readable instructions are executed by the processor 52 to implement the method according to any of the foregoing embodiments.
  • the specific implementation and related explanations of each step of the method reference may be made to the above-mentioned embodiments, and repeated descriptions will not be repeated here.
  • processor 52 and the memory 51 may communicate with each other directly or indirectly.
  • processor 52 and memory 51 may communicate via a network.
  • a network may include a wireless network, a wired network, and/or any combination of a wireless network and a wired network.
  • the processor 52 and the memory 51 may also communicate with each other through the system bus, which is not limited in the present disclosure.
  • the processor 52 can be embodied as various suitable processors, processing devices, etc., such as a central processing unit (CPU), a graphics processing unit (Graphics Processing Unit, GPU), a network processor (NP), etc.; Signal Processor (DSP), Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other Programmable Logic Devices, Discrete Gate or Transistor Logic Devices, Discrete Hardware Components.
  • the central processing unit (CPU) may be an X86 or ARM architecture or the like.
  • memory 51 may include any combination of various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the memory 51 may include, for example, a system memory, and the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs. Various application programs, various data, and the like can also be stored in the storage medium.
  • a system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs.
  • Various application programs, various data, and the like can also be stored in the storage medium.
  • FIG. 6 is a block diagram illustrating an example structure of a computer system employable in a computer system according to an embodiment of the present disclosure.
  • a central processing unit (CPU) 601 executes various processes according to programs stored in a read only memory (ROM) 602 or loaded from a storage section 608 to a random access memory (RAM) 603 .
  • ROM read only memory
  • RAM random access memory
  • the central processing unit is only exemplary, and it may also be other types of processors, such as the various processors mentioned above.
  • ROM 602, RAM 603 and storage portion 608 may be various forms of computer-readable storage media, as described below. It should be noted that although ROM 602, RAM 603 and storage device 608 are shown separately in FIG. 6, one or more of them may be combined or located in the same or different memory or storage modules.
  • the CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604.
  • the input/output interface 605 is also connected to the bus 604 .
  • the following components are connected to the input/output interface 605: an input section 606, such as a touch screen, touch pad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; an output section 607, including a display, such as a cathode ray tube (CRT ), a liquid crystal display (LCD), a speaker, a vibrator, etc.; a storage section 608 including a hard disk, a magnetic tape, etc.; and a communication section 609 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 609 allows communication processing to be performed via a network such as the Internet. It is easy to understand that although it is shown in FIG. 6 that each device or module in the electronic device 600 communicates through the bus 604, they may also communicate through a network or other methods, wherein the network may include a wireless network, a wired network , and/or any combination of wireless and wired networks.
  • a driver 610 is also connected to the input/output interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read therefrom is installed into the storage section 608 as necessary.
  • the programs constituting the software can be installed from a network such as the Internet or a storage medium such as the removable medium 611 .
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program including program code for performing a method according to an embodiment of the present disclosure.
  • the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602.
  • the computer program is executed by the CPU 601, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
  • a computer-readable medium may be a tangible medium that may contain or store information for use by or in conjunction with an instruction execution system, device, or device. program.
  • a computer readable medium may be a computer readable signal medium or a computer readable storage medium or any combination of the two.
  • a computer-readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof.
  • Computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein.
  • Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • a computer program including: instructions, which when executed by a processor cause the processor to execute the method of any one of the above embodiments.
  • instructions may be embodied as computer program code.
  • the computer program codes for performing the operations of the present disclosure may be written in one or more programming languages or a combination thereof, the above-mentioned programming languages include but not limited to object-oriented programming languages, Such as Java, Smalltalk, C++, also includes conventional procedural programming languages, such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet connection any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • modules, components or units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of a module, component or unit does not constitute a limitation on the module, component or unit itself under certain circumstances.
  • exemplary hardware logic components include: Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Application Specific Standard Product (ASSP), System on Chip (SOC), Complex Programmable Logical device (CPLD) and so on.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • ASSP Application Specific Standard Product
  • SOC System on Chip
  • CPLD Complex Programmable Logical device
  • a method for locating an object in a discontinuous observation scene comprising the following steps: acquiring an object model based on a reference image obtained when observation is resumed after interruption of observation; and obtaining an object model based on the obtained object model and object Rebuild the model to enable object correlation before and after observation interruptions.
  • the reconstructed model of the object is obtained by reconstructing the model of the object based on continuous images of the object.
  • the consecutive images of the object are a predetermined number of consecutive observed images before observation is interrupted.
  • the reconstructed model of the object is any one selected from the group consisting of a surfel-based model and a point cloud model.
  • the continuous images include continuous RGB images and depth images of the object, and the model reconstructed based on the continuous images is a surfel-based model.
  • the object model obtained based on the reference image is a point cloud model of the object.
  • the reference image is the initial image obtained when the observation is resumed, and the 2.5D instance point cloud is obtained from the initial image as the object point cloud model.
  • the obtained object model and the object reconstruction model to realize the object association before and after the observation interruption further includes: determining the similarity between the object point cloud model and the object reconstruction model based on the object information, and based on the similarity Correlation between objects before and after interruption of observation is made.
  • the object information includes at least one of object geometric features, object texture features, and object color features. In some embodiments, the object information includes both geometric features and color features, and the similarity between the object point cloud model and the object reconstruction model is determined based on both the geometric features and the color features.
  • correlating objects before and after interruption of observation based on similarity further includes: determining a one-to-one correspondence between objects before and after interruption of observation based on similarity so as to associate objects before and after interruption of observation.
  • the one-to-one correspondence between the objects before and after the interruption of observation is determined based on the maximum sum similarity between multiple object information and multiple object models.
  • the method further comprises: aligning objects before and after the associated interruption of observation.
  • spatial transformation is used to align the pose of the object after the interruption of observation with that of the object before the interruption of observation.
  • an object association device in a discontinuous observation scene including: a model acquisition unit configured to acquire an object model based on a reference image obtained when observation is resumed after interruption of observation; and an association unit, It is configured to reconstruct the model based on the acquired object model and the object to realize object association before and after observation interruption.
  • the associating unit further includes: a similarity determining unit configured to determine the similarity between the object point cloud model and the object reconstruction model based on the object information, and the associating unit further performs observation based on the similarity Interrupt the association between objects before and after.
  • the object information includes both geometric features and color features
  • the similarity determination unit is configured to determine the similarity between the object point cloud model and the object reconstruction model based on both the geometric features and the color features .
  • the associating unit is further configured to: determine a one-to-one correspondence between the objects before and after the interruption of observation based on the similarity, so that the objects before and after the interruption of observation are associated.
  • the one-to-one correspondence between the objects before and after the interruption of observation is determined based on the maximum sum similarity between multiple object information and multiple object models.
  • the associating device may further include: an alignment unit configured to align the objects before and after the interruption of the associated observation.
  • the aligning unit is configured to align the pose of the object after the observation is interrupted with the object before the observation is interrupted through spatial transformation.
  • the association device may further include a model reconstruction unit configured to perform model reconstruction on the object based on the continuous images of the object.
  • the consecutive images of the object are a predetermined number of consecutive observed images before observation is interrupted.
  • the reconstructed model of the object is any one selected from the group consisting of a surfel-based model and a point cloud model.
  • the continuous images include continuous RGB images and depth images of the object, and the model reconstructed based on the continuous images is a surfel-based model.
  • an electronic device including: a memory; and a processor coupled to the memory, the memory storing instructions, the instructions, when executed by the processor, Making the electronic device execute the method of any embodiment described in the present disclosure.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method of any embodiment described in the present disclosure is implemented.
  • a computer program including: instructions, which when executed by a processor cause the processor to perform the method of any embodiment described in the present disclosure.
  • a computer program product comprising instructions which, when executed by a processor, implement the method of any one of the embodiments described in the present disclosure.
  • a computer program the program code included in the computer program causes the method of any embodiment described in the present disclosure to be implemented when executed by a computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本公开涉及用于非连续观测情况下物体定位的方法、装置和存储介质。提出了一种不连续观测场景中的物体定位方法,包括基于观测中断之后恢复观测时获得的基准图像获取物体模型,其中物体模型是观测恢复后场景中物体的模型;以及基于所获取的物体模型与物体重建模型以实现观测中断前后物体关联,其中物体重建模型是观测中断之前场景中物体的模型。

Description

用于非连续观测情况下物体定位的方法、装置和存储介质
相关申请的交叉引用
本申请是以申请号为202111349093.5、申请日为2021年11月15日的中国申请为基础,并主张其优先权,该中国申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及计算机视觉,包括计算机视觉中的物体定位。
背景技术
动态物体重建和定位是计算机视觉和机器人领域的一项关键任务,其应用范围可以包括从自主导航、增强现实到机器人抓取和操作的各种应用场景。相关技术中,要么依赖于物体的计算机辅助设计(CAD)模型,要么需要连续观测来处理动态物体的重建和定位。然而,常规方法往往忽略了日常物品的形状和大小各不相同,而且计算机辅助设计模型可能未知或不容易获得。在实践中,所获得的物体模型可能由于有限的视角或物体间遮挡而被分割,并且传感器可能无法连续观测多个物体。在观测中断/丢失期间,物体的布局可能发生了巨大的变化。这样会对于物体观测、重建和定位造成不利影响。
发明内容
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
根据本公开的一些实施例,提供了一种不连续观测场景中的物体定位方法,包括以下步骤:基于观测中断之后恢复观测时获得的基准图像获取物体模型;以及基于所获取的物体模型与物体重建模型以实现观测中断前后物体关联。
根据本公开的另一些实施例,提供了一种不连续观测场景中的物体关联装置,包括:模型获取单元,被配置为基于观测中断之后恢复观测时获得的基准图像获取物体模型;以及关联单元,被配置为基于所获取的物体模型与物体重建模型以实现观测中断前后物体关联。
根据本公开的一些实施例,提供一种电子设备,包括:存储器;和耦接至存储器的处理器,所述处理器被配置为基于存储在所述存储器中的指令,执行本公开中所述的任一实施例的方法。
根据本公开的一些实施例,提供一种计算机可读存储介质,其上存储有计算机程序,该程序在被处理器执行时导致实现本公开中所述的任一实施例的方法。
根据本公开的一些实施例,提供一种计算机程序产品,包括指令,该指令在由处理器执行时导致实现本公开中所述的任一实施例的方法。
根据本公开的一些实施例,提供了一种计算机程序,所述计算机程序包括的程序代码在由计算机执行时导致实现本公开中所述的任一实施例的方法。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征、方面及其优点将会变得清楚。
附图说明
下面参照附图说明本公开的优选实施例。此处所说明的附图用来提供对本公开的进一步理解,各附图连同下面的具体描述一起包含在本说明书中并形成说明书的一部分,用于解释本公开。应当理解的是,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开构成限制。在附图中:
图1示意性示出了不连续观测场景。
图2A和2B示出了根据本公开的实施例的不连续观测场景中的物体定位方法,图2C示出了根据本公开的实施例的物体匹配过程示意图。
图3示出了根据本公开的实施例的物体关联示例。
图4示出了根据本公开的实施例的不连续观测场景中的物体定位装置。
图5示出本公开的电子设备的一些实施例的框图。
图6示出本公开的电子设备的另一些实施例的框图。
应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不一定是按照实际的比例关系绘制的。在各附图中使用了相同或相似的附图标记来表示相同或者相似的部件。因此,一旦某一项在一个附图中被定义,则在随后的附图中可能不再对其进行进一步讨论。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,但是显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对实施例的描述实际上也仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值应被解释为仅仅是示例性的,不限制本公开的范围。
本公开中使用的术语“包括”及其变型意指至少包括后面的元件/特征、但不排除其他元件/特征的开放性术语,即“包括但不限于”。此外,本公开使用的术语“包含”及其变型意指至少包含后面的元件/特征、但不排除其他元件/特征的开放性术语,即“包含但不限于”。因此,包括与包含是同义的。术语“基于”意指“至少部分地基于”。
整个说明书中所称“一个实施例”、“一些实施例”或“实施例”意味着与实施例结合描述的特定的特征、结构或特性被包括在本公开的至少一个实施例中。例如,术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。而且,短语“在一个实施例中”、“在一些实施例中”或“在实施例中”在整个说明书中各个地方的出现不一定全都指的是同一个实施例,但是也可以指同一个实施例。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。除非另有指定,否则“第一”、“第二”等概念并非意图暗示如此描述的对象必须按时间上、空间上、排名上的给定顺序或任何其他方式的给定顺序。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中所交互的数据、消息或者信息的名称仅用于说明性的目的,而并不是用于对这些数据、消息或信息的范围进行限制。
动态物体重建和定位对于在机器人操作中机器人理解周围环境和操纵环境中的物体至关重要。一方面,重建可以帮助对部分观测的或被遮挡的物体完成建模。另一 方面,精确的姿态估计可以提高物体重建的完成度和准确性。
目前的一些工作通过在同步定位与地图构建(SLAM)系统中引入额外的分割网络模块来区分感兴趣的物体,从而实现动态物体重建。在这些工作中,要么假设物体类别是已知的,要么要求持续观测。然而,在真实的机器人操作过程中,这些假设可能无法得到保证。而当前的物体姿态估计方法主要依赖于已知的计算机辅助设计(CAD)模型或者需要大量的代价来扫描物体以获得高质量的模型。此外,这些方法可能需要为每个物体或类别训练新的权重,这限制了可推广性,显然不适合现实场景。
而且在实践中会发生观测中断/丢失而导致不连续观测的场景,包括但不限于场景变换、物体遮挡、物体出入等等可能引起观测场景发生变化的情景。在观测中断/丢失期间,物体的布局可能发生了巨大的变化。图1示出了随时间发生不连续观测的场景,其中在观测中断/丢失期间,物体的布局可能发生了巨大的变化。图1中(a)和(b)分别示出了观测中断/丢失前后的物体布局。与(a)相比,(b)中所示的中断恢复后观测时物体完全被打乱了。这样会对于物体观测、重建和定位造成不利影响。在此情况下如何在不连续的观测中关联物体,并在新场景中准确定位物体是一个相当具有挑战性的问题。
鉴于此,我们提出了一种改进的方案,能够在没有连续观测和已知的计算机辅助设计模型的情况下进行动态物体定位。
在实际应用场景中,如机器人操作和抓取任务中,由于视角受限或物体之间相互遮挡,无法保证对场景中物体的连续观察。在不连续观测中,物体的空间和运动连续性无法保证,但大多数刚性物体模型在纹理和结构上不会发生变化。因此,刚性物体模型对于许多应用来说是必不可少的,并且已经成为不同观测之间必不可少的关联。因而,本公开的方案通过获取观测中断前后的物体的模型来进行不连续观测场景中物体的定位。特别地,本公开的方案能够在不连续观测状况发生的情况下,在恢复观测之后获取物体模型,并且基于所获取的物体模型与根据观测中断之前的信息获得的物体重建模型来实现观测中断前后的物体之间的关联,由此能够进一步精确地进行物体定位。
此外,本公开的方案中还可在进行了物体关联之后,还可以进一步进行物体姿态估计,以进行物体对齐以更加有利于物体的后续处理。例如,无需CAD模型和连续观测即可鲁棒地处理动态物体重建和姿态估计任务,生成适合机器人物体抓取的显式点云模型。
以下将结合附图详细描述根据本公开的实施例,尤其是涉及在不连续观测场景下的物体关联、对齐的物体定位。
图2A示出了根据本公开的一些实施例的不连续观测场景中的物体定位方法。在方法200中,在步骤S201,基于观测中断之后恢复观测时获得的基准图像获取物体模型;以及在步骤S202,基于所获取的物体模型与物体重建模型以实现观测中断前后物体关联。
根据本公开的一些实施例,基于基准图像获取的物体模型为该观测恢复/场景变化之后观测场景中物体的模型。在一些实施例中,基于基准图像获取的物体模型可以是各种适当形式的模型,其能够包含/指示/描述在观测场景中物体的各种属性信息,包括纹理、结构、姿态、颜色等等。在一些实施例中,物体模型是物体点云模型,其可以是任何适当的形式。
在一些实施例中,用于生成物体模型的基准图像可以是在观测恢复之后观测到的预定数量的图像。优选地,为了尽快地实现观测恢复之后的物体定位,将观测中断前后的物体之间尽快关联,可以将观测恢复后的前预定数量的连续图像来生成物体模型,该预定数量应尽量小以旨在快速且高效地实现物体关联。特别地,该基准图像是在恢复观测时单视角获得的图像,诸如可以是恢复观测时获得的起始图像,例如第一帧图像。在一些实例中,基准图像可被称为查询图像。
在一些实施例中,可以从基准图像获取2.5D实例点云作为物体点云模型,特别地从起始图像获取2.5D实例点云。作为示例,在实例分割网络的指导下,从深度图像反投影得到2.5D实例点云。特别地,在本公开的实施例中,由于可能的物体布置、遮挡等,在恢复观测后的观测到的基准图像,例如单视角图像,可能无法完整地反映出场景中的所有物体,甚至只能反映出物体的一部分,因而所获取的物体点云模型实质上是不完整物体或者部分物体的点云模型,例如可属于在单视角下观测到的部分点云,是不完整的点云模型。因此,在本文中,从该基准图像获取的物体点云模型也指的是“部分物体点云模型”或“不完整物体点云模型”,这些表述在本公开上下文中是同义的。
根据本公开的一些实施例,物体的重建模型可指的是观测中断/场景改变之前的场景中的物体的模型,其能够用于与观测中断/场景改变之后获得的物体模型进行协作处理,以实现观测中断/场景改变前后的物体之间的关联。在一些实施例中,物体的重建模型是通过基于物体的连续图像对物体进行模型重建而得到的。在一些实施例中,物体的连续图像是在观测中断之前预定数量的连续观测图像。
在一些实施例中,物体的重建模型可以是预先确定的,并且被存储在适当的存储装置中。特别的,模型重建是随着物体连续观测而执行的,并且存储在适当的存储装置中。例如,模型重建可以是周期性进行的,例如在物体连续观测中周期性地进行模型重建。此外,模型重建可以是连续进行的,例如每连续观测到预定数量的图像就进行模型重建。这样,在不连续观测场景中物体定位/关联操作开始时可以直接调用预先存储的重建物体模型。在另一些实施例中,当在不连续观测场景中物体定位/关联操作开始时,可以先执行物体重建,例如基于观测中断之前预定数量的连续观测图像来获取物体的重建模型,由此基于该重新建模型进行物体定位。
在一些实施例中,物体的重建模型是任何适当类型的物体模型特别地,可以是基于面元的模型、点云模型、或其它合适的物体模型。类似地,其也能够包含/指示/描述在观测场景中物体的各种属性信息,包括纹理、结构、姿态、颜色等等。在一些实施例中,连续图像包括物体的连续RGB图像和深度图像,也被称为RGB-D图像,并且优选地,基于连续图像重建的模型是基于面元的模型。与基于单视角点云模型相比,基于面元的模型能够获取更全面的物体属性信息,构建更加完整的模型,从而降低甚至消除物体的几何和密度误差。
在本公开的方案中,物体的模型重建可以采用各种方法来执行。作为示例,可以通过在SLAM中引入基于学习的实例分割方法来实现给定RGB-D图像作为输入的物体级模型构建,以获取物体的重建模型,例如相关技术中的SLAM++,Fusion++,Co-Fusion和MaskFusion,MID-Fusion等。在一些实施例S中,优选地,基于MaskFusion,在构建时利用面元(surfel)来表示物体模型,这样得到的面元物体模型能够比点云模型更加准确、全面地反映物体特征,使得在操作时本公开的方案比应用点云模型时更加高效。
根据本公开的一些实施例,观测中断前后的物体之间关联可指的是观测中断前后的物体之间实现匹配,即找到观测中断前后的物体之间的对应关系,尤其是一一对应关系,从而能够将观测中断前的物体与中断恢复后重新观测到的物体相关联,以有助于后续操作。在不同的任务中,通常在多目标跟踪(MOT)中,研究了跨不同帧的对象关联。MOT专注于跨连续帧跟踪动态对象。大多数MOT方法依赖于连续性假设(如GIoU或贝叶斯滤波)来执行数据关联,但是当观测不连续时会失败。本公开的实施例中利用了不连续观测场景中观测中断/丢失前后的物体模型来实现关联,从而能够高效地实现物体关联。
在一些实施例中,在步骤S202中,基于所获取的物体模型与物体重建模型以实现观测中断前后物体关联进一步包括:在步骤S2021,基于物体信息来确定物体点云模型和物体重建模型之间的相似性,并且在步骤S2022,基于相似性来进行观测中断前后的物体之间关联。如图2B所示。
在一些实施例中,物体信息可以是表征物体的各种属性信息,例如是能够从物体观测结果中获取的各种属性信息,例如可以包括物体几何特征、物体纹理特征、物体颜色特征中的至少一个。作为示例,物体信息可以是从观测的物体图像、从物体图像获取的物体模型等中提取得出的。
在一些实施例中,所述物体信息包括几何特征和颜色特征两者,并且基于几何特征和颜色特征两者来确定物体点云模型和物体重建模型之间的相似性。特别地,在视觉感知中,相似的物体往往在结构和纹理方面是相似的。由于一些物体具有相似的形状或纹理,很难仅通过几何信息或仅通过颜色信息来区分它们。因此,本公开提出了利用物体的几何特征和颜色特征两者作为物体信息来确定模型相似性。作为示例,物体的几何特征和颜色特征两者可以从物体观测图像得到物体的有色点云中提取得到。
在一些实施例中,基于相似性来进行观测中断前后的物体之间关联进一步包括:基于相似性来确定观测中断前后的物体之间的一一对应关系以使得观测中断前后的物体之间关联。在一些实施例中,基于所获取的物体模型与物体重建模型之间的最大总和相似性来确定观测中断前后物体的一一对应关系。所获取的物体模型和物体重建模型都可包含多个物体的参数信息,对应于最大相似性/最大匹配状况的物体参数可指示物体之间对应。作为示例,可以采用各种适当的算法来确定所获取的物体模型与物体重建模型之间的最大匹配状况以确定物体之间的对应关系。
图2C示出了根据本公开的实施例的物体匹配过程示意图。首先,从观测中断前的物体重建模块和观测恢复后的所获取的物体模块中,分别提取它们分别由几何和颜色组成的混合特征。然后基于这些特征估计两个集合模型的相似性。最后,我们使用适当的算法,匹配算法,例如Sinkhorn算法,来寻找两个集合之间具有最大总相似度的一对一对应关系。
在一些实施例中,所述方法200进一步包括步骤S203:将关联的观测中断前后的物体进行对齐。在模型和2.5D实例点云关联之后,我们获得了新场景中物体模型的粗略位置信息,也即是说,所观测的各个物体在观测中断之后恢复观测的场景中的各自的位置信息。但是,物体相对于相机的姿态可能已经发生了很大变化。因此,需要将 物体与新场景对齐,以便更好地估计物体姿态。物体姿态估计的目的是估计物体的朝向和变换,这对于机器人操作至关重要。物体姿态估计可以采用各种适当的方式来实现,例如一种用于不可见物体的6D姿态估计的框架LatentFusion,其提出通过使用稀疏参考视图重建物体的潜在3D表示来解决不可见物体的6D姿态估计。当然,物体姿态估计还可以通过其他方式来实现,这里将不再详细描述。
在一些实施例中,在获取了观测中断前后的物体关联之后,将观测中断前后的关联/相对应的物体之间的姿态进行对齐。物体对齐可以通过各种适当方法来实现。在一些实施例中,通过空间变换来将观测中断后的物体与观测中断之前的物体的姿态进行对齐。作为示例,可通过特定变换将所获取的物体模型,例如物体点云模型,与物体重建模型,例如物体面元模型进行对齐,从而实现物体对齐。物体对齐可通过各种适当的方法/算法来实现,这里将不再详细描述。由此,通过根据本公开的物体关联和对齐,能够在观测恢复之后基于物体重建模型来进一步修正从单个或者少量基准图像获取的物体点云模型,从而使得观测恢复后所获取的物体模型能够更加完整,更好地反映出观测物体状况,例如与单个视图相比,根据本公开的方案能够更好地获取场景变换、观测中断等而被部分或全部遮挡的物体的模型。
应指出,对齐操作对于本公开的方案而言并不是必需的,也就是说,即使不进行对齐操作,本公开的方案仍能够通过利用物体模型来准确高效地确定观测中断前后物体之间的关联性,来高效地实现不连续观测场景中的物体定位。
以下将结合图3来描述根据本公开的实施例的不连续观察场景下的物体定位的实例。
本公开旨在是在没有已知的计算机辅助设计模型和连续观测的情况下重建和定位动态物体,提出了利用物体模型、特别是物体重建模型和观测中断恢复后获取的物体模型来解决此问题。图3示意性地示出了本公开的示例性实例的三个构成部分:物体模型重建、物体关联、以及物体对齐。其中在物体模型重建中,可以连续帧为输入,基于SLAM系统重建物体模型,在物体关联中,执行不连续帧,即先前观测帧和新观测帧之间的物体级数据关联;在物体对齐中,通过点云配准网络将物体模型与新的观测值对齐。以下将对于本公开方案中的各个构成部分的实现进行详细描述。
对于物体模型重建,本实例中通过将包含连续RGB图像和深度图像的视频片段V t作为输入来重建物体模型M m,m=0…N。具体地说,为了重建物体模型,我们使用MaskFusion的实现来实现视频剪辑期间的摄像机跟踪和物体级构建。MaskFusion 以面元来表示物体模型,并通过将给定的RGB-D图像与重建模型的投影对齐来执行相机跟踪。为了实现每个物体的重建,它使用Mask(掩模)R-CNN来获得实例掩模,并对每个实例掩模进行融合到物体级构建中。此外,为了更好地应对物体重建,本公开还进一步训练了类无关分割网络,该类无关分割网络可以与MaskFusion相结合从而可以进行更加广泛类别的动态物体重建。
当先前的观测丢失,新的观测到来时,我们需要找到重建物体和新场景之间的对应关系。然而,直接将物体模型对齐到新场景中是耗时的,并且可能导致带有歧义的匹配。因此,提出了一种由粗到细的多目标对齐处理。在粗匹配中,我们引入关联模块来估计物体模型和基准图像(查询图像)中的2.5D实例点云之间的相似度,然后找到它们之间的匹配。这些匹配为每个物体提供了新场景中的大致位置。在实例分割网络的指导下,2.5D实例点云通过从深度图像反投影而获得。
本实例的关联操作中以两组彩色点云作为输入:物体重建模型M m和通过反投影从基准图像中提取的2.5D实例点云P n,并且分别从这两者提取各自的几何特征和颜色特征来实现关联。
1)几何特征:通过一个PointNet++网络实现了几何特征提取部分。具体来说,我们使用函数
Figure PCTCN2022124913-appb-000001
它处理无序的点云,并将它们编码为固定长度的向量,这里N=1024。
2)颜色特征:分析颜色分布,通过统计直方图来帮助区分不同的物体。具体来说,使用大小为(32,32,32)的三维直方图
Figure PCTCN2022124913-appb-000002
来计算RGB分布。直方图的三通道以与RGB图像相同的方式表示红色、绿色和蓝色。可以将图像中的(256,256,256)颜色空间缩放到更小的(32,32,32)颜色空间,以提高效率和对不同场景中光照变化的稳健性能。
Figure PCTCN2022124913-appb-000003
Figure PCTCN2022124913-appb-000004
其中,i,j,k可分别表示三维直方图中的三通道元素坐标,在物体模型具有数量为x的R,G,B颜色元素时,对应的直方图元素值为x。
特征提取可以采用各种适当的方法来执行,例如本领域中公知的各种方法,这里将不再详细描述。作为示例,可以在特征提取器中使用了PointNet++的多尺度分 组分类版本。
然后,基于所获取的几何特征和颜色特征,来在我们重建的m个物体M m和基准图像中的2.5D实例点云P n之间找到一对一的匹配。我们将一对一匹配问题表述为一个最优传输问题,并通过对该问题进行求解来确定一对一匹配。
具体而言,使用S geo和S rgb的加权和S来评估这两个集合之间的相似性。设x,y为两个不同的点云,例如分别对应于物体重建模型以及观测恢复后获取的物体模型。
Figure PCTCN2022124913-appb-000005
Figure PCTCN2022124913-appb-000006
S=S geo+λS rgb.     (5)
其中φ是L 2归一化函数。β是一个展平函数,可将颜色特征的三维直方图转换为矢量。λ可以为任何适当值,例如0.1。
一一匹配问题的目标是找到两个集合之间总相似度最大的对应关系。具体来说,需要找到m个物体模型和n个2.5D实例点云之间的最大权重匹配。本实例中将其表述为最优传输理论模型。此外,当一些物体消失或新物体出现时,n和m可能不相等。为了处理这些情况,在公式中引入松弛变量来寻找没有对应关系的物体。以m<n的情况为例,n×n距离矩阵D定义如下:
Figure PCTCN2022124913-appb-000007
传输矩阵是T,其中T ij是M i和P j的匹配概率。匹配问题可以表述为:
Figure PCTCN2022124913-appb-000008
Figure PCTCN2022124913-appb-000009
Figure PCTCN2022124913-appb-000010
最后,用Sinkhorn算法求解(7),得到一对一的对应关系。当T ij>0.5且M i和P j都不是松弛变量时,可认为(i,j)是一个很好的匹配,否则将放弃这个匹配。
在模型和2.5D实例点云关联之后获得了新场景中物体模型的粗略位置信息。但是,考虑到物体相对于相机的姿态可能已经发生了很大变化,因此需要将物体与新场 景对齐,以便获得精确的6-DOF(自由度)姿态。在本公开中,在重建模型中以面元来表示物体,而面元在几何和颜色上与点云相似,因此可通过点云配准将物体姿态与新场景对准。点云配准是指寻找刚性变换来对齐两个给定点云,例如重建模型和观测恢复后的物体模型的问题。公式化地,给定两个点云X和Y,目标是找到对齐两个点云的变换T∈SE3。在实现中,可对每个预先匹配的点云集合使用RPMNet,这是一个点云配准网络,在部分、有噪声和看不见的点云配准任务中实现最佳性能。
根据本公开的实施例,在将点云输入网络之前可以进行进一步的处理以优化对齐处理。作为示例,使用半径为0.005且至少有16个相邻点的滤波器进行过滤。然后对体素大小为0.01的点云进行下采样。最后,将点云缩放到单位球面,并将其平移到原点。在参考模型传入RPMNet之前,我们为每个轴生成了几个初始角度为90度、180度、270度的假设,这进一步降低了对两个点云之间初始角度的敏感性。这些操作可以基于Open3D库来进行,当然也可以基于任何其它适当的库来执行。
图4示出了根据本公开的一些实施例的不连续观测场景下的物体关联装置。在装置400中包括:模型获取单元401,被配置为基于观测中断之后恢复观测时获得的基准图像获取物体模型;以及关联单元402,被配置为基于所获取的物体模型与物体重建模型以实现观测中断前后物体关联。
在一些实施例中,关联单元402进一步包括:相似性确定单元4021,被配置为基于物体信息来确定物体点云模型和物体重建模型之间的相似性,并且所述关联单元进一步基于相似性来进行观测中断前后的物体之间关联。
在一些实施例中,所述物体信息包括几何特征和颜色特征两者,并且相似性确定单元被配置为基于几何特征和颜色特征两者来确定物体点云模型和物体重建模型之间的相似性。
在一些实施例中,所述关联单元402进一步配置为:基于相似性来确定观测中断前后的物体之间的一一对应关系以使得观测中断前后的物体之间关联。在一些实施例中,基于多个物体信息与多个物体模型之间的最大总和相似性来确定观测中断前后物体的一一对应关系。特别地,确定物体之间的一一对应关系可由匹配单元来实现,即关联单元包括匹配单元,其配置为基于相似性来确定观测中断前后的物体之间的一一对应关系。当然在实现中,相似性确定单元和匹配单元的操作都由关联单元本身来实现。
在一些实施例中,关联装置400可进一步包括:对齐单元403,被配置为将关联 的观测中断前后的物体进行对齐。在一些实施例中,所述对齐单元被配置为通过空间变换来将观测中断后的物体与观测中断之前的物体的姿态进行对齐。
在一些实施例中,关联装置还可以包括模型重建单元404,被配置为基于物体的连续图像对物体进行模型重建。在一些实施例中,物体的连续图像是在观测中断之前预定数量的连续观测图像。在一些实施例中,物体的重建模型是选自包括基于面元的模型、点云模型的组中的任一个。在一些实施例中,连续图像包括物体的连续RGB图像和深度图像,并且基于连续图像重建的模型是基于面元的模型。应指出,模型重建单元可以不包含在关联装置中,并且可以在进行操作时由关联装置进行调用来进行模型重建。
应指出,模型重建单元404用虚线示出以指示模型重建单元404也可以位于模型训练装置400之外,例如在此情况下,装置400仍能够实现如前所述的本公开的有利效果。
应注意,上述各个单元仅是根据其所实现的具体功能划分的逻辑模块,而不是用于限制具体的实现方式,例如可以以软件、硬件或者软硬件结合的方式来实现。在实际实现时,上述各个单元可被实现为独立的物理实体,或者也可由单个实体(例如,处理器(CPU或DSP等)、集成电路等)来实现。此外,上述各个单元在附图中用虚线示出指示这些单元可以并不实际存在,而它们所实现的操作/功能可由处理电路本身来实现。
此外,尽管未示出,该设备也可以包括存储器,其可以存储由设备、设备所包含的各个单元在操作中产生的各种信息、用于操作的程序和数据、将由通信单元发送的数据等。存储器可以是易失性存储器和/或非易失性存储器。例如,存储器可以包括但不限于随机存储存储器(RAM)、动态随机存储存储器(DRAM)、静态随机存取存储器(SRAM)、只读存储器(ROM)、闪存存储器。当然,存储器可也位于该设备之外。可选地,尽管未示出,但是该设备也可以包括通信单元,其可用于与其它装置进行通信。在一个示例中,通信单元可以被按照本领域已知的适当方式来实现,例如包括天线阵列和/或射频链路等通信部件,各种类型的接口、通信单元等等。这里将不再详细描述。此外,设备还可以包括未示出的其它部件,诸如射频链路、基带处理单元、网络接口、处理器、控制器等。这里将不再详细描述。
根据本公开的方案可以单独地或者与任何现有的物体观测方案相结合地形成物体 观测系统,其可以用于对物体进行观测,例如持续观测,并且在观测中断/丢失之后再恢复观测时,执行根据本公开的不连续观测情况下的物体定位。具体而言,在根据公开的物体观测系统启动时,可以随着物体观测而以连续的RGB-D视频帧为输入来重建场景中的物体模型,这可以通过动态物体重建模块来执行。而当观测中断/场景改变后,关联模块获取新观测的2.5D实例点云,然后评估重建的物体模型和新观测中的2.5D实例点云之间的相似性,以找到观测中断/场景改变前后的一对一的对应关系。这些对应关系为每个物体提供了新场景中的大致位置。然后,在对齐模块中来进行物体对齐,例如使用了一种对初始化不太敏感且更健壮的基于深度学习的刚性点云配准方法来对齐两个点云实例集群。因此,能够实现一种新颖的动态物体观测系统,用于多个看不见的物体的重建、关联和对齐,而无需对不同场景之间的新物体进行额外的训练。并且实验表明,我们的系统是一个通用的和最先进的系统,可以支持各种任务,如无模型物体姿态估计,单视图物体完成,以及真实的机器人抓取。
以下将结合实验示例来进一步展现本公开的方案的有效性,实验示例包括6自由度物体姿态估计和机器人抓取。
通过在公共YCBVideo和无模型物体姿态估计数据集(MOPED)数据集上进行评估,可表明根据本公开的系统在6自由度位姿估计方面的性能是优异的,与零样本(zero-shot)方法,例如ZePHyR,和基于模型方法,例如CosyPose、Pix2Pose和EPOS等,相比,根据本公开的方法能够获得更优、更准确的姿态估计。
根据本公开的系统还能够良好地应用于各种应用任务,尤其是机器人抓取任务。使用带有robotiq-2f-85手爪和腕部Realsense D435i RGB-D摄像头的UR5机械臂作为硬件平台。根据本公开的系统可以在以下步骤中帮助机器人抓取任务:a)机械臂扫描放置在桌面上的杂乱物体,在扫描过程中,通过根据本公开的系统来重建未知的物体模型。b)机械臂从单个视角获取查询图像,并使用根据本公开的系统将重建的物体模型与查询图像中的物体对齐,并获得对齐的物体点云。c)将对齐的物体点云馈送到现成的抓取姿态生成模型中,以生成候选抓取。与单视图点云相比,根据本公开的系统输出的对齐物体点云更加完整,例如被完全遮挡或部分遮挡的物体仍可被进行适当的定位,从而抓取姿态生成模块可以对物体的遮挡部分或者被遮挡的物体生成抓取。
因此,本公开主要考虑了全新的任务,即在没有连续观察和已知CAD模型先验的情况下,进行动态物体重建和定位,并且提出了一种新的系统来执行动态物体级重 建、多物体关联和对齐。根据本公开的系统在诸如无模型物体姿态估计、基于模型对齐的单视图物体完成和动态多物体机器人抓取等各种任务中是通用的和最先进的。
本公开的一些实施例还提供一种电子设备,其可以操作以实现前述的模型预训练设备和/或模型训练设备的操作/功能。图5示出本公开的电子设备的一些实施例的框图。例如,在一些实施例中,电子设备5可以为各种类型的设备,例如可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。例如,电子设备5可以包括显示面板,以用于显示根据本公开的方案中所利用的数据和/或执行结果。例如,显示面板可以为各种形状,例如矩形面板、椭圆形面板或多边形面板等。另外,显示面板不仅可以为平面面板,也可以为曲面面板,甚至球面面板。
如图5所示,该实施例的电子设备5包括:存储器51以及耦接至该存储器51的处理器52。应当注意,图5所示的电子设备50的组件只是示例性的,而非限制性的,根据实际应用需要,该电子设备50还可以具有其他组件。处理器52可以控制电子设备5中的其它组件以执行期望的功能。
在一些实施例中,存储器51用于存储一个或多个计算机可读指令。处理器52用于运行计算机可读指令时,计算机可读指令被处理器52运行时实现根据上述任一实施例所述的方法。关于该方法的各个步骤的具体实现以及相关解释内容可以参见上述的实施例,重复之处在此不作赘述。
例如,处理器52和存储器51之间可以直接或间接地互相通信。例如,处理器52和存储器51可以通过网络进行通信。网络可以包括无线网络、有线网络、和/或无线网络和有线网络的任意组合。处理器52和存储器51之间也可以通过系统总线实现相互通信,本公开对此不作限制。
例如,处理器52可以体现为各种适当的处理器、处理装置等,诸如中央处理器(CPU)、图形处理器(Graphics Processing Unit,GPU)、网络处理器(NP)等;还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。中央处理元(CPU)可以为X86或ARM架构等。例如,存储器51可以包括各种形式的计算机可读存储介质的任意组合,例如易失性存储器和/或非易失性存储器。存储器51例如可以包括系统存储器,系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)、数 据库以及其他程序等。在存储介质中还可以存储各种应用程序和各种数据等。
另外,根据本公开的一些实施例,根据本公开的各种操作/处理在通过软件和/或固件实现的情况下,可从存储介质或网络向具有专用硬件结构的计算机系统,例如图6所示的计算机系统600安装构成该软件的程序,该计算机系统在安装有各种程序时,能够执行各种功能,包括诸如前文所述的功能等等。图6是示出根据本公开的实施例的中可采用的计算机系统的示例结构的框图。
在图6中,中央处理单元(CPU)601根据只读存储器(ROM)602中存储的程序或从存储部分608加载到随机存取存储器(RAM)603的程序执行各种处理。在RAM 603中,也根据需要存储当CPU 601执行各种处理等时所需的数据。中央处理单元仅仅是示例性的,其也可以是其它类型的处理器,诸如前文所述的各种处理器。ROM 602、RAM 603和存储部分608可以是各种形式的计算机可读存储介质,如下文所述。需要注意的是,虽然图6中分别示出了ROM 602、RAM 603和存储装置608,但是它们中的一个或多个可以合并或者位于相同或不同的存储器或存储模块中。
CPU 601、ROM 602和RAM 603经由总线604彼此连接。输入/输出接口605也连接到总线604。
下述部件连接到输入/输出接口605:输入部分606,诸如触摸屏、触摸板、键盘、鼠标、图像传感器、麦克风、加速度计、陀螺仪等;输出部分607,包括显示器,比如阴极射线管(CRT)、液晶显示器(LCD),扬声器,振动器等;存储部分608,包括硬盘,磁带等;和通信部分609,包括网络接口卡比如LAN卡、调制解调器等。通信部分609允许经由网络比如因特网执行通信处理。容易理解的是,虽然图6中示出电子设备600中的各个装置或模块是通过总线604来通信的,但它们也可以通过网络或其它方式进行通信,其中,网络可以包括无线网络、有线网络、和/或无线网络和有线网络的任意组合。
根据需要,驱动器610也连接到输入/输出接口605。可拆卸介质611比如磁盘、光盘、磁光盘、半导体存储器等等根据需要被安装在驱动器610上,使得从中读出的计算机程序根据需要被安装到存储部分608中。
在通过软件实现上述系列处理的情况下,可以从网络比如因特网或存储介质比如可拆卸介质611安装构成软件的程序。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的 计算机程序,该计算机程序包含用于执行根据本公开的实施例的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被CPU 601执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,在本公开的上下文中,计算机可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是,但不限于:电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
在一些实施例中,还提供了一种计算机程序,包括:指令,指令当由处理器执行时使处理器执行上述任一个实施例的方法。例如,指令可以体现为计算机程序代码。
在本公开的实施例中,可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言,诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算 机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络(,包括局域网(LAN)或广域网(WAN))连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块、部件或单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块、部件或单元的名称在某种情况下并不构成对该模块、部件或单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示例性的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
根据本公开的一些实施例,提出了一种不连续观测场景中的物体定位方法,包括以下步骤:基于观测中断之后恢复观测时获得的基准图像获取物体模型;以及基于所获取的物体模型与物体重建模型以实现观测中断前后物体关联。
在一些实施例中,物体的重建模型是通过基于物体的连续图像对物体进行模型重建而得到的。在一些实施例中,物体的连续图像是在观测中断之前预定数量的连续观测图像。
在一些实施例中,物体的重建模型是选自包括基于面元的模型、点云模型的组中的任一个。
在一些实施例中,连续图像包括物体的连续RGB图像和深度图像,并且基于连续图像重建的模型是基于面元的模型。
在一些实施例中,基于基准图像获取的物体模型是物体点云模型。在一些实施例 中,所述基准图像为恢复观测时获得的起始图像,并且从起始图像获取2.5D实例点云作为物体点云模型。
在一些实施例中,基于所获取的物体模型与物体重建模型以实现观测中断前后物体关联进一步包括:基于物体信息来确定物体点云模型和物体重建模型之间的相似性,并且基于相似性来进行观测中断前后的物体之间关联。
在一些实施例中,物体信息包括物体几何特征、物体纹理特征、物体颜色特征中的至少一个。在一些实施例中,所述物体信息包括几何特征和颜色特征两者,并且基于几何特征和颜色特征两者来确定物体点云模型和物体重建模型之间的相似性。
在一些实施例中,基于相似性来进行观测中断前后的物体之间关联进一步包括:基于相似性来确定观测中断前后的物体之间的一一对应关系以使得观测中断前后的物体之间关联。在一些实施例中,基于多个物体信息与多个物体模型之间的最大总和相似性来确定观测中断前后物体的一一对应关系。
在一些实施例中,所述方法进一步包括:将关联的观测中断前后的物体进行对齐。在一些实施例中,通过空间变换来将观测中断后的物体与观测中断之前的物体的姿态进行对齐。
根据本公开的一些实施例,提供了一种不连续观测场景中的物体关联装置,包括:模型获取单元,被配置为基于观测中断之后恢复观测时获得的基准图像获取物体模型;以及关联单元,被配置为基于所获取的物体模型与物体重建模型以实现观测中断前后物体关联。
在一些实施例中,关联单元进一步包括:相似性确定单元,被配置为基于物体信息来确定物体点云模型和物体重建模型之间的相似性,并且所述关联单元进一步基于相似性来进行观测中断前后的物体之间关联。
在一些实施例中,所述物体信息包括几何特征和颜色特征两者,并且相似性确定单元被配置为基于几何特征和颜色特征两者来确定物体点云模型和物体重建模型之间的相似性。
在一些实施例中,所述关联单元进一步配置为:基于相似性来确定观测中断前后的物体之间的一一对应关系以使得观测中断前后的物体之间关联。在一些实施例中,基于多个物体信息与多个物体模型之间的最大总和相似性来确定观测中断前后物体的一一对应关系。
在一些实施例中,关联装置可进一步包括:对齐单元,被配置为将关联的观测中 断前后的物体进行对齐。在一些实施例中,所述对齐单元被配置为通过空间变换来将观测中断后的物体与观测中断之前的物体的姿态进行对齐。
在一些实施例中,关联装置还可以包括模型重建单元,被配置为基于物体的连续图像对物体进行模型重建。在一些实施例中,物体的连续图像是在观测中断之前预定数量的连续观测图像。在一些实施例中,物体的重建模型是选自包括基于面元的模型、点云模型的组中的任一个。在一些实施例中,连续图像包括物体的连续RGB图像和深度图像,并且基于连续图像重建的模型是基于面元的模型。
根据本公开的又一些实施例,提供一种电子设备,包括:存储器;和耦接至所述存储器的处理器,所述存储器中存储有指令,所述指令当由所述处理器执行时,使得所述电子设备执行本公开中所述的任一实施例的方法。
根据本公开的又一些实施例,提供一种计算机可读存储介质,其上存储有计算机程序,该程序由处理器执行时实现本公开中所述的任一实施例的方法。
根据本公开的又一些实施例,提供计算机程序,包括:指令,指令当由处理器执行时使处理器执行本公开中所述的任一实施例的方法。
根据本公开的一些实施例,提供一种计算机程序产品,包括指令,所述指令当由处理器执行时实现本公开中所述的任一实施例的方法。
根据本公开的一些实施例,提供了一种计算机程序,所述计算机程序包括的程序代码在由计算机执行时导致实现本公开中所述的任一实施例的方法。
以上描述仅为本公开的一些实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
在本文提供的描述中,阐述了许多特定细节。然而,理解的是,可以在没有这些特定细节的情况下实施本公开的实施例。在其他情况下,为了不模糊该描述的理解,没有对众所周知的方法、结构和技术进行详细展示。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合 地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
虽然已经通过示例对本公开的一些特定实施例进行了详细说明,但是本领域的技术人员应该理解,以上示例仅是为了进行说明,而不是为了限制本公开的范围。本领域的技术人员应该理解,可在不脱离本公开的范围和精神的情况下,对以上实施例进行修改。本公开的范围由所附权利要求来限定。

Claims (20)

  1. 一种不连续观测场景中的物体定位方法,包括以下步骤:
    基于观测中断之后恢复观测时获得的基准图像获取物体模型,其中物体模型是观测恢复后场景中物体的模型;以及
    基于所获取的物体模型与物体重建模型以实现观测中断前后物体关联,其中物体重建模型是观测中断之前场景中物体的模型。
  2. 根据权利要求1所述的方法,其中,物体重建模型是通过基于物体的连续图像对物体进行模型重建而得到的。
  3. 根据权利要求2所述的方法,其中,物体重建模型是选自包括基于面元的模型、点云模型的组中的任一个。
  4. 根据权利要求2所述的方法,其中,连续图像包括物体的连续RGB图像和深度图像,并且基于连续图像重建的模型是基于面元的模型。
  5. 根据权利要求2-4中任一项所述的方法,其中,物体的连续图像是在观测中断之前预定数量的连续观测图像。
  6. 根据权利要求1所述的方法,其中,基于基准图像获取的物体模型是物体点云模型。
  7. 根据权利要求1所述的方法,其中,所述基准图像为恢复观测时获得的起始图像,并且从起始图像获取2.5D实例点云作为物体点云模型。
  8. 根据权利要求1所述的方法,其中,基于所获取的物体模型与物体重建模型以实现观测中断前后物体关联进一步包括:
    基于物体信息来确定物体点云模型和物体重建模型之间的相似性,并且
    基于相似性来进行观测中断前后的物体之间关联。
  9. 根据权利要求8所述的方法,其中,物体信息包括物体几何特征、物体纹理特征、物体颜色特征中的至少一个。
  10. 根据权利要求8所述的方法,所述物体信息包括几何特征和颜色特征两者,并且基于几何特征和颜色特征两者来确定物体点云模型和物体重建模型之间的相似性。
  11. 根据权利要求8所述的方法,其中,基于相似性来进行观测中断前后的物体之间关联进一步包括:
    基于相似性来确定观测中断前后的物体之间的一一对应关系以使得观测中断前后的物体之间关联。
  12. 根据权利要求11所述的方法,其中,基于多个物体信息与多个物体模型之间的最大总和相似性来确定观测中断前后物体的一一对应关系。
  13. 根据权利要求1所述的方法,所述方法进一步包括:
    将关联的观测中断前后的物体进行对齐。
  14. 根据权利13所述的方法,其中,通过空间变换来将观测中断后的物体与观测中断之前的物体的姿态进行对齐。
  15. 一种不连续观测场景中的物体关联装置,包括:
    模型获取单元,被配置为基于观测中断之后恢复观测时获得的基准图像获取物体模型;以及
    关联单元,被配置为基于所获取的物体模型与物体重建模型以实现观测中断前后物体关联。
  16. 根据权利要求15所述的装置,其中,关联单元进一步包括:
    相似性确定单元,被配置为基于物体信息来确定物体点云模型和物体重建模型之间的相似性,并且
    所述关联单元进一步基于相似性来进行观测中断前后的物体之间关联。
  17. 根据权利要求15所述的装置,进一步包括:
    对齐单元,被配置为将关联的观测中断前后的物体进行对齐。
  18. 一种电子设备,包括:
    存储器;和
    耦接至所述存储器的处理器,所述存储器中存储有指令,所述指令当由所述处理器执行时,使得所述电子设备执行根据权利要求1-14中任一项所述的方法。
  19. 一种计算机可读存储介质,其上存储有计算机程序,该程序由处理器执行时实现根据权利要求1-14中任一项所述的方法。
  20. 一种计算机程序产品,包括指令,该指令在由处理器执行时导致实现根据权利要求1-14中任一项所述的方法。
PCT/CN2022/124913 2021-11-15 2022-10-12 用于非连续观测情况下物体定位的方法、装置和存储介质 WO2023082922A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111349093.5A CN113989374A (zh) 2021-11-15 2021-11-15 用于非连续观测情况下物体定位的方法、装置和存储介质
CN202111349093.5 2021-11-15

Publications (1)

Publication Number Publication Date
WO2023082922A1 true WO2023082922A1 (zh) 2023-05-19

Family

ID=79748582

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/124913 WO2023082922A1 (zh) 2021-11-15 2022-10-12 用于非连续观测情况下物体定位的方法、装置和存储介质

Country Status (2)

Country Link
CN (1) CN113989374A (zh)
WO (1) WO2023082922A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113989374A (zh) * 2021-11-15 2022-01-28 北京有竹居网络技术有限公司 用于非连续观测情况下物体定位的方法、装置和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190028637A1 (en) * 2017-07-20 2019-01-24 Eclo, Inc. Augmented reality for three-dimensional model reconstruction
US20190147589A1 (en) * 2017-11-10 2019-05-16 Shenzhen United Imaging Healthcare Co., Ltd. System and method for image reconstruction
CN111951158A (zh) * 2019-05-16 2020-11-17 杭州海康机器人技术有限公司 一种无人机航拍图像拼接中断的恢复方法、装置和存储介质
CN112884894A (zh) * 2021-04-28 2021-06-01 深圳大学 场景重建数据采集方法、装置、计算机设备和存储介质
CN113436318A (zh) * 2021-06-30 2021-09-24 北京市商汤科技开发有限公司 场景重建方法、装置、电子设备和计算机存储介质
CN113989374A (zh) * 2021-11-15 2022-01-28 北京有竹居网络技术有限公司 用于非连续观测情况下物体定位的方法、装置和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190028637A1 (en) * 2017-07-20 2019-01-24 Eclo, Inc. Augmented reality for three-dimensional model reconstruction
US20190147589A1 (en) * 2017-11-10 2019-05-16 Shenzhen United Imaging Healthcare Co., Ltd. System and method for image reconstruction
CN111951158A (zh) * 2019-05-16 2020-11-17 杭州海康机器人技术有限公司 一种无人机航拍图像拼接中断的恢复方法、装置和存储介质
CN112884894A (zh) * 2021-04-28 2021-06-01 深圳大学 场景重建数据采集方法、装置、计算机设备和存储介质
CN113436318A (zh) * 2021-06-30 2021-09-24 北京市商汤科技开发有限公司 场景重建方法、装置、电子设备和计算机存储介质
CN113989374A (zh) * 2021-11-15 2022-01-28 北京有竹居网络技术有限公司 用于非连续观测情况下物体定位的方法、装置和存储介质

Also Published As

Publication number Publication date
CN113989374A (zh) 2022-01-28

Similar Documents

Publication Publication Date Title
US20230252670A1 (en) Method for detecting hand key points, method for recognizing gesture, and related devices
CN109584276B (zh) 关键点检测方法、装置、设备及可读介质
EP3786890B1 (en) Method and apparatus for determining pose of image capture device, and storage medium therefor
US20210209797A1 (en) Point-based object localization from images
WO2019011249A1 (zh) 一种图像中物体姿态的确定方法、装置、设备及存储介质
WO2020062493A1 (zh) 图像处理方法和装置
US20220066569A1 (en) Object interaction method and system, and computer-readable medium
WO2022012179A1 (zh) 生成特征提取网络的方法、装置、设备和计算机可读介质
CN111273772B (zh) 基于slam测绘方法的增强现实交互方法、装置
CN110986969A (zh) 地图融合方法及装置、设备、存储介质
US11748913B2 (en) Modeling objects from monocular camera outputs
WO2023082922A1 (zh) 用于非连续观测情况下物体定位的方法、装置和存储介质
WO2023083030A1 (zh) 一种姿态识别方法及其相关设备
JP2018026064A (ja) 画像処理装置、画像処理方法、システム
WO2022143366A1 (zh) 图像处理方法、装置、电子设备、介质及计算机程序产品
WO2024001959A1 (zh) 一种扫描处理方法、装置、电子设备及存储介质
WO2024040954A1 (zh) 点云语义分割网络训练方法、点云语义分割方法及装置
US20230401799A1 (en) Augmented reality method and related device
CN114998433A (zh) 位姿计算方法、装置、存储介质以及电子设备
CN112270242B (zh) 轨迹的显示方法、装置、可读介质和电子设备
US11514645B2 (en) Electronic device for providing visual localization based on outdoor three-dimension map information and operating method thereof
WO2024060708A1 (zh) 目标检测方法和装置
CN112085842B (zh) 深度值确定方法及装置、电子设备和存储介质
US11188787B1 (en) End-to-end room layout estimation
US11551379B2 (en) Learning template representation libraries

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22891721

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18569508

Country of ref document: US