WO2023179341A1 - 在视频中放置虚拟对象的方法及相关设备 - Google Patents

在视频中放置虚拟对象的方法及相关设备 Download PDF

Info

Publication number
WO2023179341A1
WO2023179341A1 PCT/CN2023/079649 CN2023079649W WO2023179341A1 WO 2023179341 A1 WO2023179341 A1 WO 2023179341A1 CN 2023079649 W CN2023079649 W CN 2023079649W WO 2023179341 A1 WO2023179341 A1 WO 2023179341A1
Authority
WO
WIPO (PCT)
Prior art keywords
plane
point
points
grid
placement
Prior art date
Application number
PCT/CN2023/079649
Other languages
English (en)
French (fr)
Inventor
郭亨凯
温佳伟
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023179341A1 publication Critical patent/WO2023179341A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Definitions

  • the present disclosure relates to the field of computer vision technology, and in particular, to a method, device, electronic device, storage medium and program product for placing virtual objects in a video.
  • Augmented Reality (AR) technology is a technology that cleverly integrates virtual information with the real world. It widely uses multimedia, three-dimensional modeling, real-time tracking, intelligent interaction, sensing and other technical means to create computer-generated After simulating text, images, three-dimensional models, music, videos and other virtual objects, they are applied to the real world to achieve "enhancement" of the real world.
  • SLAM Simultaneous Localization and Mapping
  • embodiments of the present disclosure provide a method for placing virtual objects in a video, which can accurately determine the plane on which the virtual object is placed in the video, complete the accurate placement of the virtual object, and avoid the problem of not being able to find the plane corresponding to the virtual object.
  • the problem caused by the inability to place virtual objects in the image.
  • the above-mentioned method of placing a virtual object in a video may include: obtaining a 3D point cloud corresponding to the video; and, for each image frame in the video, obtaining the points in the 3D point cloud respectively. There are 3D points corresponding to 2D points in the image frame; based on the 3D points, Obtain a mesh through triangulation; determine the target position of the virtual object in the image frame according to the placement position of the virtual object in the video and the mesh; and the target position in the image frame Place the virtual object at the location.
  • a device for placing virtual objects in videos including:
  • a three-dimensional 3D point cloud acquisition module used to acquire the 3D point cloud corresponding to the video
  • a triangulation module configured to obtain, for each image frame in the video, 3D points in the 3D point cloud that have corresponding 2D points in the current image frame, and obtain a grid through triangulation based on the 3D points.
  • a target position determination module configured to determine the target position of the virtual object in the image frame according to the placement position of the virtual object in the video and the grid;
  • a virtual object placement module is used to place the virtual object at a target position in the image frame.
  • embodiments of the present disclosure also provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the above method is implemented.
  • Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium that stores computer instructions, and the computer instructions are used to cause the computer to perform the above method.
  • Embodiments of the present disclosure also provide a computer program product, which includes computer program instructions.
  • the computer program instructions When the computer program instructions are run on a computer, they cause the computer to perform the above method.
  • multiple triangles with 3D points as vertices can be obtained through triangulation, and each triangle can determine a plane, so , multiple planes included in each image frame can be obtained based on the above-mentioned multiple triangles, and then the target plane and the target position where the virtual object is placed are determined based on the relationship between the virtual object placement position and the above-mentioned multiple planes.
  • the above method can effectively solve the problem that the plane cannot be estimated based on a small number of 3D points and some non-planar areas in the actual scene cannot be estimated, resulting in the inability to complete the placement of virtual objects.
  • Figure 1 is a schematic diagram of an application scenario of a method for placing virtual objects in a video provided by an embodiment of the present disclosure
  • Figure 2 shows the implementation process of a method of placing virtual objects in a video according to some embodiments of the present disclosure
  • Figure 3 shows an example of a mesh obtained from a finite point set through the Delaunay triangulation algorithm according to an embodiment of the present disclosure
  • Figure 4 shows the implementation process of determining the target position of the virtual object in the above image frame according to the placement position of the virtual object in the video and the above grid according to an embodiment of the present disclosure
  • Figure 5 shows a schematic diagram of the internal structure of a device for placing virtual objects in videos according to some embodiments of the present disclosure
  • Figure 6 shows a schematic diagram of the internal structure of the target position determination module according to some embodiments of the present disclosure.
  • FIG. 7 shows a more specific schematic diagram of the hardware structure of the electronic device provided by this embodiment.
  • FIG. 1 is a schematic diagram of an application scenario of a method for placing virtual objects in a video provided by an embodiment of the present disclosure.
  • the application scenario includes: terminal device 101 and augmented reality processing device 102.
  • the above-mentioned terminal device 101 and the augmented reality processing device 102 are functionally distinguished, and Figure 1 only gives an example of an application scenario.
  • the above-mentioned terminal device 101 and augmented reality processing device 102 can be two independent physical devices, or they can be integrated on a single physical device to realize interaction with the user and video processing at the same time. If the terminal device 101 and the augmented reality processing device 102 are two independent physical devices, the terminal device 101 and the augmented reality processing device 102 may be connected through a wired or wireless communication network.
  • the above-mentioned terminal device 101 includes but is not limited to a desktop computer, a mobile phone, a mobile computer, a tablet computer, a media player, a smart wearable device, a personal digital assistant (personal digital assistant, PDA) or other capable devices. Electronic equipment that implements the above functions, etc.
  • the above-mentioned terminal device 101 can display an interactive interface that can interact with the user through the display screen, thereby providing the user with various augmented reality applications. For example, the user can select a position to place a virtual object in each image frame of a video being played through the terminal device 101.
  • the above-mentioned augmented reality processing device 102 may be an electronic device with computing capabilities, which is used to perform augmented reality processing on image frames in the video, such as placing virtual objects at a location selected by the user, and so on.
  • some embodiments of the present disclosure provide a method for placing virtual objects in a video, which can accurately determine the plane on which the virtual object is placed, and avoid being unable to place the virtual object in the video due to the inability to find the plane corresponding to the virtual object. Problem placing virtual objects. It should be noted that this method can be executed by the above-mentioned augmented reality processing device 102.
  • Figure 2 shows the implementation flow of the method of placing virtual objects in video according to the embodiment of the present disclosure. As shown in Figure 2, the method may include the following steps:
  • step 202 a three-dimensional (3D) point cloud corresponding to the above video is obtained.
  • the terminal device 101 generates a virtual object placement request carrying the information and sends it to the augmented reality processing device 102, and the augmented reality processing device 102 places the virtual object at the placement location selected by the user in the video.
  • the above-mentioned virtual object may generally refer to a material, such as a picture, a virtual object, a video, etc.
  • the placement position of the above-mentioned virtual object in the video can be mapped to a point on each image frame in the video, which can be represented by the coordinates of pixel points on the image frame.
  • the 2D points on the two-dimensional (2D) image frames contained in a video can be mapped to the three-dimensional space, thereby obtaining the corresponding three-dimensional space of the 2D points in the image frame. 3D points. Furthermore, after completing the mapping of the above-mentioned 2D points to 3D points in multiple frames of 2D images, a global 3D point cloud can be obtained, which is called a 3D point cloud corresponding to the video in this disclosure.
  • the 3D point cloud obtained through SLAM technology not only includes each 3D point, but also includes the corresponding relationship between these 3D points and the 2D points in each image frame of the video.
  • a 3D point cloud in the 3D point cloud A point can correspond to a 2D point in multiple image frames and so on.
  • each 2D point in an image frame in the video can also be mapped to a 3D point in the three-dimensional space through the pose of the camera that shoots the video.
  • step 204 3D points in the above-mentioned 3D point cloud that have corresponding 2D points in the current image frame are obtained.
  • each 3D point in the 3D point cloud corresponding to a video will correspond to a 2D point in at least one image frame of the video. Therefore, in embodiments of the present disclosure, for a For image frames, all 3D points with corresponding 2D points in the image frame can be determined from the 3D point cloud based on the correspondence between the 3D points in the 3D point cloud and the 2D points in each image frame.
  • step 206 based on the above 3D points, a mesh is obtained through triangulation.
  • the above-mentioned augmented reality processing device 102 can directly use the above-mentioned set of 3D points as a limited point set, and use the Delaunay triangulation algorithm based on the limited point set. Obtain the above grid.
  • the above-mentioned augmented reality processing device 102 may first determine the 2D point corresponding to the above-mentioned 3D point on the current image frame; and then, in the above-mentioned step 206, The set of 2D points is used as a finite point set; next, based on the limited point set, the first grid is obtained through the Delaunay triangulation algorithm; and then based on the connection relationship between the 2D points in the first grid and the above 2D points and the above 3D According to the corresponding relationship between the points, the connection relationship between the 3D points corresponding to the above-mentioned first grid is obtained; finally, the second grid is determined based on the connection relationship between the above-mentioned 3D points, and the determined second grid is The grid is referred to as the grid in step 206 above.
  • a 2D grid is obtained by performing Delaunay triangulation on 2D points, and then the obtained 2D grid is mapped into a 3D grid based on the corresponding relationship between 2D points and 3D points.
  • All faces in the plan view shown in the above grid are triangles, and the set of all triangular faces is the convex hull of the above finite point set.
  • Figure 3 shows an example of a mesh obtained from a finite point set through the Delaunay triangulation algorithm according to an embodiment of the present disclosure.
  • the grid shown in the right half of Figure 3 can be obtained.
  • each side of the plan view shown in this grid does not contain any other points.
  • the edges of this grid do not intersect.
  • the grid shows a plan view in which all faces are triangles.
  • multiple triangles can be obtained through triangulation, and each triangle can determine a plane. Therefore, multiple planes included in each image frame can be obtained, effectively solving problems based on a small number of 3D points. Problems with planes that cannot be estimated when performing plane estimation, and Some non-planar areas in actual scenes cannot be estimated.
  • step 208 plane estimation is performed based on the above-mentioned 3D point cloud to determine at least one first plane.
  • the above-mentioned augmented reality processing device 102 can perform plane estimation through a Random Sample Consensus (RANSAC) algorithm.
  • RANSAC is an algorithm first proposed by Fischler and Bolles in 1981. This algorithm calculates the mathematical model parameters of the data based on a sample data set containing abnormal data.
  • the RANSAC algorithm is often used to find the best matching model in computer vision matching problems.
  • the above-mentioned augmented reality processing device 102 can fit multiple first planes according to the 3D point cloud image through the RANSAC algorithm.
  • the best matching model found by the RANSAC algorithm is multiple first planes.
  • the parameters of the above-mentioned plane may include: various parameters that determine the plane equation of the plane.
  • the plane can be determined by determining the above four coefficients A, B, C and D. Therefore, the parameters of the above plane can be Refers to the four coefficients A, B, C and D mentioned above.
  • the above plane expression can also be expressed by the normal vector and distance. Determining the normal vector and distance of a plane can also determine a plane.
  • the parameters of the above plane can also refer to the normal vector and distance of the above plane. It should be noted that the parameters of the above-mentioned various forms of planes are essentially the same, and they can all uniquely determine a plane.
  • the normal vector and distance of the plane can be determined through the above-mentioned four coefficients A, B, C and D; and through The normal vector and distance of the plane can also obtain the above four coefficients A, B, C and D.
  • step 210 for each triangle in the above-mentioned mesh, in response to determining that the three vertices of a triangle are on the same first plane, replace the normal vector of the second plane determined by the above-mentioned triangle with the three vertices of the triangle.
  • the normal vector of the first plane in response to determining that the three vertices of a triangle are on the same first plane, replace the normal vector of the second plane determined by the above-mentioned triangle with the three vertices of the triangle. The normal vector of the first plane.
  • the triangle plane obtained through triangulation and the plane obtained through conventional plane estimation can be fused.
  • the third The normal vector of a plane corrects the normal of the plane determined by the triangle Vectors solve the problem of planes that cannot be estimated using conventional plane estimation methods because there are relatively few 3D points. It also solves the problem of one plane being estimated into multiple planes due to errors during plane estimation through the above triangulation. The resulting plane has undulating problems, which makes the final plane estimation result more accurate.
  • step 212 the target position of the virtual object in the image frame is determined based on the placement position of the virtual object in the video and the grid.
  • the placement position of the above-mentioned virtual object in the above-mentioned video actually corresponds to a point in each image frame in the video.
  • the point corresponding to the point selected by the user in each image frame of the video can be determined through planar tracking technology.
  • the specific implementation method of determining the target position of the virtual object in the image frame according to the placement position of the virtual object in the video and the grid in step 212 can be as follows: As shown in Figure 4, it includes the following steps:
  • step 402 the corresponding placement point of the virtual object in the current image frame is determined based on the placement position of the virtual object in the video.
  • step 404 in response to determining that the placement point is in a triangle of the grid, a plane determined by the triangle is used as a target plane.
  • step 406 the target position is determined based on the placement point and the target plane.
  • determining the target position based on the placement point and the target plane in step 408 may include:
  • the target plane performs collision detection to determine the location of the collision; and finally, the location of the collision is used as the target location.
  • the above method may further include: if no collision is detected during the collision detection process between the above ray and the above target plane, the target position cannot be obtained, so the virtual alignment cannot be completed. The placement of the icon. At this time, the above-mentioned augmented reality processing device 102 may output information that virtual object placement fails.
  • the above-mentioned step 404 may further include: in response to determining that the above-mentioned placement point is not in any triangle of the above-mentioned grid, determining that the virtual object placement fails.
  • the above-mentioned augmented reality processing device 102 may output information that virtual object placement fails.
  • the above-mentioned augmented reality processing device 102 can send a response to the above-mentioned terminal device 101 that virtual object placement fails, and the terminal device 101 displays corresponding prompt information.
  • the above-mentioned step 404 may further include: in response to determining that the above-mentioned placement point is not in any triangle of the above-mentioned grid, selecting from a plurality of planes determined by all triangles in the above-mentioned grid. The plane closest to the above placement point is used as the above target plane.
  • the above-mentioned plane closest to the above-mentioned placement point can be determined in the following manner: first, for each triangle in the above-mentioned grid, the plane determined by the above-mentioned triangle is used as a reference plane, and the above-mentioned The distance between the placement point and each reference plane is determined; then, the reference plane corresponding to the shortest distance is selected as the target plane.
  • determining the distance from the above-mentioned placement point to the reference plane may include: obtaining the pose of the camera corresponding to the current image frame; constructing a center point starting from the above-mentioned camera based on the pose of the camera and the above-mentioned placement point And the ray passing through the above-mentioned placement point; finding the intersection between the above-mentioned ray and the above-mentioned reference plane; and taking the distance from the above-mentioned placement point to the intersection point as the distance from the above-mentioned placement point to the above-mentioned reference plane.
  • step 214 the virtual object is placed at the target position in the image frame.
  • each triangle can determine a plane. Therefore, each image frame can be obtained based on the above multiple triangles.
  • the multiple planes included in the virtual object placement position are then used to determine the target plane and the target position of the virtual object placement based on the relationship between the virtual object placement position and the multiple planes.
  • the above method can effectively solve the problem that the plane cannot be estimated based on a small number of 3D points and some non-planar areas in the actual scene cannot be estimated, resulting in the inability to complete the placement of virtual objects.
  • the methods in the embodiments of the present disclosure can be executed by a single device, such as a computer or server.
  • the method of this embodiment can also be applied in a distributed scenario, and is completed by multiple devices cooperating with each other.
  • one device among the multiple devices can only perform one or more steps in the method of the embodiment of the present disclosure, and the multiple devices will interact with each other to complete all the steps. method described.
  • FIG. 5 shows the internal structure of a device for placing virtual objects in a video according to an embodiment of the present disclosure.
  • the device may include: a 3D point cloud acquisition module 502, a triangulation module 504, a target position determination module 508, and a virtual object placement module 510.
  • the above-mentioned 3D point cloud acquisition module 502 is used to acquire the 3D point cloud corresponding to the above-mentioned video.
  • the above-mentioned 3D point cloud acquisition module 502 can directly obtain the 3D point cloud corresponding to the above-mentioned video based on SLAM technology; or the above-mentioned 3D point cloud acquisition module 502 can also obtain the video based on the pose of the camera that shot the video. Each 2D point in an image frame is mapped to a 3D point in the three-dimensional space, thereby obtaining the 3D point cloud corresponding to the above video.
  • the above-mentioned triangulation module 504 is used for each image frame in the video to obtain the 3D points in the above-mentioned 3D point cloud that have corresponding 2D points in the current image frame, and based on the above-mentioned 3D points, obtain a grid through triangulation.
  • the above-mentioned triangulation module 504 can determine from the 3D point cloud the corresponding relationship between the 3D point in the 3D point cloud and the 2D point in each image frame. All 3D points of 2D points.
  • the above-mentioned triangulation module 504 can directly use the above-mentioned set of 3D points as a finite point set, and obtain the above-mentioned grid through the Delaunay triangulation algorithm based on the limited point set.
  • the above-mentioned triangulation module 504 may include the following units:
  • 2D point determination unit used to determine the 2D point corresponding to the 3D point on the image frame
  • a triangulation unit configured to use the set of 2D points as a finite point set, and obtain the first grid through the Delaunay triangulation algorithm based on the limited point set;
  • a grid mapping unit configured to obtain the relationship between the 3D points corresponding to the first grid based on the connection relationship between the 2D points in the first grid and the corresponding relationship between the 2D points and the 3D points. The connection relationship between the 3D points; and determining the grid according to the connection relationship between the 3D points.
  • multiple triangles can be obtained through triangulation, and each triangle can determine a plane. Therefore, multiple planes included in each image frame can be obtained, effectively solving problems based on a small number of 3D points. The problem of planes that cannot be estimated when performing plane estimation, as well as the problem that some non-planar areas in actual scenes cannot be estimated.
  • the above-mentioned device for placing virtual objects in a video may also include: a plane calibration module 506 for performing plane estimation based on the above-mentioned 3D point cloud and determining at least one first plane; and for the above-mentioned For each triangle in the grid, in response to determining that the three vertices of a triangle are on the same first plane, replace the normal vector of the second plane determined by the above triangle with the normal vector of the first plane where the three vertices of the triangle are located. normal vector.
  • the above-mentioned plane calibration module 506 can perform plane estimation through the RANSAC algorithm and determine multiple first planes, that is, determine the parameters of multiple first planes and the 3D points contained on them.
  • the above-mentioned plane calibration module 506 can fuse the triangle plane obtained through triangulation and the plane obtained through conventional plane estimation. When it is determined that the three fixed points of the triangle are all on a determined first plane, the first plane is used.
  • the normal vector of the plane corrects the normal vector of the plane determined by the triangle, which solves the problem of planes that cannot be estimated using conventional plane estimation methods because there are relatively few 3D points, and also solves the problem of plane estimation through the above triangulation.
  • the error causes a plane to be estimated into multiple planes and the resulting plane to appear undulating, thus making the final plane estimation result more accurate.
  • the target position determination module 508 is configured to determine the target position of the virtual object in the image frame according to the placement position of the virtual object in the video and the grid.
  • the above-mentioned target location determination module 508 may specifically include:
  • the placement point determination unit 602 is used to determine the corresponding placement point of the virtual object in the current image frame according to the placement position of the virtual object in the video;
  • the target plane determining unit 604 is configured to, in response to determining that the placement point is in a triangle of the grid, use the plane determined by the triangle as the target plane;
  • the target position determining unit 606 is configured to determine the target position based on the placement point and the target plane.
  • the target plane determining unit may be further configured to, in response to determining that the placement point is not in any triangle of the grid, select among a plurality of planes determined by all triangles in the grid. The plane closest to the above placement point is used as the above target plane.
  • the virtual object placement module 510 is used to place the virtual object at a target position in the image frame.
  • the devices of the above embodiments are used to implement the corresponding method of placing virtual objects in videos in any of the foregoing embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be described again here.
  • the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor When the program is executed, the method of placing a virtual object in the video described in any of the above embodiments is implemented.
  • Figure 7 shows a more specific hardware structure diagram of an electronic device provided by this embodiment.
  • the device may include: a processor 2010, a memory 2020, an input/output interface 2030, a communication interface 2040 and a bus 2050.
  • the processor 2010, the memory 2020, the input/output interface 2030 and the communication interface 2040 implement communication connections between each other within the device through the bus 2050.
  • the processor 2010 can be a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., for executing relevant programs to implement the technical solutions provided by the embodiments of this specification.
  • a general-purpose CPU Central Processing Unit, central processing unit
  • a microprocessor or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., for executing relevant programs to implement the technical solutions provided by the embodiments of this specification.
  • ASIC Application Specific Integrated Circuit
  • the memory 2020 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc.
  • the memory 2020 can store operating systems and other application programs. When the technical solutions provided by the embodiments of this specification are implemented through software or firmware, the relevant program codes are stored in the memory 2020 and called and executed by the processor 2010.
  • the input/output interface 2030 is used to connect the input/output module to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • Input devices can include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices can include monitors, speakers, vibrators, indicator lights, etc.
  • the communication interface 2040 is used to connect a communication module (not shown in the figure) to realize communication interaction between this device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 2050 includes a path that carries information between various components of the device (eg, processor 2010, memory 2020, input/output interface 2030, and communication interface 2040).
  • the above device only shows the processor 2010, the memory 2020, the input/output interface 2030, the communication interface 2040 and the bus 2050, during specific implementation, the device may also include necessary components for normal operation. Other components.
  • the above-mentioned device may only include components necessary to implement the embodiments of this specification, and does not necessarily include all components shown in the drawings.
  • the electronic devices of the above embodiments are used to implement the corresponding method of placing virtual objects in videos in any of the foregoing embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be described again here.
  • the present disclosure also provides a non-transitory computer-readable storage medium, the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions use To make the computer execute the method of placing a virtual object in the video as described in any of the above embodiments.
  • the computer-readable media in this embodiment include permanent and non-permanent, removable and non-removable media, and information storage can be implemented by any method or technology.
  • Information may be computer-readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to In phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), electrically erasable programmable memory Read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic tape cassette, tape magnetic disk storage or Other magnetic storage devices, or any other non-transmission medium, may be used to store information that can be accessed by a computing device.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM electrically erasable programmable memory Read-only
  • the computer instructions stored in the storage medium of the above embodiments are used to cause the computer to execute the task processing method described in any of the above embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be described again here.
  • DRAM dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本公开提供一种在视频中放置虚拟对象的方法,包括:获取所述视频对应的三维(3D)点云;针对所述视频中的每一个图像帧,获取所述3D点云中在所述图像帧中具有对应二维(2D)点的3D点;基于所述3D点,通过三角剖分得到网格;根据虚拟对象在所述视频中的放置位置以及所述网格确定所述虚拟对象在所述图像帧中的目标位置;以及在所述图像帧中的目标位置上放置所述虚拟对象。基于上述在视频中放置虚拟对象的方法,本公开还提供了在视频中放置虚拟对象的装置、电子设备、存储介质以及程序产品。

Description

在视频中放置虚拟对象的方法及相关设备
本申请要求2022年3月25日递交的,标题为“在视频中放置虚拟对象的方法及相关设备”、申请号为CN202210306832.0的中国发明专利申请的优先权。
技术领域
本公开涉及计算机视觉技术领域,尤其涉及一种在视频中放置虚拟对象的方法、装置、电子设备、存储介质及程序产品。
背景技术
增强现实(Augmented Reality,简称AR)技术是一种将虚拟信息与真实世界巧妙融合的技术,广泛运用了多媒体、三维建模、实时跟踪、智能交互、传感等多种技术手段,将计算机生成的文字、图像、三维模型、音乐、视频等虚拟对象模拟仿真后,应用到真实世界中,从而实现对真实世界的“增强”。
目前,通常可以通过同时定位与地图构建(Simultaneous Localization and Mapping,SLAM)技术实现三维建模。然而,由于通过SLAM技术得到的三维(3D)点通常是稀疏的,会存在较多因为3D点比较少而估计不出来的平面。此外,实际场景里还有很多非平面区域无法通过SLAM技术估计出来。由于AR中的虚拟对象通常只能放置在估计出的平面上,上述这些情况的存在会导致因找不到虚拟对象所对应平面而无法在图像或视频中放置虚拟对象的问题。
发明内容
有鉴于此,本公开的实施例提供一种在视频中放置虚拟对象的方法,可以在视频中准确确定放置虚拟对象的平面,完成虚拟对象的准确放置,避免由于找不到虚拟对象所对应平面而导致的无法在图像中放置虚拟对象的问题。
根据本公开的一些实施例,上述在视频中放置虚拟对象的方法可以包括:获取所述视频对应的3D点云;针对所述视频中的每一个图像帧,分别获取所述3D点云中在所述图像帧中具有对应2D点的3D点;基于所述3D点, 通过三角剖分得到网格;根据所述虚拟对象在所述视频中的放置位置以及所述网格确定所述虚拟对象在所述图像帧中的目标位置;以及在所述图像帧中的目标位置上放置所述虚拟对象。
基于上述在视频中放置虚拟对象的方法,本公开的实施例提供了一种在视频中放置虚拟对象的装置,包括:
三维3D点云获取模块,用于获取所述视频对应的3D点云;
三角剖分模块,用于针对视频中的每个图像帧,获取所述3D点云中在当前图像帧中具有对应2D点的3D点,以及基于所述3D点,通过三角剖分得到网格;
目标位置确定模块,用于根据所述虚拟对象在所述视频中的放置位置以及所述网格确定所述虚拟对象在所述图像帧中的目标位置;以及
虚拟对象放置模块,用于在所述图像帧中的目标位置上放置所述虚拟对象。
此外,本公开的实施例还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述方法。
本公开的实施例还提供了一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令用于使计算机执行上述方法。
本公开的实施例还提供了一种计算机程序产品,包括计算机程序指令,当所述计算机程序指令在计算机上运行时,使得计算机执行上述方法。
从上述内容可以看出,通过本公开提供的在视频中放置虚拟对象的方法和装置,通过三角剖分可以得到以3D点为顶点的多个三角形,而每个三角形均可以确定一个平面,因此,依据上述多个三角形可以得到各个图像帧所包含的多个平面,然后再根据虚拟对象放置位置与上述多个平面的关系从中确定目标平面以及虚拟对象放置的目标位置。上述方法可以有效解决基于数量较少的3D点进行平面估计时估计不出来以及实际场景里一些非平面区域无法估计出来而导致的无法完成虚拟对象放置的问题。
附图说明
为了更清楚地说明本公开或相关技术中的技术方案,下面将对实施例或 相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的在视频中放置虚拟对象的方法的应用场景示意图;
图2显示了本公开一些实施例所述的在视频中放置虚拟对象的方法的实现流程;
图3显示了本公开实施例所述的通过Delaunay三角剖分算法由一个有限点集得到的一个网格示例;
图4显示了本公开实施例所述的根据虚拟对象在视频中的放置位置以及上述网格确定上述虚拟对象在上述图像帧中的目标位置的实现流程;
图5显示了本公开一些实施例所述的在视频中放置虚拟对象的装置的内部结构示意图;
图6显示了本公开一些实施例所述的目标位置确定模块的内部结构示意图;以及
图7示出了本实施例所提供的一种更为具体的电子设备硬件结构示意图。
具体实施方式
为使本公开的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本公开进一步详细说明。
需要说明的是,除非另外定义,本公开实施例使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开实施例中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
如前所述,通过SLAM技术通常只能得到稀疏的3D点。而由于通过SLAM技术得到的3D点是稀疏的,会存在较多因为3D点比较少而估计不 出来的平面。此外,实际场景里还有很多非平面区域也无法通过SLAM技术估算出来。这些情况的存在会因为找不到虚拟对象所对应平面而导致无法在图像中放置虚拟对象的问题。
为此,本公开的一些实施例提供了一种在视频中放置虚拟对象的方法。参考图1,其为本公开实施例提供的在视频中放置虚拟对象的方法的应用场景示意图。该应用场景包括:终端设备101以及增强现实处理设备102。
在本公开的实施例中,上述终端设备101以及增强现实处理设备102是从功能上进行区分的,图1仅仅给出了一个应用场景的示例。在实际的应用中,上述终端设备101以及增强现实处理设备102既可以是两个独立的物理设备,也可以集成在一个单一的物理设备之上,同时实现与用户的交互以及视频的处理。如果上述终端设备101以及增强现实处理设备102是两个独立的物理设备,则上述终端设备101以及上述增强现实处理设备102之间可通过有线或无线的通信网络连接。
在本公开的实施例中,上述终端设备101包括但不限于桌面计算机、移动电话、移动电脑、平板电脑、媒体播放器、智能可穿戴设备、个人数字助理(personal digital assistant,PDA)或其它能够实现上述功能的电子设备等。上述终端设备101可以通过显示屏幕显示可以与用户进行交互的交互界面,从而为用户提供各种增强现实的应用。例如,用户可以通过终端设备101在播放的一段视频的各图像帧中选择一个位置放置一个虚拟对象等。
上述增强现实处理设备102可以是具有计算能力的电子设备,用于对视频中的图像帧进行增强现实处理,例如在用户选择的位置上实现虚拟对象的放置等等。
基于上述应用场景,本公开的一些实施例提供了一种在视频中放置虚拟对象的方法,可以准确确定放置虚拟对象的平面,避免由于找不到虚拟对象所对应平面而导致的无法在视频中放置虚拟对象的问题。需要说明的是,该方法可以由上述增强现实处理设备102执行。
图2显示了本公开实施例所述的在视频中放置虚拟对象的方法的实现流程。如图2所示,该方法可以包括如下步骤:
在步骤202,获取上述视频对应的三维(3D)点云。
在本公开的实施例中,如果用户希望在一段视频中各图像帧的某个位置 放置一个虚拟对象,则通常需要通过上述终端设备101选择要放置的虚拟对象,并在视频的一个图像帧上选择一个位置作物虚拟对象在上述视频中的放置位置。然后,上述终端设备101将生成一个携带上述信息的虚拟对象放置请求发送给上述增强现实处理设备102,由上述增强现实处理设备102将虚拟对象放置在视频中用户所选择的放置位置上。在本公开的实施例中,上述虚拟对象通常可以是指一个素材,例如,一副图片、一个虚拟物体或者一段视频等等。
具体地,在本公开的实施例中,上述虚拟对象在视频中的放置位置具体可以映射为视频中各个图像帧上的一个点,可以通过图像帧上像素点坐标的方式表示。
在本公开的实施例中,为了实现虚拟对象在视频中的放置,需要估计出该视频中各个图像帧中的各个平面,也即需要对各个图像帧进行平面估计。而平面估计则通常需要基于该视频所对应的3D点来实现的。因此,在本步骤202中,为了实现视频中虚拟对象的放置,先获取与视频对应的3D点云。
本领域的技术人员可以理解,通过SLAM技术,可以将一段视频所包含二维(2D)图像帧上的2D点映射到三维立体空间中,从而得到与图像帧中2D点在三维立体空间中对应的3D点。进一步,在将多帧2D图像均完成上述2D点至3D点的映射后,即可得到全局的3D点云,在本公开中将其称为与视频对应的3D点云。
由此,可以看出,通过SLAM技术得到的3D点云除了包括各个3D点之外,还包括了这些3D点与视频各图像帧中2D点的对应关系,例如,3D点云中的一个3D点可以对应于多个图像帧中的一个2D点等等。
需要说明的是,除了SLAM技术之外,通过拍摄视频的相机的位姿也可以将视频中一个图像帧中的各个2D点映射到三维空间的3D点上。
进一步,在获取了视频对应的3D点云之后,针对视频中的每个图像帧,分别执行如下步骤:
在步骤204,获取上述3D点云中在当前图像帧中具有对应2D点的3D点。
如前所述,一个视频所对应的3D点云中的每个3D点都将对应于该视频至少一个图像帧中的一个2D点。因此,在本公开的实施例中,针对一个 图像帧,可以根据3D点云中3D点与各个图像帧中2D点的对应关系从3D点云中确定在该图像帧中具有对应2D点的全部3D点。
在步骤206,基于上述3D点,通过三角剖分得到网格。
在本公开的一些实施例中,在上述步骤206,上述增强现实处理设备102可以直接以上述3D点的集合作为有限点集,并基于该有限点集通过德劳内(Delaunay)三角剖分算法得到上述网格。
在本公开的另一些实施例中,为了提高三角剖分的精确度,在上述步骤206,上述增强现实处理设备102可以首先确定当前图像帧上与上述3D点对应的2D点;然后,以上述2D点的集合作为有限点集;接下来,基于该有限点集,通过Delaunay三角剖分算法得到第一网格;再根据上述第一网格中2D点的连接关系以及上述2D点与上述3D点的对应关系,得到与上述第一网格对应的所述3D点之间的连接关系;最后,再根据上述3D点之间的连接关系,确定第二网格,并将确定的第二网格作为上述步骤206所称的网格。也即在上述方法中,是对2D点进行的Delaunay三角剖分得到一个2D网格,再根据2D点和3D点的对应关系将得到的2D网格映射为一个3D网格。
已知,上述网格应当满足如下条件:
1)除了端点,上述网格所示的平面图中的边上不包含上述有限点集中的任何点。
2)除了端点,上述网格所示的平面图中没有相交边。
3)上述网格所示的平面图中所有的面都是三角形,且所有三角面的合集是上述有限点集的凸包。
图3显示了本公开实施例所述的通过Delaunay三角剖分算法由一个有限点集得到的一个网格示例。通过Delaunay三角剖分算法,基于图3左半部分所示的一个有限点集,可以得到图3右半部分所示的网格。从图3可以看出,在该网格中,除了端点,该网格所示平面图的各边上均不包含任何其他点。而且,该网格的各个边不相交。最后,该网格所示的平面图中所有面都是三角形。
在本公开的实施例中,通过三角剖分可以得到多个三角形,而每个三角形均可以确定一个平面,因而可以得到各个图像帧所包含的多个平面,有效解决基于数量较少的3D点进行平面估计时估计不出来的平面的问题,以及 实际场景里一些非平面区域无法估计出来的问题。
除了上述通过三角剖分进行平面估计的方法,为了进一步提高平面估计的准确度,避免通过上述三角剖分进行平面估计时由误差导致的一个平面被估计成多个平面而出现的平面出现起伏不平整的问题,本公开的另一些实施例还可以包括如下几个步骤:
在步骤208,基于上述3D点云进行平面估计,确定至少一个第一平面。
在本公开的实施例中,上述增强现实处理设备102可以通过随机抽样一致算法(Random Sample Consensus,RANSAC)来进行平面估计。RANSAC是由Fischler和Bolles于1981年最先提出的算法。该算法根据一组包含异常数据的样本数据集,计算出数据的数学模型参数。目前,RANSAC算法在计算机视觉的匹配问题中通常被用来寻找最佳的匹配模型。在本公开的实施例中,上述增强现实处理设备102就可以通过RANSAC算法根据3D点云图像拟合出多个第一平面。在本例中,上述由RANSAC算法找到的最佳的匹配模型就是多个第一平面。
通过上述方法即可确定多个第一平面,也即确定了多个第一平面的参数及其上包含的3D点。其中,上述平面的参数可以包括:确定平面的平面方程的各个参数。例如,3D空间中每个平面均可以表达为Ax+By+Cz+D=0的形式,确定了上述A、B、C和D四个系数即可确定该平面,因此,上述平面的参数可以是指上述A、B、C和D四个系数。此外,上述平面表达式还可以用法向量和距离来表示,确定了一个平面的法向量和距离也可确定一个平面,因此,上述平面的参数还可以是指上述平面的法向量和距离。需要说明的是,上述多种形式的平面的参数本质是一致的,都可以唯一确定一个平面,例如,通过上述A、B、C和D四个系数可以确定平面的法向量和距离;而通过平面的法向量和距离也可以得到上述A、B、C和D四个系数。
在步骤210,对于上述网格中的每个三角形,响应于确定一个三角形的三个顶点在同一个第一平面上,将由上述三角形确定的第二平面的法向量替换为所述三角形三个顶点所在的第一平面的法向量。
通过上述步骤208和210,可以将通过三角剖分得到的三角形平面与通过常规平面估计得到的平面进行融合,在确定了三角形三个定点都在已确定的一个第一平面上时,用该第一平面的法向量修正该三角形所确定平面的法 向量,即解决了因为3D点比较少而使用常规的平面估计的方法估计不出来的平面的问题,也解决了通过上述三角剖分进行平面估计时由误差导致的一个平面被估计成多个平面而出现的平面出现起伏的问题,从而使得最终的平面估计结果更为精确。
在步骤212,根据上述虚拟对象在上述视频中的放置位置以及上述网格确定上述虚拟对象在上述图像帧中的目标位置。
如前所述,上述虚拟对象在上述视频中的放置位置对应的其实是视频中每个图像帧中的一个点。本领域的技术人员可知,当用户在视频的某一个图像帧中选择了一个点后,通过平面跟踪技术可以确定视频各个图像帧中与用户所选择的点相对应的点。基于上述内容,在本公开的实施例中,上述步骤212所述的根据上述虚拟对象在视频中的放置位置以及上述网格确定上述虚拟对象在上述图像帧中的目标位置的具体实现方法可如图4所示,包括如下步骤:
在步骤402,根据上述虚拟对象在视频中的放置位置确定虚拟对象在当前图像帧中对应的放置点。
如前所述,基于平面跟踪技术,可以基于虚拟对象在视频中的放置位置(也即用户在视频中一个图像帧上选择的一个点)确定视频的各个图像帧中与上述放置位置相对应的点。为了描述上的方便,在本公开的实施例中,将这些点称为图像帧中的放置点。
在步骤404,响应于确定上述放置点在上述网格的一个三角形中,将由上述三角形确定的平面作为目标平面。
在步骤406,基于上述放置点以及上述目标平面确定上述目标位置。
具体地,在本公开的实施例中,步骤408所述的基于上述放置点以及上述目标平面确定上述目标位置可以包括:
首先,获取上述图像帧对应的相机的位姿;其次,根据上述相机的位姿以及上述放置点构建一条起始于上述相机的中心点并经过上述放置点的射线;再次,将上述射线与上述目标平面进行碰撞检测,确定碰撞的位置;以及最后,将上述碰撞的位置作为上述目标位置。
上述方法还可以进一步包括:如果在将上述射线与上述目标平面进行碰撞检测过程中,没有检测到碰撞,则无法得到目标位置,故无法完成虚拟对 象的放置。此时,上述增强现实处理设备102可以输出虚拟对象放置失败的信息。
进一步,在本公开的一些实施例中,上述步骤404可以进一步包括:响应于确定上述放置点不在上述网格的任意一个三角形中,确定虚拟对象放置失败。此时,上述增强现实处理设备102可以输出虚拟对象放置失败的信息。例如,上述增强现实处理设备102可以向上述终端设备101发送一个虚拟对象放置失败的响应,并由终端设备101显示相应的提示信息。
进一步,在本公开的另一些实施例中,上述步骤404可以进一步包括:响应于确定上述放置点不在上述网格的任意一个三角形中,在由上述网格中所有三角形确定的多个平面中选择距离上述放置点最近的平面作为上述目标平面。
在本公开的实施例中,上述距离上述放置点最近的平面可以通过以下方式确定:首先,针对上述网格中的每个三角形,分别将由上述三角形确定的平面作为一个参考平面,并分别确定上述放置点到每个参考平面的距离;然后,选择其中最短距离对应的参考平面作为所述目标平面。
具体地,在上述过程中,确定上述放置点到参考平面的距离可以包括:获取当前图像帧对应的相机的位姿;根据相机的位姿以及上述放置点构建一条起始于上述相机的中心点并经过上述放置点的射线;将上述射线与上述参考平面求交;以及将上述放置点到交点的距离作为上述放置点到所述参考平面的距离。
上述将上述射线与上述参考平面求交的方法可以参考前序实施例中的说明,在此不再重复说明。
在步骤214,在上述图像帧中的目标位置上放置上述虚拟对象。
可以看出,在本公开的实施例中,通过三角剖分可以得到以3D点为顶点的多个三角形,而每个三角形均可以确定一个平面,因此,依据上述多个三角形可以得到各个图像帧所包含的多个平面,然后再根据虚拟对象放置位置与上述多个平面的关系从中确定目标平面以及虚拟对象放置的目标位置。上述方法可以有效解决基于数量较少的3D点进行平面估计时估计不出来以及实际场景里一些非平面区域无法估计出来而导致的无法完成虚拟对象放置的问题。
需要说明的是,本公开实施例的方法可以由单个设备执行,例如一台计算机或服务器等。本实施例的方法也可以应用于分布式场景下,由多台设备相互配合来完成。在这种分布式场景的情况下,这多台设备中的一台设备可以只执行本公开实施例的方法中的某一个或多个步骤,这多台设备相互之间会进行交互以完成所述的方法。
需要说明的是,上述对本公开的一些实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于上述实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
对应上述在视频中放置虚拟对象的方法,本公开的实施例还公开了一种视频中放置虚拟对象的装置。图5显示了本公开实施例所述的在视频中放置虚拟对象的装置的内部结构。如图5所示,该装置可以包括:3D点云获取模块502、三角剖分模块504、目标位置确定模块508、以及虚拟对象放置模块510。
上述3D点云获取模块502用于获取上述视频对应的3D点云。
在本公开的实施例中,上述3D点云获取模块502可以基于SLAM技术直接获得上述视频对应的3D点云;或者,上述3D点云获取模块502也可以基于拍摄视频的相机的位姿将视频中一个图像帧中的各个2D点映射到三维空间的3D点上,从而得到上述视频对应的3D点云。
上述三角剖分模块504用于针对视频中的每个图像帧,获取上述3D点云中在当前图像帧中具有对应2D点的3D点,以及基于上述3D点,通过三角剖分得到网格。
在本公开的实施例中,针对一个图像帧,上述三角剖分模块504可以根据3D点云中3D点与各个图像帧中2D点的对应关系从3D点云中确定在该图像帧中具有对应2D点的全部3D点。
此外,在本公开的一些实施例中,上述三角剖分模块504可以直接以上述3D点的集合作为有限点集,并基于该有限点集通过Delaunay三角剖分算法得到上述网格。
在本公开的另一些实施例中,为了提高三角剖分的精确度,上述三角剖分模块504可以包括如下单元:
2D点确定单元,用于确定所述图像帧上与所述3D点对应的2D点;
三角剖分单元,用于以所述2D点的集合作为有限点集,基于所述有限点集,通过德劳内Delaunay三角剖分算法得到第一网格;
网格映射单元,用于根据所述第一网格中2D点的连接关系以及所述2D点与所述3D点的对应关系,得到与所述第一网格对应的所述3D点之间的连接关系;以及根据所述3D点之间的连接关系,确定所述网格。
在本公开的实施例中,通过三角剖分可以得到多个三角形,而每个三角形均可以确定一个平面,因而可以得到各个图像帧所包含的多个平面,有效解决基于数量较少的3D点进行平面估计时估计不出来的平面的问题,以及实际场景里一些非平面区域无法估计出来的问题。
除了上述通过三角剖分进行平面估计的方法,为了进一步提高平面估计的准确度,避免通过上述三角剖分进行平面估计时由误差导致的一个平面被估计成多个平面而出现的平面出现起伏的问题,本公开的另一些实施例中,上述在视频中放置虚拟对象的装置还可以包括:平面校准模块506,用于基于上述3D点云进行平面估计,确定至少一个第一平面;以及对于上述网格中的每个三角形,响应于确定一个三角形的三个顶点在同一个第一平面上,将由上述三角形确定的第二平面的法向量替换为所述三角形三个顶点所在的第一平面的法向量。
在本公开的实施例中,上述平面校准模块506可以通过RANSAC算法来进行平面估计,确定多个第一平面,也即确定了多个第一平面的参数及其上包含的3D点。
通过上述平面校准模块506可以将通过三角剖分得到的三角形平面与通过常规平面估计得到的平面进行融合,在确定了三角形三个定点都在已确定的一个第一平面上时,用该第一平面的法向量修正该三角形所确定平面的法向量,即解决了因为3D点比较少而使用常规的平面估计的方法估计不出来的平面的问题,也解决了通过上述三角剖分进行平面估计时由误差导致的一个平面被估计成多个平面而出现的平面出现起伏的问题,从而使得最终的平面估计结果更为精确。
上述目标位置确定模块508用于根据上述虚拟对象在视频中的放置位置以及所述网格确定所述虚拟对象在所述图像帧中的目标位置。
具体地,在本公开的一些实施例中,如图6所示,上述目标位置确定模块508可以具体包括:
放置点确定单元602,用于根据上述虚拟对象在视频中的放置位置确定虚拟对象在当前图像帧中对应的放置点;
目标平面确定单元604,用于响应于确定上述放置点在上述网格的一个三角形中,将由上述三角形确定的平面作为目标平面;以及
目标位置确定单元606,用于基于上述放置点以及上述目标平面确定上述目标位置。
在本公开的另一些实施例中,上述目标平面确定单元可进一步用于响应于确定上述放置点不在上述网格的任意一个三角形中,在由上述网格中所有三角形确定的多个平面中选择距离上述放置点最近的平面作为上述目标平面。
上述虚拟对象放置模块510用于在上述图像帧中的目标位置上放置上述虚拟对象。
上述各个模块的具体实现可以参考前述方法以及附图,在此不再重复说明。为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本公开时可以把各模块的功能在同一个或多个软件和/或硬件中实现。
上述实施例的装置用于实现前述任一实施例中相应的在视频中放置虚拟对象的方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
基于同一发明构思,与上述任意实施例方法相对应的,本公开还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上任意一实施例所述的在视频中放置虚拟对象方法。
图7示出了本实施例所提供的一种更为具体的电子设备硬件结构示意图,该设备可以包括:处理器2010、存储器2020、输入/输出接口2030、通信接口2040和总线2050。其中处理器2010、存储器2020、输入/输出接口2030和通信接口2040通过总线2050实现彼此之间在设备内部的通信连接。
处理器2010可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit, ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的技术方案。
存储器2020可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器2020可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器2020中,并由处理器2010来调用执行。
输入/输出接口2030用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
通信接口2040用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线2050包括一通路,在设备的各个组件(例如处理器2010、存储器2020、输入/输出接口2030和通信接口2040)之间传输信息。
需要说明的是,尽管上述设备仅示出了处理器2010、存储器2020、输入/输出接口2030、通信接口2040以及总线2050,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。
上述实施例的电子设备用于实现前述任一实施例中相应的在视频中放置虚拟对象方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
基于同一发明构思,与上述任意实施例方法相对应的,本公开还提供了一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令用于使所述计算机执行如上任一实施例所述的在视频中放置虚拟对象方法。
本实施例的计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限 于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。
上述实施例的存储介质存储的计算机指令用于使所述计算机执行如上任一实施例所述的任务处理方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本公开的范围(包括权利要求)被限于这些例子;在本公开的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,步骤可以以任意顺序实现,并存在如上所述的本公开实施例的不同方面的许多其它变化,为了简明它们没有在细节中提供。
另外,为简化说明和讨论,并且为了不会使本公开实施例难以理解,在所提供的附图中可以示出或可以不示出与集成电路(IC)芯片和其它部件的公知的电源/接地连接。此外,可以以框图的形式示出装置,以便避免使本公开实施例难以理解,并且这也考虑了以下事实,即关于这些框图装置的实施方式的细节是高度取决于将要实施本公开实施例的平台的(即,这些细节应当完全处于本领域技术人员的理解范围内)。在阐述了具体细节(例如,电路)以描述本公开的示例性实施例的情况下,对本领域技术人员来说显而易见的是,可以在没有这些具体细节的情况下或者这些具体细节有变化的情况下实施本公开实施例。因此,这些描述应被认为是说明性的而不是限制性的。
尽管已经结合了本公开的具体实施例对本公开进行了描述,但是根据前面的描述,这些实施例的很多替换、修改和变型对本领域普通技术人员来说将是显而易见的。例如,其它存储器架构(例如,动态RAM(DRAM))可以使用所讨论的实施例。
本公开实施例旨在涵盖落入所附权利要求的宽泛范围之内的所有这样的替换、修改和变型。因此,凡在本公开实施例的精神和原则之内,所做的任何省略、修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (21)

  1. 一种在视频中放置虚拟对象的方法,包括:
    获取所述视频对应的三维3D点云;
    针对所述视频中的每一个图像帧,分别执行:
    获取所述3D点云中在所述图像帧中具有对应二维2D点的3D点;
    基于所述3D点,通过三角剖分得到网格;
    根据所述虚拟对象在所述视频中的放置位置以及所述网格确定所述虚拟对象在所述图像帧中的目标位置;以及
    在所述图像帧中的目标位置上放置所述虚拟对象。
  2. 根据权利要求1所述的方法,进一步包括:
    基于所述3D点云进行平面估计,确定至少一个第一平面;以及
    对于所述网格中的每个三角形,响应于确定所述三角形的三个顶点在同一个第一平面上,将由所述三角形确定的第二平面的法向量替换为所述三角形三个顶点所在的第一平面的法向量。
  3. 根据权利要求2所述的方法,其中,所述基于所述3D点云进行平面估计包括:
    基于所述3D点云,通过随机抽样一致算法RANSAC进行平面估计,确定所述至少一个第一平面。
  4. 根据权利要求1所述的方法,其中,所述基于所述3D点,通过三角剖分得到网格包括:
    确定所述图像帧中与所述3D点对应的2D点;
    将所述2D点的作为有限点集;以及
    基于所述有限点集,通过德劳内Delaunay三角剖分算法得到第一网格;
    根据所述第一网格中2D点的连接关系以及所述2D点与所述3D点的对应关系,得到与所述第一网格对应的所述3D点之间的连接关系;以及
    根据所述3D点之间的连接关系,确定所述网格。
  5. 根据权利要求1所述的方法,其中,所述基于所述3D点,通过三角剖分得到网格包括:
    将所述3D点的集合作为有限点集;以及
    基于所述有限点集,通过Delaunay三角剖分算法得到所述网格。
  6. 根据权利要求1所述的方法,其中,所述根据所述虚拟对象在所述视频中的放置位置以及所述网格确定所述虚拟对象在所述图像帧中的目标位置包括:
    根据所述虚拟对象在所述视频中的放置位置确定所述虚拟对象在所述图像帧中对应的放置点;
    响应于确定所述放置点在上述网格的一个三角形中,将由所述三角形确定的平面作为目标平面;以及
    基于所述放置点以及所述目标平面确定所述目标位置。
  7. 根据权利要求6所述的方法,其中,所述基于所述放置点以及所述目标平面确定所述目标位置包括:
    获取所述图像帧对应的相机的位姿;
    根据所述相机的位姿以及所述放置点构建一条起始于所述相机的中心点并经过所述放置点的射线;
    将所述射线与所述目标平面进行碰撞检测,确定碰撞的位置;以及
    将所述碰撞的位置作为所述目标位置。
  8. 根据权利要求7所述的方法,进一步包括:响应于没有检测到碰撞的位置,输出虚拟对象放置失败的信息。
  9. 根据权利要求6所述的方法,进一步包括:响应于确定所述放置点不在所述网格的任意一个三角形中,输出虚拟对象放置失败的信息。
  10. 根据权利要求6所述的方法,进一步包括:响应于确定所述放置点不在所述网格的任意一个三角形中,在由所述网格中所有三角形确定的多个平面中选择距离所述放置点最近的平面作为所述目标平面。
  11. 根据权利要求10所述的方法,其中,所述选择所述网格中距离所述放置点最近的三角形包括:
    针对所述网格中的每个三角形,分别将由所述三角形确定的平面作为一个参考平面,并分别确定所述放置点到每个参考平面的距离;
    选择其中最短距离对应的参考平面作为所述目标平面。
  12. 根据权利要求11所述的方法,其中,所述确定所述放置点到参考平 面的距离包括:
    获取所述图像帧对应的相机的位姿;
    根据所述相机的位姿以及所述放置点构建一条起始于所述相机的中心点并经过所述放置点的射线;
    将所述射线与所述参考平面求交;以及
    将所述放置点到交点的距离作为所述放置点到所述参考平面的距离。
  13. 一种在视频中放置虚拟对象的装置,包括:
    三维3D点云获取模块,用于获取所述视频对应的3D点云;
    三角剖分模块,用于针对视频中的每个图像帧,获取所述3D点云中在当前图像帧中具有对应二维2D点的3D点,以及基于所述3D点,通过三角剖分得到网格;
    目标位置确定模块,用于根据所述虚拟对象在所述视频中的放置位置以及所述网格确定所述虚拟对象在所述图像帧中的目标位置;以及
    虚拟对象放置模块,用于在所述图像帧中的目标位置上放置所述虚拟对象。
  14. 根据权利要求13所述的在视频中放置虚拟对象的装置,进一步包括:
    平面校准模块,用于基于所述3D点云进行平面估计,确定至少一个第一平面;以及对于所述网格中的每个三角形,响应于确定一个三角形的三个顶点在同一个第一平面上,将由所述三角形确定的第二平面的法向量替换为所述三角形三个顶点所在的第一平面的法向量。
  15. 根据权利要求13所述的在视频中放置虚拟对象的装置,其中,所述三角剖分模块包括:
    2D点确定单元,用于确定所述图像帧上与所述3D点对应的2D点;
    三角剖分单元,用于以所述2D点的集合作为有限点集,基于所述有限点集,通过德劳内Delaunay三角剖分算法得到第一网格;
    网格映射单元,用于根据所述第一网格中2D点的连接关系以及所述2D点与所述3D点的对应关系,得到与所述第一网格对应的所述3D点之间的连接关系;以及根据所述3D点之间的连接关系,确定所述网格。
  16. 根据权利要求13所述的在视频中放置虚拟对象的装置,其中,所述 三角剖分模块以所述3D点的集合作为有限点集,并基于所述有限点集通过Delaunay三角剖分算法得到所述网格。
  17. 根据权利要求13所述的在视频中放置虚拟对象的装置,其中,所述目标位置确定模块包括:
    放置点确定单元,用于根据所述虚拟对象在所述视频中的放置位置确定虚拟对象在当前图像帧中对应的放置点;
    目标平面确定单元,用于响应于确定上述放置点在上述网格的一个三角形中,将由上述三角形确定的平面作为目标平面;以及
    目标位置确定单元,用于基于上述放置点以及上述目标平面确定上述目标位置。
  18. 根据权利要求17所述的在视频中放置虚拟对象的装置,其中,所述目标平面确定单元进一步用于响应于确定上述放置点不在上述网格的任意一个三角形中,在由上述网格中所有三角形确定的多个平面中选择距离上述放置点最近的平面作为上述目标平面。
  19. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如权利要求1-12中任意一项所述的在视频中放置虚拟对象的方法。
  20. 一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令用于使计算机执行权利要求1-12中任意一项所述的在视频中放置虚拟对象的方法。
  21. 一种计算机程序产品,包括计算机程序指令,当所述计算机程序指令在计算机上运行时,使得计算机执行如权利要求1-12中任意一项所述的在视频中放置虚拟对象的方法。
PCT/CN2023/079649 2022-03-25 2023-03-03 在视频中放置虚拟对象的方法及相关设备 WO2023179341A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210306832.0A CN115937299B (zh) 2022-03-25 2022-03-25 在视频中放置虚拟对象的方法及相关设备
CN202210306832.0 2022-03-25

Publications (1)

Publication Number Publication Date
WO2023179341A1 true WO2023179341A1 (zh) 2023-09-28

Family

ID=86647831

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/079649 WO2023179341A1 (zh) 2022-03-25 2023-03-03 在视频中放置虚拟对象的方法及相关设备

Country Status (2)

Country Link
CN (1) CN115937299B (zh)
WO (1) WO2023179341A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117974746A (zh) * 2024-04-01 2024-05-03 北京理工大学长三角研究院(嘉兴) 点云2d深度面三角剖分构图方法、装置、系统及设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103985155A (zh) * 2014-05-14 2014-08-13 北京理工大学 基于映射法的散乱点云Delaunay三角剖分曲面重构方法
CN108629799A (zh) * 2017-03-24 2018-10-09 成都理想境界科技有限公司 一种实现增强现实的方法及设备
CN111415420A (zh) * 2020-03-25 2020-07-14 北京迈格威科技有限公司 空间信息确定方法、装置及电子设备
CN113570730A (zh) * 2021-07-29 2021-10-29 深圳市慧鲤科技有限公司 视频数据采集方法、视频创作方法及相关产品

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825544B (zh) * 2015-11-25 2019-08-20 维沃移动通信有限公司 一种图像处理方法及移动终端
CN110827376A (zh) * 2018-08-09 2020-02-21 北京微播视界科技有限公司 增强现实多平面模型动画交互方法、装置、设备及存储介质
CN110889890B (zh) * 2019-11-29 2023-07-28 深圳市商汤科技有限公司 图像处理方法及装置、处理器、电子设备及存储介质
CN113038264B (zh) * 2021-03-01 2023-02-24 北京字节跳动网络技术有限公司 直播视频处理方法、装置、设备和存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103985155A (zh) * 2014-05-14 2014-08-13 北京理工大学 基于映射法的散乱点云Delaunay三角剖分曲面重构方法
CN108629799A (zh) * 2017-03-24 2018-10-09 成都理想境界科技有限公司 一种实现增强现实的方法及设备
CN111415420A (zh) * 2020-03-25 2020-07-14 北京迈格威科技有限公司 空间信息确定方法、装置及电子设备
CN113570730A (zh) * 2021-07-29 2021-10-29 深圳市慧鲤科技有限公司 视频数据采集方法、视频创作方法及相关产品

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117974746A (zh) * 2024-04-01 2024-05-03 北京理工大学长三角研究院(嘉兴) 点云2d深度面三角剖分构图方法、装置、系统及设备

Also Published As

Publication number Publication date
CN115937299A (zh) 2023-04-07
CN115937299B (zh) 2024-01-30

Similar Documents

Publication Publication Date Title
US11842438B2 (en) Method and terminal device for determining occluded area of virtual object
CN106575160B (zh) 根据用户视点识别动作的界面提供方法及提供装置
US20170186219A1 (en) Method for 360-degree panoramic display, display module and mobile terminal
WO2018119889A1 (zh) 三维场景定位方法和装置
GB2567530A (en) Virtual reality parallax correction
JP2018511874A (ja) 3次元モデリング方法及び装置
CN111161398B (zh) 一种图像生成方法、装置、设备及存储介质
US11561651B2 (en) Virtual paintbrush implementing method and apparatus, and computer readable storage medium
WO2018090914A1 (zh) 三维视觉效果模拟方法及装置、存储介质及显示设备
JP7262530B2 (ja) 位置情報の生成方法、関連装置及びコンピュータプログラム製品
WO2019196871A1 (zh) 建模方法及相关装置
CN113034582A (zh) 位姿优化装置及方法、电子设备及计算机可读存储介质
WO2023179341A1 (zh) 在视频中放置虚拟对象的方法及相关设备
CN109816791B (zh) 用于生成信息的方法和装置
CN116128744A (zh) 消除图像畸变的方法、电子设备、存储介质及车辆
US20230326147A1 (en) Helper data for anchors in augmented reality
CN115578432A (zh) 图像处理方法、装置、电子设备及存储介质
CN112132909B (zh) 参数获取方法及装置、媒体数据处理方法和存储介质
CN114862997A (zh) 图像渲染方法和装置、介质和计算机设备
CN113596336B (zh) 图像获取方法及装置、电子设备及存储介质
WO2023179342A1 (zh) 重定位方法及相关设备
CN112837424B (zh) 图像处理方法、装置、设备和计算机可读存储介质
US11741663B2 (en) Multidimensional object view ability data generation
CN113838201B (zh) 模型适配方法、装置、电子设备及可读存储介质
CN117237500A (zh) 三维模型显示视角调整方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23773592

Country of ref document: EP

Kind code of ref document: A1