WO2021227360A1 - 一种交互式视频投影方法、装置、设备及存储介质 - Google Patents

一种交互式视频投影方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021227360A1
WO2021227360A1 PCT/CN2020/121664 CN2020121664W WO2021227360A1 WO 2021227360 A1 WO2021227360 A1 WO 2021227360A1 CN 2020121664 W CN2020121664 W CN 2020121664W WO 2021227360 A1 WO2021227360 A1 WO 2021227360A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
camera
feature
video frame
feature points
Prior art date
Application number
PCT/CN2020/121664
Other languages
English (en)
French (fr)
Inventor
高星
徐建明
陈奇毅
石立阳
Original Assignee
佳都新太科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 佳都新太科技股份有限公司 filed Critical 佳都新太科技股份有限公司
Publication of WO2021227360A1 publication Critical patent/WO2021227360A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Definitions

  • the embodiments of the present application relate to the field of image processing, and in particular, to an interactive video projection method, device, equipment, and storage medium.
  • Video projection technology combines surveillance video with a three-dimensional model, and projects the surveillance video of the area of interest into a three-dimensional model of a large scene, which can realize the combination of virtual and real static large scenes and dynamic key scenes.
  • each channel of video requires the staff to spend a lot of time to configure the camera position and posture and other information, the configuration process is cumbersome, and there are situations where the video projection requirements cannot be met in time.
  • the embodiments of the present application provide an interactive video projection method, device, equipment, and storage medium to meet the real-time requirements of video projection.
  • an embodiment of the present application provides an interactive video projection method, including:
  • the camera in the virtual scene is set according to the camera pose matrix, the focal length information and/or the distortion parameter, and the video frame is added to the rendering pipeline for video projection.
  • the rendering of a three-dimensional map based on the initial position and posture of the camera determined in the virtual scene to obtain a two-dimensional picture corresponding to the initial position and posture, the initial position and posture being determined based on the video frames taken by the camera includes :
  • the feature matching of the video frame captured by the camera with the two-dimensional picture includes:
  • Feature matching is performed on the feature points between the video frame and the two-dimensional picture according to the distance of the descriptor.
  • the method further includes:
  • the method further includes:
  • the viewing angle adjustment reminder is made when the feature matching fails to remind the operator to reselect the initial position and posture of the camera.
  • the determining the three-dimensional feature points corresponding to the two-dimensional feature points on the video frame on the three-dimensional map according to the feature matching result includes:
  • the coordinates of the three-dimensional feature points corresponding to the two-dimensional feature points in the three-dimensional map are determined.
  • the determining the camera pose matrix, focal length information, and/or distortion parameters through a pose solving algorithm based on the two-dimensional feature points and the three-dimensional feature points includes:
  • the two-dimensional feature point coordinates and the three-dimensional feature point coordinates are substituted into the PnP algorithm and the nonlinear optimization algorithm to obtain the camera pose matrix, focal length information and/or distortion parameters.
  • the method further includes:
  • the pose solution algorithm it is judged whether the camera pose solution is successful, and the viewing angle adjustment reminder is made when the camera pose solution fails to remind the operator to reselect the initial position and pose of the camera.
  • an embodiment of the present application provides an interactive video projection device, including a two-dimensional rendering module, a feature corresponding module, a pose determination module, and a video projection module, wherein:
  • a two-dimensional rendering module for rendering a three-dimensional map based on the initial position and posture of the camera determined in the virtual scene to obtain a two-dimensional picture corresponding to the initial position and posture, the initial position and posture being determined based on the video frame captured by the camera ;
  • the feature corresponding module is configured to perform feature matching between the video frame captured by the camera and the two-dimensional picture, and determine the three-dimensional feature point corresponding to the two-dimensional feature point on the video frame on the three-dimensional map according to the feature matching result;
  • the pose determination module is configured to determine the camera pose matrix, focal length information and/or distortion parameters through a pose solving algorithm based on the two-dimensional feature points and the three-dimensional feature points;
  • the video projection module is configured to set a camera in a virtual scene according to the camera pose matrix, the focal length information and/or the distortion parameter, and add the video frame to the rendering pipeline for video projection.
  • the two-dimensional rendering module is specifically used for:
  • the feature corresponding module when the feature corresponding module performs feature matching between the video frame captured by the camera and the two-dimensional picture, it specifically includes:
  • Feature matching is performed on the feature points between the video frame and the two-dimensional picture according to the distance of the descriptor.
  • the feature corresponding module after the feature corresponding module performs feature matching on the feature points between the video frame and the two-dimensional picture according to the distance of the descriptor, it also screens the matched feature points based on the RANSAC algorithm.
  • a matching error reminding module which is used to determine whether the feature matching is successful according to the feature matching result after the feature matching module performs feature matching on the video frame captured by the camera and the two-dimensional picture, and when the feature matching fails Remind the viewing angle adjustment at time to remind the operator to re-select the initial position and posture of the camera.
  • the feature corresponding module determines the corresponding three-dimensional feature point of the two-dimensional feature point on the video frame on the three-dimensional map according to the feature matching result, it specifically includes:
  • the coordinates of the three-dimensional feature points corresponding to the two-dimensional feature points in the three-dimensional map are determined.
  • the pose determination module is specifically used for:
  • the two-dimensional feature point coordinates and the three-dimensional feature point coordinates are substituted into the PnP algorithm and the nonlinear optimization algorithm to obtain the camera pose matrix, focal length information and/or distortion parameters.
  • a pose error reminding module which is used to determine the camera pose matrix, focal length information, and/or the camera pose based on the two-dimensional feature point and the three-dimensional feature point by the pose determination module based on the pose solving algorithm. After distorting the parameters, judge whether the camera pose is successfully solved according to the result of the pose solving algorithm, and remind the operator to adjust the angle of view when the camera pose fails to solve, so as to remind the operator to reselect the initial position and posture of the camera.
  • an embodiment of the present application provides a computer device, including: a memory and one or more processors;
  • the memory is used to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the interactive video projection method as described in the first aspect.
  • embodiments of the present application provide a storage medium containing computer-executable instructions, which are used to execute the interactive video projection method as described in the first aspect when the computer-executable instructions are executed by a computer processor .
  • the initial position and posture of the camera in the virtual scene are determined according to the video frames taken by the camera, and a three-dimensional map is rendered based on the initial position and posture to obtain a two-dimensional picture corresponding to the range taken by the camera in the initial position and posture. , And then perform feature matching on the two-dimensional picture and the video frame shot by the camera. After the matching is completed, the three-dimensional feature points corresponding to the two-dimensional feature points on the video frame in the three-dimensional map are determined.
  • the pose calculation algorithm can determine the camera pose matrix, Focus information and/or distortion parameters, set the camera in the virtual scene according to the above information, and add the video frame to the rendering pipeline for video projection, so as to achieve semi-automatic interactive fast video projection mapping, without the need for the staff to manually configure the camera parameters accurately, and improve Video projection efficiency, and by matching the video frame and the two-dimensional picture, the video frame can be projected on the correct position of the three-dimensional model, effectively improving the video projection effect.
  • Fig. 1 is a flowchart of an interactive video projection method provided by an embodiment of the present application
  • FIG. 2 is a flowchart of another interactive video projection method provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an interactive video projection device provided by an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Fig. 1 shows a flowchart of an interactive video projection method provided by an embodiment of the present application.
  • the interactive video projection method provided by an embodiment of the present application may be executed by an interactive video projection device, which may Realized by means of hardware and/or software, and integrated in computer equipment.
  • an interactive video projection device executes an interactive video projection method as an example.
  • the interactive video projection method includes:
  • the virtual scene can be a three-dimensional scene rendered based on a three-dimensional map.
  • the displayed virtual scene picture can be determined based on the position, posture and angle of view of the camera in the virtual scene, and the adjustment operation (position, angle of view) can be performed through an input device (e.g., mouse, keyboard) , Focal length, etc.) to realize the position and posture of the camera in the virtual scene, so as to determine the position and posture of the camera in the target virtual scene.
  • the virtual scene screen can be displayed on the display device (display screen).
  • an initial position and posture confirmation button or a one-key mapping button can be set in the operation interface, and the position and posture of the camera in the virtual scene at this time can be determined in response to the confirmation operation of the button, and used as the initial position and posture of the camera.
  • the three-dimensional map is established based on the world coordinate system. Even if there is an error between the coordinate system of the three-dimensional map and the world coordinate system, the offset caused by the error is within the error range (usually within a few meters). Rendering a three-dimensional map is not a big problem, and the rendered two-dimensional picture can still cover the target area (the area corresponding to the video frame).
  • S102 Perform feature matching on the video frame captured by the camera and the two-dimensional picture, and determine the three-dimensional feature point corresponding to the two-dimensional feature point on the video frame on the three-dimensional map according to the feature matching result.
  • feature matching is performed on the video frame captured by the camera and the two-dimensional picture, that is, the feature points in the video frame and the two-dimensional picture are extracted, and the similarity between the feature points The degree (feature vector distance) is matched and the matching result is generated.
  • the feature points on the video frame are two-dimensional feature points
  • the feature points on the two-dimensional screen are matching feature points.
  • the correspondence between the three-dimensional map and the two-dimensional screen coordinate points can be recorded, and the three-dimensional features in the three-dimensional map corresponding to the matching feature points in the two-dimensional screen can be determined according to the record.
  • Point coordinates are used to determine the three-dimensional feature points corresponding to the two-dimensional feature points on the video frame on the three-dimensional map.
  • the RGB image (two-dimensional picture) and the depth map are simultaneously rendered when the two-dimensional picture is rendered.
  • the three-dimensional feature points corresponding to the points in the two-dimensional picture can be inversely calculated to determine the two-dimensional image on the video frame.
  • the feature point corresponds to the three-dimensional feature point on the three-dimensional map.
  • S103 Based on the two-dimensional feature points and the three-dimensional feature points, determine a camera pose matrix, focal length information, and/or distortion parameters through a pose solving algorithm.
  • the two-dimensional feature points and the three-dimensional feature points are substituted into the pose solving algorithm to obtain the camera pose matrix, focal length information, and/or Distortion parameters.
  • the pose solution algorithm is a method to solve the motion of 3D to 2D point pairs. It describes how to determine the pose, focal length and distortion of the camera when the n 3D space points and their projection positions are known.
  • whether the camera is distorted can be determined according to the specific type or parameters of the camera. For cameras with no distortion or less severe distortion, the distortion parameters can be set as default parameters (for example, if set to 0, the default camera has no distortion).
  • S104 Set a camera in the virtual scene according to the camera pose matrix, the focal length information and/or the distortion parameter, and add the video frame to the rendering pipeline for video projection.
  • the pose matrix and focal length information are input into the camera parameters of the virtual scene, and the position, posture and focal length of the camera are set. Then the video frame is added to the rendering pipeline, and the rendering pipeline performs real-time fusion projection of the video frame under the corresponding camera parameter settings.
  • the mapped overlap area is processed for smooth transition, thereby fusing the video frame into the three-dimensional scene, and completing the video projection of the corrected video frame in the three-dimensional scene. It is understandable that the video projection of the video frame can be performed based on the existing video projection method, which will not be repeated here.
  • the initial position and posture of the camera in the virtual scene are determined according to the video frames taken by the camera, and the three-dimensional map is rendered based on the initial position and posture to obtain a two-dimensional picture corresponding to the range taken by the camera in the initial position and posture, and then Perform feature matching on the two-dimensional picture and the video frame taken by the camera.
  • Fig. 2 is a flowchart of another interactive video projection method provided by an embodiment of the application.
  • the interactive video projection method is a concrete embodiment of the interactive video projection method.
  • the interactive video projection method includes:
  • S201 Determine the initial position and posture of the camera in the virtual scene based on the video frames shot by the camera.
  • S202 Acquire a three-dimensional model tile corresponding to the rendering range according to the initial position and posture, and the three-dimensional map is stored in the form of a three-dimensional model tile.
  • the three-dimensional map is stored in the form of three-dimensional model tiles.
  • the data volume of the three-dimensional map is relatively large.
  • each three-dimensional map is called a three-dimensional model tile, and the location range corresponding to each three-dimensional model tile is recorded.
  • S203 Render the three-dimensional model tiles to obtain a two-dimensional picture corresponding to the initial position and posture.
  • the three-dimensional model tiles are rendered by the GPU visualization engine and the two-dimensional picture corresponding to the initial position and posture is obtained. It can be understood that the display range of the two-dimensional picture is larger than the display range of the corresponding video frame.
  • S204 Obtain video frames captured by the camera and feature points and descriptors of the two-dimensional picture based on the image feature extraction algorithm.
  • image feature extraction is performed on video frames and two-dimensional images based on the GPU, and the image features include feature points and descriptors.
  • the image feature extraction algorithm can be SIFT (Scale Invariant Feature Transform) algorithm, SURF (Speeded Up Robust Features, accelerated robust feature) algorithm, ORB (Oriented FAST and Rotated BRIEF) algorithm, etc. This embodiment does not Make a limit.
  • the feature points of the image are some of the most representative points on the image.
  • the so-called most representative means that these points contain most of the information expressed by the image. Even if you rotate, zoom, or even adjust the brightness of the image, these points still exist steadily and will not be lost. Finding these points is equivalent to confirming the image, and they can be used for meaningful work such as matching and recognition.
  • Feature points are composed of two parts: Key-point and Descriptor.
  • the BRIEF descriptor is a binary descriptor, usually a 128-bit binary string. Its calculation method is to randomly select 128 point pairs from around the key point p. For two points in each point pair, if the gray value of the previous point is greater than that of the next point, then it will be 1, otherwise it will be 0.
  • extracting ORB features actually includes two things: extracting key points and calculating descriptors.
  • Use FAST feature point detection algorithm or Harris corner detection algorithm or SIFT, SURF and other algorithms to detect the location of feature points, and then in the feature point neighborhood Use the BRIEF algorithm to establish feature descriptors.
  • S205 Perform feature matching on the feature points between the video frame and the two-dimensional picture according to the distance of the descriptor.
  • the similarity between the two feature points is judged according to the distance of the corresponding descriptor, and the smaller the distance, the higher the similarity.
  • the distance of the descriptor can be Euclidean distance, Hamming distance, cosine distance, etc.
  • the feature points are sorted according to distance, and the matching results of the first N features are displayed under a certain degree of confidence, that is, the two-dimensional The feature points between the picture and the video frame are matched.
  • whether the feature matching is successful can be judged after the feature points are extracted or after the feature points are actually matched. For example, when calling the image feature extraction algorithm to obtain feature points, determine whether the feature points are normally obtained according to the algorithm results, or when matching the feature points according to the distance of the descriptor, judge whether the matching feature points reach the normal number according to the matching result Or the proportion, and use this to judge whether the feature matching is successful. When the matching is successful, the next step is to filter the feature points. If the matching fails, a viewing angle adjustment reminder is performed to remind the operator to return to the initial position and posture of the newly selected camera in step S201.
  • the RANSAC Random Sample Consensus
  • the basic matrix and homography matrix of the two-dimensional picture and video frame are obtained, and the matched feature points are screened by the RANSAC algorithm based on the basic matrix and the homography matrix to eliminate the feature points with matching errors.
  • S207 Determine the coordinates of the matching feature point in the two-dimensional picture that matches the two-dimensional feature point on the video frame according to the feature matching result.
  • the feature points that match each other in the two-dimensional picture and the video frame are respectively defined as the matching feature point and the two-dimensional feature point.
  • the matching feature points in the two-dimensional picture that match the two-dimensional feature points on the video frame are determined, and the coordinates of these matching feature points are determined.
  • S208 Determine the coordinates of the three-dimensional feature points corresponding to the two-dimensional feature points in the three-dimensional map according to the correspondence between the three-dimensional map and the two-dimensional screen coordinate points.
  • the correspondence between the three-dimensional map and the coordinate points of the two-dimensional screen can be recorded.
  • the matching feature points corresponding to the two-dimensional feature points are determined according to the matching result, and then the corresponding relationship between the three-dimensional map and the two-dimensional screen coordinate points is obtained and matched The coordinates of the three-dimensional feature point corresponding to the feature point.
  • the RGB image (two-dimensional image) and the depth map are simultaneously rendered when the two-dimensional image is rendered.
  • the three-dimensional feature points corresponding to the points in the two-dimensional image can be inversely calculated, and the matching features on the two-dimensional image can be obtained.
  • the coordinates of the three-dimensional feature point corresponding to the point can be obtained.
  • S209 Obtain the two-dimensional feature point coordinates on the video frame and the three-dimensional feature point coordinates on the three-dimensional map, and substitute the two-dimensional feature point coordinates and the three-dimensional feature point coordinates into the PnP algorithm and the nonlinear optimization algorithm to obtain the camera Pose matrix, focal length information and/or distortion parameters.
  • the PnP (Perspective-n-Point) algorithm is a method for solving 3D to 2D point pair motion, which can be solved by P3P, direct linear transformation (DLT), EPnP and other algorithms.
  • P3P algorithm is a 3D-2D pose solution method. It needs to know the matching 3D point and the image 2D point, that is, first find the corresponding 2D point (equivalent to the two-dimensional feature of this scheme Point) 3D coordinates in the current camera coordinate system (equivalent to the three-dimensional feature point coordinates of this solution), and then calculate the camera pose according to the 3D coordinates in the world coordinate system and the 3D coordinates in the current camera coordinate system.
  • the nonlinear optimization algorithm is a method of further optimizing the reprojection error of the 3D point to the 2D point through the least squares given the initial value of the camera posture focal length. During the optimization process, the algorithm will further refine the camera posture and focal length. Adjustment.
  • the LM (Levenberg-Marquardt) optimization algorithm is used as a non-linear optimization algorithm to optimize the initial value of the camera posture focal length to obtain the smallest reprojection error from 3D point to 2D point.
  • the two-dimensional feature point coordinates on the video frame and the three-dimensional feature point coordinates on the three-dimensional map are obtained, the two-dimensional feature point coordinates and the three-dimensional feature point coordinates are substituted into the PnP algorithm and the nonlinear optimization algorithm, and the accurate solution is obtained by the PnP algorithm.
  • the camera pose matrix and then optimize the camera parameters through a nonlinear optimization algorithm to obtain focal length information and/or distortion parameters.
  • the distortion parameters can be determined according to the specific types or parameters of the camera. For cameras with no distortion or less severe distortion, the distortion parameters can be set as default parameters (for example, if set to 0, the default camera has no distortion), and the distortion parameters can be omitted. calculate.
  • a viewing angle adjustment reminder will be made to remind the operator to reselect the initial position and attitude of the camera.
  • whether the camera pose is successfully solved can be judged according to the solution result of the PnP algorithm. For example, according to the results of the PnP algorithm, it is judged whether the obtained camera pose data is normal or whether the deviation between the camera pose and the initial position and pose is within a reasonable range, and then it is judged whether the camera pose is solved successfully. When the camera pose is solved successfully, the next video projection operation is continued. If the camera pose fails to be solved, a viewing angle adjustment reminder is issued to remind the operator to return to the initial position and attitude of the newly selected camera in step S201.
  • S210 Set a camera in the virtual scene according to the camera pose matrix, the focal length information, and/or the distortion parameter, and add the video frame to the rendering pipeline for video projection.
  • the initial position and posture of the camera in the virtual scene are determined according to the video frames taken by the camera, and the three-dimensional map is rendered based on the initial position and posture to obtain a two-dimensional picture corresponding to the range taken by the camera in the initial position and posture, and then Perform feature matching on the two-dimensional picture and the video frame taken by the camera.
  • the matching After the matching is completed, determine the three-dimensional feature point corresponding to the two-dimensional feature point on the video frame in the three-dimensional map, and determine the camera pose matrix and focal length information through the pose solving algorithm And/or distortion parameters, set the camera in the virtual scene according to the above information, and add the video frame to the rendering pipeline for video projection, so as to realize semi-automatic interactive fast video projection mapping, without the need for staff to manually configure camera parameters accurately, and improve video projection It is efficient, and the video frame can be projected on the correct position of the three-dimensional model through the matching of the video frame and the two-dimensional picture, which effectively improves the video projection effect.
  • Two-dimensional images are rendered in the form of three-dimensional model tiles, which reduces the burden of GPU graphics processing and effectively improves the real-time performance of video projection.
  • the operator only needs to adjust the viewing angle of the virtual scene to realize one-click mapping, without manual calculation of complex camera parameters, reducing the time spent on each channel of video projection parameter configuration, improving video projection efficiency, and facilitating the large-scale implementation of video projection technology Promotion.
  • FIG. 3 is a schematic structural diagram of an interactive video projection device provided by an embodiment of the application.
  • the interactive video projection device provided by this embodiment includes a two-dimensional rendering module 31, a feature corresponding module 32, a pose determination module 33 and a video projection module 34.
  • the two-dimensional rendering module 31 is configured to render a three-dimensional map based on the initial position and posture of the camera determined in the virtual scene to obtain a two-dimensional picture corresponding to the initial position and posture, and the initial position and posture is based on the camera shooting.
  • the pose determination module 33 is used to determine the camera pose matrix, focal length information and/or distortion parameters based on the two-dimensional feature points and the three-dimensional feature points through a pose solving algorithm; video projection module 34 , For setting a camera in a virtual scene according to the camera pose matrix, the focal length information and/or the distortion parameter, and adding the video frame to the rendering pipeline for video projection.
  • the initial position and posture of the camera in the virtual scene are determined according to the video frames taken by the camera, and the three-dimensional map is rendered based on the initial position and posture to obtain a two-dimensional picture corresponding to the range taken by the camera in the initial position and posture, and then Perform feature matching on the two-dimensional picture and the video frame taken by the camera.
  • the feature corresponding module 32 when the feature corresponding module 32 performs feature matching between the video frame captured by the camera and the two-dimensional picture, it specifically includes:
  • Feature matching is performed on the feature points between the video frame and the two-dimensional picture according to the distance of the descriptor.
  • the feature corresponding module 32 after the feature corresponding module 32 performs feature matching on the feature points between the video frame and the two-dimensional picture according to the distance of the descriptor, it further compares the matched feature points based on the RANSAC algorithm. To filter.
  • it further includes a matching error reminding module, which is used to determine whether the feature matching is successful according to the feature matching result after the feature corresponding module 32 performs feature matching on the video frame captured by the camera and the two-dimensional picture , And when the feature matching fails, the viewing angle adjustment reminder will remind the operator to re-select the initial position and posture of the camera.
  • a matching error reminding module which is used to determine whether the feature matching is successful according to the feature matching result after the feature corresponding module 32 performs feature matching on the video frame captured by the camera and the two-dimensional picture , And when the feature matching fails, the viewing angle adjustment reminder will remind the operator to re-select the initial position and posture of the camera.
  • the feature corresponding module 32 determines the three-dimensional feature point corresponding to the two-dimensional feature point on the video frame on the three-dimensional map according to the feature matching result, it specifically includes:
  • the coordinates of the three-dimensional feature points corresponding to the two-dimensional feature points in the three-dimensional map are determined.
  • the pose determination module 33 is specifically configured to:
  • the two-dimensional feature point coordinates and the three-dimensional feature point coordinates are substituted into the PnP algorithm and the nonlinear optimization algorithm to obtain the camera pose matrix, focal length information and/or distortion parameters.
  • it further includes a pose error reminding module, configured to determine the camera pose matrix by the pose determination module 33 based on the two-dimensional feature points and the three-dimensional feature points. After the focal length information and/or distortion parameters, it is judged whether the camera pose is successfully solved according to the result of the pose solving algorithm, and the angle of view adjustment reminds when the camera pose solves failed to remind the operator to reselect the initial position and attitude of the camera.
  • a pose error reminding module configured to determine the camera pose matrix by the pose determination module 33 based on the two-dimensional feature points and the three-dimensional feature points.
  • Fig. 4 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer equipment includes: an input device 43, an output device 44, a memory 42, and one or more processors 41; the memory 42 is used to store one or more programs; when the one or more programs It is executed by the one or more processors 41, so that the one or more processors 41 implement the interactive video projection method provided in the foregoing embodiment.
  • the input device 43, the output device 44, the memory 42, and the processor 41 may be connected by a bus or other methods. In FIG. 4, the connection by a bus is taken as an example.
  • the memory 42 can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the interactive video projection method described in any embodiment of this application (for example, interactive The two-dimensional rendering module 31, the feature corresponding module 32, the pose determination module 33 and the video projection module 34 in the video projection device.
  • the memory 42 may mainly include a program storage area and a data storage area.
  • the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created according to the use of the device, and the like.
  • the memory 42 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • the memory 42 may further include a memory remotely provided with respect to the processor 41, and these remote memories may be connected to the device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the input device 43 can be used to receive inputted digital or character information, and generate key signal input related to user settings and function control of the device.
  • the output device 44 may include a display device such as a display screen.
  • the processor 41 executes various functional applications and data processing of the device by running the software programs, instructions, and modules stored in the memory 42 to realize the above-mentioned interactive video projection method.
  • the interactive video projection device and computer provided above can be used to execute the interactive video projection method provided in any of the above embodiments, and have corresponding functions and beneficial effects.
  • the embodiment of the present application also provides a storage medium containing computer-executable instructions, when the computer-executable instructions are executed by a computer processor, they are used to execute the interactive video projection method provided in the above-mentioned embodiments.
  • the method includes: rendering a three-dimensional map based on the initial position and posture of the camera determined in the virtual scene to obtain a two-dimensional picture corresponding to the initial position and posture, the initial position and posture being determined based on the video frames taken by the camera; Feature matching is performed on the video frame of the video frame and the two-dimensional picture, and the three-dimensional feature point corresponding to the two-dimensional feature point on the video frame on the three-dimensional map is determined according to the feature matching result; based on the two-dimensional feature point and the The three-dimensional feature points, the camera pose matrix, focal length information and/or distortion parameters are determined by a pose solving algorithm; the camera in the virtual scene is set according to the camera pose matrix, the focal length information and/or the distortion parameters, The video frame is added to the rendering pipeline for
  • Storage medium any of various types of storage devices or storage devices.
  • the term "storage medium” is intended to include: installation media, such as CD-ROM, floppy disk or tape device; computer system memory or random access memory, such as DRAM, DDR RAM, SRAM, EDO RAM, Rambus RAM, etc. ; Non-volatile memory, such as flash memory, magnetic media (such as hard disk or optical storage); registers or other similar types of memory elements.
  • the storage medium may further include other types of memory or a combination thereof.
  • the storage medium may be located in the first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the Internet).
  • the second computer system can provide the program instructions to the first computer for execution.
  • storage media may include two or more storage media that may reside in different locations (for example, in different computer systems connected through a network).
  • the storage medium may store program instructions (for example, embodied as a computer program) executable by one or more processors.
  • the storage medium containing computer-executable instructions provided in the embodiments of the present application is not limited to the interactive video projection method described above, and can also execute the interactive video projection methods provided in any of the embodiments of the present application. Related operations in the video projection method.
  • the interactive video projection apparatus, equipment, and storage medium provided in the above embodiments can perform the interactive video projection method provided in any embodiment of this application.
  • the interactive video projection method provided by the example can perform the interactive video projection method provided in any embodiment of this application.
  • the interactive video projection method provided by the example can perform the interactive video projection method provided in any embodiment of this application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Remote Sensing (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

一种交互式视频投影方法、装置、设备及存储介质。通过根据摄像头拍摄的视频帧确定相机在虚拟场景中的初始位置姿态,并基于该初始位置姿态对三维地图进行渲染,获得与初始位置姿态下的摄像头拍摄的范围对应的二维画面,然后对二维画面和摄像头拍摄的视频帧进行特征匹配,匹配完成后确定三维地图中与视频帧上的二维特征点对应的三维特征点,通过位姿求解算法可确定相机位姿矩阵、焦距信息和/或畸变参数,根据以上信息设置虚拟场景中的相机,并将视频帧加入渲染管线中进行视频投影,从而实现半自动交互式快速视频投影贴图,无需工作人员手动精确配置相机参数,提高视频投影效率。

Description

一种交互式视频投影方法、装置、设备及存储介质 技术领域
本申请实施例涉及图像处理领域,尤其涉及一种交互式视频投影方法、装置、设备及存储介质。
背景技术
视频投影技术是把监控视频和三维模型相结合,将关注区域的监控视频投影到大场景的三维模型中,可以实现静态大场景与动态重点场景的虚实结合。
传统的视频投影方案都是基于固定点位的视频监控枪机,即认为摄像头的位置、姿态是固定的,在投影配置过程中,通过人工设置相机的视场角和位置姿态,使得相机在三维数字空间中的相对位置姿态和其在物理世界里的位置姿态相同,实现视频投影画面和三维模型的贴合。
然而,每一路视频都需要工作人员花费大量的时间去配置相机位置姿态等信息,配置过程繁琐,存在无法及时满足视频投影要求的情况。
发明内容
本申请实施例提供一种交互式视频投影方法、装置、设备及存储介质,以满足视频投影的实时性需求。
在第一方面,本申请实施例提供了一种交互式视频投影方法,包括:
基于在虚拟场景中确定的相机初始位置姿态,对三维地图进行渲染以获得与所述初始位置姿态对应的二维画面,所述初始位置姿态基于摄像头拍摄的视频帧确定;
对摄像头拍摄的视频帧和所述二维画面进行特征匹配,并根据特征匹配结果确定所述视频帧上的二维特征点在所述三维地图上对应的三维特征点;
基于所述二维特征点和所述三维特征点,通过位姿求解算法确定相机位姿矩阵、焦距信息和/或畸变参数;
根据所述相机位姿矩阵、所述焦距信息和/或所述畸变参数设置虚拟场景中的相机,并将所述视频帧加入渲染管线中进行视频投影。
进一步的,所述基于在虚拟场景中确定的相机初始位置姿态,对三维地图 进行渲染以获得与所述初始位置姿态对应的二维画面,所述初始位置姿态基于摄像头拍摄的视频帧确定,包括:
基于摄像头拍摄的视频帧在虚拟场景中确定相机初始位置姿态;
根据所述初始位置姿态获取渲染范围对应的三维模型瓦片,所述三维地图通过三维模型瓦片的形式进行存储;
对所述三维模型瓦片进行渲染以获得与所述初始位置姿态对应的二维画面。
进一步的,所述对摄像头拍摄的视频帧和所述二维画面进行特征匹配,包括:
基于图像特征提取算法获取摄像头拍摄的视频帧和所述二维画面的特征点以及描述子;
根据描述子的距离对所述视频帧和所述二维画面之间的特征点进行特征匹配。
进一步的,所述根据描述子的距离对所述视频帧和所述二维画面之间的特征点进行特征匹配之后,还包括:
基于RANSAC算法对匹配的特征点进行筛选。
进一步的,所述对摄像头拍摄的视频帧和所述二维画面进行特征匹配之后,还包括:
根据特征匹配结果判断特征匹配是否成功,并在特征匹配失败时进行视角调整提醒,以提醒操作者重新选择相机的初始位置姿态。
进一步的,所述根据特征匹配结果确定所述视频帧上的二维特征点在所述三维地图上对应的三维特征点,包括:
根据特征匹配结果确定所述二维画面中与所述视频帧上的二维特征点匹配的匹配特征点的坐标;
根据三维地图和二维画面坐标点的对应关系,确定三维地图中与二维特征点对应的三维特征点的坐标。
进一步的,所述基于所述二维特征点和所述三维特征点,通过位姿求解算法确定相机位姿矩阵、焦距信息和/或畸变参数,包括:
获取视频帧上的二维特征点坐标和三维地图上的三维特征点坐标;
将所述二维特征点坐标和所述三维特征点坐标代入PnP算法和非线性优化算法,以得到相机位姿矩阵、焦距信息和/或畸变参数。
进一步的,所述基于所述二维特征点和所述三维特征点,通过位姿求解算法确定相机位姿矩阵、焦距信息和/或畸变参数之后,还包括:
根据位姿求解算法结果判断相机位姿求解是否成功,并在相机位姿求解失败时进行视角调整提醒,以提醒操作者重新选择相机的初始位置姿态。
在第二方面,本申请实施例提供了一种交互式视频投影装置,包括二维渲染模块、特征对应模块、位姿确定模块和视频投影模块,其中:
二维渲染模块,用于基于在虚拟场景中确定的相机初始位置姿态,对三维地图进行渲染以获得与所述初始位置姿态对应的二维画面,所述初始位置姿态基于摄像头拍摄的视频帧确定;
特征对应模块,用于对摄像头拍摄的视频帧和所述二维画面进行特征匹配,并根据特征匹配结果确定所述视频帧上的二维特征点在所述三维地图上对应的三维特征点;
位姿确定模块,用于基于所述二维特征点和所述三维特征点,通过位姿求解算法确定相机位姿矩阵、焦距信息和/或畸变参数;
视频投影模块,用于根据所述相机位姿矩阵、所述焦距信息和/或所述畸变参数设置虚拟场景中的相机,并将所述视频帧加入渲染管线中进行视频投影。
进一步的,所述二维渲染模块具体用于:
基于摄像头拍摄的视频帧在虚拟场景中确定相机初始位置姿态;
根据所述初始位置姿态获取渲染范围对应的三维模型瓦片,所述三维地图通过三维模型瓦片的形式进行存储;
对所述三维模型瓦片进行渲染以获得与所述初始位置姿态对应的二维画面。
进一步的,所述特征对应模块在对摄像头拍摄的视频帧和所述二维画面进行特征匹配时,具体包括:
基于图像特征提取算法获取摄像头拍摄的视频帧和所述二维画面的特征点以及描述子;
根据描述子的距离对所述视频帧和所述二维画面之间的特征点进行特征匹配。
进一步的,所述特征对应模块在根据描述子的距离对所述视频帧和所述二维画面之间的特征点进行特征匹配之后,还基于RANSAC算法对匹配的特征点进行筛选。
进一步的,还包括匹配错误提醒模块,用于在所述特征对应模块对摄像头拍摄的视频帧和所述二维画面进行特征匹配之后,根据特征匹配结果判断特征匹配是否成功,并在特征匹配失败时进行视角调整提醒,以提醒操作者重新选择相机的初始位置姿态。
进一步的,所述特征对应模块在根据特征匹配结果确定所述视频帧上的二维特征点在所述三维地图上对应的三维特征点时,具体包括:
根据特征匹配结果确定所述二维画面中与所述视频帧上的二维特征点匹配的匹配特征点的坐标;
根据三维地图和二维画面坐标点的对应关系,确定三维地图中与二维特征点对应的三维特征点的坐标。
进一步的,所述位姿确定模块具体用于:
获取视频帧上的二维特征点坐标和三维地图上的三维特征点坐标;
将所述二维特征点坐标和所述三维特征点坐标代入PnP算法和非线性优化算法,以得到相机位姿矩阵、焦距信息和/或畸变参数。
进一步的,还包括位姿错误提醒模块,用于在所述位姿确定模块基于所述二维特征点和所述三维特征点,通过位姿求解算法确定相机位姿矩阵、焦距信息和/或畸变参数之后,根据位姿求解算法结果判断相机位姿求解是否成功,并在相机位姿求解失败时进行视角调整提醒,以提醒操作者重新选择相机的初始位置姿态。
在第三方面,本申请实施例提供了一种计算机设备,包括:存储器以及一个或多个处理器;
所述存储器,用于存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面所述的交互式视频投影方法。
在第四方面,本申请实施例提供了一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如第一方面所述的交互式视频投影方法。
本申请实施例通过根据摄像头拍摄的视频帧确定相机在虚拟场景中的初始位置姿态,并基于该初始位置姿态对三维地图进行渲染,获得与初始位置姿态下的摄像头拍摄的范围对应的二维画面,然后对二维画面和摄像头拍摄的视频帧进行特征匹配,匹配完成后确定三维地图中与视频帧上的二维特征点对应的 三维特征点,通过位姿求解算法可确定相机位姿矩阵、焦距信息和/或畸变参数,根据以上信息设置虚拟场景中的相机,并将视频帧加入渲染管线中进行视频投影,从而实现半自动交互式快速视频投影贴图,无需工作人员手动精确配置相机参数,提高视频投影效率,并且通过视频帧和二维画面的匹配使得视频帧可投影在三维模型的正确位置上,有效提高视频投影效果。
附图说明
图1是本申请实施例提供的一种交互式视频投影方法的流程图;
图2是本申请实施例提供的另一种交互式视频投影方法的流程图;
图3是本申请实施例提供的一种交互式视频投影装置的结构示意图;
图4是本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面结合附图对本申请具体实施例作进一步的详细描述。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部内容。在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各项操作(或步骤)描述成顺序的处理,但是其中的许多操作可以被并行地、并发地或者同时实施。此外,各项操作的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。
图1给出了本申请实施例提供的一种交互式视频投影方法的流程图,本申请实施例提供的交互式视频投影方法可以由交互式视频投影装置来执行,该交互式视频投影装置可以通过硬件和/或软件的方式实现,并集成在计算机设备中。
下述以交互式视频投影装置执行交互式视频投影方法为例进行描述。参考图1,该交互式视频投影方法包括:
S101:基于在虚拟场景中确定的相机初始位置姿态,对三维地图进行渲染以获得与所述初始位置姿态对应的二维画面,所述初始位置姿态基于摄像头拍 摄的视频帧确定。
其中,虚拟场景可以是基于三维地图渲染出的三维场景,基于虚拟场景中相机的位置姿态和视角可确定显示的虚拟场景画面,可通过输入装置(例如鼠标、键盘)的调整操作(位置、视角、焦距等)实现相机在虚拟场景中的位置姿态,从而确定相机在目标虚拟场景画面下的位置姿态。虚拟场景画面可在显示装置(显示屏)中进行显示。
示例性的,在接收到摄像头回传的视频流后,通过软解码或硬解码的方式对视频流进行解码,并获得视频帧。操作者可基于视频帧所显示的画面调整虚拟场景中的相机,使相机的位置姿态、视角、焦距等调整到得到的虚拟场景画面与视频帧的画面相近或者是包含内容大体重叠,并确定此时相机在虚拟场景中的位置姿态,从而确定相机在虚拟场景中的初始位置姿态。在确定位置姿态后,可在操作界面中设置初始位置姿态确定按钮或一键贴图按钮,并响应于按钮的确定操作确定此时相机在虚拟场景中的位置姿态,并作为相机的初始位置姿态。
在相机在虚拟场景中的初始位置姿态确定后,获取三维地图数据,并基于该初始位置姿态以及对应的焦距信息确定在三维地图中的相机的位置姿态以及焦距,对三维地图进行渲染,从而获取与摄像头在初始位置姿态拍摄的画面对应的二维画面。优选的,二维画面的面积应大于对应视频帧的面积,即二维画面应覆盖视频帧。
可以理解的是,三维地图是基于世界坐标系建立的,即使三维地图的坐标系与世界坐标系存在误差,但是误差造成的偏移在误差范围内(一般在几米内),这个偏移量对于渲染三维地图来说问题不大,渲染出的二维画面还是可以覆盖到目标区域(视频帧对应的区域)。
S102:对摄像头拍摄的视频帧和所述二维画面进行特征匹配,并根据特征匹配结果确定所述视频帧上的二维特征点在所述三维地图上对应的三维特征点。
示例性的,在得到渲染出的二维画面后,对摄像头拍摄的视频帧和二维画面进行特征匹配,即提取出视频帧和二维画面中的特征点,并根据特征点之间的相似度(特征向量距离)进行匹配,并生成匹配结果。其中,视频帧上的特征点为二维特征点,二维画面上的特征点为匹配特征点。
进一步的,在渲染二维画面的时候,可对三维地图和二维画面坐标点之间 的对应关系进行记录,并根据该记录确定三维地图中与二维画面中的匹配特征点对应的三维特征点坐标,确定视频帧上的二维特征点在三维地图上对应的三维特征点。另外,在渲染出二维画面时同时渲染了RGB图像(二维画面)和深度图,根据深度图亦可反算出二维画面中的点对应的三维特征点,进而确定视频帧上的二维特征点在三维地图上对应的三维特征点。
S103:基于所述二维特征点和所述三维特征点,通过位姿求解算法确定相机位姿矩阵、焦距信息和/或畸变参数。
示例性的,在得到二维特征点和三维特征点的位置坐标和对应关系后,将二维特征点和三维特征点代入位姿求解算法中,从而得到相机位姿矩阵、焦距信息和/或畸变参数。
其中位姿求解算法是求解3D到2D点对运动的方法,描述了当知道n个3D空间点以及它们的投影位置时,如何确定相机所在的位姿、焦距以及畸变的方法,即在已知世界坐标系下N个空间点的真实坐标以及这些空间点在图像上的投影,如何计算相机所在的位姿的解决方法。其中摄像头是否畸变可根据摄像头的具体类型或参数进行确定,对于无畸变或者畸变不严重的摄像头,畸变参数可设置为默认参数(如设置为0,默认摄像头无畸变)。
S104:根据所述相机位姿矩阵、所述焦距信息和/或所述畸变参数设置虚拟场景中的相机,并将所述视频帧加入渲染管线中进行视频投影。
示例性的,在获得位姿矩阵和焦距信息后,将位姿矩阵和焦距信息输入到虚拟场景的相机参数中,对相机的位置姿态和焦距进行设置。然后将视频帧加入到渲染管线中,并由渲染管线在对应的相机参数设置下对视频帧进行实时融合投影。
在进行融合投影时,确定视频帧中的像素点与三维场景(虚拟场景)中的三维点之间的映射关系,并根据映射关系将视频帧在三维场景中进行颜色纹理映射,并对颜色纹理映射的重合区域进行平滑过渡处理,从而将视频帧融合在三维场景中,完成校正视频帧在三维场景中的视频投影。可以理解的是,对视频帧的视频投影基于现有的视频投影方法进行即可,在此不做赘述。
上述,通过根据摄像头拍摄的视频帧确定相机在虚拟场景中的初始位置姿态,并基于该初始位置姿态对三维地图进行渲染,获得与初始位置姿态下的摄像头拍摄的范围对应的二维画面,然后对二维画面和摄像头拍摄的视频帧进行特征匹配,匹配完成后确定三维地图中与视频帧上的二维特征点对应的三维特 征点,通过位姿求解算法可确定相机位姿矩阵、焦距信息和/或畸变参数,根据以上信息设置虚拟场景中的相机,并将视频帧加入渲染管线中进行视频投影,从而实现半自动交互式快速视频投影贴图,无需工作人员手动精确配置相机参数,提高视频投影效率,并且通过视频帧和二维画面的匹配使得视频帧可投影在三维模型的正确位置上,有效提高视频投影效果。
图2为本申请实施例提供的另一种交互式视频投影方法的流程图,该交互式视频投影方法是对交互式视频投影方法的具体化。参考图2,该交互式视频投影方法包括:
S201:基于摄像头拍摄的视频帧在虚拟场景中确定相机初始位置姿态。
S202:根据所述初始位置姿态获取渲染范围对应的三维模型瓦片,所述三维地图通过三维模型瓦片的形式进行存储。
具体的,三维地图通过三维模型瓦片的形式进行存储。三维地图的数据量比较大,通过对三维地图数据进行切片,每片三维地图称为三维模型瓦片,并对每个三维模型瓦片对应的位置范围进行记录。根据初始位置姿态确定并调取渲染范围所对应的三维模型瓦片。可以理解的是,由获取的三维模型瓦片组成的三维地图的范围应大于摄像头拍摄的视频帧的范围。
S203:对所述三维模型瓦片进行渲染以获得与所述初始位置姿态对应的二维画面。
具体的,在获取与渲染范围对应的三维模型瓦片后,通过GPU可视化引擎对这些三维模型瓦片进行渲染并获得与初始位置姿态对应的二维画面。可以理解的是,二维画面的显示范围大于对应视频帧的显示范围。
S204:基于图像特征提取算法获取摄像头拍摄的视频帧和所述二维画面的特征点以及描述子。
具体的,基于GPU对视频帧和二维画面进行图像特征提取,图像特征包括特征点和描述子。其中图像特征提取算法可以是SIFT(Scale Invariant Feature Transform,尺度不变特征变换)算法、SURF(Speeded Up Robust Features,加速稳健特征)算法或ORB(Oriented FAST and Rotated BRIEF)算法等,本实施例不做限定。
其中图像的特征点是图像上最具代表性的一些点,所谓最具代表性就是说这些点包含了图像表述的大部分信息。即使旋转、缩放,甚至调整图像的亮度, 这些点仍然稳定地存在,不会丢失。找出这些点,就相当于确定了这张图像,它们可以用来做匹配、识别等等有意义的工作。特征点由关键点(Key-point)和描述子(Descriptor)两部分组成。BRIEF描述子是一种二进制描述子,通常为128位的二进制串。它的计算方法是从关键点p周围随机挑选128个点对,对于每个点对中的两个点,如果前一个点的灰度值大于后一个点,则取1,反之取0。
比如,提取ORB特征其实包括了提取关键点和计算描述子两件事情,利用FAST特征点检测算法或Harris角点检测算法或SIFT、SURF等算法检测特征点的位置,接下来在特征点邻域利用BRIEF算法建立特征描述子。
S205:根据描述子的距离对所述视频帧和所述二维画面之间的特征点进行特征匹配。
具体的,在得到二维画面和视频帧上的特征点后,根据对应描述子的距离判断两个特征点之间的相似度,距离越小则相似度越高。其中描述子的距离可以是欧氏距离、汉明距离、余弦距离等。
进一步的,基于GPU遍历二维画面和视频帧的描述子,根据距离对特征点进行排序,在一定的置信度之下显示前N个特征的匹配结果,即根据距离反映的相似度将二维画面和视频帧之间的特征点进行匹配。
在其他实施例中,在对摄像头拍摄的视频帧和所述二维画面进行特征匹配之后,还可以根据特征匹配结果判断特征匹配是否成功,并在特征匹配失败时进行视角调整提醒,以提醒操作者重新选择相机的初始位置姿态。
示例性的,特征匹配是否成功可以在提取特征点之后或者实在对特征点进行匹配之后进行判断。例如,调用图像特征提取算法求取特征点时,根据算法结果确定是否正常获取特征点,或者在根据描述子的距离对特征点进行匹配时,根据匹配结果判断相匹配的特征点是否达到正常数量或占比,并以此判断特征匹配是否成功。在匹配成功时,继续下一步对特征点进行筛选,若匹配失败,则进行视角调整提醒,以提醒操作者重返回至步骤S201新选择相机的初始位置姿态。
S206:基于RANSAC算法对匹配的特征点进行筛选。
其中,RANSAC(Random Sample Consensus,随机抽样一致)算法用于消除错误匹配的点。在完成特征点的匹配后,获取二维画面和视频帧的基础矩阵和单应矩阵,并基于基础矩阵和单应矩阵用RANSAC算法对匹配的特征点进行 筛选,消除匹配错误的特征点。
S207:根据特征匹配结果确定所述二维画面中与所述视频帧上的二维特征点匹配的匹配特征点的坐标。
其中,二维画面和视频帧中相互匹配的特征点分别定义为匹配特征点和二维特征点。
具体的,在完成特征点的匹配与筛选后,确定二维画面中与视频帧上的二维特征点匹配的匹配特征点,并确定这些匹配特征点的坐标。
S208:根据三维地图和二维画面坐标点的对应关系,确定三维地图中与二维特征点对应的三维特征点的坐标。
具体的,在渲染二维画面的时候,可对三维地图和二维画面坐标点之间的对应关系进行记录。在需要确定与二维特征点对应的三维特征点的坐标时,根据匹配结果确定与二维特征点对应的匹配特征点,再根据三维地图和二维画面坐标点之间的对应关系获取与匹配特征点对应的三维特征点的坐标。
另外,在渲染二维画面时同时渲染了RGB图像(二维画面)和深度图,根据深度图亦可反算出二维画面中的点对应的三维特征点,并得到与二维画面上匹配特征点对应的三维特征点的坐标。
S209:获取视频帧上的二维特征点坐标和三维地图上的三维特征点坐标,并将所述二维特征点坐标和所述三维特征点坐标代入PnP算法和非线性优化算法,以得到相机位姿矩阵、焦距信息和/或畸变参数。
其中,PnP(Perspective-n-Point)算法是求解3D到2D点对运动的方法,可以通过P3P、直接线性变换(DLT)、EPnP等算法进行求解。以P3P算法为例,P3P算法是一种由3D-2D的位姿求解方式,需要已知匹配的3D点和图像2D点,即先求出对应的2D点(相当于本方案的二维特征点)在当前相机坐标系下的3D坐标(相当于本方案的三维特征点坐标),然后根据世界坐标系下的3D坐标和当前相机坐标系下的3D坐标求解相机位姿。
非线性优化算法是在给定相机姿态焦距初始值的情况下,通过最小二乘进一步优化3D点到2D点重投影误差的方法,在优化过程中算法会进一步对相机的姿态和焦距做细微的调整。例如,通过LM(Levenberg-Marquardt)优化算法作为非线性优化算法对相机姿态焦距初始值进行优化,以得到3D点到2D点最小的重投影误差。
具体的,获取视频帧上的二维特征点坐标和三维地图上的三维特征点坐标, 将二维特征点坐标和三维特征点坐标代入PnP算法和非线性优化算法,经PnP算法求解得到准确的相机位姿矩阵,然后通过非线性优化算法对相机参数进行优化得到焦距信息和/或畸变参数。畸变参数的确定可根据摄像头的具体类型或参数进行确定,对于无畸变或者畸变不严重的摄像头,畸变参数可设置为默认参数(如设置为0,默认摄像头无畸变),并可不对畸变参数进行计算。
在其他实施例中,在基于所述二维特征点和所述三维特征点,通过位姿求解算法确定相机位姿矩阵、焦距信息和/或畸变参数之后,还可以根据位姿求解算法结果判断相机位姿求解是否成功,并在相机位姿求解失败时进行视角调整提醒,以提醒操作者重新选择相机的初始位置姿态。
示例性的,相机位姿求解是否成功可以根据PnP算法的求解结果进行判断。例如,根据PnP算法得到的结果判断得到的相机位姿数据是否正常或者是相机位姿和初始位置姿态的偏差是否在合理范围内,并以此判断相机位姿求解是否成功。在相机位姿求解成功时,继续下一步的视频投影操作,若相机位姿求解失败,则进行视角调整提醒,以提醒操作者重返回至步骤S201新选择相机的初始位置姿态。
S210:根据所述相机位姿矩阵、所述焦距信息和/或所述畸变参数设置虚拟场景中的相机,并将所述视频帧加入渲染管线中进行视频投影。
上述,通过根据摄像头拍摄的视频帧确定相机在虚拟场景中的初始位置姿态,并基于该初始位置姿态对三维地图进行渲染,获得与初始位置姿态下的摄像头拍摄的范围对应的二维画面,然后对二维画面和摄像头拍摄的视频帧进行特征匹配,匹配完成后确定三维地图中与视频帧上的二维特征点对应的三维特征点,通过位姿求解算法可确定相机位姿矩阵、焦距信息和/或畸变参数,根据以上信息设置虚拟场景中的相机,并将视频帧加入渲染管线中进行视频投影,从而实现半自动交互式快速视频投影贴图,无需工作人员手动精确配置相机参数,提高视频投影效率,并且通过视频帧和二维画面的匹配使得视频帧可投影在三维模型的正确位置上,有效提高视频投影效果。并基于图像特征匹配、PnP算法和非线性优化算法确定相机的精确位置,使得视频帧可投影在三维模型的正确位置上,有效提高视频投影效果。通过三维模型瓦片的形式渲染出二维画面,减少GPU图形处理的负担,有效提高视频投影的实时性。同时,操作者只需要调整虚拟场景的视角即可实现一键贴图,无需人工计算复杂的相机参数,降低每路视频投影参数配置所花费的时间,提高视频投影效率,便于视频投影 技术大规模落地推广。
图3为本申请实施例提供的一种交互式视频投影装置的结构示意图。参考图3,本实施例提供的交互式视频投影装置包括二维渲染模块31、特征对应模块32、位姿确定模块33和视频投影模块34。
其中,二维渲染模块31,用于基于在虚拟场景中确定的相机初始位置姿态,对三维地图进行渲染以获得与所述初始位置姿态对应的二维画面,所述初始位置姿态基于摄像头拍摄的视频帧确定;特征对应模块32,用于对摄像头拍摄的视频帧和所述二维画面进行特征匹配,并根据特征匹配结果确定所述视频帧上的二维特征点在所述三维地图上对应的三维特征点;位姿确定模块33,用于基于所述二维特征点和所述三维特征点,通过位姿求解算法确定相机位姿矩阵、焦距信息和/或畸变参数;视频投影模块34,用于根据所述相机位姿矩阵、所述焦距信息和/或所述畸变参数设置虚拟场景中的相机,并将所述视频帧加入渲染管线中进行视频投影。
上述,通过根据摄像头拍摄的视频帧确定相机在虚拟场景中的初始位置姿态,并基于该初始位置姿态对三维地图进行渲染,获得与初始位置姿态下的摄像头拍摄的范围对应的二维画面,然后对二维画面和摄像头拍摄的视频帧进行特征匹配,匹配完成后确定三维地图中与视频帧上的二维特征点对应的三维特征点,通过位姿求解算法可确定相机位姿矩阵、焦距信息和/或畸变参数,根据以上信息设置虚拟场景中的相机,并将视频帧加入渲染管线中进行视频投影,从而实现半自动交互式快速视频投影贴图,无需工作人员手动精确配置相机参数,提高视频投影效率,并且通过视频帧和二维画面的匹配使得视频帧可投影在三维模型的正确位置上,有效提高视频投影效果。
在一个可能的实施例中,所述二维渲染模块31具体用于:
基于摄像头拍摄的视频帧在虚拟场景中确定相机初始位置姿态;
根据所述初始位置姿态获取渲染范围对应的三维模型瓦片,所述三维地图通过三维模型瓦片的形式进行存储;
对所述三维模型瓦片进行渲染以获得与所述初始位置姿态对应的二维画面。
在一个可能的实施例中,所述特征对应模块32在对摄像头拍摄的视频帧和所述二维画面进行特征匹配时,具体包括:
基于图像特征提取算法获取摄像头拍摄的视频帧和所述二维画面的特征点以及描述子;
根据描述子的距离对所述视频帧和所述二维画面之间的特征点进行特征匹配。
在一个可能的实施例中,所述特征对应模块32在根据描述子的距离对所述视频帧和所述二维画面之间的特征点进行特征匹配之后,还基于RANSAC算法对匹配的特征点进行筛选。
在一个可能的实施例中,还包括匹配错误提醒模块,用于在所述特征对应模块32对摄像头拍摄的视频帧和所述二维画面进行特征匹配之后,根据特征匹配结果判断特征匹配是否成功,并在特征匹配失败时进行视角调整提醒,以提醒操作者重新选择相机的初始位置姿态。
在一个可能的实施例中,所述特征对应模块32在根据特征匹配结果确定所述视频帧上的二维特征点在所述三维地图上对应的三维特征点时,具体包括:
根据特征匹配结果确定所述二维画面中与所述视频帧上的二维特征点匹配的匹配特征点的坐标;
根据三维地图和二维画面坐标点的对应关系,确定三维地图中与二维特征点对应的三维特征点的坐标。
在一个可能的实施例中,所述位姿确定模块33具体用于:
获取视频帧上的二维特征点坐标和三维地图上的三维特征点坐标;
将所述二维特征点坐标和所述三维特征点坐标代入PnP算法和非线性优化算法,以得到相机位姿矩阵、焦距信息和/或畸变参数。
在一个可能的实施例中,还包括位姿错误提醒模块,用于在所述位姿确定模块33基于所述二维特征点和所述三维特征点,通过位姿求解算法确定相机位姿矩阵、焦距信息和/或畸变参数之后,根据位姿求解算法结果判断相机位姿求解是否成功,并在相机位姿求解失败时进行视角调整提醒,以提醒操作者重新选择相机的初始位置姿态。
本申请实施例还提供了一种计算机设备,该计算机设备可集成本申请实施例提供的交互式视频投影装置。图4是本申请实施例提供的一种计算机设备的结构示意图。参考图4,该计算机设备包括:输入装置43、输出装置44、存储器42以及一个或多个处理器41;所述存储器42,用于存储一个或多个程序; 当所述一个或多个程序被所述一个或多个处理器41执行,使得所述一个或多个处理器41实现如上述实施例提供的交互式视频投影方法。其中输入装置43、输出装置44、存储器42和处理器41可以通过总线或者其他方式连接,图4中以通过总线连接为例。
存储器42作为一种计算设备可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本申请任意实施例所述的交互式视频投影方法对应的程序指令/模块(例如,交互式视频投影装置中的二维渲染模块31、特征对应模块32、位姿确定模块33和视频投影模块34)。存储器42可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据设备的使用所创建的数据等。此外,存储器42可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器42可进一步包括相对于处理器41远程设置的存储器,这些远程存储器可以通过网络连接至设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
输入装置43可用于接收输入的数字或字符信息,以及产生与设备的用户设置以及功能控制有关的键信号输入。输出装置44可包括显示屏等显示设备。
处理器41通过运行存储在存储器42中的软件程序、指令以及模块,从而执行设备的各种功能应用以及数据处理,即实现上述的交互式视频投影方法。
上述提供的交互式视频投影装置和计算机可用于执行上述任意实施例提供的交互式视频投影方法,具备相应的功能和有益效果。
本申请实施例还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如上述实施例提供的交互式视频投影方法,该交互式视频投影方法包括:基于在虚拟场景中确定的相机初始位置姿态,对三维地图进行渲染以获得与所述初始位置姿态对应的二维画面,所述初始位置姿态基于摄像头拍摄的视频帧确定;对摄像头拍摄的视频帧和所述二维画面进行特征匹配,并根据特征匹配结果确定所述视频帧上的二维特征点在所述三维地图上对应的三维特征点;基于所述二维特征点和所述三维特征点,通过位姿求解算法确定相机位姿矩阵、焦距信息和/或畸变参数;根据所述相机位姿矩阵、所述焦距信息和/或所述畸变参数设置虚拟场景中的相机,并将所述 视频帧加入渲染管线中进行视频投影。
存储介质——任何的各种类型的存储器设备或存储设备。术语“存储介质”旨在包括:安装介质,例如CD-ROM、软盘或磁带装置;计算机系统存储器或随机存取存储器,诸如DRAM、DDR RAM、SRAM、EDO RAM,兰巴斯(Rambus)RAM等;非易失性存储器,诸如闪存、磁介质(例如硬盘或光存储);寄存器或其它相似类型的存储器元件等。存储介质可以还包括其它类型的存储器或其组合。另外,存储介质可以位于程序在其中被执行的第一计算机系统中,或者可以位于不同的第二计算机系统中,第二计算机系统通过网络(诸如因特网)连接到第一计算机系统。第二计算机系统可以提供程序指令给第一计算机用于执行。术语“存储介质”可以包括可以驻留在不同位置中(例如在通过网络连接的不同计算机系统中)的两个或更多存储介质。存储介质可以存储可由一个或多个处理器执行的程序指令(例如具体实现为计算机程序)。
当然,本申请实施例所提供的一种包含计算机可执行指令的存储介质,其计算机可执行指令不限于如上所述的交互式视频投影方法,还可以执行本申请任意实施例所提供的交互式视频投影方法中的相关操作。
上述实施例中提供的交互式视频投影装置、设备及存储介质可执行本申请任意实施例所提供的交互式视频投影方法,未在上述实施例中详尽描述的技术细节,可参见本申请任意实施例所提供的交互式视频投影方法。
上述仅为本申请的较佳实施例及所运用的技术原理。本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行的各种明显变化、重新调整及替代均不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以上实施例,在不脱离本申请构思的情况下,还可以包括更多其他等效实施例,而本申请的范围由权利要求的范围决定。

Claims (11)

  1. 一种交互式视频投影方法,其特征在于,包括:
    基于在虚拟场景中确定的相机初始位置姿态,对三维地图进行渲染以获得与所述初始位置姿态对应的二维画面,所述初始位置姿态基于摄像头拍摄的视频帧确定;
    对摄像头拍摄的视频帧和所述二维画面进行特征匹配,并根据特征匹配结果确定所述视频帧上的二维特征点在所述三维地图上对应的三维特征点;
    基于所述二维特征点和所述三维特征点,通过位姿求解算法确定相机位姿矩阵、焦距信息和/或畸变参数;
    根据所述相机位姿矩阵、所述焦距信息和/或所述畸变参数设置虚拟场景中的相机,并将所述视频帧加入渲染管线中进行视频投影。
  2. 根据权利要求1所述的交互式视频投影方法,其特征在于,所述基于在虚拟场景中确定的相机初始位置姿态,对三维地图进行渲染以获得与所述初始位置姿态对应的二维画面,所述初始位置姿态基于摄像头拍摄的视频帧确定,包括:
    基于摄像头拍摄的视频帧在虚拟场景中确定相机初始位置姿态;
    根据所述初始位置姿态获取渲染范围对应的三维模型瓦片,所述三维地图通过三维模型瓦片的形式进行存储;
    对所述三维模型瓦片进行渲染以获得与所述初始位置姿态对应的二维画面。
  3. 根据权利要求1所述的交互式视频投影方法,其特征在于,所述对摄像头拍摄的视频帧和所述二维画面进行特征匹配,包括:
    基于图像特征提取算法获取摄像头拍摄的视频帧和所述二维画面的特征点以及描述子;
    根据描述子的距离对所述视频帧和所述二维画面之间的特征点进行特征匹配。
  4. 根据权利要求3所述的交互式视频投影方法,其特征在于,所述根据描述子的距离对所述视频帧和所述二维画面之间的特征点进行特征匹配之后,还包括:
    基于RANSAC算法对匹配的特征点进行筛选。
  5. 根据权利要求1所述的交互式视频投影方法,其特征在于,所述对摄像头拍摄的视频帧和所述二维画面进行特征匹配之后,还包括:
    根据特征匹配结果判断特征匹配是否成功,并在特征匹配失败时进行视角调整提醒,以提醒操作者重新选择相机的初始位置姿态。
  6. 根据权利要求1所述的交互式视频投影方法,其特征在于,所述根据特征匹配结果确定所述视频帧上的二维特征点在所述三维地图上对应的三维特征点,包括:
    根据特征匹配结果确定所述二维画面中与所述视频帧上的二维特征点匹配的匹配特征点的坐标;
    根据三维地图和二维画面坐标点的对应关系,确定三维地图中与二维特征点对应的三维特征点的坐标。
  7. 根据权利要求1所述的交互式视频投影方法,其特征在于,所述基于所述二维特征点和所述三维特征点,通过位姿求解算法确定相机位姿矩阵、焦距信息和/或畸变参数,包括:
    获取视频帧上的二维特征点坐标和三维地图上的三维特征点坐标;
    将所述二维特征点坐标和所述三维特征点坐标代入PnP算法和非线性优化算法,以得到相机位姿矩阵、焦距信息和/或畸变参数。
  8. 根据权利要求1所述的交互式视频投影方法,其特征在于,所述基于所述二维特征点和所述三维特征点,通过位姿求解算法确定相机位姿矩阵、焦距信息和/或畸变参数之后,还包括:
    根据位姿求解算法结果判断相机位姿求解是否成功,并在相机位姿求解失败时进行视角调整提醒,以提醒操作者重新选择相机的初始位置姿态。
  9. 一种交互式视频投影装置,其特征在于,包括二维渲染模块、特征对应模块、位姿确定模块和视频投影模块,其中:
    二维渲染模块,用于基于在虚拟场景中确定的相机初始位置姿态,对三维地图进行渲染以获得与所述初始位置姿态对应的二维画面,所述初始位置姿态基于摄像头拍摄的视频帧确定;
    特征对应模块,用于对摄像头拍摄的视频帧和所述二维画面进行特征匹配,并根据特征匹配结果确定所述视频帧上的二维特征点在所述三维地图上对应的三维特征点;
    位姿确定模块,用于基于所述二维特征点和所述三维特征点,通过位姿求解算法确定相机位姿矩阵、焦距信息和/或畸变参数;
    视频投影模块,用于根据所述相机位姿矩阵、所述焦距信息和/或所述畸变 参数设置虚拟场景中的相机,并将所述视频帧加入渲染管线中进行视频投影。
  10. 一种计算机设备,其特征在于,包括:存储器以及一个或多个处理器;
    所述存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-8任一所述的交互式视频投影方法。
  11. 一种包含计算机可执行指令的存储介质,其特征在于,所述计算机可执行指令在由计算机处理器执行时用于执行如权利要求1-8任一所述的交互式视频投影方法。
PCT/CN2020/121664 2020-05-14 2020-10-16 一种交互式视频投影方法、装置、设备及存储介质 WO2021227360A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010408983.8A CN111640181A (zh) 2020-05-14 2020-05-14 一种交互式视频投影方法、装置、设备及存储介质
CN202010408983.8 2020-05-14

Publications (1)

Publication Number Publication Date
WO2021227360A1 true WO2021227360A1 (zh) 2021-11-18

Family

ID=72332004

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/121664 WO2021227360A1 (zh) 2020-05-14 2020-10-16 一种交互式视频投影方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN111640181A (zh)
WO (1) WO2021227360A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445541A (zh) * 2022-01-28 2022-05-06 北京百度网讯科技有限公司 处理视频的方法、装置、电子设备及存储介质
CN114827569A (zh) * 2022-04-24 2022-07-29 咪咕视讯科技有限公司 画面显示方法、装置、虚拟现实设备及存储介质
CN114915727A (zh) * 2022-05-12 2022-08-16 北京国基科技股份有限公司 一种视频监控画面构建方法及装置
CN115022613A (zh) * 2022-05-19 2022-09-06 北京字节跳动网络技术有限公司 一种视频重建方法、装置、电子设备及存储介质
CN115396644A (zh) * 2022-07-21 2022-11-25 贝壳找房(北京)科技有限公司 基于多段外参数据的视频融合方法及装置
WO2023116430A1 (zh) * 2021-12-23 2023-06-29 奥格科技股份有限公司 视频与城市信息模型三维场景融合方法、系统及存储介质
CN116580099A (zh) * 2023-07-14 2023-08-11 山东艺术学院 一种基于视频与三维模型融合的林地目标定位方法
CN117218320A (zh) * 2023-11-08 2023-12-12 济南大学 基于混合现实的空间标注方法

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111640181A (zh) * 2020-05-14 2020-09-08 佳都新太科技股份有限公司 一种交互式视频投影方法、装置、设备及存储介质
CN112437276B (zh) * 2020-11-20 2023-04-07 埃洛克航空科技(北京)有限公司 一种基于WebGL的三维视频融合方法及系统
CN113096003B (zh) * 2021-04-02 2023-08-18 北京车和家信息技术有限公司 针对多视频帧的标注方法、装置、设备和存储介质
CN113793379A (zh) * 2021-08-12 2021-12-14 视辰信息科技(上海)有限公司 相机姿态求解方法及系统、设备和计算机可读存储介质
CN113870163B (zh) * 2021-09-24 2022-11-29 埃洛克航空科技(北京)有限公司 基于三维场景的视频融合方法以及装置、存储介质、电子装置
CN114237537B (zh) * 2021-12-10 2023-08-04 杭州海康威视数字技术股份有限公司 头戴式设备、远程协助方法及系统
CN114095662B (zh) * 2022-01-20 2022-07-05 荣耀终端有限公司 拍摄指引方法及电子设备
CN114677572B (zh) * 2022-04-08 2023-04-18 北京百度网讯科技有限公司 对象描述参数的生成方法、深度学习模型的训练方法
CN115100327B (zh) * 2022-08-26 2022-12-02 广东三维家信息科技有限公司 动画立体视频生成的方法、装置及电子设备
CN115866254A (zh) * 2022-11-24 2023-03-28 亮风台(上海)信息科技有限公司 一种传输视频帧及摄像参数信息的方法与设备
CN116758157B (zh) * 2023-06-14 2024-01-30 深圳市华赛睿飞智能科技有限公司 一种无人机室内三维空间测绘方法、系统及存储介质
CN117237438A (zh) * 2023-09-18 2023-12-15 共享数据(福建)科技有限公司 一种三维模型和无人机视频数据的范围匹配方法与终端

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010043738A1 (en) * 2000-03-07 2001-11-22 Sawhney Harpreet Singh Method of pose estimation and model refinement for video representation of a three dimensional scene
CN105844696A (zh) * 2015-12-31 2016-08-10 清华大学 基于射线模型三维重构的图像定位方法以及装置
CN108830894A (zh) * 2018-06-19 2018-11-16 亮风台(上海)信息科技有限公司 基于增强现实的远程指导方法、装置、终端和存储介质
CN111586360A (zh) * 2020-05-14 2020-08-25 佳都新太科技股份有限公司 一种无人机投影方法、装置、设备及存储介质
CN111640181A (zh) * 2020-05-14 2020-09-08 佳都新太科技股份有限公司 一种交互式视频投影方法、装置、设备及存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136738A (zh) * 2011-11-29 2013-06-05 北京航天长峰科技工业集团有限公司 一种复杂场景下固定摄像机监控视频与三维模型配准方法
CN102622747B (zh) * 2012-02-16 2013-10-16 北京航空航天大学 一种用于视觉测量的摄像机参数优化方法
CN102663767B (zh) * 2012-05-08 2014-08-06 北京信息科技大学 视觉测量系统的相机参数标定优化方法
CN103400409B (zh) * 2013-08-27 2016-08-10 华中师范大学 一种基于摄像头姿态快速估计的覆盖范围3d可视化方法
CN103716586A (zh) * 2013-12-12 2014-04-09 中国科学院深圳先进技术研究院 一种基于三维空间场景的监控视频融合系统和方法
CN104182982B (zh) * 2014-08-27 2017-02-15 大连理工大学 双目立体视觉摄像机标定参数的整体优化方法
CN105118061A (zh) * 2015-08-19 2015-12-02 刘朔 用于将视频流配准至三维地理信息空间中的场景的方法
CN105678748B (zh) * 2015-12-30 2019-01-15 清华大学 三维监控系统中基于三维重构的交互式标定方法和装置
CN105678839A (zh) * 2015-12-30 2016-06-15 天津德勤和创科技发展有限公司 基于计算机三维场景模拟技术的安防设备分布设计方法
CN107564098A (zh) * 2017-08-17 2018-01-09 中山大学 一种大区域网络三维噪声地图的快速渲染方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010043738A1 (en) * 2000-03-07 2001-11-22 Sawhney Harpreet Singh Method of pose estimation and model refinement for video representation of a three dimensional scene
CN105844696A (zh) * 2015-12-31 2016-08-10 清华大学 基于射线模型三维重构的图像定位方法以及装置
CN108830894A (zh) * 2018-06-19 2018-11-16 亮风台(上海)信息科技有限公司 基于增强现实的远程指导方法、装置、终端和存储介质
CN111586360A (zh) * 2020-05-14 2020-08-25 佳都新太科技股份有限公司 一种无人机投影方法、装置、设备及存储介质
CN111640181A (zh) * 2020-05-14 2020-09-08 佳都新太科技股份有限公司 一种交互式视频投影方法、装置、设备及存储介质

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023116430A1 (zh) * 2021-12-23 2023-06-29 奥格科技股份有限公司 视频与城市信息模型三维场景融合方法、系统及存储介质
CN114445541A (zh) * 2022-01-28 2022-05-06 北京百度网讯科技有限公司 处理视频的方法、装置、电子设备及存储介质
CN114827569A (zh) * 2022-04-24 2022-07-29 咪咕视讯科技有限公司 画面显示方法、装置、虚拟现实设备及存储介质
CN114827569B (zh) * 2022-04-24 2023-11-10 咪咕视讯科技有限公司 画面显示方法、装置、虚拟现实设备及存储介质
CN114915727A (zh) * 2022-05-12 2022-08-16 北京国基科技股份有限公司 一种视频监控画面构建方法及装置
CN114915727B (zh) * 2022-05-12 2023-06-06 北京国基科技股份有限公司 一种视频监控画面构建方法及装置
CN115022613A (zh) * 2022-05-19 2022-09-06 北京字节跳动网络技术有限公司 一种视频重建方法、装置、电子设备及存储介质
CN115396644A (zh) * 2022-07-21 2022-11-25 贝壳找房(北京)科技有限公司 基于多段外参数据的视频融合方法及装置
CN115396644B (zh) * 2022-07-21 2023-09-15 贝壳找房(北京)科技有限公司 基于多段外参数据的视频融合方法及装置
CN116580099A (zh) * 2023-07-14 2023-08-11 山东艺术学院 一种基于视频与三维模型融合的林地目标定位方法
CN117218320A (zh) * 2023-11-08 2023-12-12 济南大学 基于混合现实的空间标注方法
CN117218320B (zh) * 2023-11-08 2024-02-27 济南大学 基于混合现实的空间标注方法

Also Published As

Publication number Publication date
CN111640181A (zh) 2020-09-08

Similar Documents

Publication Publication Date Title
WO2021227360A1 (zh) 一种交互式视频投影方法、装置、设备及存储介质
WO2021227359A1 (zh) 一种无人机投影方法、装置、设备及存储介质
WO2018214365A1 (zh) 图像校正方法、装置、设备、系统及摄像设备和显示设备
WO2020001168A1 (zh) 三维重建方法、装置、设备和存储介质
KR101923845B1 (ko) 이미지 처리 방법 및 장치
CN110070564B (zh) 一种特征点匹配方法、装置、设备及存储介质
WO2019042419A1 (zh) 图像跟踪点获取方法、设备及存储介质
CN109389555B (zh) 一种全景图像拼接方法及装置
WO2019052534A1 (zh) 图像拼接方法及装置、存储介质
WO2010028559A1 (zh) 图像拼接方法及装置
US11620730B2 (en) Method for merging multiple images and post-processing of panorama
CN111915483B (zh) 图像拼接方法、装置、计算机设备和存储介质
WO2021035627A1 (zh) 获取深度图的方法、装置及计算机存储介质
CN110781823A (zh) 录屏检测方法、装置、可读介质及电子设备
US20220405968A1 (en) Method, apparatus and system for image processing
CN111866523A (zh) 全景视频合成方法、装置、电子设备和计算机存储介质
TW202244680A (zh) 位置姿勢獲取方法、電子設備及電腦可讀儲存媒體
CN112288878B (zh) 增强现实预览方法及预览装置、电子设备及存储介质
CN112215749A (zh) 基于柱面投影的图像拼接方法、系统、设备及存储介质
WO2023066143A1 (zh) 全景图像的图像分割方法、装置、计算机设备和存储介质
CN112085842A (zh) 深度值确定方法及装置、电子设备和存储介质
CN115514887A (zh) 视频采集的控制方法、装置、计算机设备和存储介质
CN115086625A (zh) 投影画面的校正方法、装置、系统、校正设备和投影设备
Xu et al. Real-time keystone correction for hand-held projectors with an RGBD camera
CN111161148B (zh) 一种全景图像生成方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20934903

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20934903

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.04.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20934903

Country of ref document: EP

Kind code of ref document: A1