CN114399553A - Virtual viewpoint generation method and device based on camera posture - Google Patents

Virtual viewpoint generation method and device based on camera posture Download PDF

Info

Publication number
CN114399553A
CN114399553A CN202111467673.4A CN202111467673A CN114399553A CN 114399553 A CN114399553 A CN 114399553A CN 202111467673 A CN202111467673 A CN 202111467673A CN 114399553 A CN114399553 A CN 114399553A
Authority
CN
China
Prior art keywords
camera
virtual viewpoint
adjacent
images
matching point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111467673.4A
Other languages
Chinese (zh)
Inventor
桑新柱
叶晓倩
王华春
陈铎
王鹏
刘博阳
齐帅
万华明
李宁驰
王葵如
颜玢玢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202111467673.4A priority Critical patent/CN114399553A/en
Publication of CN114399553A publication Critical patent/CN114399553A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a virtual viewpoint generating method and device based on camera gestures. The method comprises the following steps: constructing a three-dimensional point corresponding to a matching point pair according to the matching point pair between two adjacent viewpoint images and internal parameters and external parameters of two cameras in a camera array corresponding to the adjacent viewpoint images; and according to the target camera attitude and the internal parameters of each target camera corresponding to the selected camera attitude, carrying out re-projection on each three-dimensional point, and determining a virtual viewpoint image corresponding to the camera attitude. The virtual viewpoint generating method based on the camera gesture can improve the generating efficiency of the virtual viewpoint.

Description

Virtual viewpoint generation method and device based on camera posture
Technical Field
The application relates to the technical field of image processing, in particular to a virtual viewpoint generating method and device based on camera gestures.
Background
The real world is three-dimensional, but currently mainstream display devices are still two-dimensional. Three-dimensional displays, particularly those with the naked eye, are receiving increasing attention. The naked eye three-dimensional display needs a dense viewpoint image, and the adoption of a camera array for dense viewpoint acquisition has many difficulties, such as synchronous adjustment between camera arrays, camera calibration and attitude solution, data storage and transmission and the like. Therefore, in practical applications, a few real binocular cameras are usually used to acquire sparse viewpoints, and virtual viewpoints are generated by a virtual viewpoint generation method.
In the related art, DIBR (Depth Image Based Rendering, virtual viewpoint Rendering Based on a Depth Image) may be employed to generate a virtual viewpoint. However, when generating a virtual viewpoint by DIBR, since the virtual viewpoint generating method by DIBR needs to correct left and right images, polar lines of the two images are located on the same horizontal line, and only a virtual viewpoint located on the same polar line, but a virtual viewpoint image at an arbitrary position cannot be generated, thereby resulting in low generation efficiency of the virtual viewpoint.
Disclosure of Invention
The embodiment of the application provides a virtual viewpoint generation method and device based on camera gestures, and generation efficiency of virtual viewpoints is improved.
In a first aspect, an embodiment of the present application provides a method for generating a virtual viewpoint based on a camera pose, including:
constructing a three-dimensional point corresponding to a matching point pair according to the matching point pair between two adjacent viewpoint images and internal parameters and external parameters of two cameras in a camera array corresponding to the adjacent viewpoint images;
and according to the target camera attitude and the internal parameters of each target camera corresponding to the selected camera attitude, carrying out re-projection on each three-dimensional point, and determining a virtual viewpoint image corresponding to the camera attitude.
In one embodiment, before constructing a three-dimensional point corresponding to a matching point pair between adjacent viewpoint images according to the matching point pair and the internal and external parameters of two cameras corresponding to the adjacent viewpoint images, the method further includes:
extracting a first matching point from a first viewpoint image of two adjacent viewpoint images;
extracting a second matching point matched with the first matching point from a second viewpoint image of two adjacent viewpoint images according to the trained optical flow estimation model;
and forming a matching point pair according to the first matching point and the second matching point.
In one embodiment, the constructing a three-dimensional point corresponding to a matching point pair according to the matching point pair between two adjacent viewpoint images and the internal parameters and the external parameters of two cameras corresponding to the adjacent viewpoint images includes:
determining two light ray directions corresponding to two cameras according to a matching point pair between two adjacent viewpoint images and internal parameters and external parameters of the two cameras corresponding to the two adjacent viewpoint images;
and processing the two light ray directions according to the direct linear change to construct a three-dimensional point corresponding to the matching point pair.
In one embodiment, the re-projecting each three-dimensional point according to the target camera pose and the internal parameters of each target camera corresponding to the selected camera pose to determine a virtual viewpoint image corresponding to the camera pose includes:
when the selected camera posture is a horizontal position, carrying out camera position leveling on each target camera according to the three-dimensional coordinates of each target camera in the camera array, and acquiring leveling coordinates of each target camera;
determining the target camera attitude of each target camera according to the leveling coordinates of each target camera and the leveling rotation angle of each target camera obtained by averaging the rotation angle of each target camera;
and carrying out re-projection on each three-dimensional point according to the target camera pose and the internal parameters of each target camera, and determining a virtual viewpoint image corresponding to the camera pose.
In an embodiment, the re-projecting each three-dimensional point and determining a virtual viewpoint image corresponding to the camera pose includes:
carrying out re-projection on each three-dimensional point by taking two adjacent viewpoint images as reference images, and determining two adjacent initial virtual viewpoint images corresponding to the camera postures;
and filling holes in the left images in the adjacent initial virtual viewpoint images to generate virtual viewpoint images.
In an embodiment, the hole filling the left map in the adjacent initial virtual viewpoint images includes:
acquiring a first hole position at which the hole area in the left image of the adjacent initial virtual viewpoint image is larger than a preset area;
and acquiring a pixel corresponding to the first hole position from the right image and filling the pixel to the first hole position.
In one embodiment, the method further comprises:
acquiring a second cavity position of which the cavity area in the left drawing is smaller than or equal to the preset area;
and filling the second cavity position with a cavity through a closing operation.
In a second aspect, an embodiment of the present application provides a virtual viewpoint generating apparatus based on camera pose, including:
the three-dimensional point construction module is used for constructing three-dimensional points corresponding to matching point pairs according to the matching point pairs between two adjacent viewpoint images and internal parameters and external parameters of two cameras in a camera array corresponding to the adjacent viewpoint images;
and the virtual viewpoint generating module is used for carrying out re-projection on each three-dimensional point according to the target camera attitude and the internal parameters of each target camera corresponding to the selected camera attitude, and determining a virtual viewpoint image corresponding to the camera attitude.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor and a memory storing a computer program, where the processor implements the steps of the method for generating a virtual viewpoint based on a camera pose according to the first aspect when executing the program.
In a fourth aspect, the present application provides a computer program product, which includes a computer program that, when being executed by a processor, implements the steps of the camera pose-based virtual viewpoint generation method according to the first aspect.
According to the method and the device for generating the virtual viewpoint based on the camera posture, after the three-dimensional points are reconstructed through the matching point pairs between the two adjacent viewpoint images and the internal parameters and the external parameters of the two cameras corresponding to the two viewpoint images, the three-dimensional points are re-projected according to the target camera posture and the internal parameters of each target camera corresponding to the selected camera posture to determine the virtual viewpoint image corresponding to the camera posture, so that the reconstructed three-dimensional points can be re-projected based on any camera posture to generate the virtual viewpoint images at corresponding positions, the virtual viewpoint images at different positions can be generated by selecting different camera postures, the left image and the right image do not need to be corrected, and the generation efficiency of the virtual viewpoint is improved.
Drawings
In order to more clearly illustrate the technical solutions in the present application or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for generating a virtual viewpoint based on a camera pose provided in an embodiment of the present application;
FIG. 2 is a schematic view of triangulation provided by embodiments of the present application;
fig. 3 is a schematic diagram of a camera position and an optical axis direction in real horizontal shooting according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a position and an optical axis direction of a camera after position and optical axis leveling provided by an embodiment of the present application;
fig. 5 is a schematic structural diagram of a virtual viewpoint generating apparatus based on camera pose provided in an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
To make the purpose, technical solutions and advantages of the present application clearer, the technical solutions in the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For a better understanding of the solution, the technical terms to which the embodiments of the present invention relate are explained.
Camera internal parameters refer to parameters related to the characteristics of the camera itself, such as the focal length and pixel size of the camera;
the camera pose refers to camera external parameters, namely parameters of the camera under a world coordinate system, such as the position and the rotation direction of the camera;
and re-projection, namely second projection, wherein the first projection is to project the 3D point on the image when the camera takes a picture. And 3D points can be reconstructed by utilizing the matching points on the image pair, and the reconstructed 3D points are projected onto an image plane through the camera pose again, namely the 3D points are re-projected.
The optical flow refers to the instantaneous speed and direction of pixel motion of a space moving object on an observation imaging plane;
convolutional Neural Networks (CNN), a class of feed forward neural networks that includes convolutional computations and has a deep structure, is one of the representative algorithms for deep learning.
The embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart of a method for generating a virtual viewpoint based on a camera pose according to an embodiment of the present invention, where the method is applied to a server or a terminal device to generate a virtual viewpoint. As shown in fig. 1, a method for generating a virtual viewpoint based on a camera pose provided by this embodiment includes:
step 101, constructing a three-dimensional point corresponding to a matching point pair according to the matching point pair between two adjacent viewpoint images and internal parameters and external parameters of two cameras in a camera array corresponding to the adjacent viewpoint images;
and 102, carrying out re-projection on each three-dimensional point according to the target camera attitude and the internal parameters of each target camera corresponding to the selected camera attitude, and determining a virtual viewpoint image corresponding to the camera attitude.
After three-dimensional points are reconstructed through matching point pairs between two adjacent viewpoint images and internal parameters and external parameters of two cameras corresponding to the two viewpoint images, the three-dimensional points are re-projected according to target camera postures and internal parameters of target cameras corresponding to the selected camera postures to determine virtual viewpoint images corresponding to the camera postures, so that the reconstructed three-dimensional points can be re-projected based on any camera postures to generate virtual viewpoint images of corresponding positions, virtual viewpoint images of different positions can be generated by selecting different camera postures, and the virtual viewpoint images can be generated without correcting left and right images, so that the generation efficiency of the virtual viewpoints is improved.
In step 101, the internal parameters and the external parameters of the camera may be obtained by calibrating the camera through a checkerboard image captured by the camera. Specifically, a real world coordinate system is defined by a checkerboard image shot by a camera in advance, and a 3D coordinate of each checkerboard point in the checkerboard image in the world coordinate system is obtained. Meanwhile, carrying out corner detection on the checkerboard image to obtain the 2D pixel coordinate of each checkerboard point. After obtaining the 3D coordinate of each checkerboard point in the world coordinate system and the 2D pixel coordinate of each checkerboard point in the plane rectangular coordinate system, calibrating the camera shooting the checkerboard image by converting a formula X into K [ R | T ] X (formula 1) according to the relation between the 3D coordinate and the 2D pixel coordinate of any checkerboard point, and obtaining the internal parameters and the external parameters of the camera.
Wherein X is a homogeneous 2D pixel coordinate, X is a homogeneous 3D coordinate, K is a camera intrinsic parameter, and R and T are rotation and translation in the camera extrinsic parameter, respectively.
In an embodiment, for two adjacent viewpoint images, the two viewpoint images may be matched in an image matching manner to obtain a first matching point of a first viewpoint image in the two viewpoint images and a second matching point of a second viewpoint image in the two viewpoint images, where the second viewpoint image is matched with the first matching point of the first viewpoint image, and the first matching point and the second matching point are used as a matching point pair.
Considering that, for adjacent viewpoint images photographed by the camera array, since it is difficult to horizontally place a plurality of cameras, optical axes are not necessarily parallel, and there is both horizontal parallax and vertical parallax between the photographed viewpoint images. Therefore, in order to obtain an accurate matching point pair, in an embodiment, before constructing a three-dimensional point corresponding to a matching point pair between adjacent viewpoint images according to the matching point pair and the internal parameters and the external parameters of two cameras corresponding to the adjacent viewpoint images, the method further includes:
extracting a first matching point from a first viewpoint image of two adjacent viewpoint images;
extracting a second matching point matched with the first matching point from a second viewpoint image of two adjacent viewpoint images according to the trained optical flow estimation model;
and forming a matching point pair according to the first matching point and the second matching point.
In one embodiment, a first matching point is extracted from a first viewpoint image, then an optical flow estimation model is used for performing matching estimation between viewpoints, and a second matching point matched with the first matching point is extracted from a second viewpoint image. Considering that the convolution depth is high in accuracy and efficiency of optical flow estimation through the network, the optical flow estimation model can be an optical flow estimation model obtained based on convolutional neural network training.
Illustratively, the optical flow estimation model may be a RAFT model. Because RAFT adopts a recurrent neural network to carry out multiple iterations, the balance between the computation speed of the matching point pairs and the prediction performance requirements can be met by increasing or reducing the iteration times.
Two matched matching points are respectively extracted from two adjacent viewpoint images through an optical flow estimation model to form a matching point pair, so that the problem that the obtained matching point pair is inaccurate due to horizontal parallax and vertical parallax between the adjacent viewpoint images is solved, the accuracy and efficiency of the obtained matching point pair are improved, and the accuracy of a virtual viewpoint image determined by using the matching point pair subsequently is improved.
In an embodiment, after obtaining the matching point pairs of the two adjacent viewpoint images and the corresponding internal parameters and external parameters of the two cameras, the positions of the three-dimensional points can be obtained through triangulation.
Illustratively, the first matching point p in a pair of matching points and the inside-outside parameters of two known cameras1And a second matching point p2The positions O of the two cameras can then be derived from the internal and external parameters1And O2And a first matching point p1And a second matching point p2ObtainingLight direction O corresponding to two cameras respectively1p1And O2p2. If all parameters are calculated absolutely accurately, two rays O1p1And O2p2Will intersect at a point P in three-dimensional space as shown in fig. 2. The point P is the three-dimensional point corresponding to the matching point pair.
However, considering that there may be errors in camera calibration or optical flow estimation, O is the time when1p1And O2p2Due to errors, the images cannot be intersected, so that corresponding three-dimensional points cannot be obtained, and generation of subsequent virtual viewpoint images is influenced. To this end, in an embodiment, the constructing a three-dimensional point corresponding to a matching point pair according to the matching point pair between two adjacent viewpoint images and the internal parameter and the external parameter of two cameras corresponding to the adjacent viewpoint images includes:
determining two light ray directions corresponding to two cameras according to a matching point pair between two adjacent viewpoint images and internal parameters and external parameters of the two cameras corresponding to the two adjacent viewpoint images;
and processing the two light ray directions according to the direct linear change to construct a three-dimensional point corresponding to the matching point pair.
In one embodiment, two rays O are determined1p1And O2p2Then, the light O is detected1p1And O2p2Whether or not to intersect. If the three-dimensional points are intersected, the intersected points are directly used as the three-dimensional points. If the points are not intersected, the nearest 3D point can be solved through a direct linear change method to serve as a reconstructed 3D point, and the specific implementation can be solved through a triangulatePoints function in an opencv library, so that a three-dimensional point corresponding to the matching point pair is constructed.
When two rays of two cameras corresponding to two adjacent viewpoint images are determined to be not intersected, the directions of the two rays are processed through direct linear change to construct a three-dimensional point, so that the situation that the three-dimensional point cannot be constructed due to camera calibration errors or matching point errors is avoided, and the generation efficiency of subsequent virtual viewpoint images is further improved.
In step 102, after obtaining each three-dimensional point, the target camera pose and the internal parameters of each corresponding target camera may be determined from the camera array based on any selected camera pose, and then the target camera pose, that is, the external parameters and the internal parameters, are re-projected to each three-dimensional point according to the above formula 1, so as to obtain a virtual viewpoint image at a position corresponding to the given camera pose.
The arbitrarily selected camera pose may be each target camera positioned in a horizontal position, that is, each target camera for horizontal shooting, or each target camera in a ring pose, or the like.
In particular, when the selected camera pose is a horizontal position, since the respective target camera positions for horizontal shooting may not be on the same horizontal line, the optical axes are not parallel as shown in fig. 3. Resulting in that the finally formed virtual visual point image may not be accurate enough. To this end, in an embodiment, the re-projecting each three-dimensional point according to the target camera pose and the internal parameters of each target camera corresponding to the selected camera pose, and determining the virtual viewpoint image corresponding to the camera pose includes:
when the selected camera posture is a horizontal position, carrying out camera position leveling on each target camera according to the three-dimensional coordinates of each target camera in the camera array, and acquiring leveling coordinates of each target camera;
determining the target camera attitude of each target camera according to the leveling coordinates of each target camera and the leveling rotation angle of each target camera obtained by averaging the rotation angle of each target camera;
and carrying out re-projection on each three-dimensional point according to the target camera pose and the internal parameters of each target camera, and determining a virtual viewpoint image corresponding to the camera pose.
In one embodiment, when the selected camera pose is a horizontal position, three-dimensional coordinates of each target camera may be determined based on internal and external parameters of each target camera in the horizontal position. And averaging the vertical coordinates in the three-dimensional coordinates of each target camera to obtain average vertical coordinates, and performing straight line fitting on the horizontal coordinates and the vertical coordinates in the three-dimensional coordinates of each target camera according to a least square method. Since each target camera is a camera in a horizontal position, the abscissa of each camera is not changed, and a new ordinate of each target camera can be obtained according to the least square method. And obtaining the leveling coordinate of the target camera according to the abscissa, the new ordinate and the average ordinate of the target camera.
For example, assuming there are 5 target cameras, and for the vertical coordinate Z, assuming that the Z coordinates of the 5 camera positions are Z1-Z5, respectively, then the average vertical coordinate of the 5 target virtual cameras is:
Figure BDA0003392227100000101
for the X and Y directions, a least squares method is used for straight line fitting, and an error function is established as follows:
Figure BDA0003392227100000102
and a and b are the slope and intercept of the straight line, and a straight line equation can be obtained after solving. The original X coordinate of each target camera is unchanged, and a new Y coordinate can be obtained through a linear equation. The leveling coordinate of each target camera after position leveling is
Figure BDA0003392227100000103
Due to the different rotations of the different cameras, the optical axes of the target cameras may not be parallel, in which case the rotation matrix in the extrinsic parameters of each target camera may be converted into rotation angles in order to level the optical axes. Specifically, assume that the rotation matrix R of the target camera is:
Figure BDA0003392227100000104
the three rotation angles of the target camera are:
θx=atan2(r32,r33)
Figure BDA0003392227100000111
θz=atan2(r21,r11)
in obtaining the rotation angle of each target camera
Figure BDA0003392227100000112
Then, the rotation angles of all the target cameras are averaged to obtain the leveling rotation angle of each target camera
Figure BDA0003392227100000113
Where i represents the serial number of the target camera.
After the leveling coordinates and the leveling rotation angles of the target cameras are obtained, the target cameras can be leveled integrally, and the target camera postures of the target cameras can be determined, as shown in fig. 4.
After the target camera pose of each target camera is obtained, the three-dimensional points can be re-projected according to the target camera pose and the internal parameters of each target camera, and the virtual viewpoint image is determined.
By leveling the position of the camera and leveling the optical axis of each target camera before the three-dimensional point is re-projected, the problem that when the camera which is horizontally shot is re-projected to generate a virtual viewpoint image, the generated virtual viewpoint image is inaccurate due to the fact that the positions of the cameras are not on the same horizontal line and/or the optical axes are not parallel is avoided, and therefore the quality of the generated virtual viewpoint image is improved.
In an embodiment, the re-projecting each three-dimensional point and determining a virtual viewpoint image corresponding to the camera pose includes:
carrying out re-projection on each three-dimensional point by taking two adjacent viewpoint images as reference images, and determining two adjacent initial virtual viewpoint images corresponding to the camera postures;
and filling holes in the left images in the adjacent initial virtual viewpoint images to generate virtual viewpoint images.
In specific practice, it is found that the virtual viewpoint image obtained after the re-projection may have a hole, and at this time, in order to improve the quality of the virtual viewpoint image, the hole may be filled, so that the quality of the virtual viewpoint image is improved.
In one embodiment, for the holes with larger area, the filling can be performed by bidirectional fusion. Specifically, the filling the hole in the left image in the adjacent initial virtual viewpoint image includes:
acquiring a first hole position at which the hole area in the left image of the adjacent initial virtual viewpoint image is larger than a preset area;
and acquiring a pixel corresponding to the first hole position from the right image and filling the pixel to the first hole position.
Because the reference image comprises two adjacent viewpoint images when the re-projection is carried out, the left image V is obtained after the re-projectionlAnd right picture VrTwo adjacent initial virtual viewpoint images. Due to VlAnd VrThe hole directions are not consistent due to the translation of the left image and the right image in the reference image, so that when the hole position with a larger area exists, V can be usedrOf (2) corresponding pixel fill VlAfter the position of the cavity, the filled VlAs a virtual viewpoint image.
In one embodiment, the method further comprises:
acquiring a second cavity position of which the cavity area in the left drawing is smaller than or equal to the preset area;
and filling the second cavity position with a cavity through a closing operation.
In one embodiment, for a fine hole, the hole can be directly filled through a closing operation, so as to obtain a virtual viewpoint image.
The following describes a camera pose-based virtual viewpoint generating apparatus provided in an embodiment of the present application, and the camera pose-based virtual viewpoint generating apparatus described below and the camera pose-based virtual viewpoint generating method described above may be referred to in correspondence with each other.
In an embodiment, as shown in fig. 5, there is provided a virtual viewpoint generating apparatus based on camera pose, including:
a three-dimensional point constructing module 210, configured to construct a three-dimensional point corresponding to a matching point pair between two adjacent viewpoint images according to the matching point pair and internal parameters and external parameters of two cameras in a camera array corresponding to the adjacent viewpoint images;
the virtual viewpoint generating module 220 is configured to re-project each three-dimensional point according to the target camera pose and the internal parameter of each target camera corresponding to the selected camera pose, and determine a virtual viewpoint image corresponding to the camera pose.
In an embodiment, the three-dimensional point construction module 210 is further configured to:
extracting a first matching point from a first viewpoint image of two adjacent viewpoint images;
extracting a second matching point matched with the first matching point from a second viewpoint image of two adjacent viewpoint images according to the trained optical flow estimation model;
and forming a matching point pair according to the first matching point and the second matching point.
In an embodiment, the three-dimensional point construction module 210 is specifically configured to:
determining two light ray directions corresponding to two cameras according to a matching point pair between two adjacent viewpoint images and internal parameters and external parameters of the two cameras corresponding to the two adjacent viewpoint images;
and processing the two light ray directions according to the direct linear change to construct a three-dimensional point corresponding to the matching point pair.
In an embodiment, the virtual viewpoint generating module 220 is specifically configured to:
when the selected camera posture is a horizontal position, carrying out camera position leveling on each target camera according to the three-dimensional coordinates of each target camera in the camera array, and acquiring leveling coordinates of each target camera;
determining the target camera attitude of each target camera according to the leveling coordinates of each target camera and the leveling rotation angle of each target camera obtained by averaging the rotation angle of each target camera;
and carrying out re-projection on each three-dimensional point according to the target camera pose and the internal parameters of each target camera, and determining a virtual viewpoint image corresponding to the camera pose.
In an embodiment, the virtual viewpoint generating module 220 is specifically configured to:
carrying out re-projection on each three-dimensional point by taking two adjacent viewpoint images as reference images, and determining two adjacent initial virtual viewpoint images corresponding to the camera postures;
and filling holes in the left images in the adjacent initial virtual viewpoint images to generate virtual viewpoint images.
In an embodiment, the virtual viewpoint generating module 220 is specifically configured to:
acquiring a first hole position at which the hole area in the left image of the adjacent initial virtual viewpoint image is larger than a preset area;
and acquiring a pixel corresponding to the first hole position from the right image and filling the pixel to the first hole position.
In an embodiment, the virtual viewpoint generating module 220 is further configured to:
acquiring a second cavity position of which the cavity area in the left drawing is smaller than or equal to the preset area;
and filling the second cavity position with a cavity through a closing operation.
Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: a processor (processor)810, a Communication Interface 820, a memory 830 and a Communication bus 840, wherein the processor 810, the Communication Interface 820 and the memory 830 communicate with each other via the Communication bus 840. The processor 810 may invoke computer programs in the memory 830 to perform the steps of a camera pose based virtual viewpoint generation method, for example comprising:
constructing a three-dimensional point corresponding to a matching point pair according to the matching point pair between two adjacent viewpoint images and internal parameters and external parameters of two cameras in a camera array corresponding to the adjacent viewpoint images;
and according to the target camera attitude and the internal parameters of each target camera corresponding to the selected camera attitude, carrying out re-projection on each three-dimensional point, and determining a virtual viewpoint image corresponding to the camera attitude.
In addition, the logic instructions in the memory 830 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present application further provides a computer program product, where the computer program product includes a computer program, the computer program may be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, a computer can perform the steps of the camera pose based virtual viewpoint generation method provided in the foregoing embodiments, for example, the steps include:
constructing a three-dimensional point corresponding to a matching point pair according to the matching point pair between two adjacent viewpoint images and internal parameters and external parameters of two cameras in a camera array corresponding to the adjacent viewpoint images;
and according to the target camera attitude and the internal parameters of each target camera corresponding to the selected camera attitude, carrying out re-projection on each three-dimensional point, and determining a virtual viewpoint image corresponding to the camera attitude.
On the other hand, embodiments of the present application further provide a processor-readable storage medium, where the processor-readable storage medium stores a computer program, where the computer program is configured to cause a processor to perform the steps of the method provided in each of the above embodiments, for example, including:
constructing a three-dimensional point corresponding to a matching point pair according to the matching point pair between two adjacent viewpoint images and internal parameters and external parameters of two cameras in a camera array corresponding to the adjacent viewpoint images;
and according to the target camera attitude and the internal parameters of each target camera corresponding to the selected camera attitude, carrying out re-projection on each three-dimensional point, and determining a virtual viewpoint image corresponding to the camera attitude.
The processor-readable storage medium can be any available medium or data storage device that can be accessed by a processor, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A virtual viewpoint generation method based on camera pose is characterized by comprising the following steps:
constructing a three-dimensional point corresponding to a matching point pair according to the matching point pair between two adjacent viewpoint images and internal parameters and external parameters of two cameras in a camera array corresponding to the adjacent viewpoint images;
and according to the target camera attitude and the internal parameters of each target camera corresponding to the selected camera attitude, carrying out re-projection on each three-dimensional point, and determining a virtual viewpoint image corresponding to the camera attitude.
2. The camera pose-based virtual visual point generating method according to claim 1, wherein before constructing the three-dimensional points corresponding to the matching point pairs from the matching point pairs between the adjacent visual point images and the internal parameters and the external parameters of the two cameras corresponding to the adjacent visual point images, further comprising:
extracting a first matching point from a first viewpoint image of two adjacent viewpoint images;
extracting a second matching point matched with the first matching point from a second viewpoint image of two adjacent viewpoint images according to the trained optical flow estimation model;
and forming a matching point pair according to the first matching point and the second matching point.
3. The method of claim 1, wherein constructing three-dimensional points corresponding to the pairs of matching points according to the pairs of matching points between two adjacent viewpoint images and the intrinsic parameters and the extrinsic parameters of two cameras corresponding to the two adjacent viewpoint images comprises:
determining two light ray directions corresponding to two cameras according to a matching point pair between two adjacent viewpoint images and internal parameters and external parameters of the two cameras corresponding to the two adjacent viewpoint images;
and processing the two light ray directions according to the direct linear change to construct a three-dimensional point corresponding to the matching point pair.
4. The method of claim 1, wherein the re-projecting each three-dimensional point according to the target camera pose of each target camera corresponding to the selected camera pose and the internal parameters to determine the virtual viewpoint image corresponding to the camera pose comprises:
when the selected camera posture is a horizontal position, carrying out camera position leveling on each target camera according to the three-dimensional coordinates of each target camera in the camera array, and acquiring leveling coordinates of each target camera;
determining the target camera attitude of each target camera according to the leveling coordinates of each target camera and the leveling rotation angle of each target camera obtained by averaging the rotation angle of each target camera;
and carrying out re-projection on each three-dimensional point according to the target camera pose and the internal parameters of each target camera, and determining a virtual viewpoint image corresponding to the camera pose.
5. The method of claim 1 or 4, wherein the re-projecting each three-dimensional point to determine a virtual viewpoint image corresponding to the camera pose comprises:
carrying out re-projection on each three-dimensional point by taking two adjacent viewpoint images as reference images, and determining two adjacent initial virtual viewpoint images corresponding to the camera postures;
and filling holes in the left images in the adjacent initial virtual viewpoint images to generate virtual viewpoint images.
6. The method of claim 5, wherein the hole filling of the left image in the adjacent initial virtual viewpoint images comprises:
acquiring a first hole position at which the hole area in the left image of the adjacent initial virtual viewpoint image is larger than a preset area;
and acquiring a pixel corresponding to the first hole position from the right image and filling the pixel to the first hole position.
7. The camera pose-based virtual viewpoint generation method according to claim 6, further comprising:
acquiring a second cavity position of which the cavity area in the left drawing is smaller than or equal to the preset area;
and filling the second cavity position with a cavity through a closing operation.
8. A virtual viewpoint generating apparatus based on a camera pose, comprising:
the three-dimensional point construction module is used for constructing three-dimensional points corresponding to matching point pairs according to the matching point pairs between two adjacent viewpoint images and internal parameters and external parameters of two cameras in a camera array corresponding to the adjacent viewpoint images;
and the virtual viewpoint generating module is used for carrying out re-projection on each three-dimensional point according to the target camera attitude and the internal parameters of each target camera corresponding to the selected camera attitude, and determining a virtual viewpoint image corresponding to the camera attitude.
9. An electronic device comprising a processor and a memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the camera pose based virtual viewpoint generation method according to any one of claims 1 to 7.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, realizes the steps of the camera pose based virtual viewpoint generation method of any one of claims 1 to 7.
CN202111467673.4A 2021-12-03 2021-12-03 Virtual viewpoint generation method and device based on camera posture Pending CN114399553A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111467673.4A CN114399553A (en) 2021-12-03 2021-12-03 Virtual viewpoint generation method and device based on camera posture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111467673.4A CN114399553A (en) 2021-12-03 2021-12-03 Virtual viewpoint generation method and device based on camera posture

Publications (1)

Publication Number Publication Date
CN114399553A true CN114399553A (en) 2022-04-26

Family

ID=81225713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111467673.4A Pending CN114399553A (en) 2021-12-03 2021-12-03 Virtual viewpoint generation method and device based on camera posture

Country Status (1)

Country Link
CN (1) CN114399553A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116320358A (en) * 2023-05-19 2023-06-23 成都工业学院 Parallax image prediction device and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116320358A (en) * 2023-05-19 2023-06-23 成都工业学院 Parallax image prediction device and method
CN116320358B (en) * 2023-05-19 2023-12-01 成都工业学院 Parallax image prediction device and method

Similar Documents

Publication Publication Date Title
US11010924B2 (en) Method and device for determining external parameter of stereoscopic camera
CN106529495B (en) Obstacle detection method and device for aircraft
CN109313814B (en) Camera calibration system
WO2019127445A1 (en) Three-dimensional mapping method, apparatus and system, cloud platform, electronic device, and computer program product
CN110176032B (en) Three-dimensional reconstruction method and device
Kang et al. Two-view underwater structure and motion for cameras under flat refractive interfaces
CN105654547B (en) Three-dimensional rebuilding method
EP2847741A1 (en) Camera scene fitting of real world scenes for camera pose determination
CN109840922B (en) Depth acquisition method and system based on binocular light field camera
WO2020119467A1 (en) High-precision dense depth image generation method and device
CN114401391B (en) Virtual viewpoint generation method and device
TW202103106A (en) Method and electronic device for image depth estimation and storage medium thereof
CN109902675B (en) Object pose acquisition method and scene reconstruction method and device
EP3229209B1 (en) Camera calibration system
CN112907727A (en) Calibration method, device and system of relative transformation matrix
CN111080784A (en) Ground three-dimensional reconstruction method and device based on ground image texture
Gadasin et al. Reconstruction of a Three-Dimensional Scene from its Projections in Computer Vision Systems
CN109714587A (en) A kind of multi-view image production method, device, electronic equipment and storage medium
Jang et al. Egocentric scene reconstruction from an omnidirectional video
CN114399553A (en) Virtual viewpoint generation method and device based on camera posture
Sergiyenko et al. Multi-view 3D data fusion and patching to reduce Shannon entropy in Robotic Vision
CN109859313B (en) 3D point cloud data acquisition method and device, and 3D data generation method and system
JP7195785B2 (en) Apparatus, method and program for generating 3D shape data
CN112258635B (en) Three-dimensional reconstruction method and device based on improved binocular matching SAD algorithm
CN110148086B (en) Depth filling method and device for sparse depth map and three-dimensional reconstruction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination