Disclosure of Invention
It is an object of the present invention to provide a method and apparatus for three-dimensional model reconstruction of rare birds based on a neuro-radiation field that overcomes or at least alleviates at least one of the above-mentioned drawbacks of the prior art.
In order to achieve the above object, the present invention provides a rare bird three-dimensional model reconstruction method based on a nerve radiation field, comprising:
step 1, collecting discrete rotating light field data of rare bird targets;
step 2, estimating camera pose corresponding to a scene multi-viewpoint image corresponding to the discrete rotation light field data;
step 3, according to the camera pose corresponding to the scene multi-viewpoint image obtained in the step 2, obtaining a transformation matrix of the camera pose of the multi-viewpoint image, and further transforming the discrete rotating light field data of the rare bird target object acquired in the step 1 into an NDC (Normalized Device Coordinate, equipment coordinate normalization) space through homogeneous coordinate transformation and ray transformation in the NDC;
step 4, generating a new viewpoint image of the rotating light field under the NDC space of the discrete rotating light field data by utilizing the nerve radiation field;
and 5, reconstructing a three-dimensional model of the rare birds according to the new viewpoint image of the rotating light field generated in the step 4.
Further, the method of the nerve radiation field in the step 4 specifically includes:
step 41, describing the geometric shape and color information of the bird three-dimensional model by using a nerve field, wherein the geometric shape and color information comprise a spatial position vector x and a direction vector d, mapping the spatial position vector x and the direction vector d into an N-dimensional space respectively by using the following formula (17) to obtain coordinate codes gamma (x) and direction codes gamma (d), inputting the coordinate codes gamma (x) and the direction codes gamma (d) into a nerve radiation field, and outputting the volume density sigma (x) of the position and a color value c (x; d) of the position in the d direction by using the nerve radiation field:
in step 42, the original ray r=o+td reaching the pixel point P of the camera along the optical center of the camera, the color value C (r) of the pixel point P is obtained by the following formula (18):
where σ (r (s)) is the bulk density of the ray r(s) at the parameter s point, σ (r (t)) is the bulk density of the ray r (t) at the parameter t point, t n And t f C (r (t), d) is the pixel value of the ray r (t) in the d direction, along the closest and furthest points, respectively, of the original ray r.
Further, the camera pose corresponding to the scene multi-view image in step 2 includes a rotation matrix R, and the method for acquiring the rotation matrix R specifically includes:
in step 21a, the rotation matrix of the camera rotating around the x-axis by an angle α is expressed as the following formula (6), the rotation matrix rotating around the y-axis by an angle β is expressed as the following formula (7), and the rotation matrix rotating around the z-axis by an angle γ is expressed as the following formula (8):
step 22a, the rotation matrix R obtained by multiplying the above formula (6), formula (7) and formula (8) is expressed as the following formula (9):
further, in step 3, the method for homogeneous coordinate transformation in NDC specifically includes:
homogeneous coordinate point (x, y, z, 1) T Corresponding coordinate point p= (x ', y ', z ', 1) in NDC space T Is of the following formula (13):
where n and f are the distances from the near and far clipping planes to the origin, respectively, and r and a are the right and upper bounds of the scene on the near clipping planes, respectively.
Further, in step 3, the method of ray transformation in NDC specifically includes:
transforming the original ray r into a ray in the NDC space, the origin o 'and direction d' of the ray in the NDC space being represented by the following formulas (14), (15), respectively:
wherein n and f are the distances from the near and far shearing planes to the origin o, r and a are the right and upper boundaries of the scene on the near shearing planes, o x 、o y 、o z Coordinate values of origin o of original rays in x, y and z directions respectively, d x 、d y 、d z Coordinate values of the directions d of the original rays in the x, y and z directions are respectively shown.
The invention also provides a rare bird three-dimensional model reconstruction device based on the nerve radiation field, which comprises:
the data acquisition unit is used for acquiring discrete rotating light field data of rare bird targets;
a camera pose estimation unit for estimating a camera pose corresponding to a scene multi-view image corresponding to the discrete rotation light field data;
the NDC space conversion unit is used for obtaining a conversion matrix of the camera pose of the multi-view image according to the camera pose corresponding to the multi-view image of the scene acquired by the camera pose estimation unit, and further converting the discrete rotating light field data of the rare bird target object acquired by the data acquisition unit into an NDC space through homogeneous coordinate conversion and ray conversion in the NDC;
a new viewpoint image acquisition unit for generating a rotating light field new viewpoint image under a discrete rotating light field data NDC space using a neural radiation field;
and a rare bird three-dimensional model reconstruction unit for reconstructing a rare bird three-dimensional model from the rotated light field new viewpoint image generated by the new viewpoint image acquisition unit.
Further, the method for obtaining the nerve radiation field of the new viewpoint image acquisition unit specifically includes:
a coding subunit for describing geometry and color information of the bird three-dimensional model with the nerve field, including a spatial position vector x and a direction vector d, and then mapping the spatial position vector x and the direction vector d to N-dimensional space respectively using the following formula (17) to obtain coordinate codes γ (x) and direction codes γ (d), and inputting the coordinate codes γ (x) and direction codes γ (d) to the nerve radiation field, and outputting the volume density σ (x) of the position and the pixel value c (x, d) of the position in the d direction from the nerve radiation field:
a pixel color value calculation subunit for obtaining a color value C (r) of a pixel point P of the camera along an original ray r=o+td of the camera's optical center to the pixel point P by the following formula (18):
where σ (r (s)) is the bulk density of the ray r(s) at the parameter s point, σ (r (t)) is the bulk density of the ray r (t) at the parameter t point, t n And t f C (r (t), d) is the pixel value of the ray r (t) in the d direction, along the closest and furthest points, respectively, of the original ray r.
Further, the camera pose corresponding to the scene multi-view image of the camera pose estimation unit comprises a rotation matrix R, and the method for acquiring the rotation matrix R specifically comprises the following steps:
in step 21a, the rotation matrix of the camera rotating around the x-axis by an angle α is expressed as the following formula (6), the rotation matrix rotating around the y-axis by an angle β is expressed as the following formula (7), and the rotation matrix rotating around the z-axis by an angle γ is expressed as the following formula (8):
step 22a, the rotation matrix R obtained by multiplying the above formula (6), formula (7) and formula (8) is expressed as the following formula (9):
further, the NDC space conversion unit specifically includes a homogeneous coordinate transformation subunit for transforming homogeneous coordinate points (x, y, z, 1) T Transformed into coordinate point p= (x ', y ', z ', 1) in NDC space T Expressed by the following formula (13):
where n and f are the distances from the near and far clipping planes to the origin, respectively, and r and a are the right and upper bounds of the scene on the near clipping planes, respectively.
Further, the NDC space conversion unit specifically includes a ray conversion subunit for converting the original ray r into a ray in the NDC space, and an origin o 'and a direction d' of the ray in the NDC space are expressed as the following formulas (14), (15), respectively:
wherein n and f are the distances from the near and far shearing planes to the origin o, r and a are the right and upper boundaries of the scene on the near shearing planes, o x 、o y 、o z Coordinate values of origin o of original rays in x, y and z directions respectively, d x 、d y 、d z Coordinate values of the directions d of the original rays in the x, y and z directions are respectively shown.
According to the invention, the camera pose corresponding to the scene multi-viewpoint image is firstly obtained, then the consistency space is established through NDC space transformation, the scene is subjected to viewpoint super-resolution along the rotating light field through voxel rendering, a new viewpoint image of the rotating light field is obtained, and finally the new viewpoint image is utilized for three-dimensional image reconstruction, so that the problem that the traditional three-dimensional reconstruction point cloud is sparse and the occlusion and concave-convex parts cannot be reconstructed is effectively solved by the three-dimensional scene reconstructed after super-resolution, and the three-dimensional scene reconstruction with high quality and high precision is completed.
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings and examples.
The rare bird three-dimensional model reconstruction method based on the nerve radiation field specifically comprises the following steps:
and step 1, collecting discrete rotating light field data of rare bird targets.
As shown in fig. 1, from the perspective of the paper, the left box of fig. 1 illustrates an avian target in which the coordinate system is the image (camera) coordinate system xyz, as can also be appreciated: the image coordinates in the rotating light field may be denoted (x, y), ω representing the rotation angle. The right side of fig. 1 shows: the optical axis of the camera is perpendicular to the Z axis of the world coordinate system XYZ, and the camera is used for sampling around the rotation axis Y at equal intervals to obtain discrete rotation light field data of rare bird targets, wherein the data records the light radiation degree of 360 degrees of a three-dimensional scene.
Let the scene rotation center be o, the camera focal length be f, and the coordinates of the camera optical center in world coordinate system XYZ be C (x c ,y c ) The distance between the camera (optical center) and the rotation axis Y is R 0 . For a point P in three-dimensional space, the distance to the axis of rotation Y is R and the initial angle is Φ. In the X-Z plane of the world coordinate system, the X and Z components can be represented by polar coordinates as the following formulas (1) and (2), respectively:
X=Rsin(ω+Φ) (1)
Z=R 0 -Rcos(ω+Φ) (2)
in the range of the rotation angle omega epsilon (0-2 pi), the images sampled at intervals form a three-dimensional image body V (x, y, omega), the characteristic point track curve can be expressed as the following formula (3), and projections in the x direction and the y direction are respectively shown as the formulas (4) to (5), so that discrete rotation light field data are obtained:
the rotation light field consistency characteristic is distributed as a sine curve in a three-dimensional space, the vision consistency characteristic has high-dimensional continuous light intensity distribution characteristic, and the light intensity distribution track contains the relative motion relation between the three-dimensional scene and the camera. The discrete rotating light field data of the rare bird target object collected by the embodiment of the invention can be used for an orthogonal projection model, and is used for accurately estimating the depth of a scene and increasing the foreground and background light information. In the further three-dimensional reconstruction, a high-precision three-dimensional model free from occlusion and noise interference can be generated.
And 2, estimating camera pose corresponding to the scene multi-view image corresponding to the discrete rotation light field data, wherein the camera pose comprises a rotation matrix R and a translation vector t.
Calibrating a point P in three-dimensional space, wherein (X, Y, Z) represents position information of the point P, and the coordinates P (X, Y, Z) may represent a fixed point in three-dimensional space. The camera can also be considered as a three-dimensional space point, and the angles α, β, γ of the camera play an important role in reconstructing the projection, so that an additional 3 degrees of freedom α, β, γ are introduced to represent the rotation of the camera in three-dimensional space. Therefore, the camera is positioned in the world coordinate system, i.e. the pose of the camera requires 6 degrees of freedom α, β, γ, t x 、t y 、t z Representing the degrees of freedom alpha, beta, gamma correspond to the rotation matrix R, t of the camera in the world coordinate system x 、t y 、t z Corresponding to the translation vector t of the camera in the world coordinate system.
In one embodiment, the method for obtaining the rotation matrix R specifically includes:
in step 21a, a rotation matrix is obtained in which the camera rotates by an angle alpha around the x-axis, by an angle beta around the y-axis and by an angle gamma around the z-axis, respectively.
For example: the rotation matrix rotated by an angle α around the x-axis may be represented by, but not limited to, the following formula (6), the rotation matrix rotated by an angle β around the y-axis may be represented by, but not limited to, the following formula (7), and the rotation matrix rotated by an angle γ around the z-axis may be represented by, but not limited to, the following formula (8):
step 22a, obtaining a final rotation matrix R by multiplying the above formula (6), formula (7) and formula (8) as the following formula (9):
in one embodiment, the method for obtaining the translation vector t specifically includes:
t x 、t y 、t z representing the translation distances of the camera along the x-axis, y-axis and z-axis directions, respectively, the translation vector t can be learned by obtaining the translation distances of the camera along the x-axis, y-axis and z-axis directions.
Of course, in addition to selecting the camera pose corresponding to the multi-view image of the scene using the rotation matrix R and the translation vector t of the camera in the world coordinate system as in the above embodiment, the person skilled in the art may also represent the camera pose corresponding to the multi-view image of the scene in a plurality of different manners, such as rotation vectors, quaternions, euler angles, and the like.
Step 3, according to the camera pose corresponding to the multi-view image of the scene obtained in step 2, obtaining a transformation matrix T of the camera pose of the multi-view image, and further obtaining a parameter p= (x ', y ', z ', 1) through homogeneous coordinate transformation and ray transformation in NDC (Normalized Device Coordinate, device coordinate normalization) T Origin o' of rays in NDC space, in NDC spaceThe direction d' of the ray.
In the embodiment, the discrete rotating light field data of the rare bird target object acquired in the step 1 is converted into an NDC space, so that a real scene can be reconstructed in the NDC space later.
In one embodiment, the transformation matrix T of the multi-viewpoint image camera pose is obtained by the following equation (10):
T=[R|t] (10)
in one embodiment, the homogeneous coordinate points (x, y, z, 1) are transformed by homogeneous coordinates in the NDC T Transformed into coordinate points (x ', y ', z ', 1) in NDC space T Where (x, y, z) is the camera view coordinates and (x ', y ', z ') is the NDC spatial coordinates. The homogeneous coordinate transformation method in the NDC specifically comprises the following steps:
for homogeneous coordinate points (x, y, z, 1) T The standard three-dimensional perspective projection matrix M under homogeneous coordinates is:
where n, f are the distances from the near and far clipping planes to the origin, respectively, and r and a are the right and upper bounds of the scene on the near clipping planes, respectively. To transform a homogeneous coordinate point (x, y, z, 1) T Multiplying M to the left and then dividing by the fourth coordinate yields the following formula (12):
thus, homogeneous coordinate points (x, y, z, 1) T Corresponding coordinate point p= (x ', y ', z ', 1) in NDC space T Is of the following formula (13):
in one embodiment, an original ray o+td is transformed into a ray o '+t'd 'in NDC space by a ray transformation in NDC, where o is the origin of the original ray, d is the direction of the original ray, t represents the parameter t point of the original ray o+td, o' is the origin of the original ray in NDC space, d 'is the direction of the original ray in NDC space, and t' represents the parameter t 'point of the ray o' +t'd'.
The method for transforming the rays in the NDC specifically comprises the following steps:
transforming the original ray o+td into a ray in the NDC space, the origin o 'and direction d' of the ray in the NDC space being represented by the following formulas (14), (15), respectively:
wherein n and f are the distances from the near and far shearing planes to the origin o, r and a are the right and upper boundaries of the scene on the near shearing planes, o x 、o y 、o z Coordinate values of origin o of original rays in x, y and z directions respectively, d x 、d y 、d z Coordinate values of the directions d of the original rays in the x, y and z directions, o 'respectively' x 、o' y 、o' z Three coordinates, d 'of origin o transformed to origin in NDC space, respectively' x 、d' y 、d' z Three coordinates of the direction d transformed into NDC space, respectively.
After the original rays are transformed into NDC space rays, the phenomenon that when a neural radiation field expresses a 360-degree scene, rays in the background are captured by mistake in the foreground, so that a large number of fuzzy noise points exist in an output result can be avoided. The present invention thus models a 360 ° scene around a set NDC spatially normalized grid and multi-view images.
And 4, generating a new viewpoint image of the rotating light field in the NDC space by utilizing the nerve radiation field, so that the scene is subjected to viewpoint super-resolution along the rotating light field through the new viewpoint image of the rotating light field. The rotating light field visual consistency characteristic after the viewpoint super resolution has high-dimensional continuous light intensity distribution characteristics, contains a lot of information (such as foreground and background juncture, high-frequency texture detail positions, scene concave-convex and shielding positions) lacking before, and can be further used for completing high-quality and high-precision three-dimensional scene reconstruction.
Under the NDC space of the discrete rotation light field data, the nerve radiation field gradually optimizes the voxels in the training process, and enhances the implicit expression of the voxels so as to obtain the rendering result under the new viewpoint.
The method for nerve radiation field specifically comprises the following steps:
at step 41, the radiation field representation of the three-dimensional model and the neural networking thereof.
The geometric shape and color information of the bird three-dimensional model are described by nerve field, which comprises a space position vector x= (x, y, z) and a direction vectorθ represents the polar angle, +.>Represents the azimuth angle, that is, x represents a point of the three-dimensional model, and d represents the direction along the three-dimensional model point x. The space position vector x and the direction vector d are mapped into an N-dimensional space by using the following formula (17), coordinate coding gamma (x) and direction coding gamma (b) are obtained, and then the coordinate coding gamma (x) and the direction coding gamma (b) are input into a nerve radiation field, the nerve radiation field outputs the volume density sigma of the position and the pixel value c= (r, g, b) of the position in the d direction, and r, g and b respectively represent the red component value, the green component value and the blue component value of the pixel value.
In particular, in order to accelerate the convergence speed of the neural network, the position vector and the direction vector are mapped to a space of a higher dimension (N dimension) using high frequency mapping, and then input to the multi-layered perceptron network. Thus, the coordinate code γ (x) and the direction code γ (d) can be obtained, and information in the high-frequency space can be captured. Wherein the high frequency mapping functionγ(v)=(sin(2 0 πv),cos(2 0 πv),...,sin(2 N-1 πv),cos(2 N-1 Pi v), where v represents the input of the function; gamma (-) applies to x and d. Then the following formula (17):
and step 42, voxel rendering.
According to classical voxel rendering principles, voxel density can be understood as the different probabilities that a light beam ends up in a certain infinitely small particle. Therefore, the voxel density and the color can be integrated along the light, N sampling points are uniformly sampled on the light, and the integration weight is the cumulative transmittance from near to far, so that the voxel rendering result is obtained. The original ray r=o+td along the camera optical center reaching the pixel point P of the camera, the color value C (r) of the pixel point P is obtained by the following equation (18):
where σ (r (s)) is the bulk density of the ray r(s) at the parameter s point, σ (r (t)) is the bulk density of the ray r (t) at the parameter t point, t n And t f C (r (t), d) is the pixel value of ray r (t) in the d direction, along the closest and farthest points, respectively, of r.
After the radiation field is expressed by the nerve radiation field, the voxel rendering can be utilized to perform viewpoint super-resolution on the scene along the rotating light field.
And 5, reconstructing a three-dimensional model of the rare birds according to the new viewpoint image of the rotating light field generated in the step 4.
The embodiment of the invention also provides a rare bird three-dimensional model reconstruction device based on the nerve radiation field, which comprises a data acquisition unit, a camera pose estimation unit, an NDC space conversion unit, a new viewpoint image acquisition unit and a rare bird three-dimensional model reconstruction unit, wherein:
the data acquisition unit is used for acquiring discrete rotating light field data of rare bird targets.
And the camera pose estimation unit is used for estimating the camera pose corresponding to the scene multi-view image corresponding to the discrete rotation light field data.
The NDC space conversion unit is used for obtaining a conversion matrix of the camera pose of the multi-view image according to the camera pose corresponding to the multi-view image of the scene acquired by the camera pose estimation unit, and further converting the discrete rotating light field data of the rare bird target object acquired by the data acquisition unit into an NDC space through homogeneous coordinate conversion and ray conversion in the NDC.
The new viewpoint image acquisition unit is used for generating a new viewpoint image of the rotating light field under the space of the discrete rotating light field data NDC by utilizing the nerve radiation field.
The three-dimensional model reconstruction unit is used for reconstructing the three-dimensional model of the rare bird according to the new viewpoint image of the rotating light field generated by the new viewpoint image acquisition unit.
In one embodiment, the new viewpoint image acquisition unit specifically includes an encoding subunit and an encoding subunit:
the coding subunit is used for describing the geometric shape and color information of the bird three-dimensional model by using the nerve field, and comprises a space position vector x and a direction vector d, then the space position vector x and the direction vector d are respectively mapped into an N-dimensional space by using a formula (17) to obtain coordinate codes gamma (x) and direction codes gamma (d), the coordinate codes gamma (d) are input into the nerve radiation field, and the nerve radiation field outputs the volume density sigma (x) of the position and the pixel value c (x; d) of the position in the d direction.
The encoding subunit is configured to obtain, from equation (18), a color value C (r) of a pixel point P of the camera along an original ray r=o+td where the camera optical center reaches the pixel point P.
In one embodiment, the camera pose corresponding to the scene multi-view image of the camera pose estimation unit includes a rotation matrix R, and the method for acquiring the rotation matrix R specifically includes:
in step 21a, the rotation matrix of the camera rotating by an angle α around the x-axis is expressed as formula (6), the rotation matrix rotating by an angle β around the y-axis is expressed as formula (7), and the rotation matrix rotating by an angle γ around the z-axis is expressed as formula (8).
In step 22a, the rotation matrix R obtained by multiplying expression (6), expression (7) and expression (8) is expressed as expression (9).
In one embodiment, the NDC space conversion unit specifically includes a homogeneous coordinate transformation subunit for transforming homogeneous coordinate points (x, y, z, 1) T Transformed into coordinate point p= (x ', y ', z ', 1) in NDC space T Expressed by formula (13).
In one embodiment, the NDC space conversion unit specifically includes a ray transformation subunit for transforming the original ray o+td into a ray in the NDC space, the origin o 'and direction d' of the ray in the NDC space being represented by equations (14), (15), respectively.
Under the NDC space of the discrete rotating light field data, a new viewpoint image of a 360-degree scene can be obtained after the scene is subjected to viewpoint super-resolution, the vision consistency characteristic of the new viewpoint image has high-dimensional continuous light intensity distribution characteristics, and the new viewpoint image contains a lot of information (such as foreground and background boundary points, high-frequency texture detail positions, scene concave-convex and shielding positions) which is lacking before, so that the problem that the traditional three-dimensional reconstruction point cloud is sparse and the shielding and concave-convex positions cannot be reconstructed can be effectively solved by the three-dimensional scene reconstructed after super-resolution, and the high-quality high-precision three-dimensional scene reconstruction method is completed.
In the experiment, the effect of the traditional COLMAP method and the method is compared with the effect of the traditional COLMAP method in three view angles of a front view, a side view and a top view of a bird scene reconstruction result, and the conclusion is that the method can solve the problems of sparse point cloud and incapability of reconstructing shielding and concave-convex parts to a certain extent, and a high-quality high-precision three-dimensional model of rare birds is completed. Moreover, it can be seen that: the COLMAP method loses a large amount of point cloud information at the wings, the neck and the top of the bird, so that a complete bird model cannot be reconstructed in subsequent dense reconstruction, the texture reconstruction effect at the front wings of the bird is poor, a large amount of noise points exist, the material of the bird feathers cannot be accurately restored, and the overall reconstruction effect is poor; after the viewpoint super-resolution is performed by utilizing the nerve radiation field, the generated new viewpoint image complements the missing information of the bird scene, and the data of the concave-convex parts of the bird bodies can be generated at the new viewpoint, so that the abundant texture details of the bird feathers can be successfully restored. The number of the bird scene sparse point clouds is increased, and then the high-precision three-dimensional model is densely reconstructed.
Finally, it should be pointed out that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting. Those of ordinary skill in the art will appreciate that: the technical schemes described in the foregoing embodiments may be modified or some of the technical features may be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.