Disclosure of Invention
It is an object of the present invention to provide a method and apparatus for three-dimensional model reconstruction of rare birds based on a neuro-radiation field that overcomes or at least alleviates at least one of the above-mentioned drawbacks of the prior art.
In order to achieve the above object, the present invention provides a rare bird three-dimensional model reconstruction method based on a nerve radiation field, comprising:
step 1, collecting discrete rotating light field data of rare bird targets;
step 2, estimating camera pose corresponding to a scene multi-viewpoint image corresponding to the discrete rotation light field data;
step 3, according to the camera pose corresponding to the scene multi-viewpoint image obtained in the step 2, obtaining a transformation matrix of the camera pose of the multi-viewpoint image, and further transforming the discrete rotating light field data of the rare bird target object acquired in the step 1 into an NDC (Normalized Device Coordinate, equipment coordinate normalization) space through homogeneous coordinate transformation and ray transformation in the NDC;
step 4, generating a new viewpoint image of the rotating light field under the NDC space of the discrete rotating light field data by utilizing the nerve radiation field;
and 5, reconstructing a three-dimensional model of the rare birds according to the new viewpoint image of the rotating light field generated in the step 4.
Further, the method of the nerve radiation field in the step 4 specifically includes:
step 41, describing the geometric shape and color information of the bird three-dimensional model by using the nerve field, including the space position vector X and the direction vector d, and then mapping the space position vector X and the direction vector d to N-dimensional space respectively by using the following formula (17) to obtain the coordinate codeAnd direction code->Then input into the nerve radiation field, and output the volume density of the position from the nerve radiation field>And the color value of this position in the d-direction +.>:
(17)
Step 42, original ray reaching pixel point P of camera along camera optical centerThe color value of the pixel P is obtained from the following formula (18)>:
(18)
In the method, in the process of the invention,for rays->Bulk density at parameter S, +.>For rays->Bulk density at parameter t, +.>And->The closest and furthest points along the original ray r, respectively +.>For rays->Pixel values in the d-direction.
Further, the camera pose corresponding to the scene multi-view image in step 2 includes a rotation matrix R, and the method for acquiring the rotation matrix R specifically includes:
step 21a, rotating the camera about the X-axisThe rotation matrix of the angle is expressed as the following formula (6), rotated about the y-axis +.>The rotation matrix of the angle is expressed as the following formula (7), which rotates around the Z-axis +.>The rotation matrix of the angle is expressed as the following formula (8):
(6)
(7)
(8)
step 22a, the rotation matrix R obtained by multiplying the above formula (6), formula (7) and formula (8) is expressed as the following formula (9):
(9)。
further, in step 3, the method for homogeneous coordinate transformation in NDC specifically includes:
homogeneous coordinate pointCorresponding to coordinate point +.>Is of the following formula (13):
(13)
wherein n and f are the distances from the near and far shear planes to the origin, r andthe right and upper bounds of the scene on the near-clipping plane, respectively.
Further, in step 3, the method of ray transformation in NDC specifically includes:
transforming the original ray r into a ray in NDC space with the origin of the ray in NDC spaceAnd direction->Represented by the following formulas (14), (15), respectively:
(14)
(15)
wherein n and f are respectively the near and far shearing planes to the originDistance, r and->Respectively near-shear on-plane scenesRight and upper bounds->、/>、/>Origin of original rays respectively->Coordinate values in x, y, z direction,/-respectively>、/>、Coordinate values of the directions d of the original rays in the x, y and z directions are respectively shown.
The invention also provides a rare bird three-dimensional model reconstruction device based on the nerve radiation field, which comprises:
the data acquisition unit is used for acquiring discrete rotating light field data of rare bird targets;
a camera pose estimation unit for estimating a camera pose corresponding to a scene multi-view image corresponding to the discrete rotation light field data;
the NDC space conversion unit is used for obtaining a conversion matrix of the camera pose of the multi-view image according to the camera pose corresponding to the multi-view image of the scene acquired by the camera pose estimation unit, and further converting the discrete rotating light field data of the rare bird target object acquired by the data acquisition unit into an NDC space through homogeneous coordinate conversion and ray conversion in the NDC;
a new viewpoint image acquisition unit for generating a rotating light field new viewpoint image under a discrete rotating light field data NDC space using a neural radiation field;
and a rare bird three-dimensional model reconstruction unit for reconstructing a rare bird three-dimensional model from the rotated light field new viewpoint image generated by the new viewpoint image acquisition unit.
Further, the method for obtaining the nerve radiation field of the new viewpoint image acquisition unit specifically includes:
a coding subunit for describing geometric shape and color information of the bird three-dimensional model with the nerve field, including a spatial position vector X and a direction vector d, and then mapping the spatial position vector X and the direction vector d to N-dimensional space respectively using the following formula (17) to obtain coordinate codesAnd direction code->Then input into the nerve radiation field, and output the volume density of the position from the nerve radiation field>And the pixel value of this position in the d-direction +.>:
(17)
A pixel color value calculation subunit for raw rays reaching a pixel point P of the camera along the camera optical centerThe color value of the pixel P is obtained from the following formula (18)>:
(18)
In the method, in the process of the invention,for rays->Bulk density at parameter S, +.>For rays->Bulk density at parameter t, +.>And->The closest and furthest points along the original ray r, respectively +.>For rays->Pixel values in the d-direction.
Further, the camera pose corresponding to the scene multi-view image of the camera pose estimation unit comprises a rotation matrix R, and the method for acquiring the rotation matrix R specifically comprises the following steps:
step 21a, rotating the camera about the X-axisThe rotation matrix of the angle is expressed as the following formula (6), rotated about the y-axis +.>The rotation matrix of the angle is expressed as the following formula (7), which rotates around the Z-axis +.>The rotation matrix of the angle is expressed as the following formula (8):
(6)
(7)
(8)
step 22a, the rotation matrix R obtained by multiplying the above formula (6), formula (7) and formula (8) is expressed as the following formula (9):
(9)。
further, the NDC space conversion unit specifically includes a homogeneous coordinate transformation subunit for transforming homogeneous coordinate pointsTransformed into coordinate point +.>Expressed by the following formula (13):
(13)
wherein n and f are the distances from the near and far shear planes to the origin, r andthe right and upper bounds of the scene on the near-clipping plane, respectively.
Further, the NDC space conversion unit specifically includes a ray conversion subunit for converting the original ray r into a ray in the NDC space whose origin is the originAnd direction->Represented by the following formulas (14), (15), respectively:
(14)
(15)
wherein n and f are respectively the near and far shearing planes to the originDistance, r and->The right and upper bounds of the scene on the near-clipping plane, respectively, < >>、/>、/>Origin of original rays respectively->Coordinate values in x, y, z direction,/-respectively>、/>、Coordinate values of the directions d of the original rays in the x, y and z directions are respectively shown.
According to the invention, the camera pose corresponding to the scene multi-viewpoint image is firstly obtained, then the consistency space is established through NDC space transformation, the scene is subjected to viewpoint super-resolution along the rotating light field through voxel rendering, a new viewpoint image of the rotating light field is obtained, and finally the new viewpoint image is utilized for three-dimensional image reconstruction, so that the problem that the traditional three-dimensional reconstruction point cloud is sparse and the occlusion and concave-convex parts cannot be reconstructed is effectively solved by the three-dimensional scene reconstructed after super-resolution, and the three-dimensional scene reconstruction with high quality and high precision is completed.
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings and examples.
The rare bird three-dimensional model reconstruction method based on the nerve radiation field specifically comprises the following steps:
and step 1, collecting discrete rotating light field data of rare bird targets.
As shown in fig. 1, from the perspective of the paper, the left box of fig. 1 illustrates an avian target in which the coordinate system is the image (camera) coordinate system XYZ, as can also be appreciated: the image coordinates in the rotating light field can be expressed as,/>Indicating the rotation angle. The right side of fig. 1 shows: the optical axis of the camera is perpendicular to the Z axis of the world coordinate system XYZ, and the camera is used for sampling around the rotation axis Y at equal intervals to obtain discrete rotation light field data of rare bird targets, wherein the data records the light radiation degree of 360 degrees of a three-dimensional scene.
Setting the rotation center of the scene as o, the focal length of the camera as f, and the coordinates of the optical center of the camera in the world coordinate system XYZ asThe distance between the camera (optical center) and the rotation axis Y is R. For a point P in three-dimensional space, the distance to the axis of rotation Y is R, the initial angle is +.>. In the X-Z plane of the world coordinate system, the X and Z components can be represented by polar coordinates as the following formulas (1) and (2), respectively:
(1)
(2)
at a rotation angleWithin the scope of this, the images sampled at intervals form a three-dimensional image volume +.>The characteristic point locus curve can be expressed as the following formula (3), and projections in the X-direction and the y-direction are respectively shown as formulas (4) to (5), thereby obtaining discrete rotation light field data:
(3)
(4)
(5)
the rotation light field consistency characteristic is distributed as a sine curve in a three-dimensional space, the vision consistency characteristic has high-dimensional continuous light intensity distribution characteristic, and the light intensity distribution track contains the relative motion relation between the three-dimensional scene and the camera. The discrete rotating light field data of the rare bird target object collected by the embodiment of the invention can be used for an orthogonal projection model, and is used for accurately estimating the depth of a scene and increasing the foreground and background light information. In the further three-dimensional reconstruction, a high-precision three-dimensional model free from occlusion and noise interference can be generated.
And 2, estimating camera pose corresponding to the scene multi-view image corresponding to the discrete rotation light field data, wherein the camera pose comprises a rotation matrix R and a translation vector t.
In a three-dimensional spaceIs calibrated at a point P of the (c),position information representing point P, coordinates +.>A fixed point in three-dimensional space may be represented. The camera can also be regarded as a three-dimensional point in space, the angle of the camera +.>Plays an important role in reconstructing the projection, so an additional 3 degrees of freedom are introduced +.>To represent the rotation of the camera in three dimensions. Therefore, the positioning of the camera in the world coordinate system, i.e. the pose of the camera, requires 6 degrees of freedom +.>、/>、/>、/>、/>、Representation, degree of freedom->、/>、/>Corresponding to the rotation matrix R of the camera in the world coordinate system,>、/>、/>corresponding to the translation vector t of the camera in the world coordinate system.
In one embodiment, the method for obtaining the rotation matrix R specifically includes:
step 21a, the acquisition cameras are rotated around the X-axis respectivelyAngle, rotation about y-axis->Angle, rotation about Z axis->A rotation matrix of angles.
For example: the rotation matrix rotated by an angle a about the X-axis may be represented by, but is not limited to, formula (6) below, rotated about the y-axisThe rotation matrix of angles can be represented by, but not limited to, the following formula (7), rotated about the Z-axis +.>The rotation matrix of angles may be represented, but is not limited to, as the following formula (8):
(6)
(7)
(8)
step 22a, obtaining a final rotation matrix R by multiplying the above formula (6), formula (7) and formula (8) as the following formula (9):
(9)
in one embodiment, the method for obtaining the translation vector t specifically includes:
、/>、/>representing the translation distances of the camera along the X-axis, y-axis and Z-axis directions, respectively, the translation vector t can be learned by obtaining the translation distances of the camera along the X-axis, y-axis and Z-axis directions.
Of course, in addition to selecting the camera pose corresponding to the multi-view image of the scene using the rotation matrix R and the translation vector t of the camera in the world coordinate system as in the above embodiment, the person skilled in the art may also represent the camera pose corresponding to the multi-view image of the scene in a plurality of different manners, such as rotation vectors, quaternions, euler angles, and the like.
Step 3, according to the camera pose corresponding to the scene multi-viewpoint image obtained in step 2, obtaining a transformation matrix T of the camera pose of the multi-viewpoint image, and further obtaining parameters through homogeneous coordinate transformation and ray transformation in NDC (Normalized Device Coordinate, device coordinate normalization)Origin of ray in NDC space +.>Direction of ray in NDC space +.>。
In the embodiment, the discrete rotating light field data of the rare bird target object acquired in the step 1 is converted into an NDC space, so that a real scene can be reconstructed in the NDC space later.
In one embodiment, the transformation matrix T of the multi-viewpoint image camera pose is obtained by the following equation (10):
(10)
in one embodiment, the homogeneous coordinate points are transformed by homogeneous coordinate transformation in the NDCTransformed into coordinate point +.>Wherein->For camera view coordinates>Is the NDC space coordinates. The homogeneous coordinate transformation method in the NDC specifically comprises the following steps:
for homogeneous coordinate pointsThe standard three-dimensional perspective projection matrix M under homogeneous coordinates is:
(11)
wherein n and f are the distances from the near and far shearing planes to the origin, r andthe right and upper bounds of the scene on the near-clipping plane, respectively. For transforming a homogeneous coordinate point +.>Multiplying M to the left and then dividing by the fourth coordinate yields the following formula (12):
(12)
thus, the homogeneous coordinate pointCorresponding to coordinate point +.>Is of the following formula (13):
(13)
in one embodiment, the original ray is transformed by a ray in the NDCTransforming into rays in NDC space +.>Wherein->For the origin of the original ray, d is the direction of the original ray, t represents the original ray +.>Parameter t point,/, of (2)>For the origin of the original ray in NDC space, < >>For the direction of the original ray in NDC space, +.>Representing rays +.>Parameter of->And (5) a dot.
The method for transforming the rays in the NDC specifically comprises the following steps:
the original ray is processedTransformed into a ray in NDC space with origin +.>And direction->Represented by the following formulas (14), (15), respectively:
(14)
(15)
(16)
wherein n and f are respectively the near and far shearing planes to the originDistance, r and->The right and upper bounds of the scene on the near-clipping plane, respectively, < >>、/>、/>Origin of original rays respectively->Coordinate values in x, y, z direction,/-respectively>、/>、Coordinate values of the direction d of the original ray in the x, y and z directions respectively, +.>、/>、/>Origin +.>Three coordinates transformed to the origin in NDC space,/->、/>、/>Three coordinates of the direction d transformed into NDC space, respectively.
After the original rays are transformed into NDC space rays, the phenomenon that when a neural radiation field expresses a 360-degree scene, rays in the background are captured by mistake in the foreground, so that a large number of fuzzy noise points exist in an output result can be avoided. The present invention thus models a 360 ° scene around a set NDC spatially normalized grid and multi-view image.
And 4, generating a new viewpoint image of the rotating light field in the NDC space by utilizing the nerve radiation field, so that the scene is subjected to viewpoint super-resolution along the rotating light field through the new viewpoint image of the rotating light field. The rotating light field visual consistency characteristic after the viewpoint super resolution has high-dimensional continuous light intensity distribution characteristics, contains a lot of information (such as foreground and background juncture, high-frequency texture detail positions, scene concave-convex and shielding positions) lacking before, and can be further used for completing high-quality and high-precision three-dimensional scene reconstruction.
Under the NDC space of the discrete rotation light field data, the nerve radiation field gradually optimizes the voxels in the training process, and enhances the implicit expression of the voxels so as to obtain the rendering result under the new viewpoint.
The method for nerve radiation field specifically comprises the following steps:
at step 41, the radiation field representation of the three-dimensional model and the neural networking thereof.
The geometric shape and color information of three-dimensional model of birds are described by nerve field, including spatial position vectorAnd a direction vector->,/>Represents polar angle>Represents the azimuth angle, that is, X represents a point of the three-dimensional model, and d represents the direction along the three-dimensional model point X. Mapping the spatial position vector X and the direction vector d to an N-dimensional space using the following formula (17) to obtain a coordinate code +.>And direction code->Then input into the nerve radiation field, and output the volume density of the position from the nerve radiation field>And the pixel value of this position in the d-direction +.>R, g, b denote red, green, and blue component values, respectively, of the pixel value.
In particular, in order to accelerate the convergence speed of the neural network, the position vector and the direction vector are mapped to a space of a higher dimension (N dimension) using high frequency mapping, and then input to the multi-layered perceptron network. Whereby coordinate encoding can be obtainedAnd direction code->Information in a high frequency space is captured. Wherein the high frequency mapping functionWherein V represents the input of the function; />Applied to X and d. Then the following formula (17):
(17)
and step 42, voxel rendering.
According to classical voxel rendering principles, voxel density can be understood as the different probabilities that a light beam ends up in a certain infinitely small particle. Therefore, the voxel density and the color can be integrated along the light, N sampling points are uniformly sampled on the light, and the integration weight is the cumulative transmittance from near to far, so that the voxel rendering result is obtained. Original rays reaching pixel point P of the camera along the camera optical centerThe color value of the pixel P is obtained from the following formula (18)>:
(18)
In the method, in the process of the invention,for rays->Bulk density at parameter S, +.>For rays->Bulk density at parameter t, +.>And->The closest point and the furthest point along r, respectively,/->For rays->Pixel values in the d-direction.
After the radiation field is expressed by the nerve radiation field, the voxel rendering can be utilized to perform viewpoint super-resolution on the scene along the rotating light field.
And 5, reconstructing a three-dimensional model of the rare birds according to the new viewpoint image of the rotating light field generated in the step 4.
The embodiment of the invention also provides a rare bird three-dimensional model reconstruction device based on the nerve radiation field, which comprises a data acquisition unit, a camera pose estimation unit, an NDC space conversion unit, a new viewpoint image acquisition unit and a rare bird three-dimensional model reconstruction unit, wherein:
the data acquisition unit is used for acquiring discrete rotating light field data of rare bird targets.
And the camera pose estimation unit is used for estimating the camera pose corresponding to the scene multi-view image corresponding to the discrete rotation light field data.
The NDC space conversion unit is used for obtaining a conversion matrix of the camera pose of the multi-view image according to the camera pose corresponding to the multi-view image of the scene acquired by the camera pose estimation unit, and further converting the discrete rotating light field data of the rare bird target object acquired by the data acquisition unit into an NDC space through homogeneous coordinate conversion and ray conversion in the NDC.
The new viewpoint image acquisition unit is used for generating a new viewpoint image of the rotating light field under the space of the discrete rotating light field data NDC by utilizing the nerve radiation field.
The three-dimensional model reconstruction unit is used for reconstructing the three-dimensional model of the rare bird according to the new viewpoint image of the rotating light field generated by the new viewpoint image acquisition unit.
In one embodiment, the new viewpoint image acquisition unit specifically includes an encoding subunit and an encoding subunit:
the coding subunit is used for describing the geometric shape and color information of the bird three-dimensional model by using the nerve field, and comprises a space position vector X and a direction vector d, and then mapping the space position vector X and the direction vector d into N-dimensional space respectively by using the method (17) to obtain the coordinate codingAnd direction code->Then input into the nerve radiation field, and output the volume density of the position from the nerve radiation field>And the pixel value of this position in the d-direction +.>。
The coding subunit is used for reaching the phase along the optical center of the cameraOriginal ray of pixel point P of machineObtaining the color value of the pixel P from equation (18)>。
In one embodiment, the camera pose corresponding to the scene multi-view image of the camera pose estimation unit includes a rotation matrix R, and the method for acquiring the rotation matrix R specifically includes:
step 21a, rotating the camera about the X-axisThe rotation matrix of the angle is expressed as formula (6), rotated about the y-axis +.>The rotation matrix of the angle is expressed as formula (7), which rotates about the Z-axis +.>The rotation matrix of the angle is expressed as formula (8).
In step 22a, the rotation matrix R obtained by multiplying expression (6), expression (7) and expression (8) is expressed as expression (9).
In one embodiment, the NDC space conversion unit specifically includes a homogeneous coordinate transformation subunit for transforming homogeneous coordinate pointsTransformed into coordinate point +.>Expressed by formula (13).
In one embodiment, the NDC space conversion unit specifically includes a ray conversion subunit for converting the original rayTransformed into a ray in NDC space with origin +.>And direction->Expressed by formulas (14) and (15), respectively.
Under the NDC space of the discrete rotating light field data, a new viewpoint image of a 360-degree scene can be obtained after the scene is subjected to viewpoint super-resolution, the vision consistency characteristic of the new viewpoint image has high-dimensional continuous light intensity distribution characteristics, and the new viewpoint image contains a lot of information (such as foreground and background boundary points, high-frequency texture detail positions, scene concave-convex and shielding positions) which is lacking before, so that the problem that the traditional three-dimensional reconstruction point cloud is sparse and the shielding and concave-convex positions cannot be reconstructed can be effectively solved by the three-dimensional scene reconstructed after super-resolution, and the high-quality high-precision three-dimensional scene reconstruction method is completed.
In the experiment, the effect of the traditional COLMAP method and the method is compared with the effect of the traditional COLMAP method in three view angles of a front view, a side view and a top view of a bird scene reconstruction result, and the conclusion is that the method can solve the problems of sparse point cloud and incapability of reconstructing shielding and concave-convex parts to a certain extent, and a high-quality high-precision three-dimensional model of rare birds is completed. Moreover, it can be seen that: the COLMAP method loses a large amount of point cloud information at the wings, the neck and the top of the bird, so that a complete bird model cannot be reconstructed in subsequent dense reconstruction, the texture reconstruction effect at the front wings of the bird is poor, a large amount of noise points exist, the material of the bird feathers cannot be accurately restored, and the overall reconstruction effect is poor; after the viewpoint super-resolution is performed by utilizing the nerve radiation field, the generated new viewpoint image complements the missing information of the bird scene, and the data of the concave-convex parts of the bird bodies can be generated at the new viewpoint, so that the abundant texture details of the bird feathers can be successfully restored. The number of the bird scene sparse point clouds is increased, and then the high-precision three-dimensional model is densely reconstructed.
Finally, it should be pointed out that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting. Those of ordinary skill in the art will appreciate that: the technical schemes described in the foregoing embodiments may be modified or some of the technical features may be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.