CN118154713A

CN118154713A - Scene rendering method, device, electronic equipment, storage medium and program product

Info

Publication number: CN118154713A
Application number: CN202410303607.0A
Authority: CN
Inventors: 于金波; 刘祥德; 赵飞飞; 严旭; 魏榕; 李东
Original assignee: Beijing Digital City Research Center
Current assignee: Beijing Digital City Research Center
Priority date: 2024-03-18
Filing date: 2024-03-18
Publication date: 2024-06-07

Abstract

The application discloses a scene rendering method, a scene rendering device, electronic equipment, a storage medium and a program product, and belongs to the technical field of computer vision. Wherein the method comprises the following steps: decomposing a scene to be rendered into a first scene area and K second scene areas, wherein the first scene area is used for representing a global scene of the scene to be rendered, and the second scene area is used for representing a local scene in the scene to be rendered; constructing a global sampling search grid based on the first scene area, and constructing a local sampling search grid based on the second scene area; s first sampling points are obtained from a first scene area through a global sampling search grid, and T second sampling points are obtained from a target scene area through a target sampling search grid, wherein the target sampling search grid is one of K local sampling search grids; training a neural network based on the S first sampling points and the T second sampling points; rendering a scene to be rendered based on the trained neural network.

Description

Scene rendering method, device, electronic equipment, storage medium and program product

Technical Field

The application belongs to the technical field of computer vision, and particularly relates to a scene rendering method, a scene rendering device, electronic equipment, a storage medium and a program product.

Background

The neural radiation field (Neural RADIANCE FIELD, NERF) is used as a new view angle synthesis (level VIEW SYNTHESIS) method of an implicit scene representation (IMPLICIT SCENE representation) and is widely applied to the fields of robotics, urban mapping, autonomous navigation, virtual reality/augmented reality and the like. NeRF are capable of rendering new view angle images by learning the density and color of 3D points in space using a graphics approach of volume rendering based on a set of image sequences and camera parameters in the physical world scene. However, when NeRF is applied to a scene with complex lighting conditions, the same object may present different colors at different viewing angles, resulting in NeRF being difficult to fit the scene accurately, and obvious artifacts are likely to occur in the scene rendering image.

Disclosure of Invention

The embodiment of the application aims to provide a scene rendering method, a device, electronic equipment, a storage medium and a program product, which can solve the problems that NeRF is difficult to accurately fit a scene and a scene image is easy to generate artifacts when NeRF is applied to the scene with complex illumination conditions.

In a first aspect, an embodiment of the present application provides a scene rendering method, including:

Decomposing a scene to be rendered into a first scene area and K second scene areas according to scene data of the scene to be rendered, wherein the first scene area is used for representing a global scene of the scene to be rendered, the second scene area is used for representing a local scene in the scene to be rendered, and K is a positive integer;

constructing a global sampling search grid based on the first scene area, and constructing a local sampling search grid based on the second scene areas, wherein the K second scene areas are in one-to-one correspondence with the K local sampling search grids;

Acquiring S first sampling points from the first scene area through the global sampling search grid based on a preset sampling method, and acquiring T second sampling points from the target scene area through a target sampling search grid based on the preset sampling method, wherein the target sampling search grid is one of K local sampling search grids, and S, T is a positive integer;

Training a neural network based on the S first sampling points and the T second sampling points;

and rendering the scene to be rendered based on the neural network after training is completed.

Optionally, the decomposing the scene to be rendered into a first scene area and K second scene areas according to the scene data of the scene to be rendered includes:

Acquiring scene data of a scene to be rendered;

Determining point cloud data of the first scene area and the scene to be rendered according to the scene data;

and carrying out clustering division on the point cloud data, and determining the K second scene areas.

Optionally, the acquiring S first sampling points from the first scene area through the global sampling search grid based on the preset sampling method, and acquiring T second sampling points from the target scene area through the target sampling search grid based on the preset sampling method includes:

determining whether a target intersecting grid intersecting with a rendering ray exists in the K local sampling search grids, wherein the rendering ray is a ray emitted from a target view angle;

under the condition that the target intersecting grids exist in the K local sampling search grids, determining whether sampling points with voxel values larger than a first threshold value are contained in the target intersecting grids;

And under the condition that sampling points with voxel values larger than the first threshold value are contained in the target intersection grid, acquiring the S first sampling points from the first scene area, and acquiring the T second sampling points from the target scene area.

Optionally, the target intersection grid is determined based on:

determining N pre-selected grids, wherein the pre-selected grids are grids intersecting the rendering rays in the K local sampling search grids, N is a positive integer, and N is smaller than or equal to K;

and determining the preselected mesh with the smallest distance from a preset point in the N preselected meshes as the target intersecting mesh, wherein the preset point is the initial emission point of the rendering ray.

Optionally, the S first sampling points include a first preselected sampling point and a second preselected sampling point, and the method further includes:

acquiring at least one first intersection point of the global sampling search grid and the rendering ray under the condition that the target intersection grid does not exist in the K local sampling search grids;

determining a first intersection point of the at least one first intersection point having a voxel value greater than a second threshold as the first preselected sample point;

And under the condition that the voxel value of a first target intersection point is larger than a third threshold value, determining sampling points around the first target intersection point as the second preselected sampling points, wherein the first target intersection point is the closest one of the at least one first intersection point to a preset point, the preset point is the initial emission point of the rendering ray, and the third threshold value is larger than the second threshold value.

Optionally, the T second sampling points include a third preselected sampling point and a fourth preselected sampling point;

The obtaining the S first sampling points from the first scene area and the T second sampling points from the target scene area when the target intersection grid includes sampling points with voxel values greater than the first threshold value includes:

Under the condition that sampling points with voxel values larger than the first threshold value are contained in the target intersection grid, acquiring the S first sampling points from the first scene area, and acquiring at least one second intersection point of the target scene area and the rendering ray;

Determining a second intersection point of the at least one second intersection point having a voxel value greater than a fourth threshold value and less than the first threshold value as the third preselected sample point;

And determining sampling points around a second target intersection point as the fourth preselected sampling point in the condition that the voxel value of the second target intersection point is larger than the first threshold value, wherein the second target intersection point is any one of the at least one second intersection point.

Optionally, the neural network includes a first network and a second network, the first network is composed of a first diffuse reflection network and a first high-light network, and the second network is composed of K second diffuse reflection networks and K second high-light networks;

the number of the full-connection network layers of the first diffuse reflection network is larger than that of the connection network layers of the second diffuse reflection network, and the number of the connection network layers of the first high-light network is larger than that of the connection network layers of the second high-light network.

Optionally, the rendering the scene to be rendered based on the neural network after training is completed includes one of the following:

rendering the first scene area based on the first network after training is completed under the condition that the target intersecting grids do not exist in the K local sampling search grids;

Rendering the first scene area based on the first network after training is completed and rendering the target scene area based on the second network after training is completed under the condition that sampling points with voxel values larger than the first threshold are contained in the target intersection grid;

and jointly rendering the scene to be rendered based on the first network and the second network.

Optionally, the method further comprises:

Under the condition that sampling points with voxel values larger than the first threshold are contained in the target intersection grid, carrying out normal vector estimation on point cloud data in the target scene area to obtain normal vectors of the point cloud data;

Determining a normal constraint condition based on the normal vector, and determining an illumination constraint condition based on color characteristics of pixel points in the scene to be rendered;

Training the first network according to the illumination constraint, and training the second network according to the illumination constraint and the normal constraint.

In a second aspect, an embodiment of the present application provides a scene rendering device, including:

The scene decomposition module is used for decomposing the scene to be rendered into a first scene area and K second scene areas according to the scene data of the scene to be rendered, wherein the first scene area is used for representing the global scene of the scene to be rendered, the second scene area is used for representing the local scene in the scene to be rendered, and K is a positive integer;

The grid construction module is used for constructing a global sampling search grid based on the first scene area, constructing a local sampling search grid based on the second scene areas, and the K second scene areas are in one-to-one correspondence with the K local sampling search grids;

The sampling acquisition module is used for acquiring S first sampling points from the first scene area through the global sampling search grid based on a preset sampling method, and acquiring T second sampling points from the target scene area through a target sampling search grid based on the preset sampling method, wherein the target sampling search grid is one of K local sampling search grids, and S, T are all positive integers;

The network training module is used for training the neural network based on the S first sampling points and the T second sampling points;

and the scene rendering module is used for rendering the scene to be rendered based on the neural network after training is completed.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions implementing the steps of the scene rendering method according to the first aspect when executed by the processor.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which, when executed by a processor, implement the steps of the scene rendering method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a computer program product comprising computer instructions which, when executed by a processor, implement the steps of the scene rendering method according to the first aspect.

In the embodiment of the application, a first scene area used for representing a global scene and K second scene areas used for representing local scenes are obtained by carrying out double-level decomposition on the scene to be rendered. And then searching and sampling each scene area to obtain sampling results, namely S first sampling points in the first scene area and T second sampling points in at least one target scene area. The neural network is then trained based on the sampling results. Therefore, the trained neural network can finish rendering the scene to be rendered, so that the scene to be rendered is accurately fitted, the rendering effect of a local scene area is improved, the artifacts appearing in the scene are eliminated, and the method can be widely applied to various complicated-illumination scenes to be rendered.

Drawings

Fig. 1 is a schematic flow chart of a scene rendering method according to an embodiment of the present application;

FIG. 2 is a second flow chart of a scene rendering method according to the embodiment of the application;

Fig. 3 is a schematic structural diagram of a scene rendering device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The method, the device and the related equipment for rendering the scene provided by the embodiment of the application are described in detail through specific embodiments and application scenes thereof with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a flowchart of a scene rendering method according to an embodiment of the present application. As shown in fig. 1, the scene rendering method includes the steps of:

step 101, decomposing a scene to be rendered into a first scene area and K second scene areas according to scene data of the scene to be rendered, wherein the first scene area is used for representing a global scene of the scene to be rendered, the second scene area is used for representing a local scene in the scene to be rendered, and K is a positive integer.

It can be understood that in an indoor scene with complex illumination conditions, due to different illumination of different visual angles, the colors of the same object under the different visual angles are inconsistent, the original nerve radiation field is difficult to fit the scene, and obvious artifacts appear. However, the scene to be rendered in the embodiment of the present application may be an indoor scene with complex lighting conditions, or may be various outdoor scenes, and the actual scene to which the scene to be rendered belongs is not particularly limited.

In a specific implementation, a dual-level decomposition may be performed for a scene to be rendered, i.e. the scene to be rendered is decomposed into two levels, namely a global scene and a local scene, i.e. a first scene area and K second scene areas. For example, the scene to be rendered is a flower bed, a plurality of potted plants are arranged in the flower bed, each potted plant is planted with one flower, the whole flower bed can be used as a global scene, namely a first scene area, and a local scene can be determined by taking one potted plant as a unit, namely one potted plant and the flower in the potted plant are used as a second scene area. Through the steps, the whole scene to be rendered is divided into two levels, so that subsequent key rendering is conveniently carried out on the global scene and the local scene, the rendering effect is improved, and the occurrence of artifacts is reduced.

Step 102, a global sampling search grid is built based on the first scene area, a local sampling search grid is built based on the second scene areas, and the K second scene areas are in one-to-one correspondence with the K local sampling search grids.

It should be noted that, the first sampling point may be obtained in a grid structure corresponding to the first scene area, and the second sampling point may also be obtained in a grid structure corresponding to the target scene area. And establishing a global sampling search grid in the first scene area, wherein the resolution of the global sampling search grid can be set to 1024 x 1024. The voxel G _A of the global sampling search grid can be initialized by the depth map, and the voxel value of the grid with depth in the global sampling search grid can be set asThe voxel values of the other meshes are 0.

Likewise, a corresponding grid may be created for each second scene region, and K local sampling search grids may be constructed, where the voxel values are { G _B,G_C, … }, where the grid resolution may be adaptively set according to different ranges of the second scene region, e.g., may be set as a cube. The resolution may be set to 512 x 512, but may also be 64 x 64, the present application is not particularly limited thereto. Since the local sample lookup grid is part of the global sample lookup grid, the resolution of the global sample lookup grid needs to be greater than the resolution of any one local sample lookup grid. In addition, voxel values of grids with depth in the local sampling search grid can be set asOther grid voxel values may be 0. Therefore, sampling is performed through the global sampling search grid and the local sampling search grid, compared with depth sampling of a scene, generalization of a nerve radiation field can be enhanced, and rendering effect is improved.

Step 103, acquiring S first sampling points from the first scene area through the global sampling search grid based on a preset sampling method, and acquiring T second sampling points from the target scene area through a target sampling search grid based on the preset sampling method, wherein the target sampling search grid is one of K local sampling search grids, and S, T is a positive integer.

In the above step, the obtaining of S first sampling points in the first scene area may be equivalent to obtaining S first sampling points in the global sampling search grid. Similarly, the obtaining of T second sampling points in at least one target scene area may be equivalent to obtaining T second sampling points in a local sampling search grid corresponding to the target scene area.

Specifically, sampling is performed in the global sampling search grid, firstly, the intersecting part of the rendering rays sent by each view angle in the global sampling search grid can be determined, and a preset number of first sampling points can be acquired at the intersecting part.

Further, since the global sampling search grid is integral and the local sampling search grid is partial, when a rendering ray emitted from a certain view angle or multiple view angles is emitted to the global sampling search grid, the rendering ray also passes through one or more local sampling search grids, however, there are situations that one or more local sampling search grids are blocked and no rendering ray passes through. In order to improve the rendering effect, sampling point acquisition can be performed on the local sampling search grid through which the rendering ray passes, and at the moment, a second scene area corresponding to the local sampling search grid through which the rendering ray passes can be the target scene area. And acquiring a preset number of second sampling points at the intersection part of the local sampling search grid corresponding to the target scene area and the rendering ray, and acquiring grid voxel values of the second sampling points. For example, a certain rendering ray l passes through the target scene area, that is, passes through the local sampling search grid corresponding to the target scene area, a line segment where the rendering ray l intersects the local sampling search grid is taken, M points are determined on the line segment and used as second sampling points to collect voxel values of the grid, and the voxel values of the M points can be greater than a fourth threshold (the fourth threshold is a preset minimum threshold).

Therefore, the scene to be rendered is decomposed into the global level scene and the local level scene, so that the scene to be rendered is conveniently and accurately sampled, the key area to be rendered (namely the target scene area) is determined, the rendering processing is conveniently carried out on different areas, and the accurate fitting of the scene to be rendered is realized.

Step 104, training the neural network based on the S first sampling points and the T second sampling points.

It should be noted that, in the embodiment of the present application, the neural network may specifically be a neural radiation field (or an illumination decomposition network), and specifically may include a first network and a second network, where the first network and the second network may both be formed by a fully connected network, and the fully connected network is formed by a diffuse reflection network MLP _d and a high-light network MLP _s. The diffuse reflection network learns diffuse reflection parts in the first scene area and the target scene area through S first sampling points and T second sampling points respectively, and the highlight network learns highlight parts in the first scene area and the target scene area through S first sampling points and T second sampling points respectively, so that generalization of the first network and the second network can be enhanced, and rendering capability of the scene is improved.

Specifically, since the first network F ₁ and the second network F ₂ may be pre-constructed neural radiation fields (or illumination-decomposition networks), they may be expressed as:

F₁:(x₁,d₁)→(c₁,ρ₁)；

F₂:(x₂,d₂)→(c₂,ρ₂)；

Wherein x ₁ and x ₂ represent position information of a certain sampling point in the first scene region and the target scene region, respectively, wherein the position information x= (x, y, z) may be three-dimensional coordinate data; d ₁ and d ₂ represent direction information of a certain sampling point in the first scene area and the target scene area, respectively, wherein the direction information May be a direction vector; c ₁ and c ₂ represent colors of a certain sampling point output by the first network and the second network, respectively, wherein the colors may include c= (r, g, b); ρ ₁ and ρ ₂ represent densities of a certain sampling point of the first network and the second network output, respectively.

In addition, through the color and density information respectively output by the first network and the second network, the Opacity (Opacity) of one or more sampling points in the first scene area and the Opacity of one or more sampling points in the target scene area can be determined, the effect of rendering fitting is adjusted through the Opacity, the accurate fitting is realized, and the artifact eliminating effect is improved.

And 105, rendering the scene to be rendered based on the neural network after training is completed.

In a specific embodiment of the present application, the first opacity corresponding to the S first sampling points may be determined by the color and the density output by the first network, and the opacity corresponding to the T second sampling points may be determined by the color and the density output by the second network. Since Opacity can show how far the surface of a current object transmits light, an Opacity value between 0 and 1 will produce a partially see-through pixel when the surface allows all light to pass through when the Opacity (Opacity) is 0. Thus, by the first opacity corresponding to the S first sampling points, the extent to which light rays (e.g., rendered rays) from a certain target line of sight transmit light to the object surface in the first scene region can be determined.

Specifically, rendering the scene to be rendered may be achieved through opacity acquired by a neural network. To enhance the effect of rendering the object surface, the grid voxel values of the global sample lookup grid and the K local sample lookup grids may be updated by the opacity of the plurality of sample points. In this way, the scene rendering effect is enhanced by adjusting grid voxel values in the global sampling search grid and the K local sampling search grids.

Acquiring scene data of a scene to be rendered;

It should be noted that, the scene data of the scene to be rendered needs to be acquired first by decomposing the scene areas of the two levels, and the scene data may specifically include image data, depth image data, internal parameter data, external parameter data, and the like. The image data may be acquired by shooting around the scene to be rendered through an image capturing device, a photographing device, or the like, or may be directly input into the execution body of the embodiment of the present application. After the image data is acquired, the depth map data can be obtained through a depth device or a depth map generation network, or the pre-stored depth map data can be directly adopted without an image data acquisition step. In addition, the acquired image data may also be processed by an open source method (Colmap) for sparse and dense reconstruction in a motion restoration structure (Structure from Motion, SFM), to complete the acquisition of the position and pose of the camera in the scene and the position of the three-dimensional point in the scene, i.e. to acquire the internal and external parameter data of the camera or other data acquisition device.

In one embodiment of the present application, after the acquisition of the scene data is completed, the point cloud data of the scene to be rendered may be calculated by projection according to the depth map data, the internal parameter data, and the external parameter data in the scene data. Meanwhile, a global scene in the acquired image data may be taken as a first scene area, and may be denoted by a. Then, clustering is performed on the point cloud data, and the whole scene to be rendered is divided into K second scene areas which can be represented by { B, C, … }. Specifically, the method for clustering and dividing the point cloud data may be a K-Means (K-Means) algorithm, and the present application is not particularly limited.

Therefore, the scene to be rendered is divided into the first scene area and the K second scene areas by the embodiment of the application, so that the key areas in the scene are conveniently rendered, and the scene rendering effect is improved.

It is understood that the target intersection grid may be a grid where the rendering ray emanates from an initial point (or preset point) set by the target view angle, and is directed to K local sampling search grids intersecting one of the local sampling search grids. Since a rendered scene generally has a relatively complex illumination environment, in practice, there may be multiple rendering rays emitted from different perspectives, each of which may or may not intersect with any of the second scene regions. For example, a rendering ray is injected into the entire scene of a flower bed, intersects a pot flower in the flower bed, and generally intersects only a second scene area corresponding to a flowerpot because the blocked rendering ray of the flower cannot penetrate. Further, voxel values corresponding to the target intersection grid may be obtained and compared to a first threshold. If there are sampling points greater than the first threshold, then it may be considered an emphasis rendering of the target scene region.

Therefore, in the embodiment of the application, whether the local scene area needing to be emphasized is present in the K second scene areas is determined by setting two conditions, namely, whether the local sampling search grid corresponding to the second scene area intersected by the rendering ray is a target intersected grid or not, and whether the target intersected grid contains sampling points with voxel values larger than a first threshold value or not can be determined, and the local area needing to be emphasized is present in the scene to be rendered under the condition that the two conditions are simultaneously satisfied.

In some embodiments of the present application, first, a target intersection grid intersecting a rendering ray in K local sampling search grids may be determined, where a rendering ray and a plurality of local sampling search grids all have intersection points, and thus there may be a plurality of pre-selected grids, where each pre-selected grid may be any local sampling search grid in K local sampling search grids. Then, if a voxel value of a pre-selected mesh closest to the initial point is greater than a first threshold, determining the pre-selected mesh closest to the initial point as a target intersecting mesh, wherein the target intersecting mesh is a mesh corresponding to a local scene area needing to be emphasized. In this way, under the condition that a target intersecting grid needing to be subjected to key rendering exists, the target scene area is sampled, T second sampling points in the target scene area are acquired, and meanwhile S first sampling points can be acquired at the intersection part of the first scene area and the rendering ray. In this way, the first scene area and the target scene area are respectively sampled so as to enhance the rendering effect of the part needing to be emphasized in the scene to be rendered, and accurate fitting is realized.

Optionally, the target intersection grid is determined based on:

It should be appreciated that there is a possibility that one rendering ray will shoot into K local sample lookup grids that intersect two or more local sample lookup grids that partially overlap. For example, in a flower bed and flowerpot scene, a rendering ray may penetrate between or overlap two flowerpots, which may intersect with a second scene region where both flowerpots are located, and thus, local sampling lookup grids intersecting the rendering ray may each be determined as a preselected grid.

In one embodiment of the present application, the preselected mesh corresponding to the one of the rendering rays may be determined by comparing the magnitude of the distances between the respective preselected mesh and the point (or initial point) from which the rendering ray originated. Further, the target intersection grid may be determined by comparing the distance between each pre-selected grid and a pre-set point (i.e., the initial point at which the target view emits the rendered ray). Therefore, the grid corresponding to the preselected area close to the preset point can be set as the target intersection grid, and the closer the grid is to the initial point of ray emission, the better the transmission effect of rays is, and the better the rendering effect is.

In some embodiments of the present application, when the rendering ray does not pass through any of the second scene areas, which indicates that there is no need to focus on rendering the scene areas, only the first scene areas may be acquired. Specifically, the first collected sampling point may be a first preselected sampling point or a second preselected sampling point. The first preselected sampling point may be a point, in the plurality of first intersecting points, where a voxel value is greater than a second threshold, and the second threshold may be a preset minimum standard value, where the second threshold is used to determine whether the point can accurately reflect a value of illumination degree of a view angle where the current rendering ray is located. The second preselected sample point may be determined from a first target intersection point, the first target intersection point being the closest point to the preset point and the voxel value being greater than a third threshold value. The sampling may also be performed around the first target intersection point by a probability distribution function, such as a normal distribution function or a uniform distribution function, to obtain a plurality of second preselected sampling points. Therefore, the fitting degree of the nerve radiation field can be improved, the rendering effect is improved, and the occurrence of artifacts is reduced by collecting the first preselected sampling point and the second preselected sampling point.

In some embodiments of the present application, when there is a second scene region (i.e., a target scene region) that needs to be rendered with emphasis, sampling may be performed within the target scene region. In particular, a plurality of second intersection points within the target scene area intersecting the rendering ray may be first determined, such that a second sampling point is determined in the second intersection points, and the second sampling point may be a third preselected sampling point or a fourth preselected sampling point. Wherein the third preselected sample point may be a point where the voxel value acquired at the intersection of the rendering ray with the target scene region is greater than a fourth threshold value but less than the first threshold value, the fourth preselected sample point may be acquired around a second target intersection point, which may be a point where the voxel value acquired at the intersection is greater than the first threshold value.

It can be understood that the voxel value of the second target intersection point is greater than the first threshold value, and the first threshold value is greater than the fourth threshold value, which indicates that the illumination condition of the rendering ray under the view angle can be more accurately reflected at the second target intersection point, so that the sampling points around the second target intersection point can be sampled through a probability distribution function, such as a normal distribution function or a uniform distribution function, to obtain a fourth preselected sampling point. Therefore, when a target scene area needing to be subjected to key rendering exists, the illumination effect of the view angle where the current rendering ray is positioned is accurately reflected by collecting the third preselected sampling point and the fourth preselected sampling point, so that accurate fitting is facilitated, and the rendering effect is improved.

In some embodiments of the application, the first network that performs illumination decomposition on the first scene area may be a fully connected network having 10 tiers. The second network for performing illumination decomposition on the K second scene areas may include K second diffuse reflection networks and K second highlight networks, and one second diffuse reflection network and one second highlight network may perform illumination decomposition on one second scene area. The network that performs illumination decomposition on the second scene area may be a fully connected network having 5 tiers. Wherein the first diffuse reflection network in the first network may have 6 layers of 64 dimensions and the first high-light network may have 4 layers of 32 dimensions. One second diffuse reflection network in the second network can have 3 layers and 64 dimensions, one second high-light network can have 2 layers and 32 dimensions, and the application does not limit the number of layers of the first network and the second network in the practical application process. Through dividing the nerve radiation field into two layers, namely global and local, the generalization of the whole nerve radiation field can be enhanced by adopting a fully-connected network with different layer numbers, the rendering effect on local scenes is improved through a second network, and the accurate fitting on scene rendering is realized.

In still other embodiments of the present application, different situations may be distinguished when rendering a scene to be rendered, for example, when there is a region to be rendered that needs to be emphasized in the scene to be rendered, that is, when the target intersection grid includes sampling points with voxel values greater than the first threshold, the first scene region may be rendered through the first network, and the target scene region may be rendered through the second network. When the area needing to be subjected to the key rendering does not exist in the scene to be subjected to the key rendering, only the first scene area can be subjected to the rendering. In addition, the initial image of the whole scene to be rendered can be rendered through the neural network, namely the first network and the second network, so that the accurate fitting of the scene to be rendered is realized, and the method is widely applicable to various complicated scenes to be rendered.

Optionally, the method further comprises:

In some embodiments, when a part of or all of the scenes in a third scene area overlap with the target scene area, that is, when a part of the scenes in the third scene area through which the rendering ray passes also belong to the target scene area, normal vector estimation may be performed on the point cloud data in the target scene area to obtain a normal vectorThe normal vector estimation may be performed by a principal component analysis method (PRINCIPAL COMPONENT ANALYSIS, PCA), and the specific manner adopted in the normal vector estimation is not limited in the present application. Then, the normal constraint condition of the second training network can be constructed through a normal vector, and the radiation emission direction and the normal vector/>, can be calculated and renderedThe cosine value of the included angle is the normal constraint condition.

Therefore, the first network is trained through the illumination constraint conditions in the embodiment, and the second network is trained through the normal constraint conditions and the illumination constraint conditions, so that the first network and the second network can be attached to the scene rendering requirements, different illumination conditions of different viewing angles are adapted, and the rendering effect of the local scene area is improved.

As shown in fig. 2, the embodiment of the present application further provides a specific implementation flow of a scene rendering method, which includes the following steps:

step 201, obtaining scene data of a scene to be rendered;

step 202, determining point cloud data of the first scene area and the scene to be rendered according to the scene data;

Step 203, performing cluster division on the point cloud data, and determining the K second scene areas;

Step 204, respectively constructing a global sampling search grid and K local sampling search grids according to the first scene area and K second scene areas, wherein the K second scene areas are in one-to-one correspondence with the K second sampling search grids;

Step 205, determining whether a target intersecting grid intersecting with a rendering ray exists in the K local sampling search grids, wherein the rendering ray is a ray emitted from a target view angle;

Step 206, determining whether sampling points with voxel values larger than a first threshold are included in the target intersecting grids under the condition that the target intersecting grids exist in the K local sampling searching grids;

step 207, when the target intersection grid includes sampling points with voxel values greater than the first threshold, acquiring the S first sampling points from the first scene area, and acquiring the T second sampling points from the target scene area;

Step 208, under the condition that the target intersecting grids do not exist in the K local sampling search grids, S first sampling points are obtained from a first scene area;

step 209, training a neural network based on S first sampling points, or the S first sampling points and the T second sampling points; the neural network is further trained based on normal constraint conditions and illumination constraint conditions;

Step 210, rendering the scene to be rendered based on the neural network after training is completed.

It should be noted that, the specific implementation process of the foregoing embodiment may specifically refer to the description related to the embodiment in fig. 1, so that the same beneficial effects may be achieved, and for avoiding repetition, no further description is provided herein.

Referring to fig. 3, in another embodiment of the present application, a scene rendering apparatus 300 is provided, including:

The scene decomposition module 301 is configured to decompose a scene to be rendered into a first scene area and K second scene areas according to scene data of the scene to be rendered, where the first scene area is used to represent a global scene of the scene to be rendered, the second scene area is used to represent a local scene in the scene to be rendered, and K is a positive integer;

The grid construction module 302 is configured to construct a global sampling search grid based on the first scene area, and construct a local sampling search grid based on the second scene areas, where the K second scene areas are in one-to-one correspondence with the K local sampling search grids;

The sampling collection module 303 is configured to obtain S first sampling points from the first scene area through the global sampling search grid based on a preset sampling method, and obtain T second sampling points from the target scene area through a target sampling search grid based on the preset sampling method, where the target sampling search grid is one of K local sampling search grids, and S, T are all positive integers;

a network training module 304, configured to train a neural network based on the S first sampling points and the T second sampling points;

The scene rendering module 305 is configured to render the scene to be rendered based on the neural network after training is completed.

Optionally, the scene decomposition module 301 includes:

the first acquisition sub-module is used for acquiring scene data of a scene to be rendered;

A first determining submodule, configured to determine, according to the scene data, point cloud data of the first scene area and the scene to be rendered;

and the second determining submodule is used for carrying out clustering division on the point cloud data and determining the K second scene areas.

Optionally, the sampling module 303 includes:

A third determining submodule, configured to determine whether a target intersection grid intersecting with a rendering ray exists in the K local sampling search grids, where the rendering ray is a ray sent from a target view angle;

a fourth determining submodule, configured to determine, when the target intersection grid exists in the K local sampling search grids, whether sampling points with voxel values greater than a first threshold are included in the target intersection grid;

And the second acquisition sub-module is used for acquiring the S first sampling points from the first scene area and acquiring the T second sampling points from the target scene area under the condition that sampling points with voxel values larger than the first threshold are included in the target intersection grid.

Optionally, the sampling module 303 further includes a grid determination submodule, where the grid determination submodule includes:

the first determining unit is used for determining N pre-selected grids, wherein the pre-selected grids are grids intersecting the rendering rays in the K local sampling search grids, N is a positive integer, and N is smaller than or equal to K;

and the second determining unit is used for determining the preselected mesh with the smallest distance from a preset point in the N preselected meshes as the target intersection mesh, wherein the preset point is the initial emission point of the rendering ray.

Optionally, the S first sampling points include a first preselected sampling point and a second preselected sampling point, and the scene rendering device 300 further includes:

A third obtaining module, configured to obtain at least one first intersection point of the global sampling search grid and the rendering ray when the target intersection grid does not exist in the K local sampling search grids;

A first determining module, configured to determine a first intersection point, where the voxel value of the at least one first intersection point is greater than a second threshold value, as the first preselected sampling point;

And the second determining module is used for determining sampling points around the first target intersection point as the second preselected sampling points under the condition that the voxel value of the first target intersection point is larger than a third threshold value, wherein the first target intersection point is the closest one of the at least one first intersection point to a preset point, the preset point is the initial emission point of the rendering ray, and the third threshold value is larger than the second threshold value.

The second acquisition submodule includes:

A first obtaining unit, configured to obtain, when a sampling point whose voxel value is greater than the first threshold value is included in the target intersection grid, the S first sampling points from within the first scene area, and obtain at least one second intersection point of the target scene area and the rendering ray;

a third determining unit, configured to determine, as the third pre-selected sampling point, a second intersection point, in which a voxel value in the at least one second intersection point is greater than a fourth threshold value and less than the first threshold value;

a fourth determining unit, configured to determine, as the fourth preselected sampling point, sampling points around a second target intersection point, where the second target intersection point is any one of the at least one second intersection point, in a case where a voxel value of the second target intersection point is greater than the first threshold value.

Optionally, the scene rendering module 305 includes any one of the following:

The first rendering sub-module is used for rendering the first scene area based on the first network after training is completed under the condition that the target intersecting grids do not exist in the K local sampling search grids;

a second rendering sub-module, configured to render, when the target intersection grid includes sampling points with voxel values greater than the first threshold, the first scene area based on the first network after training is completed, and render the target scene area based on the second network after training is completed;

and the third rendering sub-module is used for jointly rendering the scene to be rendered based on the first network and the second network.

Optionally, the scene rendering device 300 further includes:

The normal vector estimation module is used for carrying out normal vector estimation on point cloud data in the target scene area under the condition that partial scenes are overlapped with the target scene area in the at least one third scene area, and obtaining normal vectors of the point cloud data;

the condition determining module is used for determining a normal constraint condition based on the normal vector and determining an illumination constraint condition based on color characteristics of pixel points in the scene to be rendered;

and the condition training module is used for training the first network according to the illumination constraint condition and training the second network according to the illumination constraint condition and the normal constraint condition.

The scene rendering device 300 provided in the embodiment of the present application can implement each process implemented by the embodiment of the method described in fig. 1, and in order to avoid repetition, a description is omitted here.

As shown in fig. 4, the embodiment of the present application further provides an electronic device, including a processor 400, a memory 420 and a transceiver 410, where a program or an instruction stored in the memory 420 and capable of being executed on the processor 400 is executed by the processor 400 to implement each process of the embodiment of the scene rendering method shown in fig. 1, and the same technical effects are achieved, and for avoiding repetition, a description is omitted herein.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored, and when the program or the instruction is executed by a processor, the processes of the embodiment of the scene rendering method described in fig. 1 are implemented, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium such as a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a computing network, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims

1. A method of scene rendering, comprising:

2. The method of claim 1, wherein the decomposing the scene to be rendered into a first scene region and K second scene regions according to scene data of the scene to be rendered comprises:

Acquiring scene data of a scene to be rendered;

3. The method of claim 1, wherein the obtaining S first sampling points from the first scene area through the global sampling lookup grid based on the preset sampling method, and obtaining T second sampling points from the target scene area through the target sampling lookup grid based on the preset sampling method, comprises:

4. A method according to claim 3, wherein the target intersection grid is determined based on:

5. A method according to claim 3, wherein the S first sampling points comprise a first preselected sampling point and a second preselected sampling point, the method further comprising:

6. A method according to claim 3, wherein the T second sampling points comprise a third preselected sampling point and a fourth preselected sampling point;

7. The method of claim 3, wherein the neural network comprises a first network and a second network, the first network consisting of a first diffuse reflecting network and a first specular network, the second network consisting of K second diffuse reflecting networks and K second specular networks;

8. The method of claim 7, wherein the rendering the scene to be rendered based on the neural network after training is completed comprises:

9. The method of claim 8, wherein the method further comprises:

10. A scene rendering device, comprising:

11. An electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the scene rendering method of any of claims 1 to 9.

12. A computer readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the scene rendering method according to any of claims 1 to 9.

13. A computer program product comprising computer instructions which, when executed by a processor, implement the steps of the scene rendering method of any of claims 1 to 9.