WO2022193104A1

WO2022193104A1 - Method for generating light field prediction model, and related apparatus

Info

Publication number: WO2022193104A1
Application number: PCT/CN2021/080893
Authority: WO
Inventors: 郑凯; 韩磊; 李琳; 李选富; 林天鹏
Original assignee: 华为技术有限公司
Priority date: 2021-03-15
Filing date: 2021-03-15
Publication date: 2022-09-22
Also published as: CN117015966A

Abstract

A method for generating a light field prediction model, and a related apparatus. The method comprises: establishing a cube model surrounding a photographed scene according to the respective photographing orientations of multiple sample images, wherein the cube model comprises multiple small cubes (voxels); respectively calculating multiple truncate distances of each of the multiple small cubes according to the multiple sample images (S402); sampling a spatial point from each small cube according to the multiple truncate distances of the small cube, wherein each spatial point sampled correspondingly have spatial coordinates; and training a light field prediction model according to the spatial coordinates of the sampled spatial points, wherein the light field prediction model is used for predicting a light field of the scene. The method can improve the sampling efficiency of the voxels, thereby improving the generation efficiency of the light field prediction model.

Description

A method and related device for generating a light field prediction model

technical field

The present application relates to the field of computer technology, and in particular, to a method for generating a light field prediction model and a related device.

Background technique

The prospect of virtual reality display technology is very broad, from card box (cardbox), virtual reality head-mounted display (Vive VR), head-mounted display (Oculus Rift), to the virtual reality helmet (VR Glass) launched in 2019, virtual display Hardware devices are becoming simple, easy to use, and ubiquitous. However, unlike the rapidly improving virtual display hardware, high-quality virtual reality digital content is very limited. Unlike traditionally displayed two-dimensional (2D) digital content, in order to enhance the immersive experience (such as the display content changing with the movement of a person), virtual reality content requires a 3D light field of the scene, and the 3D light field to capture the scene requires Very complex hardware devices limit the flexibility of 3D light field acquisition. Therefore, using computer vision algorithms to obtain 3D light fields has become a new research direction, but how to efficiently and accurately obtain 3D light fields based on computer vision algorithms is a technical problem faced by those skilled in the art.

SUMMARY OF THE INVENTION

The embodiments of the present application disclose a method for generating a light field prediction model and a related device, which can improve the generation efficiency of the light field prediction model.

In a first aspect, an embodiment of the present application discloses a method for generating a light field prediction model. The method includes: establishing a cube model surrounding a captured scene according to respective shooting orientations of multiple sample images, wherein the cube model includes multiple a plurality of small cube voxels; then, according to the plurality of sample images, respectively calculate a plurality of cut-off distances for each small cube in the plurality of small cubes, wherein one of each small cube calculated according to the first sample image The truncation distance includes: determining the truncation distance according to the distance from the camera to each small cube and the distance from the camera to the surface of the object in the scene when the first sample image is captured, the first sample image being the any one of the multiple sample images; then sample spatial points from the small cubes according to the multiple cut-off distances of each small cube, wherein each sampled spatial point corresponds to a spatial coordinate; then, according to the sampling The spatial coordinates of the spatial point train a light field prediction model, wherein the light field prediction model is used to predict the light field of the scene.

In the above method, the voxel sampling points used for training the deep learning network (also known as the light field prediction model, which is used to predict the three-dimensional light field) are obtained based on the depth information of the image, specifically based on the depth information and the voxel to the camera. A truncation distance is calculated from the distance, and then differentiated sampling is performed according to the size of the truncation distance. On the one hand, this sampling method can quickly concentrate the sampling to key areas and improve the sampling efficiency; on the other hand, the voxel sampled by this sampling method Basically, they are concentrated near the surface of the object. Therefore, the deep learning network trained based on this voxel can better represent the texture details of the object when performing image prediction, and can reduce the phenomenon of blur and structural errors.

With reference to the first aspect, in an optional solution of the first aspect, after training the light field prediction model according to the sampled spatial coordinates of the spatial point, the method further includes: predicting the light field prediction model by using the light field prediction model. The light field of the scene. That is to say, after training the pre-stored light field model, the model training device will also predict the light field through the light field prediction model.

With reference to the first aspect or any of the above-mentioned possible implementation manners of the first aspect, in yet another optional solution of the first aspect, the Sampling space points in the cube includes: performing fusion calculation on the plurality of cut-off distances of each small cube to obtain the fusion cut-off distance of each small cube; The sample space points in each small cube are described. In this implementation, the fusion cutoff distance is calculated for each small cube, and then sampling is performed according to the fusion cutoff distance.

With reference to the first aspect or any of the above-mentioned possible implementation manners of the first aspect, in yet another optional solution of the first aspect, the Sampling space points in a cube includes: determining at least one first small cube whose absolute value of cutoff distance is smaller than a preset threshold in each of the small cubes; and fusing the plurality of cutoff distances of the first small cube Calculate to obtain the fusion cut-off distance of the first small cube; and sample spatial points from the first small cube according to the fusion cut-off distance of the first small cube. In this way, the fusion cut-off distance is not calculated for all small cubes, but only for the small cubes whose absolute value of cut-off distance is less than the preset threshold, because when the cut-off distance is large, the corresponding The farther the cube is from the surface of the object, the less necessary for subsequent sampling. Therefore, this application does not perform fusion calculation on the truncation distance of such small cubes, which is equivalent to excluding these small cubes from the sampling category in advance. Without reducing the effect of subsequent sampling, the amount of calculation is reduced, and the generation efficiency of the light field prediction model is improved.

In combination with the first aspect or any of the above possible implementation manners of the first aspect, in yet another optional solution of the first aspect, in the process of sampling spatial points from the small cube, the longer the fusion cutoff distance is. Smaller cubes sample more points in space. It can be understood that the smaller the fusion truncation distance is, the closer the cube is to the surface of the object. Compared with the spatial points in other small cubes on the same camera ray, the spatial points in such small cubes can better reflect the pixel information. Therefore, training the light field prediction model based more on the spatial points in such small cubes is beneficial to the subsequent prediction of more accurate images by the light field prediction model.

With reference to the first aspect or any of the above possible implementation manners of the first aspect, in yet another optional solution of the first aspect, the fusion calculation includes weighted average calculation. It can be understood that the fusion cutoff distance calculated by the weighted average can more accurately reflect the distance between the small cube and the surface of the object.

In combination with the first aspect or any of the above-mentioned possible implementations of the first aspect, in yet another optional solution of the first aspect, the cutoff distance of the second small cube is calculated based on the first sample image. The weight value occupied in the weighted average calculation is negatively correlated with the distance from the second small cube to the camera when the first sample image was captured, and/or positively correlated with the first included angle, so The first angle is the angle between the field of view camera ray where the second small cube is located and the normal vector of the surface of the object closest to the second small cube, and the second small cube is the cube Any small cube in the model.

In this way, the weight value calculated based on each sample image is used to calculate the fusion cutoff distance, because the weight value is negatively correlated with the distance to the camera, and/or is positive with the first included angle Therefore, when the weight value is combined to calculate the fusion cut-off distance, the influence of different orientations on the fusion cut-off distance can be more accurately reflected.

With reference to the first aspect or any of the above-mentioned possible implementation manners of the first aspect, in yet another optional solution of the first aspect, the calculation of each Before the cut-off distance of the small cube, the method further includes: when capturing each sample image in the plurality of sample images, collecting depth information from the shooting angle of each sample image, wherein the depth information is used to characterize the camera The distance to the surface of the object in the scene.

In this way, collecting depth information from the shooting angle of each sample image can more accurately reflect the distance from the camera to the surface of the object in the scene being shot, which is conducive to calculating a more accurate cutoff distance.

In combination with the first aspect or any of the above-mentioned possible implementations of the first aspect, in another optional solution of the first aspect, in the process of calculating the fusion cutoff distance of the second small cube, based on the first The weight value occupied by the cut-off distance of the second small cube calculated from a sample image in the weighted average calculation, which may also be called the weight value w of the second small cube calculated according to the first sample image (p), the weight value w(p) satisfies the following relationship:

w(p)=cos(θ)/distance(v)

Wherein, θ is the first included angle, and distance(v) is the distance from the second small cube to the camera when the first sample image is captured.

As mentioned above, the weight value of the second small cube calculated according to the first sample image is negatively correlated with the distance from the second small cube to the camera when the first sample image was captured, and / or, it is positively correlated with the first included angle, the expression of w(p) here is an optional expression of this idea, so when the weight value is combined to calculate the fusion cutoff distance, it can reflect more accurately The effect of different orientations on the fusion cutoff distance.

In combination with the first aspect or any of the above possible implementation manners of the first aspect, in yet another optional solution of the first aspect, the calculated value of the second small cube based on the first sample image is The cutoff distance d(p) satisfies the following relationship:

d(p)=sdf(p)/|u|

Wherein, sdf(p) is the difference between the distance from the camera to the first small cube when the first sample image is captured, and the distance from the camera to the surface of the object in the scene, and u is a preset threshold.

It can be understood that d(p) here is only an optional calculation formula for the cut-off distance, and other expressions may also be used in practical applications.

With reference to the first aspect or any of the above possible implementation manners of the first aspect, in yet another optional solution of the first aspect, if sdf(p)>|u|, then d(p)=1; If sdf(p)<0 and |sdf(p)|>|u|, then d(p)=-1.

In this way, the truncation distance of the small cubes in one range is assigned as 1, and the truncation distance of the small cubes in another range is assigned as -1, which is conducive to the subsequent processing of the two types of small cubes. Thereby improving computational efficiency.

In a second aspect, an embodiment of the present application provides an apparatus for generating a light field prediction model, the apparatus comprising:

a establishing unit, configured to establish a cube model surrounding the scene to be shot according to the respective shooting orientations of the multiple sample images, wherein the cube model includes a plurality of small cube voxels;

a first calculation unit, configured to calculate a plurality of cutoff distances of each small cube in the plurality of small cubes according to the plurality of sample images, wherein one of each small cube calculated according to the first sample image The truncation distance includes: determining the truncation distance according to the distance from the camera to each small cube and the distance from the camera to the surface of the object in the scene when the first sample image is captured, the first sample image being the any one of the multiple sample images;

a sampling unit, configured to sample spatial points from the small cubes according to a plurality of cutoff distances of each small cube, wherein each sampled spatial point corresponds to a spatial coordinate;

The second computing unit is configured to train a light field prediction model according to the sampled spatial coordinates of the spatial point, wherein the light field prediction model is used to predict the light field of the scene.

In the above device, the voxel sampling points used for training the deep learning network (also called the light field prediction model, which is used to predict the three-dimensional light field) are obtained based on the depth information of the image, specifically based on the depth information and the voxel to the camera. A truncation distance is calculated from the distance, and then differentiated sampling is performed according to the size of the truncation distance. On the one hand, this sampling method can quickly concentrate the sampling to key areas and improve the sampling efficiency; on the other hand, the voxel sampled by this sampling method Basically, they are concentrated near the surface of the object. Therefore, the deep learning network trained based on this voxel can better represent the texture details of the object when performing image prediction, and can reduce the phenomenon of blur and structural errors.

In conjunction with the second aspect, in an optional solution of the second aspect, the device further includes:

A prediction unit, configured to predict the light field of the scene by using the light field prediction model. That is to say, after training the pre-stored light field model, the model training device will also predict the light field through the light field prediction model.

In combination with the second aspect or any of the above-mentioned possible implementations of the second aspect, in yet another optional solution of the second aspect, according to the multiple cut-off distances of each small cube, from the small cube In terms of sampling space points, the sampling unit is specifically used to: perform fusion calculation on the plurality of cut-off distances of each small cube to obtain the fusion cut-off distance of each small cube; according to each small cube The fusion cut-off distance samples spatial points from each of the small cubes. In this implementation, the fusion cutoff distance is calculated for each small cube, and then sampling is performed according to the fusion cutoff distance.

In combination with the second aspect or any of the above-mentioned possible implementations of the second aspect, in yet another optional solution of the second aspect, according to the multiple cut-off distances of each small cube, from the small cube In terms of sampling space points, the sampling unit is specifically configured to: determine at least one first small cube whose absolute value of the truncation distance is less than a preset threshold in each small cube; A fusion calculation is performed on a plurality of cutoff distances to obtain a fusion cutoff distance of the first small cube; spatial points are sampled from the first small cube according to the fusion cutoff distance of the first small cube. In this way, the fusion cut-off distance is not calculated for all small cubes, but only for the small cubes whose absolute value of cut-off distance is less than the preset threshold, because when the cut-off distance is large, the corresponding The farther the cube is from the surface of the object, the less necessary for subsequent sampling. Therefore, this application does not perform fusion calculation on the truncation distance of such small cubes, which is equivalent to excluding these small cubes from the sampling category in advance. Without reducing the effect of subsequent sampling, the amount of calculation is reduced, and the generation efficiency of the light field prediction model is improved.

In combination with the second aspect or any of the above-mentioned possible implementations of the second aspect, in yet another optional solution of the second aspect, in the process of sampling spatial points from the small cube, the fusion cutoff distance is longer. Smaller cubes sample more points in space. It can be understood that the smaller the fusion truncation distance is, the closer the cube is to the surface of the object. Compared with the spatial points in other small cubes on the same camera ray, the spatial points in such small cubes can better reflect the pixel information. Therefore, training the light field prediction model based more on the spatial points in such small cubes is beneficial to the subsequent prediction of more accurate images by the light field prediction model.

With reference to the second aspect or any of the above possible implementation manners of the second aspect, in yet another optional solution of the second aspect, the fusion calculation includes weighted average calculation. It can be understood that the fusion cutoff distance calculated by the weighted average can more accurately reflect the distance between the small cube and the surface of the object.

In combination with the second aspect or any of the above-mentioned possible implementations of the second aspect, in yet another optional solution of the second aspect, the cutoff distance of the second small cube is calculated based on the first sample image. The weight value occupied in the weighted average calculation is negatively correlated with the distance from the second small cube to the camera when the first sample image was captured, and/or positively correlated with the first included angle, so The first angle is the angle between the field of view camera ray where the second small cube is located and the normal vector of the surface of the object closest to the second small cube, and the second small cube is the cube Any small cube in the model.

In combination with the second aspect or any of the above-mentioned possible implementation manners of the second aspect, in yet another optional solution of the second aspect, the device further includes:

an acquisition unit, configured to acquire depth information from a shooting angle of view of each sample image when shooting each sample image in the plurality of sample images, wherein the depth information is used to represent the distance from the camera to the scene distance from the surface of the object.

In combination with the second aspect or any of the above-mentioned possible implementations of the second aspect, in yet another optional solution of the second aspect, in the process of calculating the fusion cutoff distance of the second small cube, based on the first The weight value of the cut-off distance calculated from a sample image in the weighted average calculation can also be called the weight value w(p) of the second small cube calculated according to the first sample image. The weight value w(p) satisfies the following relationship:

w(p)=cos(θ)/distance(v)

In combination with the second aspect or any of the above-mentioned possible implementation manners of the second aspect, in another optional solution of the second aspect, the second small cube calculated according to the first sample image is The cutoff distance d(p) satisfies the following relationship:

d(p)=sdf(p)/|u|

With reference to the second aspect or any of the above possible implementation manners of the second aspect, in yet another optional solution of the second aspect, if sdf(p)>|u|, then d(p)=1; If sdf(p)<0 and |sdf(p)|>|u|, then d(p)=-1.

In a third aspect, an embodiment of the present application provides a device for generating a light field prediction model, including a processor and a memory, wherein the memory is used to store a computer program, and when the computer program runs on the processor, the first The method described in one aspect or any optional solution of the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program runs on a processor, the first aspect or the first aspect is implemented Any of the alternatives of the method described.

By implementing the embodiments of the present application, the voxel sampling points used for training a deep learning network (also called a light field prediction model, used for predicting a three-dimensional light field) are obtained based on the depth information of the image, specifically based on the depth information and the voxel to The distance of the camera calculates a truncation distance, and then performs differential sampling according to the size of the truncation distance. On the one hand, this sampling method can quickly concentrate the sampling in key areas and improve the sampling efficiency; The voxel is basically concentrated near the surface of the object, so the subsequent deep learning network trained based on this voxel can better represent the texture details of the object when performing image prediction, and can reduce the phenomenon of blur and structural errors .

Description of drawings

1 is a schematic diagram of a scenario for acquiring NeRF provided by an embodiment of the present application;

2A is a schematic diagram of a sampling voxel scenario provided by an embodiment of the present application;

2B is a schematic diagram of a three-dimensional light field and RGB information changing with depth information according to an embodiment of the present application;

3 is a schematic diagram of the architecture of a model training provided by an embodiment of the present application;

4 is a schematic flowchart of a method for determining a three-dimensional light field of a scene provided by an embodiment of the present application;

5 is a schematic diagram of the distance from a camera to the voxel and the surface of an object provided by an embodiment of the present application;

6 is a schematic diagram of a scenario of a cut-off distance provided by an embodiment of the present application;

7 is a schematic diagram of the distribution of a truncation distance provided by an embodiment of the present application;

8 is a schematic diagram illustrating a comparison of prediction effects of a light field prediction model provided by an embodiment of the present application;

9 is a schematic structural diagram of an apparatus for generating a light field prediction model provided by an embodiment of the present application;

FIG. 10 is a schematic structural diagram of another apparatus for generating a light field prediction model provided by an embodiment of the present application.

Detailed ways

The embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

Please refer to FIG. 1. FIG. 1 is a schematic diagram of a scene for obtaining Neural Radiance Fields (NeRF). In the method shown in FIG. 1, a sparse image data set is used to synthesize a three-dimensional light field of a complex scene, which specifically includes : As in part (a), for a scene represented by a five-dimensional (5D) coordinate system, a single five-dimensional (5D) coordinate on the field of view (camera ray)

input to a fully connected form of the deep learning network, where the coordinates

Contains the spatial position (x, y, z) and the viewing direction

Information. As in part (b), the deep learning network reconstructs (i.e. outputs) coordinates

The corresponding RGB information, which can be expressed as RGBσ, includes density and color. As in part (c), after volume rendering on RGBσ, compare it with the coordinates

The actual RGB information at the location is compared, and the rendering loss is obtained, such as part (d), and the deep learning network is continued to be trained based on the rendering loss. After training the deep learning network with the 5D coordinates of the spatial points collected on the camera ray according to the above (a), (b), (c), and (d) processes, the deep learning network can predict the new 5D coordinates. Therefore, for a scene represented by a 5D coordinate system, the deep learning network can predict the view of viewing the scene from any angle, and the view set of all viewing angles of the scene is the three-dimensional light field of the scene.

After analyzing the operation principle of the deep learning network used in the scene shown in Fig. 1, the inventor of the present application found that for a certain field of view (camera ray, also called viewing ray), if it passes through the surface of the object (surface) point, the RGB information (or pixel) corresponding to the camera ray is mainly characterized by the depth and color of the point on the surface of the object. In the process, it is first necessary to uniformly sample from the camera ray to obtain the sampling points. As shown in Figure 2A, parts (e) and (f) are uniformly sampled, and the sampling granularity of part (f) is smaller; then, for Each sampling point is trained and analyzed to obtain the sampling points near the surface of the object. Based on this part of the sampling points near the surface of the object, the approximate range of the "surface of the object" can be determined. Therefore, based on the approximate range, further training and analysis can be used to obtain the object. The sampling points of the surface; since the depth and color of the sampling points on the surface of the object can basically reflect the RGB information (or pixels) corresponding to the camera ray, the sampling points based on the surface of the object can be trained to predict the three-dimensional light field. Network (or light field prediction model). As shown in Figure 2B, the horizontal direction represents the change of the image depth information on a camera ray, the vertical direction represents the change amount of the 3D light field on the camera ray with the depth and the change amount of the RGB information on the ray with the depth. The weights and RGB information of the deep learning network for predicting 3D light fields have high coincidence in terms of being affected by depth. The depth information can reflect the distance from the camera to the surface of the object. Therefore, it can be considered that the weight and RGB information of the deep learning network used to predict the three-dimensional light field have a high coincidence in terms of being affected by the surface of the object.

In this process, since the deep learning network is uniformly sampled from the camera ray, for each sampling point on the camera ray, a trial-and-error method is needed to find the samples on the camera ray that are located near the surface of the object. Points, and sampling points located on the surface of the object, this trial-and-error method leads to a very large computational pressure on the deep learning network, so the convergence speed of the deep learning network is slow, and this trial-and-error method cannot accurately The sampling points on the surface of the object are located on the camera ray, so the accuracy of the deep learning network trained based on such sampling points is not high, resulting in a relatively large error in the three-dimensional light field predicted by the deep learning network.

The inventor of the present application believes that the depth information can reflect the distance from the camera to the surface of the object, so importing the depth information into the entire deep learning network can reduce the invalid computation of the deep learning network and improve the convergence speed and efficiency of the deep learning network. Specifically, the depth information is used to focus on sampling and training near the depth value of each camera ray, so as to ensure that the deep learning network quickly converges to the surface of the object on the camera ray in the early stage of training, and concentrates computing power to represent the texture details of the object. , to avoid blur and structural errors.

Please refer to FIG. 3. FIG. 3 is a schematic diagram of a model training architecture provided by an embodiment of the present application. The architecture includes a model training device 301 and one or more model using devices 302. The model training device 301 and the model using device 302 Communication between them is carried out by wire or wireless, so the model training device 301 can send the trained deep learning network (or light field prediction model) for predicting the three-dimensional light field to the model using device 302; accordingly, the model The device 302 is used to predict the three-dimensional light field in a particular scene through the received deep learning network. Of course, it is also possible that the model training device itself predicts the three-dimensional light field in the scene based on the deep learning network obtained by training.

Optionally, the model using device 302 can feed back the result predicted based on the model to the above-mentioned model training device 301, so that the model training device 301 can further train the model based on the prediction result of the model using device 302; retraining; The good model can be sent to the model usage device 302 to update the original model.

The model training device 301 may be a device with relatively strong computing power, for example, a server, or a server cluster composed of multiple servers.

The model-using device 302 is a device that needs to acquire a three-dimensional light field of a specific scene, such as a handheld device (eg, a mobile phone, a tablet computer, a PDA, etc.), a vehicle-mounted device (eg, a car, a bicycle, an electric vehicle, an airplane, a ship, etc.) ), wearable devices (such as smart watches (such as iWatch, etc.), smart bracelets, pedometers, etc.), smart home devices (such as refrigerators, TVs, air conditioners, electricity meters, etc.), smart robots, workshop equipment, etc.

The following is an example of the model using device 302 being a car and a mobile phone, respectively.

For example, with the development of the economy and the increasing number of cars around the world, navigation maps play a very critical role in improving the efficiency of cars on the road; on some complex roads, it is often difficult for users to obtain comprehensive information on the road. However, using the method of this embodiment of the present application can predict the three-dimensional light field of a specific scene, so the user can be presented with the effect of viewing complex roads from various directions, which is beneficial for the user to control driving and improve the efficiency of vehicle traffic.

Another example is that online shopping is very common now. Consumers know the product form of the item by viewing the photos of the item online. However, the photos of many items are very limited at present, and consumers can only see the effect of viewing the item from some directions. However, using the method of the embodiment of the present application can predict the three-dimensional light field of the item, so the user can view the product form of the item from a full perspective, which is beneficial to help the user choose a more suitable product.

There are many more examples where 3D light fields need to be predicted, such as VR house viewing, VR movies, games, street scene production, etc.

Please refer to FIG. 4. FIG. 4 is a schematic flowchart of a method for determining a three-dimensional light field of a scene provided by an embodiment of the present application. The method may be implemented based on the architecture shown in FIG. 3, or may be implemented based on other architectures. When implemented based on the architecture shown in FIG. 3 , steps S400 - S405 may be implemented by the model training device 301 , and step S406 may be implemented by the model using device 302 . When implemented based on other architectures, steps S400-S406 can be completed by one device, or can be completed by multiple devices in cooperation. The application field of the one device or the multiple devices is not limited here, and corresponding devices can be provided. Computing power, and/or communication capabilities are sufficient. Steps S400-S406 are as follows:

Step S400: Inputting a plurality of sample images and information of shooting orientations of the plurality of sample images into the deep learning network.

The multiple sample images are images of the same scene captured by the camera at different orientations. Optionally, the orientation (Pose) includes position coordinates (x, y, z) and a viewing angle direction.

For example, if the world coordinate system is used as a reference, x, y, and z in the position coordinates (x, y, z) represent longitude, latitude, and altitude respectively.

θ in ,

represent the horizontal and vertical angles, respectively. Of course, the orientation can also be expressed in other ways.

Step S401 : establishing a cube model surrounding the captured scene according to the shooting orientation of the multiple sample images when shooting.

It can be understood that a cube model that can surround the scene can be constructed based on multiple different orientations when the camera takes the multiple photos. Optionally, the length, width, and height of the cube model are calculated from the multiple orientations. The maximum value of length C, width W, and height H of the scene. The scene here is not limited, for example, it can be a scene with people as the main object, another example, it can be a scene with a tree as the main object, another example, it can be a scene with the internal structure of the house as the main scene, etc. . The multiple different orientations mentioned here may be four orientations of east, west, south, and north, or may be different orientations under other reference objects.

The cube model is divided into a plurality of small cubes (grid voxel), referred to as voxel. Optionally, the voxel can be obtained by dividing the cube model into N equal parts, and its size can be set according to actual needs. It can be understood that when The smaller the voxel, the better the training accuracy of the subsequent deep learning network, but the smaller the voxel will also lead to the computational pressure during deep learning network training. Usually, the size of the voxel is considered in combination with the accuracy and the computing power of the device. For example, the voxel can be The size is set to 2 centimeters (cm). Generally speaking, for any voxel, it is considered that it is either located on the surface of the object in the above scene, or it is not located on the surface of the object (it can be considered to be located in an open position in the scene).

Optionally, if the position of the voxel in the cube model is expressed as the three-dimensional position coordinate g, that is (x, y, z), when the voxel in the subsequent cube model is sent to the deep learning network for training, a GPU thread can process one. A voxel on (x, y), that is, a GPU process scans a lattice column at (x, y) coordinates. Optionally, the coordinates of the center point of the voxel are generally used as the coordinates of the voxel. Of course, the coordinates of other points in the voxel can also be used as the coordinates of the voxel, for example, the upper left corner of the voxel and the upper right corner of the voxel. Wait.

Step S402: Calculate a plurality of cut-off distances for each small cube in the plurality of small cubes according to the plurality of sample images.

The above multiple sample images are taken by the camera in different orientations. Therefore, it is necessary to calculate the cutoff distance of each small cube in the cube model for the sample images taken in each orientation. For ease of understanding, the first sample image is taken as For illustration, the first sample image is any one of the plurality of sample images. The cut-off distance of each small cube calculated according to the first sample image is calculated according to the distance between the camera and each small cube when the first sample image was captured and the distance between the camera and the surface of the object in the scene. The distance is determined, and an example is given below.

Assuming that the first sample image is taken by the camera in the first orientation, the camera will also collect the depth information of the scene when shooting in the first orientation. The depth information reflects the distance from the camera to the surface of the object in the scene, so the camera is in the In the image captured in one direction, each pixel x corresponds to a depth value, and the depth value (x) corresponding to the pixel x reflects the voxel to the surface of the object on the camera ray corresponding to the pixel x. The distance value(x) of this camera.

The following takes the second small cube (any voxel in the cube model) as an example, its position is known, and when the camera shoots in the first orientation, the position of the camera is also known. Therefore, for the second For the small cube, when the camera shoots in the first orientation, the distance from the second small cube to the camera is known and can be recorded as distance(v).

Therefore, when the camera shoots in the first orientation, the distance from the second small cube in the cube model to the surface of the object in the scene can be marked as: sdf(p)=value(x)-distance(v). The specific scene and The geometric relationship is shown in Figure 5.

In the embodiment of this application, the voxel near the surface of the object should be highlighted. Assuming that the distance from the surface of the object does not exceed the preset threshold u, it is considered to be located near the surface of the object. Then, whether the voxel is located nearby and how close it is can be determined by the truncation distance d(p) represents, wherein, the calculation method of the cut-off distance d(p) of the second small cube voxel can be as follows:

d(p)=sdf(p)/|u|

Optionally, if sdf(p)>|u|, then d(p)=1; if sdf(p)<0 and |sdf(p)|>|u|, then d(p)=-1. Figure 6 illustrates the functional relationship between the truncation distance d(p) and u. FIG. 7 illustrates the distribution of the cutoff distance d(p) of some small cube voxels in the cube model corresponding to the scene captured by the camera.

In an optional solution, the weight value of each voxel can also be calculated. The distance of the camera is negatively correlated, and/or, is positively correlated with the first included angle, and the first included angle is the line of view camera ray where the second small cube is located and the closest distance to the second small cube. The angle between the normal vectors of the surface of the object. This weight value is used for the subsequent fusion calculation of the cutoff distance of the second small cube. The reason for determining the weight value of each voxel in this way is: when the camera shoots in the first orientation, for any voxel in the above-mentioned cube model, its pixel information (including density, color, etc. (color, etc.) is affected by many aspects, for example, the closer it is to the camera, the more pixel information, and the smaller the corresponding first angle (equivalent to the smaller the angle deviated from the camera), the more pixel information.

Optionally, the expression of the weight value w(p) of the second small cube voxel can be as follows:

w(p)=cos(θ)/distance(v)

Based on the above description, based on the above-mentioned first sample image, the cut-off distance d(p) and weight value w(p) of each voxel in the cube model can be calculated. This set of cut-off distance d(p) and weight value w(p) is a parameter relative to the first orientation. Based on other sample images, using the same calculation principle, the truncation distance d(p) and weight value w(p) of each voxel in the cube model can also be calculated. This set of truncation distance d(p) and weight value w( p) is a parameter relative to another orientation.

It should be noted that before calculating the cut-off distance d(p) and weight value w(p) of each voxel, the coordinates of the voxel and the camera can be unified into one coordinate system to facilitate the calculation. For example, according to the size of the voxel and the number of voxels, the position g of each voxel in the cube model can be converted into the position point p in the world coordinate system, and then the position point p in the world coordinate system can be determined according to the camera pose matrix. The mapping point v in the camera coordinate system, and the corresponding pixel point x in the depth image is determined according to the camera internal parameter matrix and the mapping point v point; then, the depth value of the pixel point x is obtained, and the depth value of the pixel point x is in the cube model. On the field of view camera ray where the voxel at position g is located, the distance value(x) from the voxel on the surface of the object to the camera, and the distance from the mapping point v to the origin of the camera coordinate system is recorded as distance(v); obtained value(x) and distance(v) can then be used to calculate the truncation distance d(p) and the weight value w(p).

In an optional solution, when shooting each sample image in the plurality of sample images, depth information is collected from the shooting angle of the each sample image, wherein the depth information is used to represent the distance from the camera to all the sample images. The distance to the surface of the object in the scene, that is, the above value(x). The depth information may be specifically collected by a sensor, for example, the sensor may be a radar sensor, an infrared sensor or the like, and the sensor may be deployed on or near the camera.

Step S403: Perform fusion calculation on the cut-off distances calculated according to the plurality of sample images respectively, to obtain the fusion cut-off distance of the small cubes in the cube model.

In the first optional solution, the fusion calculation is performed on the multiple truncation distances of each small cube to obtain the fusion truncation distance of each small cube; in subsequent sampling, it is based on the The fusion cutoff distance of the small cubes samples the spatial points from each of the small cubes.

In the second optional solution, it is specifically determined that each small cube has at least one first small cube whose absolute value of the truncation distance is less than a preset threshold; The truncation distance of the first small cube is fused to calculate the fusion truncation distance of the first small cube in the cube model. For example, if multiple truncation distances of a voxel are calculated based on multiple sample images, then if the If the smallest cutoff distance among the plurality of cutoff distances is smaller than the preset threshold, the one voxel is considered to be the first small cube. For example, the preset threshold may be set to 1. During subsequent sampling, spatial points are sampled from the first small cube according to the fusion cutoff distance of the first small cube.

The following takes the second optional solution as an example to illustrate the principle of fusion computing.

Optionally, the principle of fusion calculation may be that the fusion cut-off distance of the second small cube is obtained by performing a weighted average calculation of cut-off distances calculated based on different sample images, and the second small cube is the one in the cube model. Any small cube. The weight value occupied by the cut-off distance of the second small cube calculated based on the first sample image in the weighted average calculation is equal to the previous "The second small cube calculated based on the first sample image". Weight value w(p)". For example, the following operations are performed for multiple sample images: the cutoff distance d(p) calculated based on one of the sample images is used as the initial fusion cutoff distance D(p), and the weight calculated based on the one sample image is used as the initial fusion cutoff distance D(p) The value w(p) is used as the initial fusion weight value W(p); then, for other sample images, the following operations are performed in turn, and the truncation distance d(p) calculated based on the current sample image is integrated into the existing fusion truncation distance D( p) to update the fusion cut-off distance D(p), and integrate the weight value w(p) calculated based on the current sample image into the existing fusion weight value W(p) to the fusion weight value W(p) Update until the truncation distance d(p) and weight value w(p) calculated based on each sample image are fused.

Optionally, the expression of fusion calculation can be as follows:

D(p)=(W(p)*D(p)+w(p)d(p))/(W(p)+w(p))

W(p)=W(p)+w(p)

Among them, D(p) is the fusion cut-off distance of the second small cube, W(p) is the fusion weight value of the second small cube, d(p) is the cut-off distance of the second small cube calculated based on the current sample image, w(p) is the weight value of the second small cube calculated based on the current sample image.

It can be understood that the final fusion cut-off distance D(p) and fusion weight value W(p) of all voxels in the cube model can be calculated in this way. Optionally, the final fusion cut-off distance D(p) and fusion weight value W(p) of all voxels in the cube model can be input into Marching Cube, and the triangular faces can be calculated to present the cut-off distance field of the cube model.

Step S404: Sample spatial points from the small cubes according to the fusion cutoff distance of the small cubes in the cube model.

Specifically, according to the fusion cut-off distance of the voxel in the cube model, the spatial point is sampled from the voxel in the cube model, and for the first optional solution in step S403, the fusion cut-off distance of each voxel can be based on each voxel. The spatial points are sampled in the voxel, for the second optional solution in step S403, the spatial points can be sampled from the first small cube based on the fusion cutoff distance of the first small cube (ie, the first voxel).

The idea of sampling is that in the cube model, the smaller the fusion cutoff distance is, the more space points are sampled by the small cube; for example, for any phase plane, on the camera ray emitted by the phase plane, the fusion cutoff distance D(p ), the closer the voxel is to 0, the greater the sampling density and the more sampling times. If the fusion cut-off distance D(p) is closer to 1 or -1, the sampling density is smaller and the sampling times less for such voxels. Optionally, the number of samples of cubes whose fusion cut-off distance is greater than or equal to a preset threshold in the cube model is zero. For example, for a voxel whose fusion cut-off distance D(p) is more equal to 1 or equal to -1, it may not be performed at all. sampling.

Optionally, the sampling times of the voxel on the camera ray and the fusion cutoff distance D(p) of the voxel satisfy the following relationship:

n∝(1-|D(p)|)

Among them, D(p) is the fusion cutoff distance of the voxel, and n is the sampling times of the voxel.

It should be noted that each voxel is not a point in the cube space, but contains a large number of spatial points. Therefore, after calculating the sampling times of each voxel through the above method, it can be obtained from the large number of spatial points contained in the voxel. Sampling a part of the spatial points; for example, take 10 voxels in all voxels as an example, if each voxel includes 1000 spatial points, these 10 voxels are respectively expressed as voxel-1, voxel-2, voxel-3, voxel -4, voxel-5, voxel-6, voxel-7, voxel-8, voxel-9, voxel-10, the 10 voxel fusion truncation distances D(p) are 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, then the number of spatial points sampled from voxel-1 is 90, the number of spatial points sampled from voxel-2 is 80, and the number of spatial points sampled from voxel-3 is 90. The number is 70, the number of spatial points sampled from voxel-4 is 60, the number of spatial points sampled from voxel-5 is 50, the number of spatial points sampled from voxel-6 is 40, and the number of spatial points sampled from voxel-7 is 40. The number of spatial points sampled from voxel-8 is 30, the number of spatial points sampled from voxel-8 is 20, the number of spatial points sampled from voxel-9 is 10, and the number of spatial points sampled from voxel-10 is 0. Each spatial point sampled has a spatial coordinate (x, y, z).

In the embodiments of the present application, the distance between some voxels on the camera ray and the camera cannot be detected by the relevant sensors, so the cutoff distance cannot be obtained for such voxels, so sampling based on the cutoff distance cannot be performed. For this type of voxel, traditional uniform sampling (or average sampling) can be used for sampling.

Step S405: Train the deep learning network by using the spatial coordinates of the sampled spatial points and multiple sample images.

The multiple sample images correspond to multiple orientations respectively, and the deep learning network will reconstruct (or calculate) the image of each orientation in the multiple orientations according to the spatial coordinates of the sampled spatial points, until the reconstructed image of each orientation The loss (such as RGB information difference) from the original sample image of the azimuth is less than a preset value, and the loss between the reconstructed image of each azimuth and the original sample image of the azimuth is less than the preset value, then for the depth The training of the learning network is over.

The following is an example for the reconstruction process.

Optionally, the sample image 1 is taken from the orientation 1, then, add the pixel value of the pixel point A in the image 1 to the spatial coordinates of the sampled spatial point on the camera ray corresponding to the pixel point A. , and perform similar operations on other pixels in the image 1. After this operation is performed on all the pixels in the image 1, an image effect can be presented. This image is the reconstructed image A of the orientation 1. The ideal reconstruction The effect is that the loss (eg, RGB information difference) between the image A reconstructed by the deep learning network in orientation 1 and the sample image 1 is less than the preset value, and the sample images and reconstructed images captured in other orientations also satisfy this relationship. The deep learning network after the training can also be called a light field prediction model.

Step S406: Predict the image when the above scene is shot from a new orientation through the deep learning network (light field prediction model).

Specifically, the new orientation here can be other orientations except the above-mentioned multiple orientations, and the orientation can be represented by five-dimensional (5D) coordinates, for example, orientation 1

where (x, y, z) represents the position,

Indicates the viewing angle direction. It can be understood that, in this way, the image of the above-mentioned scene can be predicted from any direction. Generally speaking, when the above-mentioned scene is captured from any direction, the three-dimensional light field of the scene is considered to be obtained.

Optionally, the predicted image may be an RGB image.

Table 1 is the prediction effect of the well-trained deep learning network in the prior art illustrated in Fig. 1, and the contrast situation of the prediction effect of the well-trained deep learning network in the application, the PSNR comparison result can be output by simulation:

Table 1

网络类型Network Type	现有技术current technology	本申请实施例Examples of this application
PSNR-CoarsePSNR-Coarse	3333	3737
PSNR-FinePSNR-Fine	3636	37.237.2
PSNR-TestPSNR-Test	35.535.5	36.536.5

In Table 1, the peak signal-to-noise ratio (PSNR) is an indicator used to measure the prediction effect, and PSNR-Coarse represents the coarse-grained prediction of the existing sample images corresponding to the azimuth image. Effect, PSNR-Fine indicates the effect of fine-grained prediction on the captured image of the azimuth corresponding to the existing sample image, and PSNR-Test indicates the effect of predicting the captured image of a new azimuth. The larger the value of PSNR, the better the prediction effect of the trained deep learning network.

As shown in Figure 8, Figure 8 illustrates the prediction effect of the deep learning network trained in the above prior art, and the comparison of the prediction effect of the deep learning network trained in the application, this result is presented in the form of an image, In FIG. 8 , part (g) is an image of a scene taken, part (h) is the effect of reconstructing the image through a deep learning network in the prior art, and part (i) is an image obtained by using an embodiment of the present application The effect after the deep learning network reconstructs the image. It can be seen that the embodiments of the present application express more and more clearly the texture of the surface of the object, and can better reflect the level of the scene.

In the method shown in Figure 4, the voxel sampling points used for training the deep learning network (for predicting the 3D light field) are obtained based on the depth information of the image, specifically calculated based on the depth information and the distance from the voxel to the camera A truncation distance, and then perform differential sampling according to the size of the truncation distance. On the one hand, this sampling method can quickly concentrate the sampling to key areas and improve the sampling efficiency; on the other hand, the voxels sampled by this sampling method are basically concentrated. In the vicinity of the surface of the object, the deep learning network trained based on this voxel can better represent the texture details of the object when performing image prediction, and can reduce the phenomenon of blur and structural errors.

The methods of the embodiments of the present application are described in detail above, and the apparatuses of the embodiments of the present application are provided below.

Please refer to FIG. 9 . FIG. 9 is a schematic structural diagram of an apparatus 90 for generating a light field prediction model provided by an embodiment of the present application. The apparatus 90 may include a establishing unit 901 , a first calculating unit 902 , a sampling unit 903 and a second The calculation unit 904, wherein the detailed description of each unit is as follows.

A establishing unit 901, configured to establish a cube model surrounding the shot scene according to the respective shooting orientations of the multiple sample images, wherein the cube model includes a plurality of small cube voxels;

The first calculation unit 902 is configured to calculate a plurality of cutoff distances of each small cube in the plurality of small cubes according to the plurality of sample images, wherein the distance of each small cube calculated according to the first sample image is calculated. A cut-off distance includes: determining the cut-off distance according to the distance from the camera to each small cube and the distance from the camera to the surface of the object in the scene when the first sample image is captured, and the first sample image is: any one of the multiple sample images;

Sampling unit 903, configured to sample spatial points from the small cubes according to a plurality of cutoff distances of each small cube, wherein each sampled spatial point corresponds to a spatial coordinate;

The second computing unit 904 is configured to train a light field prediction model according to the sampled spatial coordinates of the spatial point, wherein the light field prediction model is used to predict the light field of the scene.

In an optional solution, the device 90 further includes:

In yet another optional solution, in terms of sampling spatial points from the small cubes according to a plurality of cutoff distances of each small cube, the sampling unit 903 is specifically configured to: for each small cube Perform fusion calculation on the plurality of truncation distances to obtain the fusion truncation distance of each small cube; and sample spatial points from each small cube according to the fusion truncation distance of each small cube. In this implementation, the fusion cutoff distance is calculated for each small cube, and then sampling is performed according to the fusion cutoff distance.

In yet another optional solution, in terms of sampling spatial points from the small cubes according to multiple cutoff distances of the small cubes, the sampling unit 903 is specifically configured to: determine each small cube There is at least a first small cube whose absolute value of truncation distance is less than a preset threshold; the fusion calculation is performed on the plurality of truncation distances of the first small cube to obtain the first small cube fusion truncation distance; The fusion cutoff distance of the first small cube samples spatial points from the first small cube. In this way, the fusion cut-off distance is not calculated for all small cubes, but only for the small cubes whose absolute value of cut-off distance is less than the preset threshold, because when the cut-off distance is large, the corresponding The farther the cube is from the surface of the object, the less necessary for subsequent sampling. Therefore, this application does not perform fusion calculation on the truncation distance of such small cubes, which is equivalent to excluding these small cubes from the sampling category in advance. Without reducing the effect of subsequent sampling, the amount of calculation is reduced, and the generation efficiency of the light field prediction model is improved.

In yet another optional solution, in the process of sampling spatial points from the small cube, the smaller the fusion cut-off distance is, the more spatial points are sampled from the small cube. It can be understood that the smaller the fusion truncation distance is, the closer the cube is to the surface of the object. Compared with the spatial points in other small cubes on the same camera ray, the spatial points in such small cubes can better reflect the pixel information. Therefore, training the light field prediction model based more on the spatial points in such small cubes is beneficial to the subsequent prediction of more accurate images by the light field prediction model.

In yet another optional solution, the fusion calculation includes weighted average calculation. It can be understood that the fusion cutoff distance calculated by the weighted average can more accurately reflect the distance between the small cube and the surface of the object.

In yet another optional solution, the weight value of the cut-off distance of the second small cube calculated based on the first sample image in the weighted average calculation is the same as In the first sample image, the distance of the camera is negatively correlated, and/or, it is positively correlated with the first included angle, and the first included angle is the distance between the field of view camera ray where the second small cube is located. The included angle between the normal vectors of the nearest object surface of the second small cube, where the second small cube is any small cube in the cube model.

In yet another optional solution, the device 90 further includes:

In yet another optional solution, in the process of calculating the fusion cut-off distance of the second small cube, the weight value of the cut-off distance calculated based on the first sample image in the weighted average calculation may also be It is called the weight value w(p) of the second small cube calculated according to the first sample image, and the weight value w(p) satisfies the following relationship:

w(p)=cos(θ)/distance(v)

In yet another optional solution, the cut-off distance d(p) of the second small cube calculated according to the first sample image satisfies the following relationship:

d(p)=sdf(p)/|u|

In yet another optional solution, if sdf(p)>|u|, then d(p)=1; if sdf(p)<0 and |sdf(p)|>|u|, then d( p)=-1.

It should be noted that, the implementation of each unit may also correspond to the corresponding description with reference to the method embodiment shown in FIG. 4 .

Please refer to FIG. 10. FIG. 10 is a device 100 for generating a light field prediction model provided by an embodiment of the present application. The device 100 includes a processor 1001, a memory 1002, and a communication interface 1003. The processor 1001, the memory 1002, and the communication interface 1003. The interfaces 1003 are connected to each other through a bus.

The memory 1002 includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM), or A portable read-only memory (compact disc read-only memory, CD-ROM), the memory 1002 is used for related instructions and data. The communication interface 1003 is used to receive and transmit data.

The processor 1001 may be one or more central processing units (central processing units, CPUs). In the case where the processor 1001 is a CPU, the CPU may be a single-core CPU or a multi-core CPU.

Since the device 100 needs to use multiple sample images when training the light field prediction model, the device 100 needs to acquire the multiple sample images. The device 100 is provided with a camera (or referred to as an image sensor, a photographing device), and the camera can photograph and obtain the plurality of sample images. Optionally, when the device 100 is configured with a camera, a depth sensor may also be configured on the device 100 to collect the depth of the scene in the scene being shot, and the type of the depth sensor is not limited here.

The processor 1001 in the device 100 is configured to read the program code stored in the memory 1002, and perform the following operations: establish a cube model surrounding the shot scene according to the respective shooting orientations of the multiple sample images, wherein the cube model including a plurality of small cube voxels; then, according to the plurality of sample images, respectively calculate a plurality of cut-off distances of each small cube in the plurality of small cubes, wherein each small cube calculated according to the first sample image A cut-off distance of the is any one of the multiple sample images; after that, a spatial point is sampled from the small cube according to a plurality of cutoff distances of each small cube, wherein each sampled spatial point corresponds to a spatial coordinate; then, A light field prediction model is trained according to the sampled spatial coordinates of the spatial points, wherein the light field prediction model is used to predict the light field of the scene.

In an optional solution, after the light field prediction model is trained according to the sampled spatial coordinates of the spatial points, the processor 1001 is specifically configured to: predict the light of the scene by using the light field prediction model field. That is to say, after training the pre-stored light field model, the model training device will also predict the light field through the light field prediction model.

In yet another optional solution, in terms of sampling spatial points from the small cubes according to a plurality of cutoff distances of the small cubes, the processor is specifically configured to: A fusion calculation is performed on the plurality of cutoff distances to obtain the fusion cutoff distance of each small cube; a spatial point is sampled from each small cube according to the fusion cutoff distance of each small cube. In this implementation, the fusion cutoff distance is calculated for each small cube, and then sampling is performed according to the fusion cutoff distance.

In yet another optional solution, in terms of sampling spatial points from the small cubes according to a plurality of cutoff distances of each small cube, the processor is specifically configured to: determine the There is at least one first small cube whose absolute value of the truncation distance is smaller than a preset threshold; the fusion calculation is performed on the plurality of truncation distances of the first small cube to obtain the first small cube fusion truncation distance; according to the The fusion cutoff distance of the first small cube samples spatial points from the first small cube. In this way, the fusion cut-off distance is not calculated for all small cubes, but only for the small cubes whose absolute value of cut-off distance is less than the preset threshold, because when the cut-off distance is large, the corresponding The farther the cube is from the surface of the object, the less necessary for subsequent sampling. Therefore, this application does not perform fusion calculation on the truncation distance of such small cubes, which is equivalent to excluding these small cubes from the sampling category in advance. Without reducing the effect of subsequent sampling, the amount of calculation is reduced, and the generation efficiency of the light field prediction model is improved.

In yet another optional solution, before calculating the cutoff distance of each small cube in the cube model according to the multiple sample images, the processor is further configured to: take the multiple sample images For each sample image in the scene, depth information is collected from the shooting angle of view of each sample image, where the depth information is used to represent the distance from the camera to the surface of the object in the scene.

w(p)=cos(θ)/distance(v)

d(p)=sdf(p)/|u|

It should be noted that, the implementation of each operation may also correspond to the corresponding description with reference to the method embodiment shown in FIG. 4 .

An embodiment of the present application further provides a chip system, the chip system includes at least one processor, a memory, and an interface circuit, the memory, the transceiver, and the at least one processor are interconnected by lines, and the at least one memory Instructions are stored in the ; when the instructions are executed by the processor, the method flow shown in FIG. 4 is implemented.

Embodiments of the present application further provide a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium runs on the processor, the method flow shown in FIG. 4 is implemented.

The embodiment of the present application further provides a computer program product, which implements the method flow shown in FIG. 4 when the computer program product runs on the processor.

To sum up, by implementing the embodiments of the present application, the voxel sampling points used for training a deep learning network (also called a light field prediction model, which is used to predict a three-dimensional light field) are obtained based on the depth information of the image, specifically based on the depth information of the image. A truncation distance is calculated from the depth information and the distance from the voxel to the camera, and then differentiated sampling is performed according to the size of the truncation distance. The voxels sampled by this sampling method are basically concentrated near the surface of the object, so the deep learning network trained based on this voxel can better represent the texture details of the object when performing image prediction, and can reduce blur (blur) and structural errors.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented. The process can be completed by instructing the relevant hardware by a computer program, and the program can be stored in a computer-readable storage medium. When the program is executed , which may include the processes of the foregoing method embodiments. The aforementioned storage medium includes: ROM or random storage memory RAM, magnetic disk or optical disk and other mediums that can store program codes.

Claims

A method for generating a light field prediction model, comprising:

Create a cube model surrounding the shot scene according to the respective shooting orientations of the multiple sample images, wherein the cube model includes a plurality of small cube voxels;

According to the plurality of sample images, a plurality of cut-off distances for each small cube in the plurality of small cubes are respectively calculated, wherein one cut-off distance for each small cube calculated according to the first sample image includes: according to the photographing location In the first sample image, the distance from the camera to each small cube and the distance from the camera to the surface of the object in the scene determine the cut-off distance, and the first sample image is the multiple sample images any one of them;

Sample spatial points from the small cubes according to a plurality of cutoff distances of each small cube, wherein each sampled spatial point corresponds to a spatial coordinate;

A light field prediction model is trained according to the sampled spatial coordinates of the spatial points, wherein the light field prediction model is used to predict the light field of the scene.
The method according to claim 1, wherein after training the light field prediction model according to the sampled spatial coordinates of the spatial points, the method further comprises:

The light field of the scene is predicted by the light field prediction model.
The method according to claim 1 or 2, wherein the sampling of spatial points from the small cubes according to a plurality of cutoff distances of each small cube comprises:

Perform fusion calculation on the plurality of cutoff distances of each small cube to obtain the fusion cutoff distance of each small cube;

Spatial points are sampled from each small cube according to the fusion cutoff distance of each small cube.
The method according to claim 1 or 2, wherein the sampling of spatial points from the small cubes according to a plurality of cutoff distances of each small cube comprises:

determining that there is at least one first small cube whose absolute value of the truncation distance is less than a preset threshold in each of the small cubes;

performing fusion calculation on the plurality of cutoff distances of the first small cube to obtain the fusion cutoff distance of the first small cube;

Spatial points are sampled from the first small cube according to the fusion cutoff distance of the first small cube.
The method according to claim 3 or 4, characterized in that, in the process of sampling spatial points from the small cubes, the smaller the fusion cut-off distance is, the more spatial points are sampled from the small cubes.
The method according to any one of claims 3-5, wherein the fusion calculation includes a weighted average calculation.
The method according to claim 6, wherein the weight value of the cut-off distance of the second small cube calculated based on the first sample image in the weighted average calculation is the same as that of the second small cube to The distance of the camera when shooting the first sample image is negatively correlated, and/or positively correlated with a first included angle, the first included angle being the field of view camera ray where the second small cube is located The included angle with the normal vector of the surface of the object closest to the second small cube, where the second small cube is any small cube in the cube model.
A device for generating a light field prediction model, comprising:

a establishing unit, configured to establish a cube model surrounding the scene to be shot according to the respective shooting orientations of the multiple sample images, wherein the cube model includes a plurality of small cube voxels;

a first calculation unit, configured to calculate a plurality of cutoff distances of each small cube in the plurality of small cubes according to the plurality of sample images, wherein one of each small cube calculated according to the first sample image The cut-off distance includes: determining the cut-off distance according to the distance from the camera to each small cube and the distance from the camera to the surface of the object in the scene when the first sample image is captured, and the first sample image is any one of the multiple sample images;

a sampling unit, configured to sample spatial points from the small cubes according to a plurality of cutoff distances of each small cube, wherein each sampled spatial point corresponds to a spatial coordinate;

The second computing unit is configured to train a light field prediction model according to the sampled spatial coordinates of the spatial point, wherein the light field prediction model is used to predict the light field of the scene.
The apparatus according to claim 8, wherein the apparatus further comprises:

A prediction unit, configured to predict the light field of the scene by using the light field prediction model.
The device according to claim 8 or 9, wherein, in terms of sampling spatial points from the small cubes according to a plurality of cutoff distances of each small cube, the sampling unit is specifically used for:

Perform fusion calculation on the plurality of cutoff distances of each small cube to obtain the fusion cutoff distance of each small cube;

Spatial points are sampled from each small cube according to the fusion cutoff distance of each small cube.
The device according to claim 8 or 9, wherein, in terms of sampling spatial points from the small cubes according to a plurality of cutoff distances of each small cube, the sampling unit is specifically used for:

determining that there is at least one first small cube whose absolute value of the truncation distance is less than a preset threshold in each of the small cubes;

Perform a fusion calculation on the plurality of truncation distances of the first small cube to obtain the fusion truncation distance of the first small cube;

Spatial points are sampled from the first small cube according to the fusion cutoff distance of the first small cube.
The apparatus according to claim 10 or 11, characterized in that, in the process of sampling spatial points from the small cube, the smaller the fusion cut-off distance is, the more spatial points are sampled from the small cube.
The apparatus according to any one of claims 10-12, wherein the fusion calculation includes a weighted average calculation.
The device according to claim 13, wherein the weight value of the cut-off distance of the second small cube calculated based on the first sample image in the weighted average calculation is the same as that of the second small cube to The distance of the camera when shooting the first sample image is negatively correlated, and/or positively correlated with a first included angle, the first included angle being the field of view camera ray where the second small cube is located The included angle with the normal vector of the surface of the object closest to the second small cube, where the second small cube is any small cube in the cube model.
A device for generating a light field prediction model, characterized by comprising a processor and a memory, wherein the memory is used to store a computer program, and the computer program implements any of claims 1-7 when running on the processor. one of the methods described.
A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, and when the computer program runs on a processor, the method of any one of claims 1-7 is implemented .