CN117015966A

CN117015966A - Method and related device for generating light field prediction model

Info

Publication number: CN117015966A
Application number: CN202180095331.6A
Authority: CN
Inventors: 郑凯; 韩磊; 李琳; 李选富; 林天鹏
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-03-15
Filing date: 2021-03-15
Publication date: 2023-11-07
Also published as: WO2022193104A1

Abstract

A method and a related device for generating a light field prediction model are provided, wherein the method comprises the following steps: establishing a cube model surrounding a shot scene according to respective shooting orientations of a plurality of sample images, wherein the cube model comprises a plurality of small cubes voxel; calculating a plurality of cutoff distances of each of the plurality of cubes based on the plurality of sample images, respectively (S402); sampling space points from the small cubes according to a plurality of cutoff distances of each small cube, wherein each sampled space point corresponds to a space coordinate; training a light field prediction model according to the spatial coordinates of the sampled spatial points, wherein the light field prediction model is used for predicting a light field of the scene. The method can improve the sampling efficiency of the voxel, thereby improving the generation efficiency of the light field prediction model.

Description

Method and related device for generating light field prediction model

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for generating a light field prediction model.

Background

The prospect of virtual reality display technology is very broad, and virtual display hardware devices are becoming simple, easy to use, and popular from card boxes (cardbox), virtual reality head mounted displays (Vive VR), head mounted displays (Oculus lift), to virtual reality helmets (VR Glass) introduced in 2019. However, unlike the rapid improvements of virtual display hardware devices, high quality virtual reality digital content is very limited. Unlike conventionally displayed two-dimensional (2D) digital content, in order to enhance the feeling of being immersive (e.g., display content changes with human motion), virtual reality content requires a three-dimensional light field of a scene, capturing the three-dimensional light field of the scene requires very complex hardware equipment, and flexibility in three-dimensional light field acquisition is limited. Therefore, the use of computer vision algorithms to acquire three-dimensional light fields is a new research direction, but how to efficiently and accurately acquire three-dimensional light fields based on computer vision algorithms is a technical problem faced by those skilled in the art.

Disclosure of Invention

The embodiment of the application discloses a method and a related device for generating a light field prediction model, which can improve the generation efficiency of the light field prediction model.

In a first aspect, an embodiment of the present application discloses a method for generating a light field prediction model, where the method includes: establishing a cube model surrounding a shot scene according to respective shooting orientations of a plurality of sample images, wherein the cube model comprises a plurality of small cubes voxel; then, respectively calculating a plurality of cutoff distances of each small cube in the plurality of small cubes according to the plurality of sample images, wherein one cutoff distance of each small cube calculated according to the first sample image comprises: determining the cut-off distance according to the distance from a camera to each small cube and the distance from the camera to the surface of an object in the scene when a first sample image is shot, wherein the first sample image is any one of the plurality of sample images; sampling space points from the small cubes according to the plurality of cut-off distances of each small cube, wherein each sampled space point corresponds to a space coordinate; a light field prediction model is then trained from the sampled spatial coordinates of the spatial points, wherein the light field prediction model is used to predict a light field of the scene.

In the method, the voxel sampling point used by the deep learning network (also called a light field prediction model for predicting a three-dimensional light field) is trained, the depth information of an image is obtained based, specifically, a cutoff distance is calculated based on the depth information and the distance from the voxel to a camera, and then differential sampling is carried out according to the size of the cutoff distance, on one hand, the sampling mode can be used for rapidly concentrating the sampling to a key area, and the sampling efficiency is improved; on the other hand, the pixels sampled by the sampling mode are basically concentrated near the surface of the object, so that the texture detail information of the object can be better represented and the phenomena of blurs and structural errors can be reduced when the subsequent deep learning network trained based on the pixels is used for image prediction.

With reference to the first aspect, in an optional implementation manner of the first aspect, after the training a light field prediction model according to the sampled spatial coordinates of the spatial points, the method further includes: predicting a light field of the scene by the light field prediction model. That is, the model training device predicts the light field through the light field prediction model after training the light field pre-stored model.

With reference to the first aspect or any one of the foregoing possible implementation manners of the first aspect, in a further optional implementation manner of the first aspect, the sampling spatial points from the small cubes according to the plurality of truncation distances of each small cube includes: performing fusion calculation on the plurality of cutoff distances of each small cube to obtain the fusion cutoff distance of each small cube; sampling space points from each small cube according to the fusion cut-off distance of each small cube. In the implementation mode, the fusion cut-off distance of each small cube is calculated, and then sampling is carried out according to the fusion cut-off distance.

With reference to the first aspect or any one of the foregoing possible implementation manners of the first aspect, in a further optional implementation manner of the first aspect, the sampling spatial points from the small cubes according to the plurality of truncation distances of each small cube includes: determining a first small cube with at least one cutting-off distance in each small cube, wherein the absolute value of the first small cube is smaller than a preset threshold value; performing fusion calculation on the plurality of cutoff distances of the first small cube to obtain the fusion cutoff distance of the first small cube; and sampling space points from the first small cube according to the fusion cut-off distance of the first small cube. In this way, the fusion cut-off distance is not calculated for all the microcubes, but is calculated only for the microcubes with the absolute value of the cut-off distance smaller than the preset threshold value, and because the farther the corresponding microcubes are from the surface of the object, the smaller the necessity of sampling the corresponding microcubes is, therefore, the application does not perform fusion calculation on the cut-off distance of the microcubes, which is equivalent to excluding the microcubes from the sampling category in advance, and reduces the calculated amount and improves the generating efficiency of the light field prediction model under the condition of basically not reducing the subsequent sampling effect.

With reference to the first aspect or any one of the foregoing possible implementation manners of the first aspect, in a further alternative implementation manner of the first aspect, in sampling spatial points from the small cube, the more spatial points sampled by the small cube with a smaller fusion cutoff distance are. It can be understood that the shorter the fusion cut-off distance is, the closer the cube is to the object surface, and the spatial points in the small cubes can better embody pixel information compared with the spatial points in other small cubes on the same camera array, so that training of the light field prediction model is performed based on the spatial points in the small cubes, and the light field prediction model is facilitated to predict more accurate images later.

With reference to the first aspect or any one of the foregoing possible implementation manners of the first aspect, in a further alternative aspect of the first aspect, the fusion calculation includes a weighted average calculation. It will be appreciated that the fusion cut-off distance calculated by weighted average can more accurately reflect the distance of the small cube to the object surface.

With reference to the first aspect or any one of the foregoing possible implementation manners of the first aspect, in a further alternative implementation manner of the first aspect, a weight value occupied by a truncated distance of a second small cube calculated based on the first sample image during weighted average calculation is inversely related to a distance from the second small cube to the camera when the first sample image is captured, and/or is positively related to a first included angle, where the first included angle is an included angle between a field line camera ray where the second small cube is located and a normal vector of a surface of an object closest to the second small cube, and the second small cube is any small cube in the cube model.

In this way, the weight value calculated based on each sample image is used when calculating the fusion cut-off distance, and because the weight value is inversely related to the distance from the camera to the time and/or positively related to the first included angle, when the weight value is combined to calculate the fusion cut-off distance, the influence of different orientations on the fusion cut-off distance can be reflected more accurately.

With reference to the first aspect or any one of the foregoing possible implementation manners of the first aspect, in a further optional implementation manner of the first aspect, before the calculating a truncated distance of each small cube in the cube model according to the plurality of sample images, the method further includes: in capturing each sample image of the plurality of sample images, depth information is acquired from a capture perspective of each sample image, wherein the depth information is used to characterize a distance of the camera from a surface of an object in the scene.

In this way, the depth information is acquired from the shooting angle of view of each sample image, which can more accurately reflect the distance of the camera to the object surface in the shot scene, which is advantageous for calculating a more accurate cut-off distance.

With reference to the first aspect or any one of the foregoing possible implementation manners of the first aspect, in a further optional implementation manner of the first aspect, in calculating a fusion truncated distance of a second small cube, a weight value occupied by the truncated distance of the second small cube calculated based on the first sample image during weighted average calculation may also be referred to as a weight value w (p) of the second small cube calculated according to the first sample image, where the weight value w (p) satisfies the following relationship:

w(p)＝cos(θ)/distance(v)

wherein θ is the first included angle, and distance (v) is the distance from the second small cube to the camera when the first sample image is captured.

As mentioned above, the weight value of the second small cube calculated from the first sample image is inversely related to the distance from the second small cube to the camera when the first sample image is taken, and/or is positively related to the first included angle, where the expression of w (p) is an optional expression of this concept, so that when the weight value is combined to calculate the fusion cut-off distance, the influence of different orientations on the fusion cut-off distance can be reflected more accurately.

With reference to the first aspect or any one of the foregoing possible implementation manners of the first aspect, in a further optional implementation manner of the first aspect, the truncated distance d (p) of the second small cube calculated according to the first sample image satisfies the following relationship:

d(p)＝sdf(p)/|u|

wherein sdf (p) is the distance from the camera to the first small cube when the first sample image is taken, and u is a preset threshold value, which is the difference between the distance from the camera to the surface of the object in the scene.

It will be appreciated that d (p) is only an alternative calculation formula for the truncated distance, and other expressions are possible in practical applications.

With reference to the first aspect or any one of the foregoing possible implementation manners of the first aspect, in a further alternative aspect of the first aspect, if sdf (p) > |u|, d (p) =1; if sdf (p) < 0 and |sdf (p) | > |u|, then d (p) = -1.

In this way, the truncated distance of the small cubes in one range is assigned to be 1, and the truncated distance of the small cubes in the other range is assigned to be-1, which is beneficial to carrying out the same processing on the two types of small cubes subsequently, so that the calculation efficiency is improved.

In a second aspect, an embodiment of the present application provides a device for generating a light field prediction model, where the device includes:

The device comprises a building unit, a storage unit and a display unit, wherein the building unit is used for building a cube model surrounding a shot scene according to shooting directions of a plurality of sample images, and the cube model comprises a plurality of small cubes voxel;

a first calculating unit, configured to calculate, according to the plurality of sample images, a plurality of cutoff distances of each of the plurality of microcubes, where the one cutoff distance of each microcubes calculated according to the first sample image includes: determining the cut-off distance according to the distance from a camera to each small cube and the distance from the camera to the surface of an object in the scene when a first sample image is shot, wherein the first sample image is any one of the plurality of sample images;

the sampling unit is used for sampling space points from the small cubes according to the plurality of cutoff distances of each small cube, wherein each sampled space point corresponds to a space coordinate;

a second calculation unit for training a light field prediction model based on the spatial coordinates of the sampled spatial points, wherein the light field prediction model is used for predicting a light field of the scene.

In the device, the voxel sampling point used by the deep learning network (also called a light field prediction model for predicting a three-dimensional light field) is obtained based on the depth information of the image, specifically, a cutoff distance is calculated based on the depth information and the distance from the voxel to a camera, and then differential sampling is carried out according to the size of the cutoff distance, on one hand, the sampling mode can be used for rapidly concentrating the sampling to a key area, and the sampling efficiency is improved; on the other hand, the pixels sampled by the sampling mode are basically concentrated near the surface of the object, so that the texture detail information of the object can be better represented and the phenomena of blurs and structural errors can be reduced when the subsequent deep learning network trained based on the pixels is used for image prediction.

With reference to the second aspect, in an optional aspect of the second aspect, the apparatus further includes:

and the prediction unit is used for predicting the light field of the scene through the light field prediction model. That is, the model training device predicts the light field through the light field prediction model after training the light field pre-stored model.

With reference to the second aspect or any one of the foregoing possible implementation manners of the second aspect, in a further alternative aspect of the second aspect, the sampling unit is specifically configured to sample spatial points from the small cubes according to a plurality of cutoff distances of each small cube; performing fusion calculation on the plurality of cutoff distances of each small cube to obtain the fusion cutoff distance of each small cube; sampling space points from each small cube according to the fusion cut-off distance of each small cube. In the implementation mode, the fusion cut-off distance of each small cube is calculated, and then sampling is carried out according to the fusion cut-off distance.

With reference to the second aspect or any one of the foregoing possible implementation manners of the second aspect, in a further alternative aspect of the second aspect, in sampling a spatial point from the small cubes according to the plurality of cutoff distances of each small cube, the sampling unit is specifically configured to: determining a first small cube with at least one cutting-off distance in each small cube, wherein the absolute value of the first small cube is smaller than a preset threshold value; performing fusion calculation on the plurality of cutoff distances of the first small cube to obtain the fusion cutoff distance of the first small cube; and sampling space points from the first small cube according to the fusion cut-off distance of the first small cube. In this way, the fusion cut-off distance is not calculated for all the microcubes, but is calculated only for the microcubes with the absolute value of the cut-off distance smaller than the preset threshold value, and because the farther the corresponding microcubes are from the surface of the object, the smaller the necessity of sampling the corresponding microcubes is, therefore, the application does not perform fusion calculation on the cut-off distance of the microcubes, which is equivalent to excluding the microcubes from the sampling category in advance, and reduces the calculated amount and improves the generating efficiency of the light field prediction model under the condition of basically not reducing the subsequent sampling effect.

With reference to the second aspect or any one of the foregoing possible implementation manners of the second aspect, in a further alternative aspect of the second aspect, in sampling spatial points from the small cube, the more spatial points sampled by the small cube with a smaller fusion cutoff distance are. It can be understood that the shorter the fusion cut-off distance is, the closer the cube is to the object surface, and the spatial points in the small cubes can better embody pixel information compared with the spatial points in other small cubes on the same camera array, so that training of the light field prediction model is performed based on the spatial points in the small cubes, and the light field prediction model is facilitated to predict more accurate images later.

With reference to the second aspect or any one of the foregoing possible implementation manners of the second aspect, in a further alternative aspect of the second aspect, the fusion calculation includes a weighted average calculation. It will be appreciated that the fusion cut-off distance calculated by weighted average can more accurately reflect the distance of the small cube to the object surface.

With reference to the second aspect or any one of the foregoing possible implementation manners of the second aspect, in a further alternative implementation manner of the second aspect, a weight value occupied by a truncated distance of a second small cube calculated based on the first sample image in weighted average calculation is inversely related to a distance from the second small cube to the camera when the first sample image is captured, and/or is positively related to a first included angle, where the first included angle is an included angle between a field line camera ray where the second small cube is located and a normal vector of a surface of an object closest to the second small cube, and the second small cube is any small cube in the cube model.

In this way, the weight value calculated based on each sample image is used when calculating the fusion cut-off distance, and because the weight value is inversely related to the distance from the camera to the time and/or positively related to the first included angle, the influence of different orientations on the fusion cut-off distance can be reflected more accurately when the weight value is combined to calculate the fusion cut-off distance.

With reference to the second aspect or any one of the foregoing possible implementation manners of the second aspect, in a further alternative aspect of the second aspect, the apparatus further includes:

and the acquisition unit is used for acquiring depth information from the shooting view angle of each sample image when each sample image in the plurality of sample images is shot, wherein the depth information is used for representing the distance from the camera to the surface of an object in the scene.

With reference to the second aspect or any one of the foregoing possible implementation manners of the second aspect, in a further alternative implementation manner of the second aspect, in calculating a fusion truncated distance of a second small cube, a weight value occupied by the truncated distance calculated based on the first sample image during weighted average calculation may also be referred to as a weight value w (p) of the second small cube calculated according to the first sample image, where the weight value w (p) satisfies the following relationship:

w(p)＝cos(θ)/distance(v)

With reference to the second aspect or any one of the foregoing possible implementation manners of the second aspect, in a further alternative aspect of the second aspect, the truncated distance d (p) of the second small cube calculated according to the first sample image satisfies the following relationship:

d(p)＝sdf(p)/|u|

With reference to the second aspect or any one of the foregoing possible implementation manners of the second aspect, in a further alternative aspect of the second aspect, if sdf (p) > |u|, d (p) =1; if sdf (p) < 0 and |sdf (p) | > |u|, then d (p) = -1.

In a third aspect, an embodiment of the present application provides a device for generating a light field prediction model, comprising a processor and a memory, wherein the memory is configured to store a computer program, which when run on the processor implements the method described in the first aspect or any of the alternatives of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored therein, which when run on a processor, implements the method described in the first aspect or any of the alternatives of the first aspect.

By implementing the embodiment of the application, the voxel sampling point used by the deep learning network (also called a light field prediction model for predicting a three-dimensional light field) is obtained based on the depth information of the image, specifically, a cutoff distance is calculated based on the depth information and the distance from the voxel to a camera, and then differential sampling is carried out according to the size of the cutoff distance, on one hand, the sampling mode can be used for rapidly concentrating the sampling to a key area, and the sampling efficiency is improved; on the other hand, the pixels sampled by the sampling mode are basically concentrated near the surface of the object, so that the texture detail information of the object can be better represented and the phenomena of blurs and structural errors can be reduced when the subsequent deep learning network trained based on the pixels is used for image prediction.

Drawings

Fig. 1 is a schematic view of a scene of acquiring NeRF according to an embodiment of the present application;

fig. 2A is a schematic diagram of a scenario of sampling voxel according to an embodiment of the present application;

FIG. 2B is a schematic diagram of a three-dimensional light field and RGB information as a function of depth information according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a model training architecture provided by an embodiment of the present application;

FIG. 4 is a flow chart of a method for determining a three-dimensional light field of a scene according to an embodiment of the present application;

FIG. 5 is a schematic view of the distances between a camera and a surface of a body according to an embodiment of the present application;

fig. 6 is a schematic view of a truncated distance scenario provided in an embodiment of the present application;

FIG. 7 is a schematic diagram of a truncated distance distribution according to an embodiment of the present application;

FIG. 8 is a schematic diagram showing the comparison of the prediction effect of a light field prediction model according to an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a device for generating a light field prediction model according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a generating device of another light field prediction model according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

Referring to fig. 1, fig. 1 is a schematic view of a scene for acquiring a learning ray field (Neural Radiance Fields, neRF), and in the method shown in fig. 1, a sparse image dataset is used to synthesize a three-dimensional light field of a complex scene, which specifically includes: as in (a), for a scene represented by a five-dimensional (5D) coordinate system, a single five-dimensional (5D) coordinate on a field line (camera ray) is obtained Input to a fully connected form of deep learning network, wherein coordinatesComprising spatial position (x, y, z) and viewing angle directionIs a piece of information of (a). The deep learning network reconstructs (i.e., outputs) coordinates as in (b)The corresponding RGB information may be expressed as RGB σ, including Density point (Density) and color (color). After volume rendering (volume rendering) on RGB σ, it is then aligned with coordinates as in (c)The actual RGB information at (d) is compared to derive rendering loss (rendering loss), as in (d), and training of the deep learning network is continued based on the rendering loss. After training the deep learning network according to the procedures of (a), (b), (c) and (D) on the 5D coordinates of the spatial points acquired on the camera array, the deep learning network can predict the RGB information of the new 5D coordinates, so that for the scene represented by the 5D coordinate system, the view of the scene can be predicted to be watched from any angle through the deep learning network, and the view set of all view angles of the scene is the three-dimensional light field of the scene.

After analyzing the operation principle of the deep learning network used in the scene shown in fig. 1, the inventor finds that, for a certain view field line (camera ray, also called viewing ray), if it passes through a point on the surface of an object, RGB information (or pixels) corresponding to the camera ray is mainly characterized as depth and color of the point on the surface of the object, under the premise, in the process of training to obtain a three-dimensional light field, the deep learning network needs to uniformly sample from the camera ray to obtain sampling points, as shown in fig. 2A, where the sampling granularity of the (e) part and the (f) part is smaller; then, training and analyzing each sampling point to obtain a sampling point near the surface of the object, and determining the approximate range of the 'object surface' based on the part of sampling points near the surface of the object, so that the sampling point of the surface of the object is obtained based on the further training and analyzing the approximate range; since the depth and color of the sampling points on the surface of the object can basically reflect the RGB information (or pixels) corresponding to the camera array, the deep learning network (or light field prediction model) for predicting the three-dimensional light field can be trained based on the sampling points on the surface of the object. As shown in fig. 2B, the horizontal direction represents the change of the depth information of the image on one camera ray, and the vertical direction represents the change amount of the three-dimensional light field on the camera ray along with the depth and the change amount of the RGB information on the camera ray along with the depth, so that the weight and the RGB information of the deep learning network for predicting the three-dimensional light field have high coincidence in the aspect of being affected by the depth. And the depth information can reflect the distance from the camera to the object surface, the weight and RGB information of the deep learning network used for predicting the three-dimensional light field can be considered to have high coincidence in the aspect of being influenced by the object surface.

In this process, since the deep learning network samples uniformly from the camera ray, for the sampling points on each camera ray, the sampling points on the camera ray near the object surface and the sampling points on the object surface need to be searched in a trial-and-error mode, and the calculation pressure of the deep learning network is very high due to the trial-and-error mode, so that the convergence speed of the deep learning network is low, and the sampling points on the object surface on the camera ray cannot be accurately positioned due to the trial-and-error mode, so that the precision of the deep learning network trained based on the sampling points is not high, and the error of the three-dimensional light field predicted by the deep learning network is relatively large.

The inventor considers that the depth information can reflect the distance from the camera to the surface of the object, so that the depth information is imported into the whole deep learning network, invalid calculation of the deep learning network can be reduced, and convergence speed and efficiency of the deep learning network are improved. The depth information is particularly utilized to carry out key sampling and training near the depth value on each camera ray, so that the deep learning network is ensured to quickly converge on the surface of an object on the camera ray in the initial stage of training, texture detail information of the object is represented by concentrated calculation force, and the phenomena of blurring and structural errors are avoided.

Referring to fig. 3, fig. 3 is a schematic diagram of a model training architecture provided by an embodiment of the present application, where the architecture includes a model training device 301 and one or more model using devices 302, where the model training device 301 communicates with the model using devices 302 in a wired or wireless manner, so that the model training device 301 may send a trained deep learning network (or a light field prediction model) for predicting a three-dimensional light field to the model using devices 302; accordingly, the model uses the device 302 to predict the three-dimensional light field in a particular scene through the received deep learning network. Of course, it is also possible that the model training device itself predicts the three-dimensional light field in the scene based on the training derived deep learning network.

Optionally, the model using device 302 may feed back the result predicted based on the model to the model training device 301, so that the model training device 301 may further train the model based on the predicted result of the model using device 302; the retrained model may be sent to model-using device 302 for updating the original model.

The model training device 301 may be a device with a relatively high computing power, for example, a server, or a server cluster made up of a plurality of servers.

The model-using device 302 is a device that needs to acquire a three-dimensional light field of a particular scene, such as a handheld device (e.g., a cell phone, a tablet computer, a palm computer, etc.), a vehicle-mounted device (e.g., an automobile, a bicycle, an electric car, an airplane, a ship, etc.), a wearable device (e.g., a smart watch (e.g., iWatch, etc.), a smart bracelet, a pedometer, etc.), a smart home device (e.g., a refrigerator, a television, an air conditioner, an ammeter, etc.), a smart robot, a workshop device, etc.

The model-use device 302 is exemplified as an automobile and a mobile phone, respectively.

For example, with the development of economy, the number of automobiles worldwide is increasing, and the navigation map plays a very critical role in improving the passing efficiency of automobiles on roads; in some road surfaces with complex roads, users often have difficulty in acquiring comprehensive information of the road surfaces, but the method provided by the embodiment of the application can predict the three-dimensional light field of a specific scene, so that the effect of watching the complex road surfaces from all directions can be presented to the users, the driving control of the users is facilitated, and the automobile passing efficiency is improved.

For example, online shopping is very popular, consumers know the product form of an article by checking the photos of the article on the internet, however, the photos of many articles are limited, and the consumers can only see the effect when looking at the article from a part of directions, but the method provided by the embodiment of the application can predict the three-dimensional light field of the article, so that the consumers can look at the product form of the article from all angles, and the method is beneficial to helping the users to choose products more suitable for themselves.

There are many examples of the need to predict three-dimensional light fields, such as VR for house watching, VR for movies, games, street view production, etc.

Referring to fig. 4, fig. 4 is a flowchart of a method for determining a three-dimensional light field of a scene according to an embodiment of the present application, where the method may be implemented based on the architecture shown in fig. 3, or may be implemented based on other architectures. When implemented based on the architecture shown in fig. 3, steps S400-S405 may be implemented by model training device 301 and step S406 may be implemented by model using device 302. When implemented based on other architectures, steps S400-S406 may be performed by one device or may be performed cooperatively by a plurality of devices, where the application area of the one device or the plurality of devices is not limited herein, and may be capable of providing corresponding computing capabilities, and/or communication capabilities. The steps S400-S406 are specifically as follows:

step S400: the method includes inputting a plurality of sample images and information of shooting orientations of the plurality of sample images to a deep learning network.

Wherein the plurality of sample images are images of the same scene taken by the camera at different orientations, optionally the orientation (Pose) comprising position coordinates (x, y, z) and view directionFor example, if reference is made to the world coordinate system, x, y, z in the position coordinates (x, y, z) represent longitude, latitude, altitude, respectively, the viewing angle direction Theta of (a),Representing horizontal and vertical angles, respectively. Of course, this orientation may also be expressed in other ways.

Step S401: and establishing a cube model surrounding the shot scene according to shooting orientations when a plurality of sample images are shot.

It will be appreciated that a cube model may be constructed to surround the scene based on a plurality of different orientations of the camera when the plurality of photographs are taken, and optionally the length, width, and height of the cube model are the maximum values of the scene length C, width W, and height H calculated from the plurality of different orientations, respectively. The scene is not limited, and may be, for example, a scene with a person as a main object, for example, a scene with a tree as a main object, for example, a scene with a house interior structure as a main scene, or the like. The plurality of different orientations mentioned herein may be four orientations of east, west, south and north, and may also be different orientations under other references.

The cube model is divided into a plurality of small cubes (grid cubes), which are abbreviated as voxel, optionally, the voxel can be obtained by dividing the cube model N equally, the size of the voxel can be set according to actual needs, it can be understood that the smaller the voxel is, the training precision of a subsequent deep learning network can be improved, but the smaller the voxel is, the calculation pressure during training of the deep learning network can be caused, and the size of the voxel is generally considered in combination with the precision and the equipment calculation force, for example, the specification (size) of the voxel can be set to be 2 centimeters (cm). Generally, for any one voxel, it is considered to be located on the surface of an object in the scene, or not on the surface of an object (which may be considered to be a clear location in the scene).

Alternatively, if the position of the voxel in the cube model is expressed as a three-dimensional position coordinate g, that is, (x, y, z), then when the voxel in the subsequent cube model is sent to the deep learning network for training, one GPU thread may process the voxel on one (x, y), that is, one GPU process scans the lattice column on one (x, y) coordinate. Alternatively, the coordinates of the center point of the voxel are generally taken as the coordinates of the voxel, and the coordinates of other points in the voxel may be taken as the coordinates of the voxel, for example, the upper left corner of the voxel, the upper right corner of the voxel, and the like.

Step S402: and respectively calculating a plurality of cutoff distances of each small cube in the plurality of small cubes according to the plurality of sample images.

Since the plurality of sample images are captured by the camera in different directions, it is necessary to calculate the cut-off distance of each small cube in the cube model for each of the sample images captured in different directions, and for ease of understanding, a first sample image, which is any one of the plurality of sample images, will be described below as an example. The truncated distance of each small cube calculated from the first sample image is determined according to the distance from the camera to each small cube and the distance from the camera to the object surface in the scene when the first sample image is taken, and is illustrated below.

Assuming that the first sample image is photographed by the camera in the first direction, the camera also collects depth information of a scene when photographing in the first direction, the depth information reflects the distance from the camera to the surface of an object in the scene, so that each pixel point x corresponds to a depth value in the image photographed by the camera in the first direction, and the depth value (x) corresponding to the pixel point x reflects the distance value (x) from the voxel on the surface of the object to the camera on the camera ray corresponding to the pixel point x.

In the following, a second small cube (any one of the cube models) is taken as an example, the position of which is known, and the position of which is also known when the camera is photographed in the first orientation, and thus, for the second small cube, the distance from the second small cube to the camera is known when the camera is photographed in the first orientation, which can be denoted as distance (v).

Thus, when the camera is taking a picture in a first orientation, the distance of the second small cube in the cube model to the surface of the object in the scene can be marked as: sdf (p) =value (x) -distance (v), and the specific scene and geometric relationship are shown in fig. 5.

In the embodiment of the application, the voxel near the surface of the object is emphasized, and if the distance from the surface of the object is not more than the preset threshold u, the voxel is considered to be near the surface of the object, and whether the voxel is near and to what extent can be represented by a cutoff distance d (p), wherein the calculation mode of the cutoff distance d (p) of the second small cube voxel can be as follows:

d(p)＝sdf(p)/|u|

Alternatively, if sdf (p) > |u|, then d (p) =1; if sdf (p) < 0 and |sdf (p) | > |u|, then d (p) = -1. Fig. 6 illustrates the functional relationship between the truncation distance d (p) and u. Fig. 7 illustrates a distribution of the cut-off distances d (p) of a part of the small cube volume in the cube model corresponding to the camera shooting scene.

In an alternative scheme, the weight value of each voxel can be calculated, the weight value of the second small cube obtained by calculation according to the first sample image is inversely related to the distance from the second small cube to the camera when the first sample image is shot, and/or is positively related to a first included angle, and the first included angle is an included angle between a view field line camera ray where the second small cube is located and a normal vector of the object surface closest to the second small cube, and the weight value is used for carrying out fusion calculation on the cut-off distance of the second small cube subsequently. The weight value of each voxel is determined in this way, because: when a camera shoots in a first orientation, for any one of the voxels in the above-mentioned cube model, its pixel information (including Density, color, etc.) is affected in a number of ways, e.g., the closer it is to the camera, the more pixel information, the smaller the corresponding first angle (corresponding to the smaller the angle from the camera's shoot) and the more pixel information.

Alternatively, the expression of the weight value w (p) of the second small cube voxel may be as follows:

w(p)＝cos(θ)/distance(v)

Based on the above description, based on the above first sample image, the cutoff distance d (p) and the weight value w (p) of each voxel in the cube model can be calculated, and the set of cutoff distance d (p) and weight value w (p) are parameters with respect to the first orientation. Based on other sample images, the same calculation principle is adopted, and the cut-off distance d (p) and the weight value w (p) of each voxel in the cube model can be calculated, wherein the cut-off distance d (p) and the weight value w (p) are parameters relative to another azimuth.

It should be noted that, before calculating the cutoff distance d (p) and the weight value w (p) of each voxel, the coordinates of the voxel and the coordinates of the camera may be unified into one coordinate system, so as to facilitate calculation. For example, the position g of each voxel in the cube model can be converted into a position point p under the world coordinate system according to the size of the voxels and the number of the voxels, then a mapping point v of the position point p under the world coordinate system under the camera coordinate system is determined according to the camera pose matrix, and a corresponding pixel point x in the depth image is determined according to the camera internal reference matrix and the mapping point v point; then, obtaining a depth value of a pixel point x, wherein the depth value of the pixel point x is a distance (v) from a pixel on the surface of an object to a camera on a view field line camera ray where the pixel at a position g in the cube model is located, and the distance from a mapping point v to an origin of a camera coordinate system is recorded as the distance (v); the obtained value (x) and distance (v) can be used for the calculation of the cut-off distance d (p) and the weight value w (p).

In an alternative solution, when each sample image of the plurality of sample images is captured, depth information is acquired from a capturing view angle of each sample image, where the depth information is used to characterize a distance from the camera to a surface of an object in the scene, i.e. the value (x) described above. The depth information may be acquired in particular by a sensor, which may be, for example, a radar sensor, an infrared sensor or the like, which may be deployed on or near the camera.

Step S403: and carrying out fusion calculation on the cut-off distances respectively calculated according to the plurality of sample images to obtain the fusion cut-off distance of the small cubes in the cube model.

In a first alternative, specifically, fusion calculation is performed on the multiple truncated distances of each small cube to obtain a fusion truncated distance of each small cube; and when the sampling is carried out subsequently, sampling space points from each small cube according to the fusion cut-off distance of each small cube.

In a second alternative, it is specifically determined that at least one of the small cubes has a first small cube whose absolute value of the truncated distance is smaller than a preset threshold; and then performing fusion calculation on the cut-off distances of the first small cubes calculated according to the plurality of sample images respectively to obtain fusion cut-off distances of the first small cubes in the cube model, for example, if a plurality of cut-off distances of one voxel are calculated based on the plurality of sample images, if the smallest one of the plurality of cut-off distances is smaller than the preset threshold, the one voxel is considered to be the first small cube, and for example, the preset threshold can be set to be 1. And when sampling is carried out subsequently, sampling space points from the first small cube according to the fusion cut-off distance of the first small cube.

The principle of fusion calculation is illustrated below using the second alternative as an example.

Alternatively, the principle of fusion calculation may be that the fusion cutoff distance of the second small cube is obtained by performing weighted average calculation on cutoff distances obtained by calculation based on different sample images, where the second small cube is any small cube in the cube model. The truncated distance of the second small cube calculated based on the first sample image takes up a weight value equal to the previous "weight value w (p) of the second small cube calculated based on the first sample image" in the weighted average calculation. For example, the following operations are performed for a plurality of sample images: taking a cut-off distance D (p) calculated based on one of the sample images as an initial fusion cut-off distance D (p), and taking a weight value W (p) calculated based on the one of the sample images as an initial fusion weight value W (p); then, the following operations are sequentially performed for other sample images, wherein the cut-off distance D (p) calculated based on the current sample image is fused into the existing fusion cut-off distance D (p) to update the fusion cut-off distance D (p), and the weight value W (p) calculated based on the current sample image is fused into the existing fusion weight value W (p) to update the fusion weight value W (p), until fusion of the cut-off distance D (p) calculated based on each sample image and the weight value W (p) is completed.

Alternatively, the fused calculated expression may be as follows:

D(p)＝(W(p)*D(p)+w(p)d(p))/(W(p)+w(p))

W(p)＝W(p)+w(p)

wherein D (p) is the fusion cut-off distance of the second small cube, W (p) is the fusion weight value of the second small cube, D (p) is the cut-off distance of the second small cube calculated based on the current sample image, and W (p) is the weight value of the second small cube calculated based on the current sample image.

It will be appreciated that in this way the final fusion cut-off distance D (p) and fusion weight value W (p) for all voxels in the cube model can be calculated. Alternatively, the final fusion cutoff distance D (p) and the fusion weight value W (p) of all the voxels in the Cube model may be input into the Marching Cube, and the triangular surface may be calculated to present the cutoff distance field of the Cube model.

Step S404: and sampling space points from the small cubes according to the fusion cut-off distance of the small cubes in the cube model.

Specifically, according to the fusion cut-off distance of the voxels in the cube model, the spatial points are sampled from the voxels in the cube model, the spatial points may be sampled from each voxel based on the fusion cut-off distance of each voxel for the first alternative in step S403, and the spatial points may be sampled from the first small cube based on the fusion cut-off distance of the first small cube (i.e., the first voxel) for the second alternative in step S403.

The sampling thinking is that the more the spatial points of the small cube sample with smaller fusion cutoff distance are in the cube model; for example, on a camera ray emitted from any one of the phase planes, the greater the density of voxel samples having a fusion cut-off distance D (p) closer to 0, the greater the number of samples, and if the fusion cut-off distance D (p) is closer to 1 or-1, the lesser the sampling density is for such voxel samples, and the lesser the number of samples is. Alternatively, the number of samples of the cubes with the fusion cut-off distance greater than or equal to the preset threshold in the cube model is zero, for example, for the pixel with the fusion cut-off distance D (p) equal to 1 or equal to-1, no sampling may be performed at all.

Optionally, the following relationship is satisfied between the sampling times of the voxel on the camera array and the fusion cut-off distance D (p) of the voxel:

n∝(1-|D(p)|)

wherein D (p) is the fusion cut-off distance of the voxel, and n is the sampling frequency of the voxel.

It should be noted that, each voxel does not include a point in the cube space, but includes a large number of spatial points, so after the number of times of sampling each voxel is calculated in the above manner, a part of spatial points may be sampled from the large number of spatial points included in the voxel; for example, taking 10 of the total volume of volume as an example, if each volume includes 1000 spatial points, the 10 volumes are represented as volume-1, volume-2, volume-3, volume-4, volume-5, volume-6, volume-7, volume-8, volume-9, volume-10, the 10 volumes merge and truncate distance D (p) is 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, respectively, then the number of spatial points sampled from volume-1 is 90, the number of spatial points sampled from volume-2 is 80, the number of spatial points sampled from volume-3 is 70, the number of spatial points sampled from volume-4 is 60, the number of spatial points sampled from volume-5 is 50, the number of spatial points sampled from volume-6 is 10, the number of spatial points sampled from volume-10 is sampled from volume-5, the number of spatial points sampled from volume-10 is sampled from volume-10, and the number of spatial points sampled from volume-10 is 10. Each spatial point sampled has a spatial coordinate (x, y, z).

In the embodiment of the application, the distance from the voxel on some camera ray to the camera cannot be detected by the related sensor, so that the interception distance of the voxel cannot be obtained for the voxel, and therefore, sampling based on the interception distance cannot be performed. The samples may be taken for such voxels in a conventional uniform sampling (or average sampling) manner.

Step S405: the deep learning network is trained by the spatial coordinates of the sampled spatial points and the plurality of sample images.

The plurality of sample images correspond to a plurality of orientations respectively, the deep learning network reconstructs (or calculates) an image of each of the plurality of orientations according to the spatial coordinates of the sampled spatial points until a loss (for example, RGB information difference) between the reconstructed image of each orientation and the original sample image of the orientation is smaller than a preset value, and the reconstructed image of each orientation and the original sample image of the orientation are smaller than the preset value, so that training for the deep learning network is finished.

The following is an illustration of the reconstruction process.

Optionally, if the sample image 1 is taken from the azimuth 1, then the pixel value of the pixel point a in the image 1 is added to the spatial coordinates of the spatial point sampled on the camera array corresponding to the pixel point a, similar operations are performed on other pixel points in the image 1, and after all pixel points in the image 1 are performed, an image effect can be presented, the image is the reconstructed image a of the azimuth 1, the ideal effect of reconstruction is that the loss (for example, the difference of RGB information) between the reconstructed image a of the azimuth 1 and the sample image 1 of the deep learning network is smaller than the preset value, and the sample images and the reconstructed images taken in other azimuth satisfy the relationship. The training-finished deep learning network may also be referred to as a light field prediction model.

Step S406: the image at the time of capturing the scene from the new azimuth is predicted by the deep learning network (light field prediction model).

In particular, the new azimuth herein may be an azimuth other than the plurality of azimuth described above, which may be expressed by five-dimensional (5D) coordinates, for example, azimuth 1Wherein (x, y, z) represents a position,indicating the viewing angle direction. It will be appreciated that in this way, it is possible to predict the image at the time of capturing the scene from any orientation, and in general, the three-dimensional light field of the scene is considered to be acquired when the image at the time of capturing the scene from any orientation is acquired.

Alternatively, the predicted image may be an RGB image.

Table 1 shows the prediction effect of the trained deep learning network in the prior art shown in fig. 1, and the comparison situation of the prediction effect of the trained deep learning network in the present application, the PSNR comparison result can be output through simulation:

TABLE 1

Network type	Prior Art	Embodiments of the application
PSNR-Coarse	33	37
PSNR-Fine	36	37.2
PSNR-Test	35.5	36.5

In table 1, peak signal-to-noise ratio (PSNR) is an index for measuring a prediction effect, PSNR-Coarse represents an effect when Coarse-granularity prediction is performed on an image of a direction corresponding to an existing sample image, PSNR-Fine represents an effect when Fine-granularity prediction is performed on a captured image of a direction corresponding to an existing sample image, and PSNR-Test represents an effect when prediction is performed on a captured image of a new direction. The larger the value of the peak signal-to-noise ratio PSNR, the better the prediction effect of the trained deep learning network is indicated.

As shown in fig. 8, fig. 8 illustrates the predicted effect of the above prior art trained deep learning network, and the result is presented in the form of an image, in fig. 8, (g) part is an image of a certain scene photographed, (h) part is an effect after the image is reconstructed through the prior art deep learning network, and (i) part is an effect after the image is reconstructed through the deep learning network in the embodiment of the present application. It can be seen that the texture expression of the object surface is more clear, and the scene level can be better reflected.

In the method shown in fig. 4, the voxel sampling point used by the training deep learning network (used for predicting the three-dimensional light field) is obtained based on the depth information of the image, specifically, a cut-off distance is calculated based on the depth information and the distance between the voxel and the camera, and then differential sampling is performed according to the size of the cut-off distance, on one hand, the sampling mode can be used for rapidly concentrating the sampling to a key area, so that the sampling efficiency is improved; on the other hand, the pixels sampled by the sampling mode are basically concentrated near the surface of the object, so that the texture detail information of the object can be better represented and the phenomena of blurs and structural errors can be reduced when the subsequent deep learning network trained based on the pixels is used for image prediction.

The foregoing details of the method according to the embodiments of the present application and the apparatus according to the embodiments of the present application are provided below.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a light field prediction model generating device 90 according to an embodiment of the present application, where the device 90 may include a building unit 901, a first calculating unit 902, a sampling unit 903, and a second calculating unit 904, where the details of each unit are as follows.

A building unit 901, configured to build a cube model surrounding a photographed scene according to respective photographing orientations of a plurality of sample images, where the cube model includes a plurality of small cubes voxel;

a first calculating unit 902, configured to calculate, according to the plurality of sample images, a plurality of cutoff distances of each of the plurality of microcubes, where the one cutoff distance of each microcubes calculated according to the first sample image includes: determining the cut-off distance according to the distance from a camera to each small cube and the distance from the camera to the surface of an object in the scene when a first sample image is shot, wherein the first sample image is any one of the plurality of sample images;

a sampling unit 903, configured to sample spatial points from the microcubes according to the multiple cutoff distances of each microcubes, where each sampled spatial point corresponds to a spatial coordinate;

A second calculation unit 904 for training a light field prediction model based on the spatial coordinates of the sampled spatial points, wherein the light field prediction model is used for predicting a light field of the scene.

In an alternative, the apparatus 90 further includes:

In yet another alternative, the sampling unit 903 is specifically configured to sample spatial points from the small cubes according to the plurality of cutoff distances of each small cube; performing fusion calculation on the plurality of cutoff distances of each small cube to obtain the fusion cutoff distance of each small cube; sampling space points from each small cube according to the fusion cut-off distance of each small cube. In the implementation mode, the fusion cut-off distance of each small cube is calculated, and then sampling is carried out according to the fusion cut-off distance.

In yet another alternative, the sampling unit 903 is specifically configured to, in sampling a spatial point from the small cubes according to the plurality of cutoff distances of each small cube: determining a first small cube with at least one cutting-off distance in each small cube, wherein the absolute value of the first small cube is smaller than a preset threshold value; performing fusion calculation on the plurality of cutoff distances of the first small cube to obtain the fusion cutoff distance of the first small cube; and sampling space points from the first small cube according to the fusion cut-off distance of the first small cube. In this way, the fusion cut-off distance is not calculated for all the microcubes, but is calculated only for the microcubes with the absolute value of the cut-off distance smaller than the preset threshold value, and because the farther the corresponding microcubes are from the surface of the object, the smaller the necessity of sampling the corresponding microcubes is, therefore, the application does not perform fusion calculation on the cut-off distance of the microcubes, which is equivalent to excluding the microcubes from the sampling category in advance, and reduces the calculated amount and improves the generating efficiency of the light field prediction model under the condition of basically not reducing the subsequent sampling effect.

In yet another alternative, the smaller the fusion cut-off distance, the more spatial points that a small cube samples during sampling of spatial points from the small cube. It can be understood that the shorter the fusion cut-off distance is, the closer the cube is to the object surface, and the spatial points in the small cubes can better embody pixel information compared with the spatial points in other small cubes on the same camera array, so that training of the light field prediction model is performed based on the spatial points in the small cubes, and the light field prediction model is facilitated to predict more accurate images later.

In yet another alternative, the fusion calculation includes a weighted average calculation. It will be appreciated that the fusion cut-off distance calculated by weighted average can more accurately reflect the distance of the small cube to the object surface.

In still another alternative scheme, a weight value of a truncated distance of the second small cube calculated based on the first sample image is inversely related to a distance from the second small cube to the camera when the first sample image is shot, and/or is positively related to a first included angle, wherein the first included angle is an included angle between a field line camera ray where the second small cube is located and a normal vector of an object surface closest to the second small cube, and the second small cube is any small cube in the cube model.

In yet another alternative, the apparatus 90 further includes:

In still another alternative, in the process of calculating the fusion truncated distance of the second small cube, a weight value occupied by the truncated distance calculated based on the first sample image in the weighted average calculation may also be referred to as a weight value w (p) of the second small cube calculated according to the first sample image, where the weight value w (p) satisfies the following relationship:

w(p)＝cos(θ)/distance(v)

In yet another alternative, the truncated distance d (p) of the second small cube calculated from the first sample image satisfies the following relationship:

d(p)＝sdf(p)/|u|

In yet another alternative, if sdf (p) > |u|, then d (p) =1; if sdf (p) < 0 and |sdf (p) | > |u|, then d (p) = -1.

It should be noted that the implementation of each unit may also correspond to the corresponding description of the method embodiment shown in fig. 4.

Referring to fig. 10, fig. 10 is a schematic diagram of a light field prediction model generating device 100 according to an embodiment of the present application, where the device 100 includes a processor 1001, a memory 1002, and a communication interface 1003, and the processor 1001, the memory 1002, and the communication interface 1003 are connected to each other by a bus.

Memory 1002 includes, but is not limited to, random access memory (random access memory, RAM), read-only memory (ROM), erasable programmable read-only memory (erasable programmable read only memory, EPROM), or portable read-only memory (compact disc read-only memory, CD-ROM), with memory 1002 for the associated instructions and data. The communication interface 1003 is used to receive and transmit data.

The processor 1001 may be one or more central processing units (central processing unit, CPU), and in the case where the processor 1001 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.

Since the device 100 needs to use multiple sample images when training the light field prediction model, the device 100 needs to acquire the multiple sample images, which may be obtained by receiving the sample images sent by other devices through the communication interface 1003, or a camera (or referred to as an image sensor or a photographing device) is configured on the device 100, and the camera can photograph the multiple sample images. Optionally, when a camera is configured on the device 100, a depth sensor may also be configured on the device 100 for acquiring the depth of a scene in the photographed scene, where the type of the depth sensor is not limited herein.

The processor 1001 in the device 100 is configured to read the program code stored in the memory 1002, and perform the following operations: establishing a cube model surrounding a shot scene according to respective shooting orientations of a plurality of sample images, wherein the cube model comprises a plurality of small cubes voxel; then, respectively calculating a plurality of cutoff distances of each small cube in the plurality of small cubes according to the plurality of sample images, wherein one cutoff distance of each small cube calculated according to the first sample image comprises: determining the cut-off distance according to the distance from a camera to each small cube and the distance from the camera to the surface of an object in the scene when a first sample image is shot, wherein the first sample image is any one of the plurality of sample images; sampling space points from the small cubes according to the plurality of cut-off distances of each small cube, wherein each sampled space point corresponds to a space coordinate; a light field prediction model is then trained from the sampled spatial coordinates of the spatial points, wherein the light field prediction model is used to predict a light field of the scene.

In an alternative, after the training of the light field prediction model according to the sampled spatial coordinates of the spatial points, the processor 1001 is specifically configured to: predicting a light field of the scene by the light field prediction model. That is, the model training device predicts the light field through the light field prediction model after training the light field pre-stored model.

In yet another alternative, the processor is specifically configured to, in sampling spatial points from each small cube according to the plurality of truncated distances of the small cube: performing fusion calculation on the plurality of cutoff distances of each small cube to obtain the fusion cutoff distance of each small cube; sampling space points from each small cube according to the fusion cut-off distance of each small cube. In the implementation mode, the fusion cut-off distance of each small cube is calculated, and then sampling is carried out according to the fusion cut-off distance.

In yet another alternative, the processor is specifically configured to, in sampling spatial points from each small cube according to the plurality of truncated distances of the small cube: determining a first small cube with at least one cutting-off distance in each small cube, wherein the absolute value of the first small cube is smaller than a preset threshold value; performing fusion calculation on the plurality of cutoff distances of the first small cube to obtain the fusion cutoff distance of the first small cube; and sampling space points from the first small cube according to the fusion cut-off distance of the first small cube. In this way, the fusion cut-off distance is not calculated for all the microcubes, but is calculated only for the microcubes with the absolute value of the cut-off distance smaller than the preset threshold value, and because the farther the corresponding microcubes are from the surface of the object, the smaller the necessity of sampling the corresponding microcubes is, therefore, the application does not perform fusion calculation on the cut-off distance of the microcubes, which is equivalent to excluding the microcubes from the sampling category in advance, and reduces the calculated amount and improves the generating efficiency of the light field prediction model under the condition of basically not reducing the subsequent sampling effect.

In yet another alternative, the processor is further configured to, prior to separately calculating the cutoff distance for each small cube in the cube model from the plurality of sample images: in capturing each sample image of the plurality of sample images, depth information is acquired from a capture perspective of each sample image, wherein the depth information is used to characterize a distance of the camera from a surface of an object in the scene.

w(p)＝cos(θ)/distance(v)

d(p)＝sdf(p)/|u|

It should be noted that the implementation of the respective operations may also correspond to the corresponding description of the method embodiment shown with reference to fig. 4.

The embodiment of the application also provides a chip system, which comprises at least one processor, a memory and an interface circuit, wherein the memory, the transceiver and the at least one processor are interconnected through lines, and instructions are stored in the at least one memory; the instructions, when executed by the processor, implement the method flow shown in fig. 4.

Embodiments of the present application also provide a computer readable storage medium having instructions stored therein that, when executed on a processor, implement the method flow shown in fig. 4.

Embodiments of the present application also provide a computer program product for implementing the method flow shown in fig. 4 when the computer program product is run on a processor.

In summary, by implementing the embodiment of the application, the voxel sampling point used by the deep learning network (also called a light field prediction model for predicting a three-dimensional light field) is obtained based on the depth information of the image, specifically, a cut-off distance is calculated based on the depth information and the distance between the voxel and the camera, and then differential sampling is performed according to the size of the cut-off distance, on one hand, the sampling mode can rapidly concentrate the sampling to a key area, and the sampling efficiency is improved; on the other hand, the pixels sampled by the sampling mode are basically concentrated near the surface of the object, so that the texture detail information of the object can be better represented and the phenomena of blurs and structural errors can be reduced when the subsequent deep learning network trained based on the pixels is used for image prediction.

Those of ordinary skill in the art will appreciate that implementing all or part of the above-described method embodiments may be accomplished by a computer program to instruct related hardware, the program may be stored in a computer readable storage medium, and the program may include the above-described method embodiments when executed. And the aforementioned storage medium includes: ROM or random access memory RAM, magnetic or optical disk, etc.

Claims

A method of generating a light field prediction model, comprising:

establishing a cube model surrounding a shot scene according to respective shooting orientations of a plurality of sample images, wherein the cube model comprises a plurality of small cubes voxel;

calculating a plurality of cutoff distances of each small cube in the plurality of small cubes according to the plurality of sample images, wherein the one cutoff distance of each small cube calculated according to the first sample image comprises: determining the cut-off distance according to the distance from a camera to each small cube and the distance from the camera to the surface of an object in the scene when the first sample image is shot, wherein the first sample image is any one of the plurality of sample images;

Sampling space points from each small cube according to the plurality of cut-off distances of each small cube, wherein each sampled space point corresponds to a space coordinate;

training a light field prediction model according to the sampled spatial coordinates of the spatial points, wherein the light field prediction model is used for predicting a light field of the scene.
The method of claim 1, wherein after training a light field prediction model from the sampled spatial coordinates of the spatial points, further comprising:

predicting a light field of the scene by the light field prediction model.
The method according to claim 1 or 2, wherein sampling spatial points from each small cube according to the plurality of cutoff distances of the small cubes comprises:

performing fusion calculation on the plurality of cutoff distances of each small cube to obtain the fusion cutoff distance of each small cube;

sampling space points from each small cube according to the fusion cut-off distance of each small cube.
The method according to claim 1 or 2, wherein sampling spatial points from each small cube according to the plurality of cutoff distances of the small cubes comprises:

Determining a first small cube with at least one cutting-off distance in each small cube, wherein the absolute value of the first small cube is smaller than a preset threshold value;

performing fusion calculation on the plurality of cutoff distances of the first small cube to obtain a fusion cutoff distance of the first small cube;

and sampling space points from the first small cube according to the fusion cut-off distance of the first small cube.
A method according to claim 3 or 4, characterized in that in sampling spatial points from the small cube, the smaller the fusion cut-off distance, the more spatial points the small cube samples.
The method of any of claims 3-5, wherein the fusion calculation comprises a weighted average calculation.
The method according to claim 6, wherein the weight value taken by the truncated distance of the second small cube calculated based on the first sample image when calculated by weighted average is inversely related to the distance from the second small cube to the camera when the first sample image is taken, and/or is positively related to a first included angle, the first included angle is an included angle between a field line camera ray where the second small cube is located and a normal vector of a surface of an object closest to the second small cube, and the second small cube is any small cube in the cube model.
A light field prediction model generation apparatus, comprising:

the device comprises a building unit, a storage unit and a display unit, wherein the building unit is used for building a cube model surrounding a shot scene according to shooting directions of a plurality of sample images, and the cube model comprises a plurality of small cubes voxel;

a first calculating unit, configured to calculate, according to the plurality of sample images, a plurality of cutoff distances of each of the plurality of microcubes, where the one cutoff distance of each microcubes calculated according to the first sample image includes: determining the cut-off distance according to the distance from a camera to each small cube and the distance from the camera to the surface of an object in the scene when the first sample image is shot, wherein the first sample image is any one of the plurality of sample images;

the sampling unit is used for sampling space points from the small cubes according to the plurality of cutoff distances of each small cube, wherein each sampled space point corresponds to a space coordinate;

a second calculation unit for training a light field prediction model based on the spatial coordinates of the sampled spatial points, wherein the light field prediction model is used for predicting a light field of the scene.
The apparatus of claim 8, wherein the apparatus further comprises:

and the prediction unit is used for predicting the light field of the scene through the light field prediction model.
The apparatus according to claim 8 or 9, wherein the sampling unit is specifically configured to, in sampling spatial points from the microcubes according to the plurality of cutoff distances of each microcubes:

performing fusion calculation on the plurality of cutoff distances of each small cube to obtain the fusion cutoff distance of each small cube;

sampling space points from each small cube according to the fusion cut-off distance of each small cube.
The apparatus according to claim 8 or 9, wherein the sampling unit is specifically configured to, in sampling spatial points from the microcubes according to the plurality of cutoff distances of each microcubes:

determining a first small cube with at least one cutting-off distance in each small cube, wherein the absolute value of the first small cube is smaller than a preset threshold value;

performing fusion calculation on the plurality of cutoff distances of the first small cube to obtain the fusion cutoff distance of the first small cube;

and sampling space points from the first small cube according to the fusion cut-off distance of the first small cube.
The apparatus according to claim 10 or 11, characterized in that in sampling spatial points from the small cube, the smaller the fusion cut-off distance, the more spatial points the small cube samples.
The apparatus of any of claims 10-12, wherein the fusion calculation comprises a weighted average calculation.
The apparatus of claim 13, wherein a weight value of a truncated distance of a second small cube calculated based on the first sample image, which is occupied by a weighted average calculation, is inversely related to a distance from the second small cube to the camera when the first sample image is captured, and/or is positively related to a first included angle, which is an included angle between a field line camera ray in which the second small cube is located and a normal vector of a surface of an object closest to the second small cube, and the second small cube is any small cube in the cube model.
A device for generating a light field prediction model, comprising a processor and a memory, wherein the memory is for storing a computer program which, when run on the processor, implements the method of any of claims 1-7.
A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when run on a processor, implements the method of any of claims 1-7.