CN115063708A

CN115063708A - Light source model parameter obtaining method, training method, device and medium

Info

Publication number: CN115063708A
Application number: CN202210481668.7A
Authority: CN
Inventors: 齐越; 郅西腾; 王君义; 高连生; 李弘毅
Original assignee: Shenzhen Beihang Emerging Industrial Technology Research Institute; Beihang University
Current assignee: Shenzhen Beihang Emerging Industrial Technology Research Institute; Beihang University
Priority date: 2022-05-05
Filing date: 2022-05-05
Publication date: 2022-09-16

Abstract

The application provides a light source model parameter obtaining method, a training method, equipment and a medium. The method comprises the following steps: converting the HDR image into a pixel matrix, and performing region growth according to the gray value of each pixel point to obtain a light source region; converting the HDR image into a spherical coordinate representation to obtain a light source parameter of the light source region under the spherical coordinate; selecting a first partial image from the LDR image represented by the spherical coordinates, and mapping the first partial image to the LDR image to obtain a second partial image; rotating pixel points of a light source parameter and an HDR image according to the central point coordinate of the first local image of the LDR image; and acquiring light source model parameters according to the second local image, the rotated light source parameters and the pixels of the rotated HDR image. According to the method, the expressive force of the light source model parameters is increased, and the accuracy of light source estimation is guaranteed.

Description

Light source model parameter obtaining method, training method, device and medium

Technical Field

The present application relates to the field of augmented reality, and in particular, to a method, a training method, a device, and a medium for obtaining parameters of a light source model.

Background

Augmented Reality (AR) is a technology that combines and interacts a virtual world on a screen with a real world scene by performing position and angle calculations on a camera image and adding an image analysis technology. With the development of technologies related to computer graphics and rendering engines, the application scenes of AR are also expanded from entertainment to the fields of industry, medicine, education, and the like, which puts higher demands on the sense of reality and the sense of immersion of applications. The realism and immersion of AR applications comes from the precise positioning and real rendering of virtual objects. When rendering the virtual object, the rendering engine needs the geometric and color information and illumination information of the object, and uses the illumination model to calculate the result. The illumination parameters of the rendering model are consistent with the ambient illumination parameters, so that the sense of reality and the sense of immersion of the augmented reality application can be improved.

In the augmented reality application, the illumination parameters are usually set manually and fixed, and the illumination parameters are either a point light source with a fixed position and a directional light source with a fixed direction, or an HDR image obtained by shooting a mirror ball and converted from the image. However, with the development of deep learning theory and the popularization of application, the deep convolutional neural network can directly estimate scene light source parameters from indoor and outdoor scenes through a network constructed based on CNN and using a large amount of data training. Most of the existing methods for estimating illumination by the deep neural network only generate single or multiple directional light sources, are difficult to provide light source parameters highly consistent with a complex environment, and have limited performance of the directional light sources or point light source models.

Disclosure of Invention

The application provides a light source model parameter obtaining method, a training method, equipment and a medium, which are used for solving the problem that in the prior art, light source parameters and directional light sources or point light source models which are highly consistent with complex environments are difficult to provide and have limited performance.

In a first aspect, the present application provides a method for obtaining parameters of a light source model, including:

converting the HDR image into a pixel matrix, taking a pixel point with the largest gray value in the pixel matrix as a starting point, and performing region growth according to the gray value of each pixel point to obtain a light source region;

converting the HDR image into a spherical coordinate representation, and mapping the light source region to the spherical coordinate of the HDR image to obtain the light source parameter of the light source region under the spherical coordinate;

selecting a first partial image from the LDR image represented by the spherical coordinates, and mapping the first partial image to the LDR image to obtain a second partial image; the LDR image is obtained by converting an HDR image;

rotating pixel points of the light source parameters and the HDR image according to the central point coordinates of the first local image of the LDR image;

and acquiring light source model parameters according to the second local image, the rotated light source parameters and the pixels of the rotated HDR image.

In a possible implementation manner, the converting an HDR image into a pixel matrix, performing region growing according to a gray value of each pixel point by using a pixel point with a largest gray value in the pixel matrix as a starting point, and obtaining a light source region includes:

s2-1, converting the HDR image into a pixel matrix;

s2-2, traversing the pixel matrix, marking the pixel point with the maximum gray value, and correspondingly recording the maximum gray value;

s2-3, constructing a pixel stack and a pixel state matrix, putting the pixel point with the maximum gray value into the pixel stack, traversing each pixel point in the pixel matrix by taking the pixel point with the maximum gray value as a starting point, judging whether the current pixel point is a non-light source and the gray value of the neighbor pixel point is greater than a gray threshold value, if so, setting the state of the current pixel point as a light source, setting the neighbor pixel point of the current pixel point as a light source boundary, and continuously processing the next pixel point until all pixel point processing is completed; if not, setting the state of the current pixel point as a non-light source, and continuously processing the next pixel point until all pixel point processing is finished;

the pixel stack is used for recording the row and column positions of the stacked pixels, the pixel state matrix is used for recording the states of the pixel points, the states of the pixel points comprise a light source, a light source boundary and a non-light source, and the initial states of the pixel state matrix are all non-light sources;

s2-4, repeating S2-3 to obtain at least two different light source areas; the light source area is an area corresponding to the pixel point of the light source in the state of the pixel point.

In one possible implementation, the mapping the illuminant region into a spherical coordinate of the HDR image to obtain the illuminant parameter of the illuminant region included in the spherical coordinate includes:

fitting the light source area into an ellipse, averaging the colors of the light source area, and taking the size of the light source area as the average value of the long axis and the short axis of the fitted ellipse;

mapping a direction vector of a central point of the light source area pointing to a central point of a spherical coordinate of the HDR image into a direction vector of the light source area;

acquiring the average depth of the light source area according to the depth label corresponding to the light source area;

and obtaining the light source parameters according to the direction vector, the average depth, the color, the size and the ambient light component of the light source area.

In a possible implementation manner, the mapping the first partial image into the LDR image to obtain a second partial image includes:

calculating coordinates of pixel points in the first partial image mapped into the LDR image according to the coordinates of the central point of the first partial image and the coordinates of the pixel points in the first partial image;

calculating a clipping row and column value of each pixel in the first partial image in the LDR image according to the coordinates mapped into the LDR image;

and according to the clipping row and column values, carrying out image clipping to obtain the second local image.

In a possible implementation manner, the rotating, according to coordinates of a central point of a first partial image of an LDR image, pixel points of a light source parameter and an HDR image includes:

according to the formula:

θ′＝θ-φ ₁

for direction vector in light source parameter

Rotating to obtain a rotated direction vector

Wherein (lambda) ₀ ,φ ₁ ) Radian coordinates of a central point of a local image of the LDR image;

the rotation method of the pixel point coordinates of the HDR image is the same as the rotation method of the direction vector of the light source parameter.

In a second aspect, the present application provides a light source model training method, including:

inputting light source model parameters into a light source model; the light source model comprises a deep neural network; the light source model parameters are model parameters obtained by the method described above;

and respectively carrying out first-stage training and second-stage training on the light source model to obtain the trained light source model, wherein the loss functions of the first-stage training and the second-stage training are different.

In one possible implementation, the deep neural network includes an encoder and a decoder; the encoder comprises a pre-trained DenseNet-121 model;

the decoder comprises two fully connected layers and five output layers;

the input end of the first full connection layer is connected with the output end of the DenseNet-121 model; the first output end of the first full connection layer is connected with the input end of the second full connection layer; the second output end of the first full connection layer is connected with the first input end of the first output layer;

the first output end of the second full connection layer and the input end of the second output layer; the output end of the second output layer is connected with the second input end of the first output layer; the second output end of the second full connection layer is connected with the input end of the third output layer; the third output end of the second full connection layer is connected with the input end of the fourth output layer; and the fourth output end of the second full connection layer is connected with the input end of the fifth output layer.

In one possible implementation, the first stage trains the loss function for training

Comprises the following steps:

wherein, ω is _r And omega _a Are respectively different weights,. l ₂ (. cndot.) is the mean square error,

to render light source parameters as a function of the HDR panorama containing only light sources, R _w For the HDR image after the rotation, it is,

is a predicted value of the ambient light component, a is a true value of the ambient light component, N is a light source parameter sample number,

is the predicted value of the color value in the ith light source parameter, e is a natural number constant,

is the predicted value of the direction vector in the ith light source parameter, u is a function parameter, pi is 180 degrees,

the predicted value is the size of the ith light source parameter;

the second stage training carries out the training loss function

Comprises the following steps:

wherein the content of the first and second substances,

as a prediction of the mean depth in the ith illuminant parameter, d _i Is the true value of the average depth in the ith light source parameter, s is the true value of the magnitude in the ith light source parameter, c _i The true value of the color value in the ith light source parameter.

In a third aspect, the present application provides a light source model parameter obtaining apparatus, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes computer-executable instructions stored by the memory, causing the at least one processor to perform the light source model parameter acquisition method or the light source model training method.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the light source model parameter obtaining method or the light source model training method as described above.

According to the method, the training method, the device and the medium for obtaining the parameters of the light source model, the HDR image is converted into the pixel matrix, the pixel point with the maximum gray value in the pixel matrix is taken as the starting point, and the region growth is carried out according to the gray value of each pixel point to obtain the light source region; converting the HDR image into a spherical coordinate representation, and mapping the light source region to the spherical coordinate of the HDR image to obtain the light source parameter of the light source region under the spherical coordinate; the light source parameters are spherical Gaussian parameters, comprise the direction, average depth, color and size of a light source, provide light source parameters highly consistent with a complex environment, and have strong expressive ability; further, rotating the light source parameters and the pixel points of the HDR image according to the central point coordinates of the first local image of the LDR image; acquiring light source model parameters according to the second local image, the rotated light source parameters and the pixels of the rotated HDR image; the light source model parameters are selected on multiple layers on the basis of the spherical Gaussian function, so that the data expression is enriched; furthermore, the parameters of the light source model are input into the light source model and are trained in two stages, so that the problem that the corresponding relation between the estimated value and the label value cannot be determined in the first stage, namely the early stage of training, is solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic diagram of a system for model training based on virtual samples according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a framework of a method for obtaining parameters of a light source model according to an embodiment of the present invention;

fig. 3 is a first schematic flow chart of a method for obtaining parameters of a light source model according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of a second method for obtaining parameters of a light source model according to an embodiment of the present invention;

FIG. 5a is a first view of a light source region provided in accordance with an embodiment of the present invention;

FIG. 5b is a second view of a light source area provided in an embodiment of the present invention;

fig. 6 is a third schematic flowchart of a method for obtaining parameters of a light source model according to an embodiment of the present invention;

fig. 7 is a fourth schematic flowchart of a method for obtaining parameters of a light source model according to an embodiment of the present invention;

FIG. 8 is a flowchart of a light source model training method according to an embodiment of the present invention;

FIG. 9a is a panoramic view before rendering according to an embodiment of the present invention;

FIG. 9b is a perspective view of a panoramic light source provided in accordance with an embodiment of the present invention;

fig. 9c is a schematic diagram illustrating rendering results of a panorama according to an embodiment of the present invention;

FIG. 10a is a partial input image provided by an embodiment of the present invention;

FIG. 10b is a perspective view of an embodiment of the present invention;

fig. 11 is a hardware schematic diagram of an order quality evaluation device according to an embodiment of the present invention.

Specific embodiments of the present application have been shown by way of example in the drawings and will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims.

Fig. 1 is a schematic diagram of a system for model training based on virtual samples according to an embodiment of the present invention. As shown in fig. 1, the system provided in this embodiment includes a terminal and a server. The terminal can be a mobile phone, a tablet, a computer and other devices which can interact with a user. For example, the terminal may be the computer 101 and/or the mobile phone 102 shown in fig. 1. The computer 101 may communicate with the server 103 via a wired or wireless network. The handset 102 may communicate with the server 103 via the wireless network 102.

The server 103 may provide various services, such as a search service, to the user. The server 103 obtains access information of the terminal and generates training samples based on the access information. The server 103 may also send the obtained access information to other servers, which generate training samples based on the access information. The present embodiment does not particularly limit the server that generates the training samples.

In the prior art, the method for acquiring the light source parameters generally includes setting manually and fixing, wherein the light source parameters are either a point light source at a fixed position and a directional light source in a fixed direction, or an HDR image obtained by shooting a mirror sphere and converted from the point light source, and inputting the light source parameters into a deep neural network for model training and prediction after the light source parameters are acquired, but the light source parameters can only estimate and generate a single or multiple directional light sources, and are difficult to provide light source parameters highly consistent with a complex environment, and the expressive ability is limited.

Therefore, the present invention provides a new model parameter obtaining and model training method, and fig. 2 is a schematic frame diagram of a light source model parameter obtaining method according to an embodiment of the present invention. As shown in fig. 2, the light source model parameters to be acquired in the embodiment of the present invention include a clipping parameter and a rotation parameter; the cutting parameter is a second local image, a first local image is selected from the LDR image expressed by the spherical coordinates, and the first local image is mapped into the LDR image; the rotation parameters comprise rotated light source parameters and pixels of the rotated HDR image, and the light source parameters are obtained by sequentially performing region growth and spherical coordinate transformation on the HDR image; the light source parameters also comprise direction vectors, average depth, color and size of the light source area; compared with the prior art, the method fully considers the environmental complexity and obtains the multiple model parameters; further, establishing light source model parameters as training samples, and performing iterative training on the deep neural network, wherein the iterative training comprises two stages of model training performed on a corresponding design model structure and two different loss functions, so that sufficient accuracy is achieved; the light source model performance capability is integrally enhanced.

Fig. 3 is a first flowchart of a method for obtaining parameters of a light source model according to an embodiment of the present invention. As shown in fig. 3, the method includes:

s301, converting the HDR image into a pixel matrix, taking a pixel point with the largest gray value in the pixel matrix as a starting point, and performing region growth according to the gray value of each pixel point to obtain a light source region.

For example, the HDR image is converted into a pixel matrix, the pixel matrix includes the gray value of the HDR image, so that a pixel point with the largest gray value can be selected from the pixel matrix, the larger the gray value is, the brighter the pixel point is, so that the pixel point is corresponding to a constituent point of the light source region, the gray value of the pixel point around the pixel point is found by using the pixel point as a starting point, and the light source region range can be expanded step by step until the found pixel point has the too small gray value, and the smaller the gray value is, the darker the pixel point is.

S302, converting the HDR image into a spherical coordinate representation, and mapping the light source region to the spherical coordinate of the HDR image to obtain the light source parameters of the light source region under the spherical coordinate.

For example, a spherical gaussian function is used to convert the HDR image into a spherical coordinate for representation, the light source region is a rectangular matrix region, and coordinate transformation needs to be performed with reference to the HDR image in the spherical coordinate after conversion, so as to obtain light source parameters in the spherical coordinate for representation.

S303, selecting a first local image from the LDR image expressed by the spherical coordinates, and mapping the first local image to the LDR image to obtain a second local image; the LDR image is converted from the HDR image.

For example, the HDR image is converted into an LDR image, the LDR image is subjected to coordinate transformation by a spherical gaussian function to obtain an LDR image represented by a spherical coordinate, and local images are randomly selected from the LDR image represented by the spherical coordinate to obtain a first local image, but this is not a parameter of a subsequent direct input model, and a clipping process is also required to be performed to project the center coordinates, width, height, and pixel point positions within the width and height of the first local image represented by the spherical coordinate onto the LDR image represented by the non-spherical coordinate to perform clipping, thereby obtaining a second local image.

S304, rotating the light source parameters and the pixel points of the HDR image according to the central point coordinates of the first local image of the LDR image.

For example, now, the coordinates of the center point of the first partial image of the LDR image in the spherical coordinates are obtained, the second partial image is obtained by being transformed with the first partial image as a reference, and the light source parameters and the HDR image are not obtained by this reference, and it is necessary to perform respective data mapping with a uniform reference to construct a data set.

S305, obtaining light source model parameters according to the second local image, the rotated light source parameters and the pixels of the rotated HDR image.

For example, after data processing is performed under a unified reference, pixel points of the second partial image, the rotated light source parameters, and the rotated HDR image are obtained, and these data are combined to obtain light source model parameters, which are used as input of a subsequent light source model.

In the embodiment, a light source region is obtained in an HDR image by a region growing mode, the HDR image is represented by a spherical coordinate, the light source region is mapped into the HDR image under the spherical coordinate to obtain a light source parameter, the HDR image is converted into an LDR image, the LDR image is represented by the spherical coordinate, a first local image is selected from the HDR image, the first local image is mapped into the LDR image to obtain a second local image, pixel points of the light source parameter and the HDR image are rotated according to a central point coordinate of the first local image of the LDR image, and the light source model parameter including the second local image, the rotated light source parameter and the pixel points of the rotated HDR image is obtained; the method fully considers the complexity of the environment, confirms the light source model parameters by using different parameters and different conversion modes, and has sufficient environmental adaptability.

Fig. 4 is a schematic flowchart of a second method for obtaining parameters of a light source model according to an embodiment of the present invention. As shown in fig. 4, the method includes:

s401, converting the HDR image into a pixel matrix;

optionally, the HDR image is subjected to gaussian blurring and converted into a floating-point number matrix, where the floating-point number matrix is a pixel matrix;

in one possible implementation, a three-dimensional pixel matrix is converted into a two-dimensional pixel matrix by taking an average value in a channel dimension, and the size is H × W, where H is the matrix height and W is the matrix width, and since the HDR image is a panorama, W is 2H;

s402, traversing the pixel matrix, marking the pixel point with the maximum gray value, and correspondingly recording the maximum gray value.

Optionally, traversing the pixel matrix, recording and updating the maximum value of the gray scale value in the pixel matrix and the pixel point corresponding to the maximum value, and recording the pixel points as the floating point number and the row and column positions of the pixel point respectively.

S403, constructing a pixel stack and a pixel state matrix, putting the pixel point with the maximum gray value into the pixel stack, traversing each pixel point in the pixel matrix by taking the pixel point with the maximum gray value as a starting point, judging whether the current pixel point is a non-light source and whether the gray value of the neighbor pixel point is greater than a gray threshold value, if so, setting the state of the current pixel point as a light source, setting the neighbor pixel point of the current pixel point as a light source boundary, and continuously processing the next pixel point until all pixel point processing is completed; if not, setting the state of the current pixel point as a non-light source, and continuously processing the next pixel point until all pixel point processing is finished;

the pixel stack is used for recording the row and column positions of the stacked pixels, the pixel state matrix is used for recording the states of the pixel points, the states of the pixel points comprise light sources, light source boundaries and non-light sources, and the initial states of the pixel state matrix are all non-light sources.

In one possible implementation, the non-light source state value is set to 0, the light source boundary state value is set to 1, the light source state value is set to 2, and the grayscale threshold is set to 1/5 of the maximum grayscale value;

initializing an empty pixel stack (after initialization, the pixel stack is all 0, which is equivalent to storing data of a full-dark pixel map), setting the state value of the found pixel point with the maximum gray value as 1, and pressing the pixel stack into the pixel stack; at least one pixel point with the maximum gray value is selected;

pressing other pixel points into the pixel stack, traversing neighbor pixel points of each pixel point with the state value of 1, for example, 8 neighbor pixel points with the row and column values of +/-1, if the gray value of the neighbor pixel points is greater than the gray threshold, setting the state value of the neighbor pixel points to be 1, remaining the neighbor pixel points in the pixel stack, setting the state value of the current pixel point to be 2, and popping up the pixel stack; if the gray values of the neighbor pixel points are less than or equal to the gray threshold, setting the state values of the neighbor pixel points and the current central pixel point to be 0; until the pixel stack is empty (or fully dark).

S404, repeating the step S403 to obtain at least two different light source areas; the light source area is an area corresponding to the pixel point of the light source in the state of the pixel point.

The specific light source position can be obtained from the light source pixel point popped from the pixel stack according to the row and column positions, fig. 5a is a first light source area diagram provided by the embodiment of the invention, and fig. 5b is a second light source area diagram provided by the embodiment of the invention. As shown in fig. 5a, the gray scale image is shown on the left, and the light-colored part on the right is the light source region, and the image only has one light source region; as shown in fig. 5b, the gray scale image is shown on the left, and the light-colored portion on the right is the light source region, and three light source regions are selected.

Optionally, obtaining at least two different light source regions means repeatedly obtaining the light source region of one light source on the image, and accuracy can be ensured.

In this embodiment, an HDR image is converted into a pixel matrix, a maximum gray value is obtained by traversing the pixel matrix, region growing is performed with a pixel point corresponding to the maximum gray value as a starting point, a light source region is found, and then at least two light source regions are determined by repeatedly passing through a region growing manner, so that sufficient reliability in parameter selection is ensured.

Fig. 6 is a third schematic flowchart of a method for obtaining parameters of a light source model according to an embodiment of the present invention. As shown in fig. 6, the method includes:

s601, fitting the light source area into an ellipse, averaging the colors of the light source area, and averaging the size of the light source area into the long and short axis of the fitted ellipse.

Optionally, after the region growing is performed, a rectangular pixel matrix is obtained to represent the light source region, the light source region is fitted to an ellipse, the mean value of the major axis and the minor axis of the ellipse is calculated, and the mean value is used as the size of the light source region; and averaging the RGB colors of the light source area to obtain the color value of the light source area.

And S602, mapping a direction vector of the central point of the light source area pointing to the central point of the spherical coordinate of the HDR image into the direction vector of the light source area.

Optionally, the HDR image is transformed to be represented under a spherical coordinate, and a relative position relationship between the light source region and the HDR image is represented by a direction vector between a center point position of the light source region and a center point position of the spherical coordinate of the HDR image, the direction vector pointing to a center point of the spherical coordinate of the HDR image from the center point of the light source region.

S603, obtaining the average depth of the light source area according to the depth mark corresponding to the light source area.

Optionally, the depth value may be obtained according to the light source region, for example, the depth label in the HDR picture may be recorded in an open source database, and the depth value corresponding to the light source region may be selected from the open source database, and the depth values corresponding to the light source region are averaged to obtain the average depth of the light source region.

S604, obtaining the light source parameters according to the direction vector, the average depth, the color, the size and the ambient light component of the light source area.

Optionally, the light source parameter includes a direction vector, an average depth, a color, a size, and an ambient light component of the light source region, for example, assuming that the direction vector of the light source region is represented by l, the average depth of the light source region is represented by d, the color (average) of the light source region is represented by c, and the size of the light source region is represented by s, the light source region parameter may be represented by P _i { l, d, s, c }, where i ═ 1, denotes a first source region parameter; averaging HDR images to obtain an ambient light component a, and if one HDR image selects 3 light source regions, expressing the light source parameter as P ═ P ₁ ,P ₂ ,P ₃ ,a}。

The light source parameters are determined through the parameters of the light source area obtained through calculation, the light source area is fitted into an ellipse, the color of the light source area is averaged, the size of the light source area is averaged with the major axis and the minor axis of the fitted ellipse, the direction vector of the central point of the light source area pointing to the central point of the spherical coordinate of the HDR image is mapped into the direction vector of the light source area, and the average depth of the light source area is obtained according to the depth mark corresponding to the light source area; compared with the prior art, the parameters are added with new light source parameter representation and have more expressive force.

Fig. 7 is a fourth schematic flowchart of a method for obtaining parameters of a light source model according to an embodiment of the present invention. As shown in fig. 7, the method includes:

and S701, calculating coordinates of the pixel points in the first partial image mapped into the LDR image according to the coordinates of the central point of the first partial image and the coordinates of the pixel points in the first partial image.

Optionally, the HDR image is converted into an LDR image, the LDR image is converted into a representation under a spherical coordinate, the first local image of the LDR image under the spherical coordinate is randomly selected to obtain a central point position of the first local image, and a longitude and latitude coordinate of the central point is represented as (x) _cen ,y _cen ) And the radian coordinate is expressed as (lambda) ₀ ,φ ₁ )；

Taking a single coordinate as an example, the relationship between the longitude and latitude coordinates and the radian coordinates is as follows: lambda [ alpha ] ₀ ＝(2x _cen -1)π/2。

By the formula:

obtaining the longitude and latitude relative value (x) of the pixel point of the first local image _i-r ,y _i-r ) (ii) a Wherein (row, col) is the coordinate of the non-central pixel point of the first partial image, W _p Is the width of the first partial image, H _p Is the height of the first partial image.

By the formula:

μ＝tan ^-1 (ρ)

acquiring radian coordinates of pixel points in the first local image mapped to the LDR image under the non-spherical coordinates, namely radian coordinates (phi, lambda) of the second local image, wherein the longitude and latitude coordinates are (x) _i ,y _i ) (ii) a Wherein: sin for medical use ^-1 (. cndot.) is the reciprocal of a sine function, cos (. cndot.) is a cosine function, tan ^-1 (. is) the reciprocal of a tangent function, (λ) ₀ ,φ ₁ ) Is the radian coordinate of the central point of the local image, (x) _i-r ,y _i-r ) Is the longitude and latitude coordinate of the relative coordinate of the pixel point in the local image,

and the radian coordinates of the relative coordinates of the pixel points in the local image are shown.

S702, calculating clipping row and column values of each pixel in the first partial image in the LDR image according to the coordinates mapped to the LDR image.

Optionally, by the formula:

obtaining a row-column value (row) of a pixel of the second partial image sampled from the LDR image in non-spherical coordinates _sam ,col _sam ) Wherein (x) _i ,y _i ) The longitude and latitude coordinates (phi, lambda) of the second local image are obtained, H is the height of the LDR image under the non-spherical coordinate, and W is the width of the LDR image under the non-spherical coordinate.

And S703, performing image clipping to obtain the second local image according to the clipping row and column values.

After the image is cut, the coordinates of the center point of the first partial image are needed to rotate the light source parameters and the pixel points of the HDR image, which includes:

according to the formula:

θ′＝θ-φ ₁

for direction vector in light source parameter

Rotating to obtain a rotated direction vector

Optionally, for each HDR image, 8 partial images are cropped, for each partial image R _p All corresponding to the rotated illumination parameters P, and the original HDR panoramic image after rotation is represented as R _w Then, the light source model parameter to be input into the light source model is expressed as { R _p ,P,R _w }。

In the embodiment, the HDR image is converted into the LDR image, the first local image is obtained under the spherical coordinate, the LDR image represented under the non-spherical coordinate is subjected to perspective transformation according to the first local image, the second local image is cut, the light source parameter and the original HDR panoramic image are subjected to rotation distortion by using the central point coordinate of the first local image, the parameters of the LDR image and the parameters of the original HDR panoramic image are added on the basis of the light source parameter, and data are unified, so that training in a subsequent input model is facilitated.

Fig. 8 is a flowchart of a light source model training method according to an embodiment of the present invention. As shown in fig. 8, the method includes:

s801, inputting light source model parameters into a light source model; the light source model comprises a deep neural network; the light source model parameters are model parameters obtained by the method described above.

Optionally, the deep neural network comprises an encoder and a decoder; the encoder comprises a pre-trained DenseNet-121 model;

the decoder comprises two fully connected layers and five output layers;

Optionally, the input dimension of the first fully-connected layer is 64, and the output dimension is 3072; the input dimension of the first fully-connected layer is 3072, and the output dimension is 512; inputting light source model parameters into a deep neural network, wherein a first output layer of the deep neural network is used for outputting an average depth d, a second output layer is used for outputting a direction vector l, a third output layer is used for outputting a size s, a fourth output layer is used for outputting a color c, and a fifth output layer is used for outputting an ambient light component a.

S802, respectively carrying out first-stage training and second-stage training on the light source model to obtain the trained light source model, wherein the loss functions of the first-stage training and the second-stage training are different.

In one possible implementation, in the process of model training, the network model performs parameter optimization through a back propagation loss function; the training of the network is divided into two phases, for example, 150 times in the first phase and 50 times in the second phase.

Fig. 9a is a panorama provided in an embodiment of the present invention before rendering, fig. 9b is a panorama light source extraction diagram provided in an embodiment of the present invention, and fig. 9c is a panorama rendering result diagram provided in an embodiment of the present invention. As shown in fig. 9a to 9c, a training data set is input into the deep neural network, the training data set includes a plurality of light source model parameters, wherein the light source parameters are stored in the text file; by passing

Rendering a panorama by the function; performing model training by using an Adam optimizer, setting the learning rate of the Adam optimizer to be 0.001, setting the hyper-parameter to be 0.9, and setting the batch processing parameter to be 24; training in a first stage, namely performing loss function supervised network training by using the first stage training, wherein the loss function trained in the first stage training needs to be calculated through an estimation value after the HDR panorama is rendered and a real value of an HDR image and is trained in the first stage; and in the second stage of training, the loss function supervised network training is carried out by utilizing the second stage of training.

Optionally, the first stage training performs a training loss function

Comprises the following steps:

wherein, ω is _r And ω _a Are respectively different weights,. l ₂ (. cndot.) is the mean square error,

to render illuminant parameters as a function of an HDR panorama containing only illuminants, R _w For the HDR image after the rotation, it is,

the predicted value is the size of the ith light source parameter;

second stage training loss function

Comprises the following steps:

wherein the content of the first and second substances,

as a prediction of the mean depth in the ith illuminant parameter, d _i For the true value of the mean depth in the ith light source parameterS is the true value of the magnitude of the ith light source parameter, c _i The true value of the color value in the ith light source parameter.

Optionally, after the light source model is trained, the trained model may be tested through a test set, parameters of the trained model stored locally are loaded through a function when the test is started, and in the test process, the estimation condition of the illumination parameters is recorded.

FIG. 10a is a partial input image provided by an embodiment of the present invention, and FIG. 10b is a panoramic view provided by an embodiment of the present invention; table 1 is a light source estimation table.

TABLE 1 light source estimation Table

As shown in fig. 10a, fig. 10b and table 1, the input image includes two light sources, the two light sources are identified for the model pair image, and the estimation result is accurate; each estimated light source comprises three light source parameters, the size and the average depth of the three light source parameters are unchanged, and the direction and the color have three different values respectively.

Table 2 is a table comparing the method of the present invention with the data of the prior art.

TABLE 2 mean square error COMPARATIVE TABLE

As shown in table 2, since the deplightg method only selects one direction vector for estimation, the mean square error (RMSE) between the estimated value and the true value reaches 0.135, and since the Gardner17 method only selects the HDR panorama for estimation, the mean square error reaches 0.128, the method selects a plurality of parameters under the spherical coordinates, wherein the mean square error except the mean depth is larger, and the mean square error is smaller than 0.1, so the overall effect is improved.

DeepLight method: from Peter K n and Hannes Kafumann.2019 DeepLight light source estimation for estimated reliability using deep learning, Vis Compout.35, 6-8 (Jun 2019), 873-.

Gardner17 method: from Marc-Andre Gardner, Kalyan Sunkavalil, Ersin Yumer, Xiaohui Shen, Emiliano Garnetto, Christian Gagne, and Jean-Francis Lanlon. Learning to prediction index from a single image. ACM Transactions on Graphics (SIGRAPH Asia),9(4), 2017.

The embodiment inputs the parameters of the light source model into the light source model; the light source model comprises an encoder and a decoder, wherein the encoder comprises a pre-trained DenseNet-121 model, and the decoder comprises two full connection layers and five output layers and is used for training light source model parameters to be processed; the model is trained in two stages, the problem that the estimated value and the label value cannot be corresponding in the initial training stage is solved, direct estimation of the HDR image is avoided under the condition that the input data of the model is not single, the expressive force of the light source parameters is strived for, and meanwhile, the estimation running time is saved.

The present application further provides a light source model parameter obtaining apparatus, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes computer-executable instructions stored by the memory, causing the at least one processor to perform a light source model parameter acquisition method or a light source model training method.

Fig. 11 is a hardware schematic diagram of a light source model parameter obtaining device according to an embodiment of the present invention. As shown in fig. 11, the light source model parameter acquiring apparatus 110 provided in the present embodiment includes: at least one processor 1101 and memory 1102. The device 110 further comprises a communication component 1103. The processor 1101, the memory 1102, and the communication unit 1103 are connected by a bus 1104.

In a specific implementation, the at least one processor 1101 executes the computer-executable instructions stored by the memory 1102, so that the at least one processor 1101 executes the light source model parameter obtaining method or the light source model training method as described above.

For a specific implementation process of the processor 1101, reference may be made to the above method embodiments, which implement principles and technical effects similar to each other, and details of this embodiment are not described herein again.

In the embodiment shown in fig. 11, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The Memory may include a high-speed Memory (RAM) and may also include a Non-volatile Memory (NVM), such as at least one disk Memory.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The present application further provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the light source model parameter obtaining method or the light source model training method as described above is implemented.

The computer-readable storage medium may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.

The division of the units is only a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. The invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains, and as may be applied to the details of construction and as follows in the construction and use of the invention, the invention is not limited to the precise arrangements set forth herein and as shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A light source model parameter obtaining method is characterized by comprising the following steps:

rotating pixel points of a light source parameter and an HDR image according to the central point coordinate of the first local image of the LDR image;

2. The method of claim 1, wherein the converting the HDR image into a pixel matrix, and performing region growing according to a gray value of each pixel point by using a pixel point with a largest gray value in the pixel matrix as a starting point to obtain a light source region comprises:

s2-1, converting the HDR image into a pixel matrix;

3. The method as claimed in claim 1, wherein the mapping the illuminant region into a spherical coordinate of the HDR image, the obtaining the illuminant region including illuminant parameters at the spherical coordinate comprises:

acquiring the average depth of the light source area according to the depth label corresponding to the light source area; and obtaining the light source parameters according to the direction vector, the average depth, the color, the size and the ambient light component of the light source area.

4. The method of claim 1, wherein mapping the first partial image into the LDR image to obtain a second partial image comprises:

calculating coordinates of pixel points in the first local image mapped to the LDR image according to the coordinates of the central point of the first local image and the coordinates of the pixel points in the first local image;

5. The method as claimed in claim 4, wherein the rotating pixel points of the HDR image and the light source parameter according to the center point coordinate of the first partial image of the LDR image comprises:

according to the formula:

θ′＝θ-φ ₁

for direction vector in light source parameter

Rotating to obtain a rotated direction vector

6. A light source model training method is characterized by comprising the following steps:

inputting light source model parameters into a light source model; the light source model comprises a deep neural network; the light source model parameters are model parameters obtained by the method of any one of claims 1 to 5;

7. The method of claim 6, wherein the deep neural network comprises an encoder and a decoder; the encoder comprises a pre-trained DenseNet-121 model;

the decoder comprises two fully connected layers and five output layers;

8. The method of claim 6, wherein the first stage trains a trained loss function

Comprises the following steps:

wherein, ω is _r And ω _a Are respectively different weights,/ ₂ (. cndot.) is the mean square error,

is a predicted value of a color value in the ith light source parameter, e is a natural number constant,

is a predicted value of a direction vector in the ith light source parameter, and u is a functionA number parameter, pi is 180 degrees,

the predicted value is the size of the ith light source parameter;

the second stage training performs a training loss function

Comprises the following steps:

wherein the content of the first and second substances,

9. A light source model parameter acquisition apparatus, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the light source model parameter acquisition method of any one of claims 1-5 or the light source model training method of any one of claims 6-8.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a light source model parameter acquisition method according to any one of claims 1 to 5 or a light source model training method according to any one of claims 6 to 8.