CN114764848A - Scene illumination distribution estimation method - Google Patents
Scene illumination distribution estimation method Download PDFInfo
- Publication number
- CN114764848A CN114764848A CN202110042495.4A CN202110042495A CN114764848A CN 114764848 A CN114764848 A CN 114764848A CN 202110042495 A CN202110042495 A CN 202110042495A CN 114764848 A CN114764848 A CN 114764848A
- Authority
- CN
- China
- Prior art keywords
- image
- map
- environment map
- matrix
- feature vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000005286 illumination Methods 0.000 title claims abstract description 49
- 238000009826 distribution Methods 0.000 title abstract description 11
- 239000011159 matrix material Substances 0.000 claims abstract description 64
- 238000013507 mapping Methods 0.000 claims abstract description 9
- 239000013598 vector Substances 0.000 claims description 48
- 238000000605 extraction Methods 0.000 claims description 20
- 238000009877 rendering Methods 0.000 claims description 19
- 238000005259 measurement Methods 0.000 claims description 10
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 230000004927 fusion Effects 0.000 claims description 7
- 230000001131 transforming effect Effects 0.000 claims description 5
- 238000010586 diagram Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 abstract description 6
- 230000003190 augmentative effect Effects 0.000 abstract description 4
- 238000013528 artificial neural network Methods 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 12
- 230000008901 benefit Effects 0.000 description 8
- 230000000007 visual effect Effects 0.000 description 7
- 238000003860 storage Methods 0.000 description 5
- YMHOBZXQZVXHBM-UHFFFAOYSA-N 2,5-dimethoxy-4-bromophenethylamine Chemical compound COC1=CC(CCN)=C(OC)C=C1Br YMHOBZXQZVXHBM-UHFFFAOYSA-N 0.000 description 4
- 241000545067 Venus Species 0.000 description 4
- 238000004590 computer program Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 239000011521 glass Substances 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/006—Mixed reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/50—Lighting effects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/08—Projecting images onto non-planar surfaces, e.g. geodetic screens
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4038—Image mosaicing, e.g. composing plane images from plane sub-images
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Graphics (AREA)
- Computer Hardware Design (AREA)
- Image Analysis (AREA)
Abstract
The application discloses an illumination distribution estimation method which is mainly applied to the fields of virtual reality and augmented reality. The method comprises the steps of constructing a cubic mapping, projecting an image shot by a camera onto the cubic mapping based on an equipment orientation matrix and a camera projection matrix, then constructing a mirror sphere inside a space formed by the cubic mapping, using the mirror sphere to map and project to obtain a scene environment map, and then determining the environment map with a high dynamic range based on a neural network. The environment graph obtained by the method can obtain the illumination effect with high reality and improve the quality of illumination distribution estimation.
Description
Technical Field
The application relates to the technical field of virtual reality and augmented reality, in particular to a scene illumination distribution estimation method for mobile equipment.
Background
In recent years, with the rapid development of hardware devices and algorithms, Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR) technologies are widely used in the industries of education training, military, medical treatment, entertainment, manufacturing, and the like. Among them, the virtual-real fusion of high reality is a core requirement of related applications. Therefore, the light source and the illumination model of the virtual object are kept consistent with the real scene, and the rendering effect with the sense of reality can be obtained. In the traditional scene illumination estimation method, because camera parameters on different mobile phones are inconsistent and only a single image is used as input, rendering effects on different mobile phones are inconsistent in practical application, and jitter is generated for continuous scene input, so that the rendering effect is poor.
Disclosure of Invention
The invention aims to provide a method for estimating scene illumination distribution, which can improve the accuracy of scene illumination estimation, so that when a virtual object is superposed in a real scene, the virtual object can have illumination information more consistent with the environment, the display effect of the virtual object is improved, and the user experience is improved.
The above and other objects are achieved by the features of the independent claims. Further implementations are presented in the dependent claims, the description and the drawings.
In a first aspect, a method for illumination estimation is provided, which includes the following steps: acquiring a first image; determining a device orientation matrix and a camera projection matrix corresponding to the first image; projecting the first image onto a spatial cube map according to the equipment orientation matrix and the camera projection matrix; according to the equipment orientation matrix and the space cube map, using a mirror surface ball to map and project to obtain a scene environment map; and determining an environment map with a high dynamic range according to the first image and the scene environment map.
The technical scheme described in the first aspect can obtain illumination information in a real scene in combination with a cube map and a mirror sphere mapping mode; in the obtained environment image with the high dynamic range, through illumination estimation, illumination information of unobserved pixel points is supplemented, and the illumination estimation effect is improved.
In a possible implementation form according to the first aspect, the device orientation matrix is obtained by an inertial measurement unit.
According to the first aspect, in a possible implementation manner, the determining a device orientation matrix corresponding to the first image further includes: determining a first moment of acquiring a first image; and taking the inertial measurement unit data which is closest to the first moment before the first moment as a device orientation matrix corresponding to the first image.
According to the first aspect, in a possible implementation manner, the projecting the first image onto a spatial cube map according to the device orientation matrix and the camera projection matrix further includes: transforming a first coordinate of a first pixel to a first direction vector by projecting an inverse matrix of a matrix through the camera; wherein the first pixel is a pixel in the first image; transforming, by the device, the first direction vector into a second coordinate by an inverse of a direction matrix of the device.
According to the first aspect, in a possible implementation manner, the obtaining a scene environment map by using a mirror sphere mapping projection according to the device orientation matrix and the spatial cube map further includes: rendering a first mirror sphere within a cubic environment formed by the cube map; determining an ambient direction of reflection of a second pixel on a sphere of the first specular sphere from the device orientation matrix; acquiring the color or brightness of a corresponding pixel on the cube map according to the environment direction; filling the color or brightness into the second pixel.
According to the first aspect, in a possible implementation manner, the determining an environment map with a high dynamic range according to the first image and the scene environment map further includes: determining the high dynamic range environment map using a deep convolutional neural network.
According to the first aspect, in one possible implementation manner, the deep convolutional neural network includes an image feature extraction network, an observed environment map feature extraction network, a feature fusion network, and an illumination generation network, where the image feature extraction network is configured to extract a first feature vector from the first image; the observed environment map feature extraction network is used for extracting a second feature vector from the scene environment map; the feature fusion and illumination generation network is used for generating the environment map with the high dynamic range according to the first feature vector and the second feature vector.
According to the first aspect, in a possible implementation manner, the extracting a second feature vector from the scene environment map further includes: marking observed pixel points and unobserved pixel points in the scene environment image; and determining the second feature vector according to the observed pixel points.
In a possible implementation manner, according to the first aspect, the generating the high-dynamic-range environment map according to the first feature vector and the second feature vector further includes: splicing the first feature vector and the second feature vector to generate a third feature vector; and generating the environment map with the high dynamic range according to the third feature vector.
According to the first aspect, in a possible implementation manner, after determining the environment map with a high dynamic range according to the first image and the scene environment map, the method further includes: rendering the virtual object according to the environment map with the high dynamic range; fusing the rendered virtual object into the first image.
In the illumination estimation method provided by the first aspect, the sensor in the electronic device is used for obtaining the device orientation information, and the camera is combined to record the scene content observed by the user in the process of using the electronic device, so that more effective input information is provided to improve the quality of scene illumination distribution estimation. In addition, the user usually changes the observation angle continuously in the process of using the related application, so that the method can continuously acquire more scene information along with the progress of the application session, and further improves the accuracy of the illumination distribution. Compared with the traditional illumination estimation method only using a single image, the method can provide better continuity constraint and reduce flicker.
In a second aspect, an electronic device is provided that includes one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors, the one or more programs including instructions for causing the electronic device to perform the various implementation methods of the first aspect.
In a third aspect, a computer readable medium is provided for storing one or more programs, wherein the one or more programs are configured to be executed by the one or more processors, the one or more programs comprising instructions for causing an electronic device to perform the implementing methods of the first aspect.
It should be appreciated that the description of technical features, aspects, advantages, or similar language in the specification does not imply that all of the features and advantages may be realized in any single embodiment. Rather, it is to be understood that the description of a feature or advantage is intended to include a specific feature, aspect, or advantage in at least one embodiment. Thus, descriptions of technical features, technical solutions or advantages in this specification do not necessarily refer to the same embodiment. Furthermore, the technical features, aspects and advantages described in the following embodiments may be combined in any suitable manner. One skilled in the relevant art will recognize that an embodiment may be practiced without one or more of the specific features, aspects, or advantages of a particular embodiment. In other instances, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments.
Drawings
Fig. 1 is a flowchart of an illumination estimation method provided in the present application.
Fig. 2 is a schematic diagram of a cycle of reporting data by the camera and the IMU.
FIG. 3 is a schematic diagram of a cube map provided in an embodiment of the present application.
Fig. 4 is a schematic diagram of a deep convolutional neural network provided in an embodiment of the present application.
Fig. 5 is a flowchart of illumination estimation by the deep convolutional neural network provided in the embodiment of the present application.
Fig. 6 is a graph comparing SSIM of the present invention and prior art illumination estimation methods provided in the embodiments of the present application.
Fig. 7 is a PSNR comparison graph of the present invention and the prior art illumination estimation method provided in the embodiment of the present application.
Detailed Description
The terminology used in the following embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in the specification of the present application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the listed items.
Embodiments of an electronic device, a graphical user interface for such an electronic device, and for using such an electronic device are described below. In some embodiments, the electronic device may be a portable electronic device, such as a cell phone, a tablet, a wearable electronic device (e.g., virtual reality glasses, augmented reality glasses, etc.), and so forth, that also incorporates other functionality, such as personal digital assistant and/or music player functionality. Exemplary embodiments of the portable electronic device include, but are not limited to, a mount Or other operating system. The portable electronic device may also be other portable electronic devices, such as a laptop computer with a touch-sensitive surface or touch panel(Laptop), and the like. It should also be understood that in other embodiments, the electronic device may not be a portable electronic device, but rather a desktop computer having a touch-sensitive surface or touch panel.
The following embodiments of the present application provide a scene illumination estimation method, which can improve the quality of scene illumination distribution estimation by combining orientation information of a device and data acquired by a camera.
The scene illumination estimation method of the present application will be described in detail below with reference to specific embodiments. Fig. 1 shows a scene illumination estimation method of the present application.
S102: electronic device obtains image I at time ttAnd a device orientation matrix Mview,t。
Specifically, the electronic device may obtain the image It through the camera in response to an operation by the user. The camera can be a front camera or a rear camera installed on the electronic equipment. Therefore, the acquired image is an image of the real world. Furthermore, the electronic device may obtain the device orientation matrix M through an Inertial Measurement Unit (IMU)view,tThis matrix, which may also be referred to as a view angle matrix, represents the pose information of the electronic device with respect to the reference coordinate system at time t. The reference coordinate system may be a terrestrial coordinate system. Device orientation matrix Mview,tMay be a 3 x 3 or 4 x 4 matrix, in particular Mview,tIt can be expressed in two ways:
device orientation matrix Mview,tShows that the camera is shooting an image ItThe attitude of the user. In some embodiments, the image I is captured by capturing the image I because the orientation of the camera and the cell phone screen are generally consistent in an electronic device such as a cell phonetThe attitude information of the mobile phone can obtain a device orientation matrix M of the camera visual angle relative to the terrestrial coordinate systemview,t. Electronic device The attitude information can be obtained through the API relevant to the system. For example, Android is used, the attitude information of the mobile phone screen relative to the terrestrial coordinate system, such as an euler angle represented by three variables, roll, yaw, and pitch, can be monitored by a rotation vector sensor (RV-sensor). In other embodiments, the three variables roll, pitch, and azimuth may also be used to represent the pose of the electronic device. In other embodiments, if the camera and the mobile phone screen have a certain positional relationship, the device orientation matrix M may also be obtained by converting the relative positional relationshipview,t。
When the camera is used for acquiring an image, the camera can be triggered to shoot an image through the operation of a user, for example, the user shoots an image by touching a shooting key displayed in a graphical user interface of a mobile phone or pressing an entity key. In other embodiments, the camera may be set to automatically acquire multiple images, for example, the camera may be set to capture images according to a certain time period, or the user may be prompted to move the mobile phone on the graphical user interface, and when image capture is triggered, the mobile phone may automatically acquire multiple images in the process that the user moves the mobile phone. The camera can be set to acquire one image, and the camera can also be set to acquire an image sequence comprising a plurality of images. In the process of acquiring the image, the electronic device may be kept facing one direction to shoot, or the electronic device may be facing a different direction to shoot. When the electronic device shoots towards one direction, the device orientation matrix can be kept unchanged, and when the electronic device shoots towards different directions, the device orientation matrix can change along with the change of the orientation of the electronic device.
When the camera is set to automatically acquire images, the camera and the inertia measurement unit respectively provide image data and equipment attitude data periodically. Since the camera and the inertial measurement unit are independent hardware devices, the two data need to be synchronized when the electronic device acquires an image and a device orientation matrix. Generally speaking, the inertial measurement unit has a high sensitivity and a low delay for acquiring data, and the camera has a high delay for acquiring data because the amount of data provided is large and requires some processing. In some embodiments, as shown in fig. 2, the period of image data collected by the camera is greater than the period of data reported by the inertial measurement unit. Therefore, when the electronic device acquires the camera image, the last IMU data, which is closest in system time to the time at which the camera image is acquired, is used as the device orientation matrix corresponding to the acquired image. Specifically, the data of the inertial measurement unit may be collected by a Rotation Vector Rotation _ Vector () function in the sensor management component SensorManager in the system API.
S104: electronic equipment obtains camera projection matrix MprojCombined device orientation matrix M view,tImage ItProjected onto a spatial cube map (cube map).
The camera projection matrix is used to represent hardware performance of the camera, including but not limited to parameters such as Field of View (FOV), aspect ratio, etc. Generally, the projection matrix M of a camera is used without optical zooming using a lensprojUsually fixed values, which can be obtained directly or indirectly via the device parameters. For example, the electronic device may obtain information required by the camera projection matrix by calling a third-party API, such as AR Engine, to generate the camera projection matrix Mproj. The PHYSICAL SIZE of the camera SENSOR can also be obtained by a camera _ INFO _ PHYSICAL _ SIZE parameter in a camera charateristics function in the system API, so as to obtain a camera projection matrix. When the camera of the electronic equipment can perform optical zooming, the projection matrix M of the cameraprojMay vary as adjusted by the user.
In some embodiments, the electronic device may acquire information required for the camera to project the matrix before step S102, or when step S102 is performed.
Fig. 3 illustrates a cube map as referred to in the present application. The cube map (cube map) contains six two-dimensional texture maps, each representing a face of a regular hexahedron. When rendered, cube maps are often used to represent content in various directions (e.g., front, back, left, right, top, bottom, six directions) of a scene environment. Under the condition that the current view angle image, the equipment orientation matrix and the camera projection matrix are known, the spatial direction corresponding to each pixel in the two-dimensional image can be obtained, and therefore the two-dimensional image is projected onto the cube map.
Specifically, the projection may be performed by:
for a pixel point (xa, ya) in the image coordinate system, when it has no depth information, the depth information may be added to the coordinate information of the pixel point, for example, it is transformed into (xa, ya, 1), and then, the matrix M is projectedprojInverse matrix ofTransforming the vector into a camera coordinate system to obtain corresponding direction vectors (xb, yb, zb), and then passing through a visual angle matrix MviewInverse transformation ofAnd transforming the direction vector into a direction vector (xc, yc, zc) in a world coordinate system, and further obtaining the corresponding position of the pixel point on the cube map. Wherein the view angle matrix is obtained in step S102, and the projection matrix is obtained in step S104. If the depth information za corresponding to the pixel point can be obtained while the image is obtained, the transformation operation can be performed by using the coordinates (xa, ya, za) of the pixel point, so as to obtain the corresponding position of the pixel point on the cube map.
In the process of the above conversion, the image I obtained in step S102tThe pixels in (a) may correspond to the same face (e.g., front) in the cube map, or may correspond to multiple faces (e.g., front and left) in the cube map. Further, the more the images corresponding to different device orientations obtained in step S102, the more abundant the pixels in the cube map are filled. The color of the unfilled pixels may also be represented by black, i.e., its RGB value is (0, 0, 0).
During the continuous use process of the user, a continuous image sequence and a corresponding view angle matrix can be acquired, and the observed scene content is maintained by combining the sequence information. For a sequence of consecutive images, there may be regions of repetition between adjacent image frames. Considering the accuracy and error of the sensor on the actual device, for the colors of the overlapping region pixels, in some embodiments, the historical data and the current frame data are calculated using a weighted average, namely:
Ct=(1-w)*Ct-1+w*Ccurrent
in the above formula, t represents the number of frames of an image, Ct-1For the color value of the pixel after the t-1 frame image merging calculation, CcurrentIs the color value, C, of the corresponding pixel in the current frame, i.e., the t-th frame imagetAnd w is a weight, and w is more than or equal to 0 and less than or equal to 1. The value of w may be a fixed value, for example, w may be 0.2 or 0.5. In other embodiments, w may have other values, and the present application does not limit this.
In some embodiments, steps S102 and S104 may be iterated to form a cube map with richer lighting information. After acquiring the image through S102, the electronic device may further enrich the illumination information based on the history cube map, and project the newly acquired image into the history cube map.
S106: device orientation matrix M based on current timeview,tCube map with observed scenetObtaining the scene environment map Envmap mapped by the mirror spheret。
In some embodiments, mirror-ball mapping (mirror-ball mapping) projection may be used to obtain the Envmap of the scene environment of the currently observed mirror-ball mappingt。
The scene environment map mapped by the mirror surface ball can be equivalent to that a mirror surface reflection ball is put in the scene, and all the environment contents except the part shielded by the ball in the scene can be observed on the surface of the scene. Such a sphere is used to collect the scene environment illumination distribution, which is represented using a corresponding specular sphere map.
This step generates a specular sphere mapped environment map that has been observed at the current perspective by rendering the cubic environment map obtained from step S104. The specific operation steps are as follows:
s202: rendering a mirror surface sphere in a cubic environment formed by a cubic environment map; the rendering of the sphere may be by any rendering means known in the art.
S204: and determining the environment direction reflected by each pixel on the spherical surface according to the current visual angle matrix.
S206: and acquiring a corresponding pixel and the color or brightness thereof on the cube map according to the environment direction.
Specifically, according to the environmental direction determined in step S204, the corresponding one or more pixels may be acquired on the cube map. If there is one acquired pixel, the color or brightness corresponding to the pixel may be acquired. If the obtained pixels are multiple, determining the average value of the colors or the brightness of the multiple pixels.
S208: the color or brightness determined at S206 is filled in the corresponding pixel.
And obtaining a scene environment map mapped by the mirror surface ball through the steps. The scene environment map may be a circular two-dimensional image. Wherein each pixel point in the image corresponds to a direction in space.
S108: using a deep convolutional neural network with It、EnvmaptGenerating as input an environment map L of a complete High-Dynamic Range (HDR) specular sphere map at a current perspectivetAs the scene illumination distribution.
In particular, for camera image ItWith observed environment map EnvmaptAnd respectively extracting the features, and fusing the extracted features to generate an HDR environment graph. In some embodiments, as shown in fig. 4, the deep convolutional neural network structure may be divided into three parts: the system comprises an image feature extraction network, an observed environment image feature extraction network and a feature fusion and illumination generation network. FIG. 5 illustrates an example of the deep convolution algorithm in the present application And working through a network.
The image feature extraction network is used for extracting feature vectors related to illumination from the camera image of the current view angle. For example, a network structure based on MobileNetV2 may be used. The network structure has higher computing performance on the current mobile equipment and can effectively extract required characteristics from the image. The input to the network may be an image with a resolution of 240 x 320 and the output may be a feature vector of length 512. In other embodiments, the format of the input image and the length of the output feature vector may be set according to specific requirements.
The observed environment map feature extraction network extracts features of the input environment map through a plurality of convolution layers, and meanwhile, a visual attention module is used for fusing mask information, so that feature extraction focuses more on the observed effective pixels. The mask information is used to mark observed pixels and unobserved pixels. Specifically, in the scene environment map obtained in S106, if a pixel does not obtain any color information or luminance information from the cube map in S104, the pixel is an unobserved pixel, and otherwise, the pixel is an observed pixel. Accordingly, in some embodiments, the mask information may be generated in S106. The visual attention module determines observed pixels from the scene environment map using the mask information to enable the feature extraction network to extract image features from the observed pixels. The network may receive an observed environment map of 4 channels (RGB + Mask) at a resolution of 256 × 256, and output a length-512 feature vector. In other embodiments, the format of the input image and the length of the output feature vector may be set according to specific requirements.
The feature fusion and illumination generation network firstly splices feature vectors output by the image feature extraction network and the observed environment image feature extraction network, and then generates the HDR environment image through a plurality of full connection layers, an upsampling layer and a convolution layer. In some embodiments, the resolution of the HDR environment map may be 64 × 64.
In some embodiments, the lengths of the feature vectors output by the image feature extraction network and the observed environment map feature extraction network may be the same or different.
In some embodiments, when the feature vector a output by the image feature extraction network and the feature vector B output by the observed environment map feature extraction network are spliced, the splicing may be performed in a predetermined order, for example, the feature vector B is filled behind the feature vector a, the feature vector a is filled behind the feature vector B, or the feature vector a and the feature vector B are cross-spliced according to a certain rule. This is not to be taken as limiting in any way.
In the process of generating the HDR environment map, through illumination estimation, unobserved pixel points in the scene environment map are filled.
In some embodiments, the neural network may be trained prior to use to improve the accuracy of the neural network in performing the illumination estimation operation. For example, an environmental image taken by a panoramic camera may be used as training data. During the training process, the L1 loss function is used for the HDR environment graph of the network output, and the countertraining loss is used for supervision.
S110: and rendering the virtual object according to the HDR environment image, and fusing the rendered virtual object into a real image. In the rendering process, the virtual object geometry and the surface material can be combined for rendering. Any rendering mode in the prior art can be used as the rendering mode, which is not limited in this application.
By using the illumination estimation method provided by the application, the reality of the visual effect is stronger when the virtual object is rendered. The present application uses 100 HDR environment maps to generate 400 sets of data, which respectively include a current view image, an observed scene environment map, and an HDR environment map, to evaluate the illumination estimation method provided in the present application, and compare with the illumination estimation method in the prior art. In the process of rendering, the Sphere (Sphere) and Venus image (Venus) are used in the method, and the six virtual object models are obtained by combining the Sphere (Sphere) and Venus image (Venus) with three surface materials, namely roughness, gloss and mirror surface. The method and the device use the estimated illumination and the reference illumination to render the six virtual object models and fuse the models into the image of the current view angle. Fig. 6 and fig. 7 show comparison results of the rendering result of the estimated illumination and the rendering result of the reference illumination in terms of Structural Similarity (SSIM) and Peak Signal-to-Noise Ratio (PSNR) using the illumination estimation method provided by the present application. As can be seen from the figure, the illumination estimation method provided by the present application is more consistent with the reference result in visual effect compared with the prior art.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), among others.
Those skilled in the art can understand that all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer readable storage medium and can include the processes of the method embodiments described above when executed. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.
Claims (12)
1. A method of illumination estimation, comprising the steps of:
acquiring a first image;
determining a device orientation matrix and a camera projection matrix corresponding to the first image;
projecting the first image onto a spatial cube map according to the equipment orientation matrix and the camera projection matrix;
according to the equipment orientation matrix and the space cube map, using a mirror surface ball to map and project to obtain a scene environment map;
and determining an environment map with a high dynamic range according to the first image and the scene environment map.
2. The method of claim 1, wherein the device orientation matrix is obtained by an inertial measurement unit.
3. The method of claim 1 or 2, wherein the determining a device orientation matrix corresponding to the first image further comprises:
Determining a first moment of acquiring a first image;
and taking the inertial measurement unit data which is closest to the first moment before the first moment as a device orientation matrix corresponding to the first image.
4. The method of any of claims 1-3, wherein the projecting the first image onto a spatial cube map according to the device orientation matrix and the camera projection matrix, further comprises:
transforming a first coordinate of a first pixel to a first direction vector by projecting an inverse matrix of a matrix through the camera; wherein the first pixel is a pixel in the first image;
transforming, by the device, the first direction vector into a second coordinate by an inverse of a direction matrix of the device.
5. The method of any one of claims 1-4, wherein the obtaining the scene environment map using a mirror ball mapping projection based on the device orientation matrix and the spatial cube map, further comprises:
rendering a first mirror sphere within a cubic environment formed by the cube map;
determining an ambient direction of reflection of a second pixel on a sphere of the first specular sphere from the device orientation matrix;
Acquiring the color or brightness of a corresponding pixel on the cube map according to the environment direction;
filling the color or brightness into the second pixel.
6. The method of any of claims 1-5, wherein determining a high dynamic range environment map from the first image and the scene environment map further comprises:
determining the high dynamic range environment map using a deep convolutional neural network.
7. The method of claim 6, wherein the deep convolutional neural network comprises an image feature extraction network, an observed environment map feature extraction network, a feature fusion, and a lighting generation network, wherein,
the image feature extraction network is used for extracting a first feature vector from the first image;
the observed environment map feature extraction network is used for extracting a second feature vector from the scene environment map;
the feature fusion and illumination generation network is used for generating the environment map with the high dynamic range according to the first feature vector and the second feature vector.
8. The method of claim 7, wherein said extracting a second feature vector from said scene environment map further comprises:
Marking observed pixel points and unobserved pixel points in the scene environment image;
and determining the second feature vector according to the observed pixel points.
9. The method of claim 7 or 8, wherein the generating the high dynamic range environment map from the first eigenvector and the second eigenvector, further comprises:
splicing the first feature vector and the second feature vector to generate a third feature vector;
and generating the environment map with the high dynamic range according to the third feature vector.
10. The method of any one of claims 1-9, further comprising,
rendering the virtual object according to the environment diagram with the high dynamic range;
fusing the rendered virtual object into the first image.
11. An electronic device includes a first electronic component having a first electronic component,
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors, the one or more programs including instructions for causing the electronic device to perform the method of any of claims 1-10.
12. A computer readable medium storing one or more programs, wherein the one or more programs are configured to be executed by the one or more processors, the one or more programs comprising instructions for causing an electronic device to perform the method of any of claims 1-10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110042495.4A CN114764848A (en) | 2021-01-13 | 2021-01-13 | Scene illumination distribution estimation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110042495.4A CN114764848A (en) | 2021-01-13 | 2021-01-13 | Scene illumination distribution estimation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114764848A true CN114764848A (en) | 2022-07-19 |
Family
ID=82363837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110042495.4A Pending CN114764848A (en) | 2021-01-13 | 2021-01-13 | Scene illumination distribution estimation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114764848A (en) |
-
2021
- 2021-01-13 CN CN202110042495.4A patent/CN114764848A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11756223B2 (en) | Depth-aware photo editing | |
CN109615703B (en) | Augmented reality image display method, device and equipment | |
US10621777B2 (en) | Synthesis of composite images having virtual backgrounds | |
US11044398B2 (en) | Panoramic light field capture, processing, and display | |
CN115690382B (en) | Training method of deep learning model, and method and device for generating panorama | |
CN109906600B (en) | Simulated depth of field | |
CN113220251B (en) | Object display method, device, electronic equipment and storage medium | |
CN114531553B (en) | Method, device, electronic equipment and storage medium for generating special effect video | |
CN113643414A (en) | Three-dimensional image generation method and device, electronic equipment and storage medium | |
CN112766215A (en) | Face fusion method and device, electronic equipment and storage medium | |
CN112995491B (en) | Video generation method and device, electronic equipment and computer storage medium | |
WO2020184174A1 (en) | Image processing device and image processing method | |
CN115984447A (en) | Image rendering method, device, equipment and medium | |
CN111402136A (en) | Panorama generation method and device, computer readable storage medium and electronic equipment | |
KR102558294B1 (en) | Device and method for capturing a dynamic image using technology for generating an image at an arbitray viewpoint | |
CN111833459B (en) | Image processing method and device, electronic equipment and storage medium | |
CN114764848A (en) | Scene illumination distribution estimation method | |
CN114241127A (en) | Panoramic image generation method and device, electronic equipment and medium | |
CN113382227A (en) | Naked eye 3D panoramic video rendering device and method based on smart phone | |
KR102146839B1 (en) | System and method for building real-time virtual reality | |
CN107087114B (en) | Shooting method and device | |
CN116091572B (en) | Method for acquiring image depth information, electronic equipment and storage medium | |
CN116681818B (en) | New view angle reconstruction method, training method and device of new view angle reconstruction network | |
US12051168B2 (en) | Avatar generation based on driving views | |
CN117934769A (en) | Image display method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |