CN114764848A

CN114764848A - Scene illumination distribution estimation method

Info

Publication number: CN114764848A
Application number: CN202110042495.4A
Authority: CN
Inventors: 石剑; 漆思远
Original assignee: Huawei Technologies Co Ltd; Institute of Automation of Chinese Academy of Science
Current assignee: Huawei Technologies Co Ltd; Institute of Automation of Chinese Academy of Science
Priority date: 2021-01-13
Filing date: 2021-01-13
Publication date: 2022-07-19

Abstract

The application discloses an illumination distribution estimation method which is mainly applied to the fields of virtual reality and augmented reality. The method comprises the steps of constructing a cubic mapping, projecting an image shot by a camera onto the cubic mapping based on an equipment orientation matrix and a camera projection matrix, then constructing a mirror sphere inside a space formed by the cubic mapping, using the mirror sphere to map and project to obtain a scene environment map, and then determining the environment map with a high dynamic range based on a neural network. The environment graph obtained by the method can obtain the illumination effect with high reality and improve the quality of illumination distribution estimation.

Description

A method of scene illumination distribution estimation

技术领域technical field

本申请涉及虚拟现实和增强现实技术领域，尤其涉及一种面向移动设备的场景光照分布估计方法。The present application relates to the technical field of virtual reality and augmented reality, and in particular, to a method for estimating scene illumination distribution for mobile devices.

背景技术Background technique

近年来，随着硬件设备和算法上的快速发展，虚拟现实(VR，Virtual Reality)、增强现实(AR，Augmented Reality)以及混合现实(MR，Mixed Reality)技术在教育培训、军事、医疗、娱乐和生产制造等行业中被广泛应用。其中，高度真实感的虚实融合是相关应用的核心需求。因此，将虚拟物体的光源和光照模型与真实场景保持一致，才能获得具有真实感的渲染效果。而传统的场景光照估计方法由于在不同的手机上的相机参数不一致，以及仅使用单张的图像作为输入，因此，在实际应用中，在不同的手机上渲染的效果不一致，并且对于连续的场景输入会产生抖动，渲染效果较差。In recent years, with the rapid development of hardware devices and algorithms, Virtual Reality (VR, Virtual Reality), Augmented Reality (AR, Augmented Reality) and Mixed Reality (MR, Mixed Reality) technologies are used in education and training, military, medical, entertainment It is widely used in manufacturing and other industries. Among them, the fusion of virtual and real with a high degree of realism is the core requirement of related applications. Therefore, only by keeping the light source and lighting model of the virtual object consistent with the real scene, a realistic rendering effect can be obtained. However, the traditional scene illumination estimation method has inconsistent camera parameters on different mobile phones and only uses a single image as input. Therefore, in practical applications, the rendering effect on different mobile phones is inconsistent, and for continuous scenes The input is jittery and renders poorly.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种场景光照分布的估计方法，能够提升场景光照估计的准确度，使得在真实场景中叠加虚拟物体时，虚拟物体能够具有与环境更加一致的光照信息，提升虚拟物体的显示效果，提高用户体验。The purpose of the present invention is to provide a scene illumination distribution estimation method, which can improve the accuracy of scene illumination estimation, so that when virtual objects are superimposed in a real scene, the virtual objects can have illumination information more consistent with the environment, and improve the accuracy of virtual objects. Display effect, improve user experience.

上述目标和其他目标将通过独立权利要求中的特征来达成。进一步的实现方式在从属权利要求、说明书和附图中体现。The above-mentioned objects and other objects are achieved by the features of the independent claims. Further implementations are embodied in the dependent claims, the description and the drawings.

第一方面，提供了一种光照估计方法，包括以下步骤：获取第一图像；确定与所述第一图像对应的设备朝向矩阵和摄像头投影矩阵；根据所述设备朝向矩阵和所述摄像头投影矩阵，将所述第一图像投影到空间立方体贴图上；根据所述设备朝向矩阵和所述空间立方体贴图，使用镜面球映射投影，得到场景环境图；根据所述第一图像和所述场景环境图，确定高动态范围的环境图。A first aspect provides an illumination estimation method, comprising the following steps: acquiring a first image; determining a device orientation matrix and a camera projection matrix corresponding to the first image; according to the device orientation matrix and the camera projection matrix , project the first image onto the space cube map; according to the device orientation matrix and the space cube map, use the specular ball mapping projection to obtain a scene environment map; according to the first image and the scene environment map , to determine the high dynamic range environment map.

第一方面所描述的技术方案，能够结合立方体贴图和镜面球映射的方式，获得真实场景中的光照信息；在获得的高动态范围的环境图中，通过光照估计，补充了未观察到的像素点的光照信息，提高了光照估计的效果。The technical solution described in the first aspect can combine cube maps and specular sphere mapping to obtain illumination information in real scenes; in the obtained high dynamic range environment map, through illumination estimation, unobserved pixels are supplemented The lighting information of the point improves the effect of lighting estimation.

根据第一方面，在一种可能的实现方式中，通过惯性测量单元获得设备朝向矩阵。According to the first aspect, in a possible implementation manner, the device orientation matrix is obtained through an inertial measurement unit.

根据第一方面，在一种可能的实现方式中，所述确定与所述第一图像对应的设备朝向矩阵，进一步包括：确定获取第一图像的第一时刻；将在所述第一时刻之前，距离所述第一时刻最近的一次惯性测量单元数据作为所述第一图像对应的设备朝向矩阵。According to the first aspect, in a possible implementation manner, the determining a device orientation matrix corresponding to the first image further includes: determining a first moment at which the first image is acquired; , and the closest IMU data to the first moment is used as the device orientation matrix corresponding to the first image.

根据第一方面，在一种可能的实现方式中，所述根据所述设备朝向矩阵和所述摄像头投影矩阵将所述第一图像投影到空间立方体贴图上，进一步包括：通过所述摄像头投影矩阵的逆矩阵，将第一像素的第一坐标变换至第一方向向量；其中，所述第一像素为所述第一图像中的像素；通过所述设备朝向矩阵的逆矩阵，将所述第一方向向量变换成第二坐标。According to the first aspect, in a possible implementation, the projecting the first image onto a spatial cubemap according to the device orientation matrix and the camera projection matrix, further comprising: projecting the first image through the camera projection matrix , transform the first coordinate of the first pixel into a first direction vector; wherein, the first pixel is a pixel in the first image; through the inverse matrix of the device orientation matrix, convert the first pixel A direction vector is transformed into a second coordinate.

根据第一方面，在一种可能的实现方式中，所述根据所述设备朝向矩阵和所述空间立方体贴图，使用镜面球映射投影，得到场景环境图，进一步包括：在所述立方体贴图构成的立方体环境内渲染第一镜面球体；根据所述设备朝向矩阵确定所述第一镜面球体的球面上的第二像素反射的环境方向；根据所述环境方向，在所述立方体贴图上获取对应的像素的颜色或亮度；将所述颜色或亮度填充到所述第二像素中。According to the first aspect, in a possible implementation manner, according to the device orientation matrix and the space cube map, using a specular sphere mapping projection to obtain a scene environment map, further comprising: in the cube map composed of Rendering a first specular sphere in a cube environment; determining an environment direction reflected by a second pixel on the spherical surface of the first specular sphere according to the device orientation matrix; obtaining corresponding pixels on the cube map according to the environment direction color or brightness; fill the color or brightness into the second pixel.

根据第一方面，在一种可能的实现方式中，所述根据所述第一图像和所述场景环境图，确定高动态范围的环境图，进一步包括：使用深度卷积神经网络，确定所述高动态范围的环境图。According to the first aspect, in a possible implementation manner, the determining a high dynamic range environment map according to the first image and the scene environment map, further includes: using a deep convolutional neural network to determine the High dynamic range environment map.

根据第一方面，在一种可能的实现方式中，所述深度卷积神经网络包括图像特征提取网络、已观察到的环境图特征提取网络、特征融合及光照生成网络，其中，所述图像特征提取网络用于从所述第一图像中提取第一特征向量；所述已观察到的环境图特征提取网络用于从所述场景环境图中提取第二特征向量；所述特征融合及光照生成网络用于根据所述第一特征向量和所述第二特征向量，生成所述高动态范围的环境图。According to the first aspect, in a possible implementation manner, the deep convolutional neural network includes an image feature extraction network, an observed environment map feature extraction network, a feature fusion and a lighting generation network, wherein the image features The extraction network is used to extract the first feature vector from the first image; the observed environment map feature extraction network is used to extract the second feature vector from the scene environment map; the feature fusion and illumination generation The network is configured to generate the high dynamic range environment map according to the first feature vector and the second feature vector.

根据第一方面，在一种可能的实现方式中，所述从所述场景环境图中提取第二特征向量，进一步包括：标记所述场景环境图中观察到的像素点和未观察到的像素点；根据所述观察到的像素点，确定所述第二特征向量。According to the first aspect, in a possible implementation manner, the extracting the second feature vector from the scene environment map further includes: marking observed pixels and unobserved pixels in the scene environment map point; according to the observed pixel point, determine the second feature vector.

根据第一方面，在一种可能的实现方式中，所述根据所述第一特征向量和所述第二特征向量，生成所述高动态范围的环境图，进一步包括：将所述第一特征向量和所述第二特征向量进行拼接，生成第三特征向量；根据所述第三特征向量，生成所述高动态范围的环境图。According to the first aspect, in a possible implementation manner, the generating the high dynamic range environment map according to the first feature vector and the second feature vector further includes: converting the first feature The vector and the second feature vector are spliced to generate a third feature vector; and the high dynamic range environment map is generated according to the third feature vector.

根据第一方面，在一种可能的实现方式中，在上述根据所述第一图像和所述场景环境图，确定高动态范围的环境图之后，进一步包括：根据所述高动态范围的环境图，对虚拟物体进行渲染；将渲染后的所述虚拟物体融合到所述第一图像中。According to the first aspect, in a possible implementation manner, after determining the high dynamic range environment map according to the first image and the scene environment map, the method further includes: according to the high dynamic range environment map , and render the virtual object; and fuse the rendered virtual object into the first image.

第一方面提供的光照估计方法，利用了电子设备中的传感器获得设备朝向信息，并结合摄像头对用户使用电子设备过程中所观察到的场景内容进行记录，从而提供更多的有效输入信息来提升场景光照分布估计的质量。此外，用户在使用相关应用的过程中，通常会不断变换观察角度，因此，上述方法能够随着应用会话的进行不断的获取更多的场景信息，进一步提高光照分布的准确度。由于本方法维护了一个全局的场景信息，相比于目前仅使用单张图像所谓输入的光照估计方法，本方法能够提供更好的连续性约束，减少闪烁。The illumination estimation method provided in the first aspect utilizes the sensor in the electronic device to obtain the device orientation information, and combines with the camera to record the scene content observed by the user during the use of the electronic device, so as to provide more effective input information to improve. The quality of the scene lighting distribution estimate. In addition, in the process of using the related application, the user usually changes the observation angle constantly. Therefore, the above method can continuously acquire more scene information as the application session progresses, and further improve the accuracy of the illumination distribution. Since this method maintains a global scene information, compared with the current illumination estimation method that only uses the so-called input of a single image, this method can provide better continuity constraints and reduce flicker.

第二方面，提供了一种电子设备，其包括一个或多个处理器；存储器；以及一个或多个程序，其中，所述一个或多个程序被存储在所述存储器中并被配置为所述一个或多个处理器执行，所述一个或多个程序包括指令，所述指令使得所述电子设备执行如第一方面的各实现方法。In a second aspect, there is provided an electronic device comprising one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to The one or more programs are executed by the one or more processors, and the one or more programs include instructions, and the instructions cause the electronic device to perform each implementation method according to the first aspect.

第三方面，提供了一种计算机可读介质，用于存储一个或多个程序，其中所述一个或多个程序被配置为被所述一个或多个处理器执行，所述一个或多个程序包括指令，所述指令使得电子设备执行如第一方面的各实现方法。In a third aspect, a computer-readable medium is provided for storing one or more programs, wherein the one or more programs are configured to be executed by the one or more processors, the one or more programs The program includes instructions that cause the electronic device to perform each implementation method of the first aspect.

应当理解的是，说明书中对技术特征、技术方案、优点或类似语言的描述并不是暗示在任意的单个实施例中可以实现所有的特点和优点。相反，可以理解的是对于特征或优点的描述意味着在至少一个实施例中包括特定的技术特征、技术方案或优点。因此，本说明书中对于技术特征、技术方案或优点的描述并不一定是指相同的实施例。进而，还可以任何适当的方式组合以下各个实施例中所描述的技术特征、技术方案和优点。本领域技术人员将会理解，无需特定实施例的一个或多个特定的技术特征、技术方案或优点即可实现实施例。在其他实施例中，还可在没有体现所有实施例的特定实施例中识别出额外的技术特征和优点。It should be understood that the description of technical features, technical solutions, advantages or similar language in the specification does not imply that all features and advantages can be realized in any single embodiment. Rather, it is to be understood that a description of a feature or advantage is meant to include a particular technical feature, technical solution or advantage in at least one embodiment. Therefore, descriptions of technical features, technical solutions or advantages in this specification do not necessarily refer to the same embodiments. Furthermore, the technical features, technical solutions and advantages described in each of the following embodiments can also be combined in any appropriate manner. Those skilled in the art will appreciate that an embodiment may be implemented without one or more of the specific technical features, technical solutions or advantages of a specific embodiment. In other embodiments, additional technical features and advantages may also be identified in specific embodiments that do not embody all embodiments.

附图说明Description of drawings

图1是本申请提供的光照估计方法流程图。FIG. 1 is a flowchart of the illumination estimation method provided by the present application.

图2是camera和IMU上报数据的周期示意图。Figure 2 is a schematic diagram of the cycle of data reported by the camera and the IMU.

图3是本申请实施例中提供的立方体贴图示意图。FIG. 3 is a schematic diagram of a cubemap provided in an embodiment of the present application.

图4是本申请实施例中提供的深度卷积神经网络示意图。FIG. 4 is a schematic diagram of a deep convolutional neural network provided in an embodiment of the present application.

图5是本申请实施例中提供的深度卷积神经网络进行光照估计的流程图。FIG. 5 is a flowchart of illumination estimation performed by the deep convolutional neural network provided in the embodiment of the present application.

图6是本申请实施例中提供的本发明与现有技术光照估计方法的SSIM对比图。FIG. 6 is a SSIM comparison diagram of the present invention and the prior art illumination estimation method provided in the embodiment of the present application.

图7是本申请实施例中提供的本发明与现有技术光照估计方法的PSNR对比图。FIG. 7 is a comparison diagram of PSNR between the illumination estimation method of the present invention and the prior art illumination estimation method provided in the embodiment of the present application.

具体实施方式Detailed ways

本申请以下实施例中所使用的术语只是为了描述特定实施例的目的，而并非旨在作为对本申请的限制。如在本申请的说明书和所附权利要求书中所使用的那样，单数表达形式“一个”、“一种”、“所述”、“上述”、“该”和“这一”旨在也包括复数表达形式，除非其上下文中明确地有相反指示。还应当理解，本申请中使用的术语“和/或”是指并包含一个或多个所列出项目的任何或所有可能组合。The terms used in the following embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to be used as limitations of the present application. As used in the specification of this application and the appended claims, the singular expressions "a," "an," "the," "above," "the," and "the" are intended to also Plural expressions are included unless the context clearly dictates otherwise. It will also be understood that, as used in this application, the term "and/or" refers to and includes any and all possible combinations of one or more of the listed items.

以下介绍了电子设备、用于这样的电子设备的图形用户界面、和用于使用这样的电子设备的实施例。在一些实施例中，电子设备可以是还包含其它功能诸如个人数字助理和/或音乐播放器功能的便携式电子设备，诸如手机、平板电脑、可穿戴电子设备(如虚拟现实眼镜、增强现实眼镜等)等。便携式电子设备的示例性实施例包括但不限于搭载

或者其它操作系统的便携式电子设备。上述便携式电子设备也可以是其它便携式电子设备，诸如具有触敏表面或触控面板的膝上型计算机(Laptop)等。还应当理解的是，在其他一些实施例中，上述电子设备也可以不是便携式电子设备，而是具有触敏表面或触控面板的台式计算机。Electronic devices, graphical user interfaces for such electronic devices, and embodiments for using such electronic devices are described below. In some embodiments, the electronic device may be a portable electronic device such as a cell phone, tablet, wearable electronic device (eg, virtual reality glasses, augmented reality glasses, etc.) )Wait. Exemplary embodiments of portable electronic devices include, but are not limited to, carry-on

Or portable electronic devices with other operating systems. The portable electronic device described above may also be other portable electronic devices, such as a laptop computer (Laptop) with a touch-sensitive surface or a touch panel, or the like. It should also be understood that, in some other embodiments, the above-mentioned electronic device may not be a portable electronic device, but a desktop computer having a touch-sensitive surface or a touch panel.

本申请以下实施例提供了一种场景光照估计方法，该方法能够结合设备的朝向信息，并结合摄像头获取的数据，提升场景光照分布估计的质量。The following embodiments of the present application provide a scene illumination estimation method, which can improve the quality of scene illumination distribution estimation by combining the orientation information of the device and the data obtained by the camera.

以下将结合具体的实施例，对本申请的场景光照估计方法进行详细说明。图1显示了本申请的场景光照估计方法。The scene illumination estimation method of the present application will be described in detail below with reference to specific embodiments. Figure 1 shows the scene illumination estimation method of the present application.

S102：电子设备在时间t获得图像I_t以及设备朝向矩阵M_view,t。S102: The electronic device obtains the image It and the device orientation matrix M _{view,t at time t} _.

具体来说，电子设备可以响应用户的操作，通过摄像头获得图像It。其中，该摄像头可以为安装在电子设备上的前置摄像头或后置摄像头。因此，上述获取的图像为真实世界的图像。此外，电子设备可以通过惯性测量单元(Inertial Measurement Unit,IMU)获得上述设备朝向矩阵M_view,t，该矩阵又可称为视角矩阵，表示电子设备在时刻t相对于参考坐标系的姿态信息。上述参考坐标系可以为地球坐标系。设备朝向矩阵M_view,t可以为3*3或4*4的矩阵，具体来说，M_view,t可以采用以下两种方式来表示：Specifically, the electronic device can obtain the image It through the camera in response to the user's operation. Wherein, the camera may be a front camera or a rear camera installed on the electronic device. Therefore, the above acquired images are real-world images. In addition, the electronic device can obtain the above-mentioned device orientation matrix M _view,t through an inertial measurement unit (IMU), which is also called a viewing angle matrix, and represents the attitude information of the electronic device relative to the reference coordinate system at time t. The above-mentioned reference coordinate system may be an earth coordinate system. The device orientation matrix M _view,t can be a 3*3 or 4*4 matrix. Specifically, M _view,t can be expressed in the following two ways:

设备朝向矩阵M_view,t表示了摄像头在拍摄图像I_t时的姿态。在一些实施例中，由于在例如手机等电子设备中，摄像头和手机屏幕的方位一般来说是一致的，因此，通过拍摄图像I_t时手机的姿态信息能够得到摄像头视角相对于地球坐标系的设备朝向矩阵M_view,t。电子设备可以通过系统相关的API获得上述姿态信息。以Android为例，通过旋转矢量传感器(RV-sensor)可以监测手机屏幕相对于地球坐标系的姿态信息，如使用roll、yaw、pitch三个变量表示的欧拉角。在其他一些实施例中，也可以使用roll、pitch和azimuth三个变量来表示电子设备的姿态。在其他一些实施例中，若摄像头与手机屏幕具有一定的位置关系，也可以通过相对位置关系的转换来获得设备朝向矩阵M_view,t。The device orientation matrix M _{view, t} _represents the pose of the camera when taking the image It. In some embodiments, in an electronic device such as a mobile phone, the orientation of the camera and the screen of the mobile phone are generally the same. Therefore, the attitude information of the mobile _phone when the image It is captured can obtain the angle of view of the camera relative to the earth coordinate system. The device is oriented towards the matrix M _view,t . The electronic device can obtain the above posture information through system-related APIs. Taking Android as an example, the attitude information of the mobile phone screen relative to the earth coordinate system can be monitored through the rotation vector sensor (RV-sensor), such as the Euler angle represented by the three variables of roll, yaw and pitch. In some other embodiments, three variables of roll, pitch and azimuth can also be used to represent the posture of the electronic device. In some other embodiments, if the camera and the screen of the mobile phone have a certain positional relationship, the device orientation matrix M _view,t can also be obtained by converting the relative positional relationship.

在使用摄像头获取图像时，可以通过用户的操作触发摄像头拍摄一张图像，例如，用户通过轻触手机在图形用户界面中显示的拍摄按键或按压实体按键的方式拍摄一张图像。在其他一些实施例中，可以设置摄像头自动获取多张图像，例如，可以设置摄像头按照一定的时间周期拍摄图像，也可以在图形用户界面上提示用户移动手机，当触发图像拍摄时，在用户移动手机的过程中，手机可以自动获取多张图像。其中，可以设置摄像头获取一张图像，也可以设置摄像头获取包括多张图像的图像序列。在获取图像的过程中，可以维持电子设备朝向一个方向进行拍摄，也可以使电子设备朝向不同的方向进行拍摄。当电子设备朝向一个方向拍摄时，设备朝向矩阵可以保持不变，当电子设备朝向不同的方向进行拍摄时，设备朝向矩阵会随着电子设备朝向的变化而变化。When using the camera to acquire an image, the camera can be triggered to capture an image by a user's operation. For example, the user can capture an image by touching a capture button displayed in a graphical user interface of a mobile phone or pressing a physical button. In some other embodiments, the camera can be set to automatically acquire multiple images, for example, the camera can be set to shoot images according to a certain period of time, or the user can be prompted to move the mobile phone on the graphical user interface. During the process of the mobile phone, the mobile phone can automatically acquire multiple images. The camera may be set to acquire one image, or the camera may be set to acquire an image sequence including multiple images. During the process of acquiring an image, the electronic device may be kept facing one direction to shoot, or the electronic device may be directed to shoot in different directions. When the electronic device is photographed in one direction, the device orientation matrix can remain unchanged, and when the electronic device is photographed in different directions, the device orientation matrix will change with the change in the orientation of the electronic device.

当设置摄像头自动获取图像时，摄像头和惯性测量单元将分别会周期性的提供图像数据和设备姿态数据。由于摄像头和惯性测量单元是相互独立的硬件设备，因此，当电子设备采集图像和设备朝向矩阵时需要对上述两个数据进行同步。通常来说，惯性测量单元的敏感程度较高，采集数据的延迟较低，而摄像头由于提供的数据量较大且需要一定的处理，因此，采集数据的延迟较高。在一些实施例中，如图2所示，摄像头采集图像数据的周期大于惯性测量单元上报数据的周期。因此，电子设备在采集摄像头图像时，使用系统时间上与采集摄像头图像的时刻最近的前一次IMU数据作为采集的图像对应的设备朝向矩阵。具体来说，可以通过系统API中的传感器管理组件SensorManager中的旋转向量Rotation_Vector()函数采集惯性测量单元的数据。When the camera is set to automatically acquire images, the camera and the inertial measurement unit will periodically provide image data and device attitude data respectively. Since the camera and the inertial measurement unit are independent hardware devices, the above two data need to be synchronized when the electronic device collects the image and the device orientation matrix. Generally speaking, the inertial measurement unit is highly sensitive and has a low delay in collecting data, while the camera provides a large amount of data and requires a certain amount of processing, so the delay in collecting data is high. In some embodiments, as shown in FIG. 2 , the period in which the camera collects image data is greater than the period in which the inertial measurement unit reports data. Therefore, when the electronic device collects the camera image, the previous IMU data that is closest to the time when the camera image is collected in the system time is used as the device orientation matrix corresponding to the collected image. Specifically, the data of the inertial measurement unit can be collected through the rotation vector Rotation_Vector() function in the sensor management component SensorManager in the system API.

S104：电子设备获得摄像头投影矩阵M_proj，结合设备朝向矩阵M_view,t将图像I_t投影到空间立方体贴图(Cubemap)上。S104: The electronic device obtains the camera projection matrix M _proj , and projects the image It onto a spatial cubemap (Cubemap) in combination with the device orientation matrix M _view,t _.

摄像头投影矩阵用于表示摄像头的硬件性能，包括但不限于视场角(Field ofView,FOV)、画面长宽比等参数。通常来说，在不使用镜头进行光学变焦的情况下，摄像头的投影矩阵M_proj通常是固定值，可以通过设备参数直接或间接得到。例如，电子设备可以通过调用第三方API，如华为AR Engine等，获取上述摄像头投影矩阵所需要的信息，生成摄像头投影矩阵M_proj。也可以通过系统API中的cameraCharacteristics函数获取其中的SENSOR_INFO_PHYSICAL_SIZE参数获取相机传感器的物理大小，从而获得摄像头投影矩阵。而在电子设备的摄像头能够进行光学变焦时，摄像头的投影矩阵M_proj会随着用户的调节而发生变化。The camera projection matrix is used to represent the hardware performance of the camera, including but not limited to parameters such as a field of view (Field of View, FOV) and an aspect ratio of the screen. Generally speaking, in the case of optical zooming without using a lens, the projection matrix M _proj of the camera is usually a fixed value, which can be obtained directly or indirectly through device parameters. For example, the electronic device can obtain the information required by the above-mentioned camera projection matrix by calling a third-party API, such as Huawei AR Engine, to generate the camera projection matrix M _proj . You can also obtain the physical size of the camera sensor through the cameraCharacteristics function in the system API to obtain the SENSOR_INFO_PHYSICAL_SIZE parameter to obtain the camera projection matrix. However, when the camera of the electronic device can perform optical zooming, the projection matrix M _proj of the camera will change as the user adjusts.

在一些实施例中，电子设备可以在步骤S102之前，或者步骤S102被执行时获取摄像头投影矩阵所需要的信息。In some embodiments, the electronic device may acquire the information required by the camera projection matrix before step S102 or when step S102 is performed.

图3示例性示出了本申请中提及的立方体贴图。立方体贴图(Cubemap)包含六张二维纹理贴图，分别对应表示一个正六面体的每一个面。在渲染时，立方体贴图常用于表示场景环境各个方向(例如，前，后，左，右，上，下六个方向)上的内容。在已知当前视角图像以及设备朝向矩阵和摄像头投影矩阵的情况下，可以得到二维图像中每个像素对应的空间方向，从而将二维图像投影到立方体贴图上。FIG. 3 exemplarily shows the cubemap mentioned in this application. Cubemap contains six two-dimensional texture maps, which correspond to each face of a regular hexahedron. When rendering, cubemaps are often used to represent content in all directions of the scene environment (for example, front, back, left, right, up, and down). When the current viewing angle image, the device orientation matrix and the camera projection matrix are known, the spatial direction corresponding to each pixel in the two-dimensional image can be obtained, thereby projecting the two-dimensional image onto the cubemap.

具体来说，可以通过以下步骤进行投影：Specifically, projection can be done by the following steps:

对于图像坐标系中的像素点(xa，ya)，当其没有深度信息时，可以将该像素点的坐标信息中增加深度信息，例如将其变换为(xa，ya，1)，然后，通过投影矩阵M_proj的逆矩阵

将其变换至相机坐标系中得到相对应的方向向量(xb，yb，zb)，再通过视角矩阵M_view的逆变换

将其变换到世界坐标系中的方向向量(xc，yc，zc)，进而得到该像素点在立方体贴图上对应的位置。其中，视角矩阵为在步骤S102中获得的，投影矩阵是在步骤S104中获得的。若在获取图像的同时能够获得该像素点对应的深度信息za，则可以使用像素点的坐标(xa，ya，za)进行上述变换操作，从而得到该像素点在立方体贴图上对应的位置。For the pixel point (xa, ya) in the image coordinate system, when it has no depth information, the depth information can be added to the coordinate information of the pixel point, for example, it is transformed into (xa, ya, 1), and then, by The inverse of the projection matrix M _proj

Transform it into the camera coordinate system to get the corresponding direction vector (xb, yb, zb), and then through the inverse transformation of the viewing angle matrix M _view

Transform it to the direction vector (xc, yc, zc) in the world coordinate system, and then get the corresponding position of the pixel on the cubemap. The viewing angle matrix is obtained in step S102, and the projection matrix is obtained in step S104. If the depth information za corresponding to the pixel can be obtained while acquiring the image, the above transformation operation can be performed by using the coordinates (xa, ya, za) of the pixel to obtain the corresponding position of the pixel on the cube map.

在上述转换的过程中，步骤S102中获得的图像I_t中的像素点可以对应到立方体贴图中同一个面(例如，前面)，也可能对应到立方体贴图中的多个面(例如，前面和左面)。此外，在步骤S102中获得的对应不同设备朝向的图像约多，立方体贴图中的像素被填充的就越丰富。未被填充的像素的颜色可以用黑色还表示，也即，其RGB值为(0，0，0)。In the above conversion process, the pixels in the image It obtained in step S102 may correspond to the same face (for example, the front) in the _cubemap , or may correspond to multiple faces (for example, the front and the front) in the cubemap. left). In addition, the more images corresponding to different device orientations are obtained in step S102, the richer the pixels in the cubemap are filled. The color of an unfilled pixel can also be represented by black, ie, its RGB value is (0, 0, 0).

在用户持续使用过程中，能够获取连续的图像序列与对应的视角矩阵，通过对这一序列信息进行合并来维护已观察到的场景内容。对于连续的图像序列来说，相邻的图像帧之间可能存在重复的区域。考虑到实际设备上传感器的精度和误差，对于重叠区域像素的颜色，在一些实施例中，使用加权平均来对历史数据及当前帧数据进行计算，即：During the continuous use of the user, the continuous image sequence and the corresponding viewing angle matrix can be obtained, and the observed scene content can be maintained by merging the sequence information. For consecutive image sequences, there may be overlapping regions between adjacent image frames. Taking into account the accuracy and error of the sensor on the actual device, for the color of the pixels in the overlapping area, in some embodiments, a weighted average is used to calculate the historical data and the current frame data, namely:

C_t＝(1-w)*C_t-1+w*C_current C _t =(1-w)*C _t-1 +w*C _current

在上述公式中，t表示图像的帧数，C_t-1为该像素在经过t-1帧图像合并计算之后的颜色数值，C_current为当前帧，也就是第t帧图像中对应的该像素的颜色数值，C_t为该像素在经过t帧图像合并计算之后的颜色数值，w为权重，0≤w≤1。w的取值可以为固定值，例如，w可以取值为0.2或0.5。在其他一些实施例中，w也可以是其他值，本申请对此不做任何限制。In the above formula, t represents the number of frames of the image, C _t-1 is the color value of the pixel after the t-1 frame image is merged and calculated, and C _current is the current frame, that is, the corresponding pixel in the t-th frame image The color value of , C _t is the color value of the pixel after t frames of images are merged and calculated, w is the weight, 0≤w≤1. The value of w may be a fixed value, for example, w may be 0.2 or 0.5. In some other embodiments, w may also be other values, which is not limited in this application.

在一些实施例中，步骤S102和S104可以迭代进行，以形成具有更加丰富的光照信息的立方体贴图。电子设备在经过S102获取图像之后，可以在历史立方体贴图的基础上进一步的丰富光照信息，将新获取的图像投影到历史立方体贴图中。In some embodiments, steps S102 and S104 may be performed iteratively to form a cubemap with richer lighting information. After acquiring the image through S102, the electronic device can further enrich the lighting information on the basis of the historical cubemap, and project the newly acquired image into the historical cubemap.

S106：基于当前时刻设备朝向矩阵M_view，t与已观察到的场景的立方体贴图Cubemap_t，得到镜面球映射的场景环境图Envmap_t。S106: Based on the device orientation matrix M _{view, t} at the current moment and the observed cubemap Cubemap _{t of the scene, obtain the scene environment map Envmap t} _mapped by the specular sphere.

在一些实施例中，可以使用镜面球映射(mirror-ball mapping)投影得到当前已观察到的镜面球映射的场景环境图Envmap_t。In some embodiments, a mirror-ball mapping projection may be used to obtain the currently observed mirror-ball mapped scene environment map Envmap _t .

镜面球映射的场景环境图可以等效为在场景中放入一个镜面反射的球体，在其表面上能观察的场景中除被这个球遮挡的部分外所有的环境内容。这种球体用于采集场景环境光照分布，使用对应的镜面球映射来表示场景环境光照分布。The scene environment map of specular sphere mapping can be equivalent to placing a specular sphere in the scene, and all the environmental content in the scene that can be observed on its surface except the part occluded by the sphere. This sphere is used to collect the lighting distribution of the scene environment, and the corresponding specular sphere map is used to represent the lighting distribution of the scene environment.

该步骤通过渲染从步骤S104得到的立方体环境贴图来生成当前视角下已观察到的镜面球映射环境图。具体操作步骤如下：In this step, the observed specular sphere mapping environment map under the current viewing angle is generated by rendering the cube environment map obtained from step S104. The specific operation steps are as follows:

S202：在立方体环境贴图构成的立方体环境内渲染一个镜面球体；渲染球体可以通过现有技术中任意的渲染方式。S202: Render a specular sphere in the cube environment formed by the cube environment map; the rendering of the sphere may be performed by any rendering method in the prior art.

S204：根据当前的视角矩阵确定球面上的每个像素反射的环境方向。S204: Determine the environmental direction reflected by each pixel on the spherical surface according to the current viewing angle matrix.

S206：根据该环境方向，在立方体贴图上获取对应的像素及其颜色或亮度。S206: Acquire corresponding pixels and their colors or brightnesses on the cube map according to the environment direction.

具体来说，根据S204步骤中确定的环境方向，在立方体贴图上可能获取对应的一个或多个像素。若获取的像素为一个，则可以获取该像素对应的颜色或亮度。若获取的像素为多个，则确定该多个像素颜色或亮度的平均值。Specifically, according to the environment direction determined in step S204, one or more corresponding pixels may be acquired on the cube map. If the acquired pixel is one, the color or brightness corresponding to the pixel can be acquired. If there are multiple acquired pixels, the average value of the colors or brightness of the multiple pixels is determined.

S208：将S206确定的颜色或亮度填充到对应的像素中。S208: Fill the corresponding pixels with the color or brightness determined in S206.

经过上述步骤，获得镜面球映射的场景环境图。该场景环境图可以是圆形的二维图像。其中，图像中的每个像素点对应了空间中的方向。After the above steps, the scene environment map mapped by the specular sphere is obtained. The scene environment map may be a circular two-dimensional image. Among them, each pixel in the image corresponds to the direction in space.

S108：使用深度卷积神经网络，以I_t、Envmap_t作为输入生成当前视角下完整的高动态范围(High-Dynamic Range,HDR)的镜面球映射的环境图L_t，作为场景光照分布。S108: Using a deep convolutional neural network, using It and Envmap _t as inputs to generate a complete high-dynamic range (High-Dynamic Range, HDR) specular sphere-mapped environment map L _t under the current viewing angle _, as the scene illumination distribution.

具体来说，对相机图像I_t与已观察到的环境图Envmap_t分别进行特征提取，再对提取的特征进行融合后生成HDR环境图。在一些实施例中，如图4所示，深度卷积神经网络结构可以分为三个部分：图像特征提取网络、已观察到的环境图特征提取网络和特征融合及光照生成网络。图5示例性示出了本申请中的深度卷积神经网络的工作步骤。Specifically, feature extraction _{is performed on the camera image It and the observed environment map Envmap t} _respectively , and the extracted features are fused to generate the HDR environment map. In some embodiments, as shown in Figure 4, the deep convolutional neural network structure can be divided into three parts: an image feature extraction network, an observed environment map feature extraction network, and a feature fusion and illumination generation network. FIG. 5 exemplarily shows the working steps of the deep convolutional neural network in this application.

其中，图像特征提取网络用于从当前视角的相机图像中提取出与光照相关的特征向量。例如，可以使用基于MobileNetV2网络结构。该网络结构在当前移动设备上具有较高的计算性能，并能够有效的从图像中提取所需的特征。该网络的输入可以为分辨率为240*320的图像，输出可以是长度为512的特征向量。在其他一些实施例中，输入图像的格式和输出的特征向量的长度可以依据具体的需求进行设置。Among them, the image feature extraction network is used to extract feature vectors related to illumination from the camera image of the current perspective. For example, a network structure based on MobileNetV2 can be used. The network structure has high computational performance on current mobile devices and can effectively extract the required features from images. The input of this network can be an image with a resolution of 240*320, and the output can be a feature vector of length 512. In some other embodiments, the format of the input image and the length of the output feature vector can be set according to specific requirements.

已观察到的环境图特征提取网络通过若干卷积层对输入的环境图提取特征，同时使用视觉注意力模块来融合掩膜信息，使特征提取更加关注已观察到的有效像素。其中，掩膜信息是用来标记观察到的像素点和未观察到的像素点。具体来说，在S106中得到的场景环境图中，若一像素点没有从S104中的立方体贴图中获得任何颜色信息或亮度信息，该像素点为未观察到的像素点，反之，该像素点为观察到的像素点。因此，在一些实施例中，上述掩膜信息可以在S106中生成。视觉注意力模块使用掩膜信息，从场景环境图中确定已观察到的像素，以使得特征提取网络能够从上述已观察到的像素中提取图像特征。该网络可以接收分辨率为256*256的4通道(RGB+Mask)已观察到的环境图，输出长度为512的特征向量。在其他一些实施例中，输入图像的格式和输出的特征向量的长度可以依据具体的需求进行设置。The observed environment map feature extraction network extracts features from the input environment map through several convolutional layers, and at the same time uses a visual attention module to fuse the mask information, so that the feature extraction pays more attention to the observed effective pixels. Among them, the mask information is used to mark the observed and unobserved pixels. Specifically, in the scene environment map obtained in S106, if a pixel does not obtain any color information or brightness information from the cubemap in S104, the pixel is an unobserved pixel, otherwise, the pixel is is the observed pixel. Therefore, in some embodiments, the above-mentioned mask information may be generated in S106. The visual attention module uses the mask information to determine the observed pixels from the scene environment map to enable the feature extraction network to extract image features from the above observed pixels. The network can receive a 4-channel (RGB+Mask) observed environment map with a resolution of 256*256 and output a feature vector of length 512. In some other embodiments, the format of the input image and the length of the output feature vector can be set according to specific requirements.

特征融合及光照生成网络首先对图像特征提取网络和已观察到的环境图特征提取网络输出的特征向量进行拼接，然后通过若干全连接层、上采样和卷积层来生成HDR环境图。在一些实施例中，上述HDR环境图的分辨率可以为64*64。The feature fusion and illumination generation network first splices the feature vectors output by the image feature extraction network and the observed environment map feature extraction network, and then generates the HDR environment map through several fully connected layers, upsampling and convolution layers. In some embodiments, the resolution of the above-mentioned HDR environment map may be 64*64.

在一些实施例中，图像特征提取网络和已观察到的环境图特征提取网络输出的特征向量的长度可以相同，也可以不相同。In some embodiments, the lengths of the feature vectors output by the image feature extraction network and the observed environment map feature extraction network may or may not be the same.

在一些实施例中，在对图像特征提取网络输出的特征向量A和已观察到的环境图特征提取网络输出的特征向量B进行拼接时，可以按照预定的顺序进行拼接，如将特征向量B填充在特征向量A的后面，也可以将特征向量A填充在特征向量B的后面，也可以将特征向量A和特征向量B按照一定的规则进行交叉拼接。本申请对此不做任何限制。In some embodiments, when splicing the feature vector A output by the image feature extraction network and the feature vector B output by the observed environment map feature extraction network, the splicing can be performed in a predetermined order, such as filling the feature vector B with After the eigenvector A, the eigenvector A can also be filled behind the eigenvector B, or the eigenvector A and the eigenvector B can be cross-spliced according to certain rules. This application does not make any restrictions on this.

在生成HDR环境图的过程中，通过光照估计，将场景环境图中的未观察到的像素点进行填充。In the process of generating the HDR environment map, the unobserved pixels in the scene environment map are filled through illumination estimation.

在一些实施例中，在使用神经网络之前，可以对该神经网络进行训练，以提高神经网络进行光照估计操作时的准确度。举例来说，可以使用全景相机拍摄的环境图像作为训练数据。在训练过程中，对网络输出的HDR环境图使用L1损失函数，并使用对抗训练损失来进行监督。In some embodiments, prior to using the neural network, the neural network may be trained to improve the accuracy of the neural network for lighting estimation operations. For example, an image of the environment captured by a panoramic camera can be used as training data. During training, an L1 loss function is used on the HDR environment map output by the network, and an adversarial training loss is used for supervision.

S110：根据HDR环境图对虚拟物体进行渲染，并将渲染后的虚拟物体融合到真实图像中。在渲染的过程中，可以结合虚拟物体的几何形状与表面材质进行渲染。渲染的方式可以使用现有技术中任意的渲染方式，本申请对此不做任何限定。S110: Render the virtual object according to the HDR environment map, and fuse the rendered virtual object into the real image. During the rendering process, the geometry and surface material of the virtual object can be combined for rendering. The rendering mode may use any rendering mode in the prior art, which is not limited in this application.

使用本申请提供的光照估计方法，能够使得在对虚拟物体进行渲染时，视觉效果真实性更强。本申请使用了100张HDR环境图生成了400组数据，分别包含当前视角图像、已观察到的场景环境图、HDR环境图，对本申请中所提供的光照估计方法进行评估，并与现有技术中的光照估计方法进行对比。在渲染的过程中，本申请使用了球体(Sphere)和维纳斯像(Venus)两个几何模型，并与粗糙、光泽和镜面三种表面材质进行组合，得到了六个虚拟物体模型。本申请使用估计的光照以及参考的光照对上述六个虚拟物体模型进行渲染，并融合到当前视角的图像中。图6和图7展示了使用了本申请提供的光照估计方法，估计光照的渲染结果与参考光照的渲染结果在结构相似性(Structural Similarity，SSIM)和峰值信噪比(Peak Signal-to-Noise Ratio，PSNR)两方面的对比结果。从图中可以看出，本申请所提供的光照估计方法相比现有技术，在视觉效果上与参考结果更具有一致性。Using the illumination estimation method provided by the present application can make the visual effect more realistic when the virtual object is rendered. This application uses 100 HDR environment maps to generate 400 sets of data, including the current perspective image, the observed scene environment map, and the HDR environment map. The illumination estimation method provided in this application is evaluated and compared with the existing technology. Compare the illumination estimation methods in . In the process of rendering, this application uses two geometric models, Sphere and Venus, and combines them with three surface materials, rough, glossy and specular, to obtain six virtual object models. The present application uses the estimated illumination and the reference illumination to render the above-mentioned six virtual object models, and fuse them into the image of the current viewing angle. Figures 6 and 7 show that using the illumination estimation method provided by the present application, the rendering results of the estimated illumination and the rendering results of the reference illumination have the structural similarity (SSIM) and the peak signal-to-noise ratio (Peak Signal-to-Noise ratio). Ratio, PSNR). It can be seen from the figure that, compared with the prior art, the illumination estimation method provided by the present application is more consistent with the reference results in terms of visual effect.

在上述实施例中，可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时，全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质，(例如，软盘、硬盘、磁带)、光介质(例如DVD)、或者半导体介质(例如固态硬盘)等。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state drives), and the like.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，该流程可以由计算机程序来指令相关的硬件完成，该程序可存储于计算机可读取存储介质中，该程序在执行时，可包括如上述各方法实施例的流程。而前述的存储介质包括：ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented. The process can be completed by instructing the relevant hardware by a computer program, and the program can be stored in a computer-readable storage medium. When the program is executed , which may include the processes of the foregoing method embodiments. The aforementioned storage medium includes: ROM or random storage memory RAM, magnetic disk or optical disk and other mediums that can store program codes.

Claims

1. A method of illumination estimation, comprising the steps of:

acquiring a first image;

determining a device orientation matrix and a camera projection matrix corresponding to the first image;

projecting the first image onto a spatial cube map according to the equipment orientation matrix and the camera projection matrix;

according to the equipment orientation matrix and the space cube map, using a mirror surface ball to map and project to obtain a scene environment map;

and determining an environment map with a high dynamic range according to the first image and the scene environment map.

2. The method of claim 1, wherein the device orientation matrix is obtained by an inertial measurement unit.

3. The method of claim 1 or 2, wherein the determining a device orientation matrix corresponding to the first image further comprises:

Determining a first moment of acquiring a first image;

and taking the inertial measurement unit data which is closest to the first moment before the first moment as a device orientation matrix corresponding to the first image.

4. The method of any of claims 1-3, wherein the projecting the first image onto a spatial cube map according to the device orientation matrix and the camera projection matrix, further comprises:

transforming a first coordinate of a first pixel to a first direction vector by projecting an inverse matrix of a matrix through the camera; wherein the first pixel is a pixel in the first image;

transforming, by the device, the first direction vector into a second coordinate by an inverse of a direction matrix of the device.

5. The method of any one of claims 1-4, wherein the obtaining the scene environment map using a mirror ball mapping projection based on the device orientation matrix and the spatial cube map, further comprises:

rendering a first mirror sphere within a cubic environment formed by the cube map;

determining an ambient direction of reflection of a second pixel on a sphere of the first specular sphere from the device orientation matrix;

Acquiring the color or brightness of a corresponding pixel on the cube map according to the environment direction;

filling the color or brightness into the second pixel.

6. The method of any of claims 1-5, wherein determining a high dynamic range environment map from the first image and the scene environment map further comprises:

determining the high dynamic range environment map using a deep convolutional neural network.

7. The method of claim 6, wherein the deep convolutional neural network comprises an image feature extraction network, an observed environment map feature extraction network, a feature fusion, and a lighting generation network, wherein,

the image feature extraction network is used for extracting a first feature vector from the first image;

the observed environment map feature extraction network is used for extracting a second feature vector from the scene environment map;

the feature fusion and illumination generation network is used for generating the environment map with the high dynamic range according to the first feature vector and the second feature vector.

8. The method of claim 7, wherein said extracting a second feature vector from said scene environment map further comprises:

Marking observed pixel points and unobserved pixel points in the scene environment image;

and determining the second feature vector according to the observed pixel points.

9. The method of claim 7 or 8, wherein the generating the high dynamic range environment map from the first eigenvector and the second eigenvector, further comprises:

splicing the first feature vector and the second feature vector to generate a third feature vector;

and generating the environment map with the high dynamic range according to the third feature vector.

10. The method of any one of claims 1-9, further comprising,

rendering the virtual object according to the environment diagram with the high dynamic range;

fusing the rendered virtual object into the first image.

11. An electronic device includes a first electronic component having a first electronic component,

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors, the one or more programs including instructions for causing the electronic device to perform the method of any of claims 1-10.

12. A computer readable medium storing one or more programs, wherein the one or more programs are configured to be executed by the one or more processors, the one or more programs comprising instructions for causing an electronic device to perform the method of any of claims 1-10.