WO2024119997A1 - Illumination estimation method and apparatus - Google Patents

Illumination estimation method and apparatus Download PDF

Info

Publication number
WO2024119997A1
WO2024119997A1 PCT/CN2023/123617 CN2023123617W WO2024119997A1 WO 2024119997 A1 WO2024119997 A1 WO 2024119997A1 CN 2023123617 W CN2023123617 W CN 2023123617W WO 2024119997 A1 WO2024119997 A1 WO 2024119997A1
Authority
WO
WIPO (PCT)
Prior art keywords
scene
illumination
lighting
dimensional model
light source
Prior art date
Application number
PCT/CN2023/123617
Other languages
French (fr)
Chinese (zh)
Inventor
张欢
尼斯纳马蒂亚斯
彼得柯西斯
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024119997A1 publication Critical patent/WO2024119997A1/en

Links

Definitions

  • the embodiments of the present application relate to the field of media technology, and in particular to a method and device for illumination estimation.
  • AR augmented reality
  • other image technologies have gradually entered people's daily lives.
  • the popularity of AR and other image technologies has greatly improved people's efficiency and experience in obtaining information.
  • AR and other image technologies are widely used in education and training, military, medical, entertainment, and manufacturing industries.
  • Highly realistic virtual-real fusion is the core requirement of AR and other image technologies.
  • the light source of the actual scene is often unknown, and it is necessary to estimate the light source in the scene based on information such as scene pictures to obtain a more realistic AR experience. Accurate lighting estimation can improve the similarity between the lighting model and the real scene.
  • the embodiments of the present application provide a method and device for illumination estimation, which can improve the accuracy of illumination estimation. To achieve the above-mentioned purpose, the embodiments of the present application adopt the following technical solutions:
  • an embodiment of the present application provides a method for estimating illumination, the method comprising: obtaining a three-dimensional model of a scene. Determining target information of the scene based on the three-dimensional model. Determining illumination intensity of the scene based on the target information.
  • the target information includes light source position information, and the light source position information is used to indicate the position of the light source in the scene.
  • the lighting estimation method provided in the embodiment of the present application additionally introduces light source position information describing the position of the light source in the scene when performing lighting estimation, because lights often have similar geometric features in different scenes.
  • light source position information describing the position of the light source in the scene when performing lighting estimation, because lights often have similar geometric features in different scenes.
  • the three-dimensional model may be input into a first network model to obtain the light source position information, wherein the first network model is used to input the corresponding light source position information according to the input three-dimensional model.
  • the method provided in the embodiment of the present application can obtain light source position information describing the position of the light source in the scene by inputting the three-dimensional model of the scene into the first network model. Since lights often have similar geometric features in different scenes. By introducing light source position information during illumination estimation, more constraints are imposed on the solution space, improving the accuracy and robustness of illumination estimation, and narrowing the domain gap between the rendered scene CG data and the actual scene data, thereby improving the accuracy of illumination estimation.
  • the method may further include: inputting the three-dimensional model into a first coding network to obtain a first coding block, wherein the first coding block includes coding point features. Inputting the three-dimensional model into a second coding network to obtain a second coding block, wherein the second coding block includes target text features. Training a first network model based on the coding point features and the target text features, wherein the first network model is used to input corresponding light source position information based on the input three-dimensional model.
  • the three-dimensional model can be input into a universal network (Unet) for encoding to obtain a first encoding block including encoded point features.
  • the three-dimensional model is input into a target text encoder (such as a contrastive language-image pre-training (CLIP) encoder) for encoding to obtain a second encoding block including target text features (class features).
  • CLIP contrastive language-image pre-training
  • the encoded point features and the target text features are then placed in a joint space, and the Euclidean distance (L2 distance) between the two features is optimized by gradient descent to train the first network model.
  • the target text can include light sources and non-light sources.
  • the method provided in the embodiment of the present application can be trained by the coding point features of the scene 3D model and the target text features. to a first network model for inputting corresponding light source position information according to the input three-dimensional model.
  • the three-dimensional model of the scene is input into the first network model to obtain light source position information describing the position of the light source in the scene. Since lights often have similar geometric features in different scenes.
  • By introducing light source position information during illumination estimation more constraints are imposed on the solution space, improving the accuracy and robustness of illumination estimation, and narrowing the domain gap between the rendered scene CG data and the actual scene data, thereby improving the accuracy of illumination estimation.
  • the light source position information, the position code and the illumination feature vector may be input into a second network model to obtain the illumination intensity of the scene; and the illumination expression may be determined according to the illumination intensity and the illumination color.
  • the illumination feature vector of the scene describes the distribution of illumination in the scene.
  • the method provided in the embodiment of the present application introduces position coding during illumination estimation, which supplements more high-frequency components during illumination estimation, so that illumination estimation can depict illumination conditions with high-frequency changes.
  • the introduction of light source position information during illumination estimation imposes more constraints on the solution space, improves the accuracy and robustness of illumination estimation, and reduces the domain gap between the rendered scene CG data and the actual scene data, thereby improving the accuracy of illumination estimation.
  • the target information further includes position coding information, and the position coding information is used to indicate a representation of the scene in a high-dimensional space.
  • the lighting estimation method introduces position coding information describing the representation of the scene in high-dimensional space when performing lighting estimation, supplements more high-frequency components, so that the lighting estimation process will not be too smooth, and it is easier to characterize the lighting conditions of the scene under high-frequency changes, thereby improving the accuracy of lighting estimation.
  • the target information further includes a lighting feature vector, and the lighting feature vector is used to indicate the distribution of lighting in the scene.
  • the illumination estimation method provided in the embodiment of the present application introduces an illumination feature vector that describes the distribution of illumination in the above-mentioned scene when performing illumination estimation, which gives more constraints to the solution space, thereby improving the accuracy of illumination estimation.
  • the target information further includes the lighting color of the scene.
  • the lighting expression of the scene can be determined according to the lighting intensity and the lighting color, and the lighting expression is used to indicate the lighting color and lighting intensity of the scene.
  • the illumination estimation method provided in the embodiment of the present application additionally introduces light source position information describing the position of the light source in the scene when performing illumination estimation, because lights often have similar geometric features in different scenes.
  • the target information may be input into a second network model to obtain the light intensity, and the second network model is used to output the corresponding light intensity according to the input target information.
  • the illumination estimation method provided in the embodiment of the present application additionally introduces the light source position information describing the position of the light source in the scene when performing illumination estimation, and then inputs the light source position information into the second network model to obtain the illumination intensity of the scene. Since lights often have similar geometric features in different scenes, by introducing the light source position information during illumination estimation, more constraints are imposed on the solution space, the accuracy and robustness of illumination estimation are improved, and the domain gap between the rendered scene CG data and the actual scene data is reduced, thereby improving the accuracy of illumination estimation.
  • the method may further include: determining a rendered image of the scene according to a lighting expression of the scene, the lighting expression being used to indicate the lighting color and lighting intensity of the scene. Training a second network model according to the rendered image of the scene and a reference image of the scene, the second network model being used to output a corresponding lighting intensity according to the input target information.
  • the second network model can be trained by gradient descent based on the difference between the rendered image of the scene and the reference image of the scene as a loss function.
  • the light intensity output by the second network model can be closer to the light intensity of the real scene, thereby further improving the accuracy of light estimation.
  • the method may further include: dividing the three-dimensional model into a plurality of voxels, and determining illumination feature vectors of the plurality of voxels according to an illumination feature vector of the scene, wherein the illumination feature vector of the scene is used to indicate the distribution of illumination in the scene.
  • the method may further include: segmenting the three-dimensional model into a plurality of voxels; and determining the illumination colors of the plurality of voxels according to the illumination color of the scene.
  • the three-dimensional model cannot express the complex, spatially varying lighting model well. After the three-dimensional model is voxelized, it can better express the complex, spatially varying lighting model. By voxelizing the three-dimensional model, the lighting color of each voxel of the three-dimensional model can be obtained, so that the obtained voxelized lighting model can support lighting editing and relighting.
  • an embodiment of the present application provides a lighting estimation device, the device comprising: a transceiver unit and a processing unit.
  • the transceiver unit is used to obtain a three-dimensional model of a scene.
  • the processing unit is used to determine target information of the scene based on the three-dimensional model, the target information includes light source position information, and the light source position information is used to indicate the position of the light source in the scene.
  • the processing unit is also used to determine the lighting intensity of the scene based on the target information.
  • the processing unit is specifically used to: input the three-dimensional model into a first network model to obtain the light source position information, and the first network model is used to input the corresponding light source position information according to the input three-dimensional model.
  • the processing unit is also used to: input the three-dimensional model into a first coding network to obtain a first coding block, wherein the first coding block includes coding point features; input the three-dimensional model into a second coding network to obtain a second coding block, wherein the second coding block includes target text features; train a first network model based on the coding point features and the target text features, wherein the first network model is used to input corresponding light source position information based on the input three-dimensional model.
  • the target information further includes position coding information, and the position coding information is used to indicate a representation of the scene in a high-dimensional space.
  • the target information further includes a lighting feature vector, and the lighting feature vector is used to indicate the distribution of lighting in the scene.
  • the target information also includes the lighting color of the scene
  • the processing unit is further used to determine the lighting expression of the scene according to the lighting intensity and the lighting color, wherein the lighting expression is used to indicate the lighting color and lighting intensity of the scene.
  • the processing unit is specifically used to: input the target information into a second network model to obtain the light intensity, and the second network model is used to output the corresponding light intensity according to the input target information.
  • the processing unit is also used to: determine a rendered image of the scene based on a lighting expression of the scene, wherein the lighting expression is used to indicate the lighting color and lighting intensity of the scene; train a second network model based on the rendered image of the scene and a reference image of the scene, wherein the second network model is used to output corresponding lighting intensity based on input target information.
  • the processing unit is further used to: segment the three-dimensional model into multiple voxels; determine the illumination feature vectors of the multiple voxels based on the illumination feature vector of the scene, wherein the illumination feature vector of the scene is used to indicate the distribution of illumination in the scene.
  • the processing unit is further configured to: segment the three-dimensional model into a plurality of voxels; and determine the illumination colors of the plurality of voxels according to the illumination color of the scene.
  • an embodiment of the present application further provides a lighting estimation device, which includes: at least one processor, when the at least one processor executes program code or instructions, implements the method described in the above first aspect or any possible implementation method thereof.
  • the illumination estimation device may further include at least one memory, and the at least one memory is used to store the program code or instruction.
  • an embodiment of the present application further provides a chip, comprising: an input interface, an output interface, and at least one processor.
  • the chip further comprises a memory.
  • the at least one processor is used to execute the code in the memory, and when the at least one processor executes the code, the chip implements the method described in the first aspect or any possible implementation thereof.
  • the above chip may also be an integrated circuit.
  • an embodiment of the present application further provides a computer-readable storage medium for storing a computer program, wherein the computer program includes methods for implementing the method described in the above-mentioned first aspect or any possible implementation thereof.
  • an embodiment of the present application further provides a computer program product comprising instructions, which, when executed on a computer, enables the computer to implement the method described in the first aspect or any possible implementation thereof.
  • the illumination estimation device, computer storage medium, computer program product and chip provided in this embodiment are used to execute the above-mentioned Therefore, the beneficial effects that can be achieved can refer to the beneficial effects of the method provided above, which will not be repeated here.
  • FIG1 is a schematic diagram of the structure of an image processing system provided in an embodiment of the present application.
  • FIG2 is a schematic diagram of a flow chart of a method for estimating illumination provided in an embodiment of the present application
  • FIG3 is a schematic diagram of a process for determining a lighting expression of a scene provided in an embodiment of the present application
  • FIG4 is a schematic diagram of a training process of a light source position information extraction network provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of the structure of a lighting estimation device provided in an embodiment of the present application.
  • FIG6 is a schematic diagram of the structure of a chip provided in an embodiment of the present application.
  • FIG7 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
  • FIG. 8 is a schematic diagram of the structure of another illumination estimation device provided in an embodiment of the present application.
  • a and/or B in this article is merely a description of the association relationship of associated objects, indicating that three relationships may exist.
  • a and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone.
  • first and second and the like in the description and drawings of the embodiments of the present application are used to distinguish different objects, or to distinguish different processing of the same object, rather than to describe a specific order of objects.
  • Voxel is the abbreviation of volume pixel.
  • a volume containing voxels can be displayed by stereo rendering or extracting polygonal isosurfaces with a given threshold contour. As the name suggests, it is the smallest unit of digital data in three-dimensional space segmentation. Voxel is used for three-dimensional imaging.
  • Three-dimensional reconstruction refers to the process of using computer and/or mathematical methods to establish a three-dimensional mathematical model suitable for computer representation and processing based on the data of three-dimensional objects in the geographical world collected.
  • 3D model is a polygonal representation of an object, usually displayed by a computer or other video device.
  • the displayed object can be a real-world entity or a fictional object. Anything that exists in the physical world can be represented by a 3D model.
  • the pose is the position of the camera in space and the attitude of the camera, which can be regarded as the translation and rotation transformation of the camera from the original reference position to the current position.
  • the pose of an object in this application is the position of the object in space and the attitude of the object.
  • Position encoding refers to mapping the original low-dimensional position coordinates through high-frequency functions to obtain high-dimensional position coordinates, so that the position coordinates carry more high-frequency information.
  • Camera extrinsic parameters the external parameters of the camera, the conversion relationship between the world coordinate system and the camera coordinate system, including displacement parameters and rotation parameters.
  • the camera pose can be determined based on the camera extrinsic parameters.
  • Trilinear interpolation is a method of linear interpolation on a tensor product grid of three-dimensional discrete sampled data.
  • This tensor product grid may be There are arbitrary non-overlapping grid points in each dimension, but it is not a triangulated finite element analysis grid.
  • This method approximates the value of a point (x, y, z) linearly on a local rectangular prism by using the data points on the grid.
  • Trilinear interpolation is often used in numerical analysis, data analysis, and computer graphics.
  • Semantic segmentation is a basic research direction in the field of computer vision (CV). It can give a specific category to each pixel in an image. For example, it can analyze objects in a picture or a video stream and mark their categories pixel by pixel. Semantic segmentation is widely used in many fields such as autonomous driving, smart cities and medical image processing.
  • Deep learning can be used to perform image recognition and identify the categories of objects in the image, that is, object classification.
  • Object categories can be, for example: table, chair, cat, dog, car, etc.
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit with xs (i.e., input data) and intercept 1 as input, and the output of the operation unit can be:
  • n is a natural number greater than 1
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into the output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple single neural units mentioned above, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected to the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • Convolutional neural network is a deep neural network with a convolutional structure.
  • Convolutional neural network contains a feature extractor consisting of a convolution layer and a subsampling layer, which can be regarded as a filter.
  • Convolutional layer refers to the neuron layer in the convolutional neural network that performs convolution processing on the input signal.
  • a neuron can only be connected to some neurons in the adjacent layers.
  • a convolutional layer usually contains several feature planes, each of which can be composed of some rectangularly arranged neural units.
  • the neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract features is independent of position.
  • Convolution kernels can be formalized as matrices of random sizes, and convolution kernels can obtain reasonable weights through learning during the training process of convolutional neural networks.
  • the direct benefit of shared weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • Loss function In the process of training a deep neural network, because we hope that the output of the deep neural network is as close as possible to the value we really want to predict, we can compare the predicted value of the current network with the target value we really want, and then update the weight vector of each layer of the neural network according to the difference between the two (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the deep neural network). For example, if the predicted value of the network is high, adjust the weight vector to make it predict a lower value, and keep adjusting until the deep neural network can predict the target value we really want or a value very close to the target value we really want.
  • a multi-layer perceptron is a feed-forward artificial neural network model that maps multiple input data sets to a single output data set. Its main feature is that it has multiple layers of neurons, so it is also called deep neural networks (DNN). Perceptrons are single neuron models and are the predecessors of larger neural networks. The power of neural networks lies in their ability to learn representations in the training data and how to relate them to the output variables that you want to predict. Mathematically, they are able to learn any mapping function and have been proven to be a universal approximation algorithm. The predictive power of neural networks comes from the hierarchical or multi-layered structure of the network.
  • a multi-layer perceptron is a neural network with at least three layers of nodes, an input layer, some intermediate layers, and an output layer. Each node in a given layer is connected to every node in the adjacent layer.
  • the input layer receives data
  • the intermediate layers calculate the data
  • the output layer outputs the results.
  • RGB red, green, blue
  • Trilinear interpolation is a method of linear interpolation on a tensor product grid of three-dimensional discrete sampled data.
  • This tensor product grid may be There are arbitrary non-overlapping grid points in each dimension, but it is not a triangulated finite element analysis grid.
  • This method approximates the value of a point (x, y, z) linearly on a local rectangular prism by using the data points on the grid.
  • Trilinear interpolation is often used in numerical analysis, data analysis, and computer graphics.
  • AR augmented reality
  • other image technologies have gradually entered people's daily lives.
  • the popularity of AR and other image technologies has greatly improved people's efficiency and experience in obtaining information.
  • AR and other image technologies are widely used in education and training, military, medical, entertainment, and manufacturing industries.
  • Highly realistic virtual-real fusion is the core requirement of AR and other image technologies. Keeping the light source and lighting model of virtual objects consistent with the real scene can obtain a realistic rendering effect.
  • the light source of the actual scene is often unknown, and it is necessary to estimate the light source in the scene based on information such as scene pictures to obtain a more realistic AR experience. Accurate lighting estimation can improve the similarity between the lighting model and the real scene.
  • an embodiment of the present application provides a method for illumination estimation, which can improve the accuracy of illumination estimation.
  • the method is applicable to an image processing system, and FIG1 shows a possible existence form of the image processing system.
  • the image processing system includes a terminal 10 and a server 20 .
  • the terminal 100 in the embodiment of the present application can be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), etc., and the embodiment of the present application does not impose any limitation on this.
  • AR augmented reality
  • VR virtual reality
  • UMPC ultra-mobile personal computer
  • PDA personal digital assistant
  • the terminal 10 may include a sensor unit 11 , a computing unit 12 , a storage unit 13 , a network transmission unit 14 , and an interaction unit 15 .
  • the sensor unit 11 usually includes a visual sensor (such as a camera, etc.) for acquiring two-dimensional (2D) image information of the scene; an inertial navigation module (such as an inertial measurement unit (IMU)), etc.) for acquiring the relative posture relationship of the mobile device at different times, which is used for subsequently acquiring an initial geometric model.
  • a visual sensor such as a camera, etc.
  • an inertial navigation module such as an inertial measurement unit (IMU)
  • IMU inertial measurement unit
  • the computing unit 12 may include a central processing unit (CPU), an application processor (AP), a modem processor, a graphics processing unit (GPU), an image signal processor (ISP), a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU) and other processors as well as buffers and registers, and is mainly used to run the mobile terminal system and process the various algorithm modules designed in the embodiments of the present application, such as the illumination estimation method.
  • CPU central processing unit
  • AP application processor
  • AP application processor
  • modem processor GPU
  • ISP image signal processor
  • video codec video codec
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the storage unit 13 mainly includes a memory and an external storage module (such as a hard disk, etc.), and is mainly used for storing and reading and writing algorithm data, user local and temporary data, etc.
  • an external storage module such as a hard disk, etc.
  • the network transmission unit 14 mainly includes upload and download modules, encoding and decoding modules for images, videos, three-dimensional models and lighting information, etc.
  • the interaction unit 15 mainly includes a display, a touch panel, a vibrator, an audio device (such as a speaker, a microphone and other audio devices), etc., and is mainly used to interact with the user, obtain user input, and present the algorithm effect to the user.
  • the server 20 may include a computing unit 21 , a storage unit 22 , and a network transmission unit 23 .
  • the computing unit 21 may include various processors such as a central processing unit, an application processor, a modem processor, a graphics processor, an image signal processor, a video codec, a digital signal processor, a baseband processor, and/or a neural network processor, as well as buffers and registers, and is mainly used to run a server operating system and process various algorithm modules designed in the embodiments of the present application, such as an illumination estimation method.
  • processors such as a central processing unit, an application processor, a modem processor, a graphics processor, an image signal processor, a video codec, a digital signal processor, a baseband processor, and/or a neural network processor, as well as buffers and registers, and is mainly used to run a server operating system and process various algorithm modules designed in the embodiments of the present application, such as an illumination estimation method.
  • the storage unit 22 mainly includes a memory and an external storage module (such as a hard disk, etc.), and is mainly used to store network models and parameters.
  • the network transmission unit 23 mainly includes upload and download modules, encoding and decoding of images, videos, three-dimensional models and lighting information, etc.
  • FIG2 shows a method for estimating illumination provided by an embodiment of the present application, the method comprising:
  • the scene may include one or more objects, such as people, plants, animals, buildings, etc.
  • a three-dimensional model of a scene can be obtained based on the image of the scene.
  • a three-dimensional reconstruction application can be installed on the terminal 10, or a webpage related to three-dimensional reconstruction or downstream tasks based on three-dimensional reconstruction results can be opened.
  • the above application and webpage can provide an interface, and the terminal 10 can receive the relevant parameters entered by the user on the three-dimensional reconstruction or downstream task interface based on the three-dimensional reconstruction results, and send the above parameters to the server 20.
  • the server 20 can obtain the processing results based on the received parameters and return the processing results to the terminal 10.
  • the terminal 10 can also complete the data processing results based on the received parameters by itself without the need for the server to cooperate in the implementation, and the embodiments of the present application are not limited.
  • the data can be collected by a collection device (terminal, SLR camera, surveillance camera, stereo vision camera or other collection device).
  • the scene is photographed to obtain multi-view images of the scene (but not limited to RGB images of various formats).
  • the scene can be photographed around to obtain multi-view images.
  • the captured images need to have a certain common viewing area between each perspective, and there should be no blind spots as much as possible.
  • the camera should be kept stable during shooting to avoid image blur.
  • the multi-view images of the scene are used as input, and a 3D model of the scene is reconstructed through a 3D reconstruction algorithm (such as the structure from motion (SFM) algorithm).
  • SFM structure from motion
  • a multi-view image of a room may be collected by a collection device, and then a three-dimensional reconstruction algorithm is used to perform three-dimensional reconstruction using the collected multi-view images of the room as reference images to obtain a three-dimensional model of the room.
  • S202 Determine target information of the scene according to the three-dimensional model.
  • the target information includes light source position information, and the light source position information is used to indicate the position of the light source in the scene.
  • the light sources indicated by the light source position information include activated light sources and inactivated light sources.
  • activated light sources include but are not limited to light sources that are emitting light (such as turned-on lights, the sun, and turned-on displays)
  • inactivated light sources include but are not limited to light sources that are not emitting light (such as turned-off lights and turned-off displays) and objects with the geometric shape of light sources (such as lamp-shaped ornaments, lamp-shaped sculptures, etc.).
  • the three-dimensional model may be input into the first network model to obtain the light source position information.
  • the first network model is used to input the corresponding light source position information according to the input three-dimensional model.
  • the first network model can be trained by three-dimensional models of multiple scenes and coding features corresponding to each three-dimensional model.
  • the coding features include coding point features and target text features.
  • the target information may further include position coding information, where the position coding information is used to indicate a representation of the scene in a high-dimensional space.
  • the lighting estimation method introduces position coding information describing the representation of the scene in high-dimensional space when performing lighting estimation, supplements more high-frequency components, so that the lighting estimation process will not be too smooth, and it is easier to characterize the lighting conditions of the scene under high-frequency changes, thereby improving the accuracy of lighting estimation.
  • the position coding of the scene may be determined according to the position of the scene and a position coding coefficient.
  • the position coding of the scene may be determined according to the three-dimensional representation of the points in the scene and the position coding coefficients according to the position coding formula.
  • the pe(x, y, z) component represents the position of the point used to represent the scene in the high-dimensional space
  • the x, y, z components represent the position of the point used to represent the scene in the three-dimensional space.
  • n is a positive number, and the value of n can be related to the resolution of the voxel grid that divides the scene (that is, it can be related to the number of voxels in the scene). In the embodiment of the present application, n can be 9.
  • w n is a position coding coefficient.
  • w n can be 2 to the power of n.
  • w 0 can be 2 to the power of 0
  • w 5 can be 2 to the power of 5.
  • the target information further includes a lighting feature vector, and the lighting feature vector is used to indicate the distribution of lighting in the scene.
  • the illumination estimation method provided in the embodiment of the present application introduces an illumination feature vector that describes the distribution of illumination in the above-mentioned scene when performing illumination estimation, which gives more constraints to the solution space, thereby improving the accuracy of illumination estimation.
  • the three-dimensional model may be input into a third network model to obtain the illumination feature vector.
  • the third network model can be obtained by training three-dimensional models of multiple scenes and the illumination feature vector corresponding to each three-dimensional model.
  • the target information further includes the lighting color of the scene.
  • the lighting color is used to indicate the color of the light source in the scene, that is, the material information of the scene.
  • the three-dimensional model may be input into a third network model to obtain the lighting color.
  • the third network model can be obtained by training three-dimensional models of multiple scenes and the lighting color corresponding to each three-dimensional model.
  • the three-dimensional model may be input into a third network model to obtain the illumination feature vector and illumination color.
  • the third network model can be obtained by training three-dimensional models of multiple scenes and the illumination feature vector and illumination color corresponding to each three-dimensional model.
  • the three-dimensional model may be segmented into a plurality of voxels; and the illumination feature vectors of the plurality of voxels may be determined according to the illumination feature vectors.
  • the three-dimensional model of the scene can be input into the third network model to obtain the illumination feature vector of the three-dimensional model.
  • the three-dimensional model of the scene is divided into a plurality of voxels (for example, the three-dimensional model of the scene can be divided into a plurality of voxels at a resolution of 10 cm).
  • the illumination feature vector stored in each voxel is determined according to the illumination feature vector of the three-dimensional model.
  • the illumination feature vector of each voxel is determined by trilinear interpolation. For each illumination feature vector of a voxel, it can be obtained by obtaining the illumination feature vectors of the eight voxels around it and performing trilinear interpolation.
  • the three-dimensional model may be segmented into a plurality of voxels; and the illumination colors of the plurality of voxels may be determined according to the illumination colors.
  • the three-dimensional model of the scene can be input into the third network model to obtain the illumination color of the three-dimensional model.
  • the three-dimensional model of the scene is divided into a plurality of voxels (e.g., the three-dimensional model of the scene can be divided into a plurality of voxels at a resolution of 10 cm).
  • the illumination color stored in each voxel is determined according to the illumination color (diffuse) of the three-dimensional model.
  • the illumination color of each voxel is determined by trilinear interpolation. For the illumination color of each voxel, the illumination color can be obtained by obtaining the illumination colors of the eight voxels around it and performing trilinear interpolation.
  • S203 Determine the illumination intensity of the scene according to the target information of the scene.
  • the light intensity is used to indicate the intensity of the light in the scene, and the range of the light intensity may be between 0 and 1. The higher the light intensity of the scene, the stronger the light in the scene.
  • the target information may be input into a second network model to obtain the light intensity, and the second network model is used to output the corresponding light intensity according to the input target information.
  • the second network model may be trained by a rendered image of a scene and a reference image of the corresponding scene.
  • Ie is the illumination intensity of the scene or voxel
  • ⁇ l is the second network model (also called the neural field or illumination neural field)
  • w0 is the illumination emission direction
  • w0 can be determined by the rendering angle
  • zs is the scene light source position information of the scene
  • zl is the illumination feature vector of the scene or scene voxel
  • zx is the position code of the scene or scene voxel.
  • the lighting expression of the scene may also be determined according to the lighting intensity and the lighting color, where the lighting expression is used to indicate the lighting color and lighting intensity of the scene.
  • Le is the lighting expression of the scene or scene voxel
  • Ie is the lighting intensity of the scene or scene voxel
  • ce is the lighting color of the scene or scene voxel.
  • the three-dimensional model of the scene is input into the third network model to obtain the illumination information fl of the scene, and the illumination information of the scene includes the illumination feature vector zl and illumination color c e of the scene.
  • the three-dimensional model of the scene is input into the first network model to obtain the light source position information zs of the scene.
  • the three-dimensional model of the scene can obtain the position code zx of the scene through position coding.
  • the illumination intensity Ie of the scene (not shown in the figure) can be obtained, and the obtained illumination intensity Ie of the scene and the illumination color c e of the scene can determine the illumination expression L e of the scene.
  • the illumination estimation method provided in the embodiment of the present application performs illumination estimation.
  • the light source position information describing the position of the light source in the scene is additionally introduced, because lights often have similar geometric features in different scenes.
  • By introducing the light source position information during illumination estimation more constraints are imposed on the solution space, improving the accuracy and robustness of illumination estimation, and narrowing the domain gap between the rendered scene CG data and the actual scene data, thereby improving the accuracy of illumination estimation.
  • the method provided in the embodiment of the present application may further include:
  • S204 Input the three-dimensional model into a first coding network to obtain a first coding block.
  • the above-mentioned first coding block includes coding point features.
  • the above three-dimensional model can be input into the Unet network for encoding to obtain a first encoding block containing encoded point features.
  • S205 Input the above three-dimensional model into a second coding network to obtain a second coding block.
  • the second encoding block includes target text features.
  • the three-dimensional model may be input into a target text encoder (such as a CLIP encoder) for encoding to obtain a second encoding block including target text features (class features).
  • the target text may include a light source and a non-light source.
  • the first network model is used to input the corresponding light source position information according to the input three-dimensional model.
  • the first network model can also be called a light source position information extraction network.
  • the three-dimensional model can be input into Unet for encoding to obtain a first encoding block including encoded point features.
  • the three-dimensional model is input into the target text encoder for encoding to obtain a second encoding block including target text features.
  • the encoded point features and the target text features are then placed in a joint space, and the Euclidean distance (L2 distance) between the two features is optimized by gradient descent to train the first network model.
  • the first network model mentioned above can be a first network model of a residual network (deep residual network, ResNet) or UNet (such as Res16UNet34D) network structure.
  • ResNet deep residual network
  • UNet such as Res16UNet34D
  • the pre-training data set of the first network model may be a ScanNet or S3DIS data set.
  • the pre-training data set of the first network model may be fine-tuned to reduce the classification categories in the pre-training data set to two categories: light source and non-light source.
  • the method provided in the embodiment of the present application can obtain a first network model for inputting corresponding light source position information according to the input three-dimensional model through the coding point features and target text features of the scene three-dimensional model.
  • the three-dimensional model of the scene is input into the first network model to obtain light source position information describing the position of the light source in the scene. Since lights often have similar geometric features in different scenes. By introducing light source position information during illumination estimation, this gives more constraints to the solution space, improves the accuracy and robustness of illumination estimation, and narrows the domain gap between the rendered scene CG data and the actual scene data, thereby improving the accuracy of illumination estimation.
  • a rendered image of a scene may be obtained according to a lighting expression through a rendering equation.
  • any pixel in a rendered image as an example, its color value is determined by the light incident on the pixel; and through ray tracing, we can find the light source corresponding to the incident light (which may have been reflected multiple times). Then, the color and brightness of the light source can be obtained from the model through the above lighting expression; combined with the material information, the color value of this ray after being projected onto the pixel can be calculated through the rendering equation.
  • sampling that is, sampling multiple rays, repeating the above tracing process for each ray, and superimposing the obtained color values to obtain the final color value of the pixel.
  • the above-mentioned second network model is used to output the corresponding light intensity according to the light source position information, position coding and lighting feature vector of the scene.
  • the second network model can be trained by gradient descent based on the difference between the rendered image of the scene and the reference image of the scene as a loss function.
  • the light intensity output by the second network model can be closer to the light intensity of the real scene, thereby further improving the accuracy of light estimation.
  • the following will introduce an illumination estimation device for executing the above illumination estimation method in conjunction with FIG. 5 .
  • the illumination estimation device includes hardware and/or software modules that perform the corresponding functions. Whether a function is executed in hardware or in a computer software-driven hardware manner depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application in combination with the embodiments, but such implementation should not be considered to be beyond the scope of the embodiments of this application.
  • the embodiment of the present application can divide the functional modules of the illumination estimation device according to the above method example.
  • each functional module can be divided according to each function, or two or more functions can be integrated into one processing module.
  • the above integrated module can be implemented in the form of hardware. It should be noted that the division of modules in this embodiment is schematic and is only a logical function division. There may be other division methods in actual implementation.
  • FIG5 shows a possible composition diagram of the illumination estimation device involved in the above embodiment.
  • the illumination estimation device 500 may include: a transceiver unit 501 and a processing unit 502 .
  • the above-mentioned transceiver unit 501 is used to obtain a three-dimensional model of the scene.
  • the processing unit 502 is used to determine target information of the scene according to the three-dimensional model.
  • the target information includes light source position information.
  • the light source position information is used to indicate the position of the light source in the scene.
  • the processing unit 502 is further configured to determine the illumination intensity of the scene according to the target information.
  • the processing unit 502 is specifically used to: input the three-dimensional model into a first network model to obtain the light source position information, and the first network model is used to input the corresponding light source position information according to the input three-dimensional model.
  • the processing unit 502 is also used to: input the three-dimensional model into a first coding network to obtain a first coding block, wherein the first coding block includes coding point features; input the three-dimensional model into a second coding network to obtain a second coding block, wherein the second coding block includes target text features; train a first network model based on the coding point features and the target text features, wherein the first network model is used to input corresponding light source position information based on the input three-dimensional model.
  • the target information further includes position coding information, and the position coding information is used to indicate a representation of the scene in a high-dimensional space.
  • the target information further includes a lighting feature vector, and the lighting feature vector is used to indicate the distribution of lighting in the scene.
  • the target information also includes the lighting color of the scene
  • the processing unit is further used to determine the lighting expression of the scene according to the lighting intensity and the lighting color, wherein the lighting expression is used to indicate the lighting color and lighting intensity of the scene.
  • the processing unit 502 is specifically used to: input the target information into a second network model to obtain the light intensity, and the second network model is used to output the corresponding light intensity according to the input target information.
  • the processing unit 502 is further used to: determine a rendered image of the scene based on a lighting expression of the scene, wherein the lighting expression is used to indicate the lighting color and lighting intensity of the scene; train a second network model based on the rendered image of the scene and a reference image of the scene, wherein the second network model is used to output corresponding lighting intensity based on input target information.
  • the processing unit 502 is further used to: divide the three-dimensional model into multiple voxels; determine the illumination feature vectors of the multiple voxels based on the illumination feature vector of the scene, wherein the illumination feature vector of the scene is used to indicate the distribution of illumination in the scene.
  • the processing unit 502 is further configured to: segment the three-dimensional model into a plurality of voxels; and determine the illumination colors of the plurality of voxels according to the illumination color of the scene.
  • FIG6 shows a schematic diagram of the structure of a chip 600.
  • the chip 600 includes one or more processors 601 and an interface circuit 602.
  • the chip 600 may also include a bus 603.
  • the processor 601 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned illumination estimation method may be completed by an integrated logic circuit of hardware in the processor 601 or by instructions in the form of software.
  • the processor 601 may be a general purpose processor, a digital signal processing (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the general purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the interface circuit 602 can be used to send or receive data, instructions or information.
  • the processor 601 can use the data, instructions or other information received by the interface circuit 602 to process, and can send the processing completion information through the interface circuit 602.
  • the chip also includes a memory, which may include a read-only memory and a random access memory, and provides operation instructions and data to the processor.
  • a portion of the memory may also include a non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory stores executable software modules or data structures
  • the processor can perform corresponding operations by calling operation instructions stored in the memory (the operation instructions can be stored in the operating system).
  • the chip can be used in the illumination estimation device involved in the embodiment of the present application.
  • the interface circuit 602 can be used to output the execution result of the processor 601.
  • the illumination estimation method provided by one or more embodiments of the embodiment of the present application can refer to the aforementioned embodiments, which will not be repeated here.
  • processor 601 and the interface circuit 602 can be implemented through hardware design, software design, or a combination of hardware and software, and there is no limitation here.
  • the electronic device 100 may be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), an illumination estimation device, or a chip or functional module in an illumination estimation device.
  • AR augmented reality
  • VR virtual reality
  • UMPC ultra-mobile personal computer
  • PDA personal digital assistant
  • an illumination estimation device or a chip or functional module in an illumination estimation device.
  • FIG7 is a schematic diagram of the structure of an electronic device 100 provided in an embodiment of the present application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, and a subscriber identification module (SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than shown in the figure, or combine some components, or split some components, or arrange the components differently.
  • the components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (AP), a modem processor, a graphics processor (GPU), an image signal processor (ISP), a controller, a memory, a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU), etc.
  • AP application processor
  • GPU graphics processor
  • ISP image signal processor
  • controller a memory
  • video codec a digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • Different processing units may be independent devices or integrated in one or more processors.
  • the controller may be the nerve center and command center of the electronic device 100.
  • the controller may generate an operation control signal according to the instruction operation code and the timing signal to complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the processor 110 may include one or more interfaces.
  • the interface may include an inter-integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a pulse code modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a mobile industry processor interface (MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (SIM) interface, and/or a universal serial bus (USB) interface, etc.
  • I2C inter-integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • UART universal asynchronous receiver/transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus.
  • the processor 110 can couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to realize the touch function of the electronic device 100.
  • the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193.
  • the MIPI interface includes a camera serial interface (CSI), a display serial interface (DSI), etc.
  • the processor 110 and the camera 193 communicate through the CSI interface to realize the shooting function of the electronic device 100.
  • the processor 110 and the display screen 194 communicate through the DSI interface to realize the display function of the electronic device 100.
  • the interface connection relationship between the modules illustrated in the embodiment of the present application is only a schematic illustration and does not constitute a structural limitation on the electronic device 100.
  • the electronic device 100 may also adopt different interface connections in the above embodiments. port connection mode, or a combination of multiple port connection modes.
  • the charging management module 140 is used to receive charging input from a charger.
  • the charger can be a wireless charger or a wired charger.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and provides power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
  • the electronic device 100 implements the display function through a GPU, a display screen 194, and an application processor.
  • the GPU is a microprocessor for image processing, which connects the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos, etc.
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the electronic device 100 can realize the shooting function through ISP, camera 193, touch sensor, video codec, GPU, display screen 194 and application processor.
  • ISP is used to process the data fed back by camera 193.
  • the shutter is opened, and the light is transmitted to the camera photosensitive element through the lens.
  • the light signal is converted into an electrical signal, and the camera photosensitive element transmits the above electrical signal to ISP for processing and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on the noise, brightness, and skin color of the image.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • ISP can be set in camera 193.
  • the camera 193 is used to capture still images or videos.
  • the object generates an optical image through the lens and projects it onto the photosensitive element.
  • the photosensitive element can be a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS) phototransistor.
  • CMOS complementary metal oxide semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to be converted into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • the DSP converts the digital image signal into an image signal in a standard RGB, YUV or other format. It should be understood that in the description of the embodiments of the present application, an image in RGB format is used as an example for introduction, and the embodiments of the present application do not limit the image format.
  • the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
  • the digital signal processor is used to process digital signals, and can process not only digital image signals but also other digital signals. For example, when the electronic device 100 is selecting a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.
  • Video codecs are used to compress or decompress digital videos.
  • the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record videos in a variety of coding formats, such as Moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG Moving Picture Experts Group
  • MPEG2 MPEG2, MPEG3, MPEG4, etc.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the internal memory 121 can be used to store computer executable program codes, which include instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by running the instructions stored in the internal memory 121.
  • the internal memory 121 may include a program storage area and a data storage area.
  • the electronic device 100 can implement audio functions such as music playing and recording through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, and the application processor.
  • the button 190 includes a power button, a volume button, etc.
  • the button 190 can be a mechanical button. It can also be a touch button.
  • the electronic device 100 can receive button input and generate key signal input related to the user settings and function control of the electronic device 100.
  • the motor 191 can generate a vibration prompt.
  • the motor 191 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback. For example, touch operations acting on different applications (such as taking pictures, audio playback, etc.) can correspond to different vibration feedback effects. For touch operations acting on different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging status, power changes, and can also be used to indicate messages, missed calls, notifications, etc.
  • the SIM card interface 195 is used to connect a SIM card.
  • the electronic device 100 may be a chip system or a device with a similar structure as shown in FIG. 7.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the actions, terms, etc. involved in the various embodiments of the present application may refer to each other without limitation.
  • the message name or parameter name in the message exchanged between the various devices in the embodiments of the present application is only an example, and other names may be used in the specific implementation without limitation.
  • the composition structure shown in FIG. 7 does not constitute a Definition of the sub-device 100, in addition to the components shown in FIG. 7, the electronic device 100 may include more or less components than those shown in FIG. 7, or combine certain components, or arrange the components differently.
  • the processor and transceiver described in the present application can be implemented in an integrated circuit (IC), an analog IC, a radio frequency integrated circuit, a mixed signal IC, an application specific integrated circuit (ASIC), a printed circuit board (PCB), an electronic device, etc.
  • the processor and transceiver can also be manufactured using various IC process technologies, such as complementary metal oxide semiconductor (CMOS), N-type metal oxide semiconductor (NMOS), P-type metal oxide semiconductor (positive channel metal oxide semiconductor, PMOS), bipolar junction transistor (BJT), bipolar CMOS (BiCMOS), silicon germanium (SiGe), gallium arsenide (GaAs), etc.
  • CMOS complementary metal oxide semiconductor
  • NMOS N-type metal oxide semiconductor
  • PMOS P-type metal oxide semiconductor
  • BJT bipolar junction transistor
  • BiCMOS bipolar CMOS
  • SiGe silicon germanium
  • GaAs gallium arsenide
  • FIG8 is a schematic diagram of the structure of a lighting estimation device provided in an embodiment of the present application.
  • the lighting estimation device can be applied to the scenario shown in the above method embodiment.
  • FIG8 only shows the main components of the lighting estimation device, including a processor 801, a memory 802, a control circuit 803, and an input-output device 804.
  • the processor 801 is mainly used to process communication protocols and communication data, execute software programs, and process data of software programs.
  • the memory 802 is mainly used to store software programs and data.
  • the control circuit 803 is mainly used for power supply and transmission of various electrical signals.
  • the input-output device 804 is mainly used to receive data input by a user and output data to the user.
  • the control circuit 803 can be a mainboard, the memory 802 includes a hard disk, RAM, ROM and other media with storage functions, the processor 801 can include a baseband processor 801 and a central processing unit, the baseband processor is mainly used to process the communication protocol and communication data, the central processing unit is mainly used to control the entire illumination estimation device, execute software programs, and process the data of the software programs, and the input and output device 804 includes a display screen, a keyboard and a mouse, etc.; the control circuit 803 can further include or be connected to a transceiver circuit or a transceiver, such as: a network cable interface, etc., for sending or receiving data or signals, such as data transmission and communication with other devices. Further, it can also include an antenna for sending and receiving wireless signals, and for data/signal transmission with other devices.
  • a transceiver circuit or a transceiver such as: a network cable interface, etc.
  • An embodiment of the present application also provides a lighting estimation device, which includes: at least one processor, when the at least one processor executes program code or instructions, the above-mentioned related method steps are implemented to implement the lighting estimation method in the above-mentioned embodiment.
  • the device may further include at least one memory, and the at least one memory is used to store the program code or instruction.
  • An embodiment of the present application further provides a computer storage medium, in which computer instructions are stored.
  • the lighting estimation device executes the above-mentioned related method steps to implement the lighting estimation method in the above-mentioned embodiment.
  • the embodiment of the present application also provides a computer program product.
  • the computer program product When the computer program product is run on a computer, the computer is enabled to execute the above-mentioned related steps to implement the illumination estimation method in the above-mentioned embodiment.
  • the embodiment of the present application also provides a lighting estimation device, which can be a chip, an integrated circuit, a component or a module.
  • the device may include a connected processor and a memory for storing instructions, or the device includes at least one processor for obtaining instructions from an external memory.
  • the processor can execute instructions so that the chip executes the lighting estimation method in the above-mentioned method embodiments.
  • the size of the serial numbers of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the above units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
  • the units described above as separate components may or may not be physically separate, and the components shown as units may be Or it may not be a physical unit, and may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application or the part that contributes to the prior art or the part of the technical solution, can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including several instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) to execute all or part of the steps of the above methods in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

The embodiments of the present application relate to the technical field of media, and can improve the accuracy of illumination estimation. Disclosed are an illumination estimation method and apparatus. The method comprises: acquiring a three-dimensional model of a scene; determining target information of the scene according to the three-dimensional model; and determining illumination intensity in the scene according to the target information, wherein the target information comprises light source position information, which is used for indicating the position of a light source in the scene. By means of introducing light source position information during illumination estimation, more constraints are provided to a solution space, the accuracy and robustness of illumination estimation are improved, and the domain difference between scene CG data obtained by means of rendering and actual scene data is reduced, thereby improving the accuracy of illumination estimation.

Description

光照估计方法和装置Lighting estimation method and device
本申请要求于2022年12月09日提交中国专利局、申请号为202211581949.6、申请名称为“一种光照估计方法”的中国专利申请的优先权,以及于2023年06月26日提交中国知识产权局、申请号为202310767724.8、申请名称为“光照估计方法和装置”的中国专利申请的优先权,它们的全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the China Patent Office on December 9, 2022, with application number 202211581949.6 and application name “A method for illumination estimation”, and the priority of the Chinese patent application filed with the China Intellectual Property Office on June 26, 2023, with application number 202310767724.8 and application name “Illumination estimation method and device”, the entire contents of which are incorporated by reference in this application.
技术领域Technical Field
本申请实施例涉及媒体技术领域,尤其涉及光照估计方法和装置。The embodiments of the present application relate to the field of media technology, and in particular to a method and device for illumination estimation.
背景技术Background technique
随着科技的进步,增强现实(augmented reality,AR)等图像技术已经逐渐走进人们的日常生活。AR等图像技术的普及,极大提升了人们获取信息的效率和体验。AR等图像技术在教育培训、军事、医疗、娱乐和生产制造等行业中被广泛应用。高度真实感的虚实融合是AR等图像技术的核心需求。将虚拟物体的光源和光照模型与真实场景保持一致,可以获得具有真实感的渲染效果。With the advancement of science and technology, augmented reality (AR) and other image technologies have gradually entered people's daily lives. The popularity of AR and other image technologies has greatly improved people's efficiency and experience in obtaining information. AR and other image technologies are widely used in education and training, military, medical, entertainment, and manufacturing industries. Highly realistic virtual-real fusion is the core requirement of AR and other image technologies. By keeping the light source and lighting model of virtual objects consistent with the real scene, a realistic rendering effect can be obtained.
然而,实际场景的光源往往是未知的,需要根据场景图片等信息来估计场景中的光源,进而获得较为真实的AR体验,准确的光照估计可以提高光照模型与真实场景的相似度。However, the light source of the actual scene is often unknown, and it is necessary to estimate the light source in the scene based on information such as scene pictures to obtain a more realistic AR experience. Accurate lighting estimation can improve the similarity between the lighting model and the real scene.
为此,如何提升光照估计的准确性是本领域技术人员亟需解决的问题之一。Therefore, how to improve the accuracy of illumination estimation is one of the problems that technical personnel in this field need to solve urgently.
发明内容Summary of the invention
本申请实施例提供了光照估计方法和装置,能够提升光照估计的准确性。为达到上述目的,本申请实施例采用如下技术方案:The embodiments of the present application provide a method and device for illumination estimation, which can improve the accuracy of illumination estimation. To achieve the above-mentioned purpose, the embodiments of the present application adopt the following technical solutions:
第一方面,本申请实施例提供了一种光照估计方法,该方法包括:获取场景的三维模型。根据上述三维模型确定上述场景的目标信息。根据上述目标信息确定上述场景的光照强度。其中,上述目标信息包括光源位置信息,上述光源位置信息用于指示上述场景中光源的位置。In a first aspect, an embodiment of the present application provides a method for estimating illumination, the method comprising: obtaining a three-dimensional model of a scene. Determining target information of the scene based on the three-dimensional model. Determining illumination intensity of the scene based on the target information. The target information includes light source position information, and the light source position information is used to indicate the position of the light source in the scene.
相比于仅通过场景的图像进行光照估计,本申请实施例提供的光照估计方法,在进行光照估计时额外引入了描述场景中光源位置的光源位置信息,由于灯光在不同场景下往往具有类似的几何特征。通过在光照估计时引入光源位置信息,这对解空间给出了更多约束,提高了光照估计的准确性和鲁棒性,并且缩小了渲染得到的场景计算机图形学(computer graphics,CG)数据和场景实际数据之间的域差距(domain gap),由此提高了光照估计的准确性。Compared with lighting estimation only through the image of the scene, the lighting estimation method provided in the embodiment of the present application additionally introduces light source position information describing the position of the light source in the scene when performing lighting estimation, because lights often have similar geometric features in different scenes. By introducing light source position information during lighting estimation, more constraints are imposed on the solution space, the accuracy and robustness of lighting estimation are improved, and the domain gap between the rendered scene computer graphics (CG) data and the actual scene data is reduced, thereby improving the accuracy of lighting estimation.
在一种可能的实现方式中,可以将上述三维模型输入第一网络模型以得到上述光源位置信息。其中,上述第一网络模型用于根据输入的三维模型输入对应的光源位置信息。In a possible implementation, the three-dimensional model may be input into a first network model to obtain the light source position information, wherein the first network model is used to input the corresponding light source position information according to the input three-dimensional model.
可以看出,本申请实施例提供的方法可以通过将场景的三维模型输入第一网络模型得到描述场景中光源位置的光源位置信息。由于灯光在不同场景下往往具有类似的几何特征。通过在光照估计时引入光源位置信息,这对解空间给出了更多约束,提高了光照估计的准确性和鲁棒性,并且缩小了渲染得到的场景CG数据和场景实际数据之间的domain gap,由此提高了光照估计的准确性。It can be seen that the method provided in the embodiment of the present application can obtain light source position information describing the position of the light source in the scene by inputting the three-dimensional model of the scene into the first network model. Since lights often have similar geometric features in different scenes. By introducing light source position information during illumination estimation, more constraints are imposed on the solution space, improving the accuracy and robustness of illumination estimation, and narrowing the domain gap between the rendered scene CG data and the actual scene data, thereby improving the accuracy of illumination estimation.
在一种可能的实现方式中,该方法还可以包括:将上述三维模型输入第一编码网络得到第一编码块,上述第一编码块包括编码点特征。将上述三维模型输入第二编码网络得到第二编码块,上述第二编码块包括目标文本特征。根据上述编码点特征和上述目标文本特征训练第一网络模型,上述第一网络模型用于根据输入的三维模型输入对应的光源位置信息。In a possible implementation, the method may further include: inputting the three-dimensional model into a first coding network to obtain a first coding block, wherein the first coding block includes coding point features. Inputting the three-dimensional model into a second coding network to obtain a second coding block, wherein the second coding block includes target text features. Training a first network model based on the coding point features and the target text features, wherein the first network model is used to input corresponding light source position information based on the input three-dimensional model.
示例性地,可以将上述三维模型输入通用网络(Unet)进行编码以得到包含编码点特征(encoded point features)的第一编码块。将上述三维模型输入目标文本编码器(如语言图像预训练(contrastive language-image pre-training,CLIP)编码器)进行编码以得到包括目标文本特征(class features)的第二编码块。之后编码点特征和目标文本特征将放在一个联合空间中,通过梯度下降优化两种特征之间的欧氏距离(L2距离),以训练第一网络模型。其中,上述目标文本可以包括光源和非光源。Exemplarily, the three-dimensional model can be input into a universal network (Unet) for encoding to obtain a first encoding block including encoded point features. The three-dimensional model is input into a target text encoder (such as a contrastive language-image pre-training (CLIP) encoder) for encoding to obtain a second encoding block including target text features (class features). The encoded point features and the target text features are then placed in a joint space, and the Euclidean distance (L2 distance) between the two features is optimized by gradient descent to train the first network model. The target text can include light sources and non-light sources.
可以看出,本申请实施例提供的方法可以通过场景三维模型的编码点特征和目标文本特征训练得 到用于根据输入的三维模型输入对应的光源位置信息的第一网络模型。将场景的三维模型输入第一网络模型得到描述场景中光源位置的光源位置信息。由于灯光在不同场景下往往具有类似的几何特征。通过在光照估计时引入光源位置信息,这对解空间给出了更多约束,提高了光照估计的准确性和鲁棒性,并且缩小了渲染得到的场景CG数据和场景实际数据之间的domain gap,由此提高了光照估计的准确性。It can be seen that the method provided in the embodiment of the present application can be trained by the coding point features of the scene 3D model and the target text features. to a first network model for inputting corresponding light source position information according to the input three-dimensional model. The three-dimensional model of the scene is input into the first network model to obtain light source position information describing the position of the light source in the scene. Since lights often have similar geometric features in different scenes. By introducing light source position information during illumination estimation, more constraints are imposed on the solution space, improving the accuracy and robustness of illumination estimation, and narrowing the domain gap between the rendered scene CG data and the actual scene data, thereby improving the accuracy of illumination estimation.
在一种可能的实现方式中,可以将上述光源位置信息、上述位置编码和上述光照特征向量输入第二网络模型以得到上述场景的光照强度;根据上述光照强度和上述光照颜色确定上述光照表达。其中,场景的光照特征向量描述了光照在场景中的分布情况。In a possible implementation, the light source position information, the position code and the illumination feature vector may be input into a second network model to obtain the illumination intensity of the scene; and the illumination expression may be determined according to the illumination intensity and the illumination color. The illumination feature vector of the scene describes the distribution of illumination in the scene.
可以理解的是,本申请实施例提供的方法一方面在光照估计时引入了位置编码,位置编码在光照估计过程中补充了更多的高频分量,使得光照估计时能够刻画高频变化的光照情况。另一方面,在光照估计时引入光源位置信息,这对解空间给出了更多约束,提高了光照估计的准确性和鲁棒性,并且缩小了渲染得到的场景CG数据和场景实际数据之间的domain gap,由此提高了光照估计的准确性。It can be understood that the method provided in the embodiment of the present application introduces position coding during illumination estimation, which supplements more high-frequency components during illumination estimation, so that illumination estimation can depict illumination conditions with high-frequency changes. On the other hand, the introduction of light source position information during illumination estimation imposes more constraints on the solution space, improves the accuracy and robustness of illumination estimation, and reduces the domain gap between the rendered scene CG data and the actual scene data, thereby improving the accuracy of illumination estimation.
在一种可能的实现方式中,上述目标信息还包括位置编码信息,上述位置编码信息用于指示上述场景在高维空间中的表示。In a possible implementation, the target information further includes position coding information, and the position coding information is used to indicate a representation of the scene in a high-dimensional space.
可以理解的是,由于场景中的点在三维空间中仅需要x、y、z三个分量表示,其无法表达高频分量。而位置编码信息指示了场景在高维空间中的表示,具备表达高频分量能力。因此相比于仅通过场景的三维位置进行光照估计,本申请实施例提供的光照估计方法,通过在进行光照估计时引入了描述场景中在高维空间中的表示的位置编码信息,补充了更多的高频分量,使得光照估计过程不会过于平滑,更易于刻画场景在高频变化情况下的光照情况,由此提高了光照估计的准确性。It is understandable that since the points in the scene only need three components x, y, and z to represent in three-dimensional space, it is unable to express high-frequency components. The position coding information indicates the representation of the scene in high-dimensional space and has the ability to express high-frequency components. Therefore, compared to lighting estimation only through the three-dimensional position of the scene, the lighting estimation method provided in the embodiment of the present application introduces position coding information describing the representation of the scene in high-dimensional space when performing lighting estimation, supplements more high-frequency components, so that the lighting estimation process will not be too smooth, and it is easier to characterize the lighting conditions of the scene under high-frequency changes, thereby improving the accuracy of lighting estimation.
在一种可能的实现方式中,上述目标信息还包括光照特征向量,上述光照特征向量用于指示光照在上述场景中的分布情况。In a possible implementation manner, the target information further includes a lighting feature vector, and the lighting feature vector is used to indicate the distribution of lighting in the scene.
可以理解的是,本申请实施例提供的光照估计方法,通过在进行光照估计时引入了描述光照在上述场景中的分布情况的光照特征向量,这对解空间给出了更多约束,由此提高了光照估计的准确性。It can be understood that the illumination estimation method provided in the embodiment of the present application introduces an illumination feature vector that describes the distribution of illumination in the above-mentioned scene when performing illumination estimation, which gives more constraints to the solution space, thereby improving the accuracy of illumination estimation.
在一种可能的实现方式中,上述目标信息还包括上述场景的光照颜色。可以根据上述光照强度和上述光照颜色确定上述场景的光照表达,上述光照表达用于指示上述场景的光照颜色和光照强度。In a possible implementation, the target information further includes the lighting color of the scene. The lighting expression of the scene can be determined according to the lighting intensity and the lighting color, and the lighting expression is used to indicate the lighting color and lighting intensity of the scene.
可以理解的是,相比于仅通过场景的图像进行光照估计,本申请实施例提供的光照估计方法,在进行光照估计时额外引入了描述场景中光源位置的光源位置信息,由于灯光在不同场景下往往具有类似的几何特征。通过在光照估计时引入光源位置信息,这对解空间给出了更多约束,提高了光照估计的准确性和鲁棒性,并且缩小了渲染得到的场景CG数据和场景实际数据之间的domain gap,由此提高了光照估计的准确性。It is understandable that, compared to performing illumination estimation only through the image of the scene, the illumination estimation method provided in the embodiment of the present application additionally introduces light source position information describing the position of the light source in the scene when performing illumination estimation, because lights often have similar geometric features in different scenes. By introducing light source position information during illumination estimation, more constraints are imposed on the solution space, the accuracy and robustness of illumination estimation are improved, and the domain gap between the rendered scene CG data and the actual scene data is reduced, thereby improving the accuracy of illumination estimation.
在一种可能的实现方式中,可以将上述目标信息输入第二网络模型以得到上述光照强度,上述第二网络模型用于根据输入的目标信息输出对应的光照强度。In a possible implementation, the target information may be input into a second network model to obtain the light intensity, and the second network model is used to output the corresponding light intensity according to the input target information.
可以理解的是,相比于仅通过场景的图像进行光照估计,本申请实施例提供的光照估计方法,在进行光照估计时额外引入了描述场景中光源位置的光源位置信息,然后通过将光源位置信息输入第二网络模型以得到场景的光照强度,由于灯光在不同场景下往往具有类似的几何特征。通过在光照估计时引入光源位置信息,这对解空间给出了更多约束,提高了光照估计的准确性和鲁棒性,并且缩小了渲染得到的场景CG数据和场景实际数据之间的domain gap,由此提高了光照估计的准确性。It can be understood that, compared with the illumination estimation only through the image of the scene, the illumination estimation method provided in the embodiment of the present application additionally introduces the light source position information describing the position of the light source in the scene when performing illumination estimation, and then inputs the light source position information into the second network model to obtain the illumination intensity of the scene. Since lights often have similar geometric features in different scenes, by introducing the light source position information during illumination estimation, more constraints are imposed on the solution space, the accuracy and robustness of illumination estimation are improved, and the domain gap between the rendered scene CG data and the actual scene data is reduced, thereby improving the accuracy of illumination estimation.
在一种可能的实现方式中,该方法还可以包括:根据上述场景的光照表达确定上述场景的渲染图像,上述光照表达用于指示上述场景的光照颜色和光照强度。根据上述场景的渲染图像和上述场景的参考图像训练第二网络模型,上述第二网络模型用于根据输入的目标信息输出对应的光照强度。In a possible implementation, the method may further include: determining a rendered image of the scene according to a lighting expression of the scene, the lighting expression being used to indicate the lighting color and lighting intensity of the scene. Training a second network model according to the rendered image of the scene and a reference image of the scene, the second network model being used to output a corresponding lighting intensity according to the input target information.
具体地,可以根据上述场景的渲染图像和上述场景的参考图像之间的差值作为损失函数,通过梯度下降以训练第二网络模型。Specifically, the second network model can be trained by gradient descent based on the difference between the rendered image of the scene and the reference image of the scene as a loss function.
可以理解的是,通过根据上述场景的渲染图像和上述场景的参考图像训练第二网络模型,可以使得第二网络模型输出的光照强度更加接近真实场景的光照强度,由此进一步提高了光照估计的准确性。It can be understood that by training the second network model based on the rendered image of the above scene and the reference image of the above scene, the light intensity output by the second network model can be closer to the light intensity of the real scene, thereby further improving the accuracy of light estimation.
在一种可能的实现方式中,该方法还可以包括:将上述三维模型分割为多个体素。根据上述场景的光照特征向量确定上述多个体素的光照特征向量,上述场景的光照特征向量用于指示光照在上述场景中的分布情况。In a possible implementation, the method may further include: dividing the three-dimensional model into a plurality of voxels, and determining illumination feature vectors of the plurality of voxels according to an illumination feature vector of the scene, wherein the illumination feature vector of the scene is used to indicate the distribution of illumination in the scene.
可以理解的,三维模型无法很好表达复杂、空间变化的光照模型,而将三维模型体素化后,可以 较好的表达复杂、空间变化的光照模型,通过将三维模型体素化得到三维模型体素化的各个体素的光照特征向量,从而可以使得到的体素化的光照模型支持光照编辑和重打光。It is understandable that 3D models cannot well express complex, spatially varying lighting models. However, by voxelizing the 3D model, It is better to express complex and spatially varying lighting models. By voxelizing the three-dimensional model, the lighting feature vector of each voxel of the three-dimensional model is obtained, so that the obtained voxelized lighting model can support lighting editing and relighting.
在一种可能的实现方式中,该方法还可以包括:将上述三维模型分割为多个体素;根据上述场景的光照颜色确定上述多个体素的光照颜色。In a possible implementation manner, the method may further include: segmenting the three-dimensional model into a plurality of voxels; and determining the illumination colors of the plurality of voxels according to the illumination color of the scene.
可以理解的,三维模型无法很好表达复杂、空间变化的光照模型,而将三维模型体素化后,可以较好的表达复杂、空间变化的光照模型,通过将三维模型体素化可以得到三维模型体素化的各个体素的光照颜色,从而可以使得到的体素化的光照模型支持光照编辑和重打光。It is understandable that the three-dimensional model cannot express the complex, spatially varying lighting model well. After the three-dimensional model is voxelized, it can better express the complex, spatially varying lighting model. By voxelizing the three-dimensional model, the lighting color of each voxel of the three-dimensional model can be obtained, so that the obtained voxelized lighting model can support lighting editing and relighting.
第二方面,本申请实施例提供了一种光照估计装置,上述装置包括:收发单元和处理单元。上述收发单元,用于获取场景的三维模型。上述处理单元,用于根据上述三维模型确定上述场景的目标信息,上述目标信息包括光源位置信息,上述光源位置信息用于指示上述场景中光源的位置。上述处理单元,还用于根据上述目标信息确定上述场景的光照强度。In a second aspect, an embodiment of the present application provides a lighting estimation device, the device comprising: a transceiver unit and a processing unit. The transceiver unit is used to obtain a three-dimensional model of a scene. The processing unit is used to determine target information of the scene based on the three-dimensional model, the target information includes light source position information, and the light source position information is used to indicate the position of the light source in the scene. The processing unit is also used to determine the lighting intensity of the scene based on the target information.
在一种可能的实现方式中,上述处理单元具体用于:将上述三维模型输入第一网络模型以得到上述光源位置信息,上述第一网络模型用于根据输入的三维模型输入对应的光源位置信息。In a possible implementation, the processing unit is specifically used to: input the three-dimensional model into a first network model to obtain the light source position information, and the first network model is used to input the corresponding light source position information according to the input three-dimensional model.
在一种可能的实现方式中,上述处理单元还用于:将上述三维模型输入第一编码网络得到第一编码块,上述第一编码块包括编码点特征;将上述三维模型输入第二编码网络得到第二编码块,上述第二编码块包括目标文本特征;根据上述编码点特征和上述目标文本特征训练第一网络模型,上述第一网络模型用于根据输入的三维模型输入对应的光源位置信息。In a possible implementation, the processing unit is also used to: input the three-dimensional model into a first coding network to obtain a first coding block, wherein the first coding block includes coding point features; input the three-dimensional model into a second coding network to obtain a second coding block, wherein the second coding block includes target text features; train a first network model based on the coding point features and the target text features, wherein the first network model is used to input corresponding light source position information based on the input three-dimensional model.
在一种可能的实现方式中,上述目标信息还包括位置编码信息,上述位置编码信息用于指示上述场景在高维空间中的表示。In a possible implementation, the target information further includes position coding information, and the position coding information is used to indicate a representation of the scene in a high-dimensional space.
在一种可能的实现方式中,上述目标信息还包括光照特征向量,上述光照特征向量用于指示光照在上述场景中的分布情况。In a possible implementation, the target information further includes a lighting feature vector, and the lighting feature vector is used to indicate the distribution of lighting in the scene.
在一种可能的实现方式中,上述目标信息还包括上述场景的光照颜色,上述处理单元还用于:根据上述光照强度和上述光照颜色确定上述场景的光照表达,上述光照表达用于指示上述场景的光照颜色和光照强度。In a possible implementation, the target information also includes the lighting color of the scene, and the processing unit is further used to determine the lighting expression of the scene according to the lighting intensity and the lighting color, wherein the lighting expression is used to indicate the lighting color and lighting intensity of the scene.
在一种可能的实现方式中,上述处理单元具体用于:将上述目标信息输入第二网络模型以得到上述光照强度,上述第二网络模型用于根据输入的目标信息输出对应的光照强度。In a possible implementation, the processing unit is specifically used to: input the target information into a second network model to obtain the light intensity, and the second network model is used to output the corresponding light intensity according to the input target information.
在一种可能的实现方式中,上述处理单元还用于:根据上述场景的光照表达确定上述场景的渲染图像,上述光照表达用于指示上述场景的光照颜色和光照强度;根据上述场景的渲染图像和上述场景的参考图像训练第二网络模型,上述第二网络模型用于根据输入的目标信息输出对应的光照强度。In a possible implementation, the processing unit is also used to: determine a rendered image of the scene based on a lighting expression of the scene, wherein the lighting expression is used to indicate the lighting color and lighting intensity of the scene; train a second network model based on the rendered image of the scene and a reference image of the scene, wherein the second network model is used to output corresponding lighting intensity based on input target information.
在一种可能的实现方式中,上述处理单元还用于:将上述三维模型分割为多个体素;根据上述场景的光照特征向量确定上述多个体素的光照特征向量,上述场景的光照特征向量用于指示光照在上述场景中的分布情况。In a possible implementation, the processing unit is further used to: segment the three-dimensional model into multiple voxels; determine the illumination feature vectors of the multiple voxels based on the illumination feature vector of the scene, wherein the illumination feature vector of the scene is used to indicate the distribution of illumination in the scene.
在一种可能的实现方式中,上述处理单元还用于:将上述三维模型分割为多个体素;根据上述场景的光照颜色确定上述多个体素的光照颜色。In a possible implementation, the processing unit is further configured to: segment the three-dimensional model into a plurality of voxels; and determine the illumination colors of the plurality of voxels according to the illumination color of the scene.
第三方面,本申请实施例还提供一种光照估计装置,该光照估计装置包括:至少一个处理器,当所述至少一个处理器执行程序代码或指令时,实现上述第一方面或其任意可能的实现方式中所述的方法。In a third aspect, an embodiment of the present application further provides a lighting estimation device, which includes: at least one processor, when the at least one processor executes program code or instructions, implements the method described in the above first aspect or any possible implementation method thereof.
可选地,该光照估计装置还可以包括至少一个存储器,该至少一个存储器用于存储该程序代码或指令。Optionally, the illumination estimation device may further include at least one memory, and the at least one memory is used to store the program code or instruction.
第四方面,本申请实施例还提供一种芯片,包括:输入接口、输出接口、至少一个处理器。可选的,该芯片还包括存储器。该至少一个处理器用于执行该存储器中的代码,当该至少一个处理器执行该代码时,该芯片实现上述第一方面或其任意可能的实现方式中所述的方法。In a fourth aspect, an embodiment of the present application further provides a chip, comprising: an input interface, an output interface, and at least one processor. Optionally, the chip further comprises a memory. The at least one processor is used to execute the code in the memory, and when the at least one processor executes the code, the chip implements the method described in the first aspect or any possible implementation thereof.
可选地,上述芯片还可以为集成电路。Optionally, the above chip may also be an integrated circuit.
第五方面,本申请实施例还提供一种计算机可读存储介质,用于存储计算机程序,该计算机程序包括用于实现上述第一方面或其任意可能的实现方式中所述的方法。In a fifth aspect, an embodiment of the present application further provides a computer-readable storage medium for storing a computer program, wherein the computer program includes methods for implementing the method described in the above-mentioned first aspect or any possible implementation thereof.
第六方面,本申请实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机实现上述第一方面或其任意可能的实现方式中所述的方法。In a sixth aspect, an embodiment of the present application further provides a computer program product comprising instructions, which, when executed on a computer, enables the computer to implement the method described in the first aspect or any possible implementation thereof.
本实施例提供的光照估计装置、计算机存储介质、计算机程序产品和芯片均用于执行上文所提供 的方法,因此,其所能达到的有益效果可参考上文所提供的方法中的有益效果,此处不再赘述。The illumination estimation device, computer storage medium, computer program product and chip provided in this embodiment are used to execute the above-mentioned Therefore, the beneficial effects that can be achieved can refer to the beneficial effects of the method provided above, which will not be repeated here.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本申请实施例的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the following briefly introduces the drawings required for use in the description of the embodiments. Obviously, the drawings described below are only some embodiments of the embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without creative work.
图1为本申请实施例提供的一种图像处理系统的结构示意图;FIG1 is a schematic diagram of the structure of an image processing system provided in an embodiment of the present application;
图2为本申请实施例提供的一种光照估计方法的流程示意图;FIG2 is a schematic diagram of a flow chart of a method for estimating illumination provided in an embodiment of the present application;
图3为本申请实施例提供的一种确定场景的光照表达的流程示意图;FIG3 is a schematic diagram of a process for determining a lighting expression of a scene provided in an embodiment of the present application;
图4为本申请实施例提供的一种光源位置信息提取网络的训练流程示意图;FIG4 is a schematic diagram of a training process of a light source position information extraction network provided in an embodiment of the present application;
图5为本申请实施例提供的一种光照估计装置的结构示意图;FIG5 is a schematic diagram of the structure of a lighting estimation device provided in an embodiment of the present application;
图6为本申请实施例提供的一种芯片的结构示意图;FIG6 is a schematic diagram of the structure of a chip provided in an embodiment of the present application;
图7为本申请实施例提供的一种电子设备的结构示意图;FIG7 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application;
图8为本申请实施例提供的另一种光照估计装置的结构示意图。FIG. 8 is a schematic diagram of the structure of another illumination estimation device provided in an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整的描述,显然,所描述的实施例仅仅是本申请实施例一部分实施例,而不是全部的实施例。基于本申请实施例中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请实施例保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the embodiments of the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the embodiments of the present application.
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。The term "and/or" in this article is merely a description of the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone.
本申请实施例的说明书以及附图中的术语“第一”和“第二”等是用于区别不同的对象,或者用于区别对同一对象的不同处理,而不是用于描述对象的特定顺序。The terms "first" and "second" and the like in the description and drawings of the embodiments of the present application are used to distinguish different objects, or to distinguish different processing of the same object, rather than to describe a specific order of objects.
此外,本申请实施例的描述中所提到的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选的还包括其他没有列出的步骤或单元,或可选的还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。In addition, the terms "including" and "having" and any variations thereof mentioned in the description of the embodiments of the present application are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device including a series of steps or units is not limited to the listed steps or units, but may optionally include other steps or units that are not listed, or may optionally include other steps or units that are inherent to these processes, methods, products or devices.
需要说明的是,本申请实施例的描述中,“示例性地”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性地”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优先或更具优势。确切而言,使用“示例性地”或者“例如”等词旨在以具体方式呈现相关概念。It should be noted that, in the description of the embodiments of the present application, words such as "exemplarily" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described as "exemplarily" or "for example" in the embodiments of the present application should not be interpreted as having priority or advantage over other embodiments or designs. Specifically, the use of words such as "exemplarily" or "for example" is intended to present related concepts in a specific way.
首先对本申请实施例涉及的一些术语进行解释说明。First, some terms involved in the embodiments of the present application are explained.
体素,是体积元素(volume pixel)的简称,包含体素的立体可以通过立体渲染或者提取给定阈值轮廓的多边形等值面表现出来。一如其名,是数字数据于三维空间分割上的最小单位,体素用于三维成像。Voxel is the abbreviation of volume pixel. A volume containing voxels can be displayed by stereo rendering or extracting polygonal isosurfaces with a given threshold contour. As the name suggests, it is the smallest unit of digital data in three-dimensional space segmentation. Voxel is used for three-dimensional imaging.
三维重建,是指基于采集到的地理世界中三维物体的数据,利用计算机方法和/或数学方法将该三维物体建立为适合计算机表示和处理的三维数学模型的过程。Three-dimensional reconstruction refers to the process of using computer and/or mathematical methods to establish a three-dimensional mathematical model suitable for computer representation and processing based on the data of three-dimensional objects in the geographical world collected.
三维模型:三维模型是物体的多边形表示,通常用计算机或者其它视频设备进行显示。显示的物体可以是现实世界的实体,也可以是虚构的物体。任何物理自然界存在的东西都可以用三维模型表示。三维模型的数据存储形式有多种,例如以三维点云、网格或体元等形式表示,具体此处不做限定3D model: A 3D model is a polygonal representation of an object, usually displayed by a computer or other video device. The displayed object can be a real-world entity or a fictional object. Anything that exists in the physical world can be represented by a 3D model. There are many forms of data storage for 3D models, such as 3D point clouds, meshes, or voxels, which are not limited here.
相机位姿:位姿即相机在空间中的位置和相机的姿态,可以分别看作相机从原始参考位置到当前位置的平移变换和旋转变换。类似的,本申请中物体的位姿即,物体在空间中的位置和物体的姿态。Camera pose: The pose is the position of the camera in space and the attitude of the camera, which can be regarded as the translation and rotation transformation of the camera from the original reference position to the current position. Similarly, the pose of an object in this application is the position of the object in space and the attitude of the object.
位置编码(position embedding,PE):指通过高频函数对原始的低维位置坐标进行映射得到高维位置坐标,从而使位置坐标带有更多的高频信息。Position encoding (PE): refers to mapping the original low-dimensional position coordinates through high-frequency functions to obtain high-dimensional position coordinates, so that the position coordinates carry more high-frequency information.
相机外参:即相机的外参数,世界坐标系与相机坐标系之间的转换关系,包括位移参数和旋转参数,根据相机外参可以确定相机位姿。Camera extrinsic parameters: the external parameters of the camera, the conversion relationship between the world coordinate system and the camera coordinate system, including displacement parameters and rotation parameters. The camera pose can be determined based on the camera extrinsic parameters.
三线性插值,是在三维离散采样数据的张量积网格上进行线性插值的方法。这个张量积网格可能 在每一维度上都有任意不重叠的网格点,但并不是三角化的有限元分析网格。这种方法通过网格上数据点在局部的矩形棱柱上线性地近似计算点(x,y,z)的值。三线性插值经常用于数值分析、数据分析以及计算机图形学等领域。Trilinear interpolation is a method of linear interpolation on a tensor product grid of three-dimensional discrete sampled data. This tensor product grid may be There are arbitrary non-overlapping grid points in each dimension, but it is not a triangulated finite element analysis grid. This method approximates the value of a point (x, y, z) linearly on a local rectangular prism by using the data points on the grid. Trilinear interpolation is often used in numerical analysis, data analysis, and computer graphics.
语义分割(semantic segmentation),是计算机视觉(computer vision,CV)领域的一个基础研究方向,它可以对图像中的每个像素给出具体的类别,例如它可以分析一张图片或者一段视频流中的物体,并逐像素标记出来其所属类别。语义分割被广泛应用于自动驾驶、智慧城市和医疗图像处理等诸多领域。Semantic segmentation is a basic research direction in the field of computer vision (CV). It can give a specific category to each pixel in an image. For example, it can analyze objects in a picture or a video stream and mark their categories pixel by pixel. Semantic segmentation is widely used in many fields such as autonomous driving, smart cities and medical image processing.
类别:通过深度学习可以进行图像识别,识别图像中物体的类别,即物体分类,物体类别例如可以是:桌子、椅子、猫、狗、汽车等等。Category: Deep learning can be used to perform image recognition and identify the categories of objects in the image, that is, object classification. Object categories can be, for example: table, chair, cat, dog, car, etc.
神经网络,神经网络可以是由神经单元组成的,神经单元可以是指以xs(即输入数据)和截距。1为输入的运算单元,该运算单元的输出可以为:
Neural network, a neural network can be composed of neural units, and a neural unit can refer to an operation unit with xs (i.e., input data) and intercept 1 as input, and the output of the operation unit can be:
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。Where s=1, 2, ...n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into the output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting multiple single neural units mentioned above, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be an area composed of several neural units.
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取特征的方式与位置无关。卷积核可以以随机大小的矩阵的形式化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。Convolutional neural network (CNN) is a deep neural network with a convolutional structure. Convolutional neural network contains a feature extractor consisting of a convolution layer and a subsampling layer, which can be regarded as a filter. Convolutional layer refers to the neuron layer in the convolutional neural network that performs convolution processing on the input signal. In the convolutional layer of the convolutional neural network, a neuron can only be connected to some neurons in the adjacent layers. A convolutional layer usually contains several feature planes, each of which can be composed of some rectangularly arranged neural units. The neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract features is independent of position. Convolution kernels can be formalized as matrices of random sizes, and convolution kernels can obtain reasonable weights through learning during the training process of convolutional neural networks. In addition, the direct benefit of shared weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
损失函数:在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。Loss function: In the process of training a deep neural network, because we hope that the output of the deep neural network is as close as possible to the value we really want to predict, we can compare the predicted value of the current network with the target value we really want, and then update the weight vector of each layer of the neural network according to the difference between the two (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the deep neural network). For example, if the predicted value of the network is high, adjust the weight vector to make it predict a lower value, and keep adjusting until the deep neural network can predict the target value we really want or a value very close to the target value we really want. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function or objective function, which are important equations used to measure the difference between the predicted value and the target value. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, so the training of the deep neural network becomes a process of minimizing this loss as much as possible.
多层感知器(multi layer perceptron,MLP)是一种前馈人工神经网络模型,其将输入的多个数据集映射到单一的输出的数据集上。它最主要的特点是有多个神经元层,因此也叫深度神经网络(deep neural networks,DNN)。感知机是单个神经元模型,是较大神经网络的前身。神经网络的强大之处在于它们能够学习训练数据中的表示,以及如何将其与想要预测的输出变量联系起来。从数学上讲,它们能够学习任何映射函数,并且已经被证明是一种通用的近似算法。神经网络的预测能力来自网络的分层或多层结构。而多层感知机是指具有至少三层节点,输入层,一些中间层和输出层的神经网络。给定层中的每个节点都连接到相邻层中的每个节点。输入层接收数据,中间层计算数据,输出层输出结果。A multi-layer perceptron (MLP) is a feed-forward artificial neural network model that maps multiple input data sets to a single output data set. Its main feature is that it has multiple layers of neurons, so it is also called deep neural networks (DNN). Perceptrons are single neuron models and are the predecessors of larger neural networks. The power of neural networks lies in their ability to learn representations in the training data and how to relate them to the output variables that you want to predict. Mathematically, they are able to learn any mapping function and have been proven to be a universal approximation algorithm. The predictive power of neural networks comes from the hierarchical or multi-layered structure of the network. A multi-layer perceptron is a neural network with at least three layers of nodes, an input layer, some intermediate layers, and an output layer. Each node in a given layer is connected to every node in the adjacent layer. The input layer receives data, the intermediate layers calculate the data, and the output layer outputs the results.
红绿蓝(red green blue,RGB)色彩模式是工业界的一种颜色标准,是通过对红(R)、绿(G)、蓝(B)三个颜色通道的变化以及它们相互之间的叠加来得到各式各样的颜色的,RGB即是代表红、绿、蓝三个通道的颜色。The red, green, blue (RGB) color model is a color standard in the industry. A variety of colors are obtained by changing the three color channels of red (R), green (G), and blue (B) and superimposing them on each other. RGB represents the colors of the three channels: red, green, and blue.
三线性插值,是在三维离散采样数据的张量积网格上进行线性插值的方法。这个张量积网格可能 在每一维度上都有任意不重叠的网格点,但并不是三角化的有限元分析网格。这种方法通过网格上数据点在局部的矩形棱柱上线性地近似计算点(x,y,z)的值。三线性插值经常用于数值分析、数据分析以及计算机图形学等领域。Trilinear interpolation is a method of linear interpolation on a tensor product grid of three-dimensional discrete sampled data. This tensor product grid may be There are arbitrary non-overlapping grid points in each dimension, but it is not a triangulated finite element analysis grid. This method approximates the value of a point (x, y, z) linearly on a local rectangular prism by using the data points on the grid. Trilinear interpolation is often used in numerical analysis, data analysis, and computer graphics.
随着科技的进步,增强现实AR等图像技术已经逐渐走进人们的日常生活。AR等图像技术的普及,极大提升了人们获取信息的效率和体验。AR等图像技术在教育培训、军事、医疗、娱乐和生产制造等行业中被广泛应用。高度真实感的虚实融合是AR等图像技术的核心需求。将虚拟物体的光源和光照模型与真实场景保持一致,可以获得具有真实感的渲染效果。With the advancement of science and technology, augmented reality (AR) and other image technologies have gradually entered people's daily lives. The popularity of AR and other image technologies has greatly improved people's efficiency and experience in obtaining information. AR and other image technologies are widely used in education and training, military, medical, entertainment, and manufacturing industries. Highly realistic virtual-real fusion is the core requirement of AR and other image technologies. Keeping the light source and lighting model of virtual objects consistent with the real scene can obtain a realistic rendering effect.
然而,实际场景的光源往往是未知的,需要根据场景图片等信息来估计场景中的光源,进而获得较为真实的AR体验,准确的光照估计可以提高光照模型与真实场景的相似度。However, the light source of the actual scene is often unknown, and it is necessary to estimate the light source in the scene based on information such as scene pictures to obtain a more realistic AR experience. Accurate lighting estimation can improve the similarity between the lighting model and the real scene.
为此,本申请实施例提供了一种光照估计方法,能够光照估计的准确性。该方法适用于图像处理系统,图1示出了该图像处理系统的一种可能的存在形式。To this end, an embodiment of the present application provides a method for illumination estimation, which can improve the accuracy of illumination estimation. The method is applicable to an image processing system, and FIG1 shows a possible existence form of the image processing system.
如图1所示该图像处理系统包括:终端10和服务器20。As shown in FIG. 1 , the image processing system includes a terminal 10 and a server 20 .
本申请实施例中的终端100可以为手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等,本申请实施例对此不作任何限制。The terminal 100 in the embodiment of the present application can be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), etc., and the embodiment of the present application does not impose any limitation on this.
在一种可能的实现方式中,终端10可以包括传感器单元11、计算单元12、存储单元13、网络传输单元14和交互单元15。In a possible implementation, the terminal 10 may include a sensor unit 11 , a computing unit 12 , a storage unit 13 , a network transmission unit 14 , and an interaction unit 15 .
传感器单元11,通常包括视觉传感器(如相机等),用于获取场景的二维(2D)图像信息;惯性导航模块(如惯性测量单元(inertial measurement unit,IMU)等),用于获取移动设备不同时刻的相对位姿关系,用于后续获取初始几何模型。The sensor unit 11 usually includes a visual sensor (such as a camera, etc.) for acquiring two-dimensional (2D) image information of the scene; an inertial navigation module (such as an inertial measurement unit (IMU)), etc.) for acquiring the relative posture relationship of the mobile device at different times, which is used for subsequently acquiring an initial geometric model.
计算单元12,可以包括中央处理器(central processing unit,CPU),应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等各类处理器以及缓冲和寄存器,主要用于运行移动端做系统,并处理本申请实施例所设计的各算法模块,如光照估计方法等。The computing unit 12 may include a central processing unit (CPU), an application processor (AP), a modem processor, a graphics processing unit (GPU), an image signal processor (ISP), a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU) and other processors as well as buffers and registers, and is mainly used to run the mobile terminal system and process the various algorithm modules designed in the embodiments of the present application, such as the illumination estimation method.
存储单元13,主要包括内存和外部存储模块(如硬盘等),主要用于算法数据、用户本地和临时数据的存储和读写等。The storage unit 13 mainly includes a memory and an external storage module (such as a hard disk, etc.), and is mainly used for storing and reading and writing algorithm data, user local and temporary data, etc.
网络传输单元14,主要包括上传和下载模块,图像、视频、三维模型和光照信息的编解码模块等。The network transmission unit 14 mainly includes upload and download modules, encoding and decoding modules for images, videos, three-dimensional models and lighting information, etc.
交互单元15,主要包括显示器、触摸板、振动器、音频设备(如扬声器、麦克风等音频设备)等,主要用于和用户进行交互,获取用户输入,并向用户呈现算法效果等。The interaction unit 15 mainly includes a display, a touch panel, a vibrator, an audio device (such as a speaker, a microphone and other audio devices), etc., and is mainly used to interact with the user, obtain user input, and present the algorithm effect to the user.
在一种可能的实现方式中,服务器20可以包括计算单元21、存储单元22和网络传输单元23。In a possible implementation, the server 20 may include a computing unit 21 , a storage unit 22 , and a network transmission unit 23 .
计算单元21,可以包括中央处理器,应用处理器,调制解调处理器,图形处理器,图像信号处理器,视频编解码器,数字信号处理器,基带处理器,和/或神经网络处理器等各类处理器以及缓冲和寄存器,主要用于运行服务器操作系统,并处理本申请实施例所设计的各算法模块,如光照估计方法等。The computing unit 21 may include various processors such as a central processing unit, an application processor, a modem processor, a graphics processor, an image signal processor, a video codec, a digital signal processor, a baseband processor, and/or a neural network processor, as well as buffers and registers, and is mainly used to run a server operating system and process various algorithm modules designed in the embodiments of the present application, such as an illumination estimation method.
存储单元22,主要包括内存和外部存储模块(如硬盘等),主要用于存储网络模型和参数。The storage unit 22 mainly includes a memory and an external storage module (such as a hard disk, etc.), and is mainly used to store network models and parameters.
网络传输单元23,主要包括上传和下传模块,图像、视频、三维模型和光照信息的编解码等。The network transmission unit 23 mainly includes upload and download modules, encoding and decoding of images, videos, three-dimensional models and lighting information, etc.
图2示出了本申请实施例提供的一种光照估计方法,该方法包括:FIG2 shows a method for estimating illumination provided by an embodiment of the present application, the method comprising:
S201、获取场景的三维模型。S201: Obtain a three-dimensional model of a scene.
其中,场景可以包括一种或多种对象。例如,场景可以包括人物、植物、动物、建筑物等一种或多种对象。The scene may include one or more objects, such as people, plants, animals, buildings, etc.
在一种可能的实现方式中,可以根据场景的图像得到场景的三维模型。例如,终端10上可以安装有三维重建类应用程序,或者打开与三维重建或者基于三维重建结果进行下游任务相关的网页,上述应用程序和网页可以提供一个界面,终端10可以接收用户在三维重建或者基于三维重建结果进行下游任务界面上输入的相关参数,并将上述参数发送至服务器20,服务器20可以基于接收到的参数,得到处理结果,并将处理结果返回至终端10。应理解,在一些可选的实现中,终端10也可以由自身完成基于接收到的参数,得到数据处理结果,而不需要服务器配合实现,本申请实施例并不限定。In a possible implementation, a three-dimensional model of a scene can be obtained based on the image of the scene. For example, a three-dimensional reconstruction application can be installed on the terminal 10, or a webpage related to three-dimensional reconstruction or downstream tasks based on three-dimensional reconstruction results can be opened. The above application and webpage can provide an interface, and the terminal 10 can receive the relevant parameters entered by the user on the three-dimensional reconstruction or downstream task interface based on the three-dimensional reconstruction results, and send the above parameters to the server 20. The server 20 can obtain the processing results based on the received parameters and return the processing results to the terminal 10. It should be understood that in some optional implementations, the terminal 10 can also complete the data processing results based on the received parameters by itself without the need for the server to cooperate in the implementation, and the embodiments of the present application are not limited.
示例性地,可以通过采集设备(终端、单反相机、监控相机、立体视觉相机或其他采集设备)对 场景进行拍摄以得到场景的多视角图像(但不限于各种格式的RGB图像)。具体可以对场景四周进行环视拍摄,获得多视角图像。为保证后续的三维重建质量,拍摄的图像需要各视角之间具有一定的共视区域,且尽量没有死角。拍摄时保证相机平稳,避免图像产生模糊等现象。将场景的多视角图像作为输入,通过三维重建算法(如运动恢复结构(structure from motion,SFM)算法),重建得到场景的三维模型。For example, the data can be collected by a collection device (terminal, SLR camera, surveillance camera, stereo vision camera or other collection device). The scene is photographed to obtain multi-view images of the scene (but not limited to RGB images of various formats). Specifically, the scene can be photographed around to obtain multi-view images. To ensure the quality of subsequent 3D reconstruction, the captured images need to have a certain common viewing area between each perspective, and there should be no blind spots as much as possible. The camera should be kept stable during shooting to avoid image blur. The multi-view images of the scene are used as input, and a 3D model of the scene is reconstructed through a 3D reconstruction algorithm (such as the structure from motion (SFM) algorithm).
例如,可以通过采集设备采集房间的多视角图像。然后通过三维重建算法以采集的房间多视角图像作为参考图像进行三维重建以得到房间的三维模型。For example, a multi-view image of a room may be collected by a collection device, and then a three-dimensional reconstruction algorithm is used to perform three-dimensional reconstruction using the collected multi-view images of the room as reference images to obtain a three-dimensional model of the room.
S202、根据上述三维模型确定上述场景的目标信息。S202: Determine target information of the scene according to the three-dimensional model.
其中,上述目标信息包括光源位置信息,上述光源位置信息用于指示上述场景中光源的位置。The target information includes light source position information, and the light source position information is used to indicate the position of the light source in the scene.
需要说明的是,光源位置信息指示的光源包括已激活光源和未激活光照。其中,已激活光源包括但不限于正在发光的光源(如打开的灯、太阳、打开的显示器),未激活光源包括但不限于未发光的光源(如关闭的灯、关闭的显示器)和具备光源几何形状的物体(如灯形状的装饰品、灯形状的雕塑等)。It should be noted that the light sources indicated by the light source position information include activated light sources and inactivated light sources. Among them, activated light sources include but are not limited to light sources that are emitting light (such as turned-on lights, the sun, and turned-on displays), and inactivated light sources include but are not limited to light sources that are not emitting light (such as turned-off lights and turned-off displays) and objects with the geometric shape of light sources (such as lamp-shaped ornaments, lamp-shaped sculptures, etc.).
在一种可能的实现方式中,可以将上述三维模型输入第一网络模型以得到上述光源位置信息。In a possible implementation, the three-dimensional model may be input into the first network model to obtain the light source position information.
其中,上述第一网络模型用于根据输入的三维模型输入对应的光源位置信息。第一网络模型可以通过多个场景的三维模型和每个三维模型对应的编码特征训练得到的。编码特征包括编码点特征和目标文本特征。The first network model is used to input the corresponding light source position information according to the input three-dimensional model. The first network model can be trained by three-dimensional models of multiple scenes and coding features corresponding to each three-dimensional model. The coding features include coding point features and target text features.
在一种可能的实现方式中,上述目标信息还可以包括位置编码信息,上述位置编码信息用于指示上述场景在高维空间中的表示。In a possible implementation, the target information may further include position coding information, where the position coding information is used to indicate a representation of the scene in a high-dimensional space.
可以理解的是,由于场景中的点在三维空间中仅需要x、y、z三个分量表示,其无法表达高频分量。而位置编码信息指示了场景在高维空间中的表示,具备表达高频分量能力。因此相比于仅通过场景的三维位置进行光照估计,本申请实施例提供的光照估计方法,通过在进行光照估计时引入了描述场景中在高维空间中的表示的位置编码信息,补充了更多的高频分量,使得光照估计过程不会过于平滑,更易于刻画场景在高频变化情况下的光照情况,由此提高了光照估计的准确性。It is understandable that since the points in the scene only need three components x, y, and z to represent in three-dimensional space, it is unable to express high-frequency components. The position coding information indicates the representation of the scene in high-dimensional space and has the ability to express high-frequency components. Therefore, compared to lighting estimation only through the three-dimensional position of the scene, the lighting estimation method provided in the embodiment of the present application introduces position coding information describing the representation of the scene in high-dimensional space when performing lighting estimation, supplements more high-frequency components, so that the lighting estimation process will not be too smooth, and it is easier to characterize the lighting conditions of the scene under high-frequency changes, thereby improving the accuracy of lighting estimation.
在一种可能的实现方式中,可以根据场景的位置和位置编码系数确定场景的位置编码。In a possible implementation manner, the position coding of the scene may be determined according to the position of the scene and a position coding coefficient.
示例性地,可以根据场景中的点的三维表示和位置编码系数根据位置编码公式确定场景的位置编码。Exemplarily, the position coding of the scene may be determined according to the three-dimensional representation of the points in the scene and the position coding coefficients according to the position coding formula.
例如,位置编码公式可以满足:
pe(x,y,z)=[sin(w0x),cos(w0x),…,sin(wnx),cos(wnx),
sin(w0y),cos(w0y),…,sin(wny),cos(wny),
sin(w0z),cos(w0z),…,sin(wnz),cos(wnz)]
For example, the position encoding formula can satisfy:
pe(x,y,z)=[sin(w 0 x),cos(w 0 x),…,sin(w n x),cos(w n x),
sin(w 0 y),cos(w 0 y),…,sin(w n y),cos(w n y),
sin(w 0 z),cos(w 0 z),…,sin(w n z),cos(w n z)]
其中,pe(x,y,z)分量表示用于表征场景的点在高维空间中的位置,x,y,z分量表示用于表征场景的点在三维空间中的位置。n为正数,n的取值可以与划分场景的体素网格分辨率相关(即可以与场景的体素数量)在本申请实施例中n可以为9。The pe(x, y, z) component represents the position of the point used to represent the scene in the high-dimensional space, and the x, y, z components represent the position of the point used to represent the scene in the three-dimensional space. n is a positive number, and the value of n can be related to the resolution of the voxel grid that divides the scene (that is, it can be related to the number of voxels in the scene). In the embodiment of the present application, n can be 9.
wn为位置编码系数,在本申请实施例中wn可以为2的n次方,相应的,w0可以为2的0次方,w5可以为2的5次方。w n is a position coding coefficient. In the embodiment of the present application, w n can be 2 to the power of n. Correspondingly, w 0 can be 2 to the power of 0, and w 5 can be 2 to the power of 5.
在一种可能的实现方式中,上述目标信息还包括光照特征向量,上述光照特征向量用于指示光照在上述场景中的分布情况。In a possible implementation, the target information further includes a lighting feature vector, and the lighting feature vector is used to indicate the distribution of lighting in the scene.
可以理解的是,本申请实施例提供的光照估计方法,通过在进行光照估计时引入了描述光照在上述场景中的分布情况的光照特征向量,这对解空间给出了更多约束,由此提高了光照估计的准确性。It can be understood that the illumination estimation method provided in the embodiment of the present application introduces an illumination feature vector that describes the distribution of illumination in the above-mentioned scene when performing illumination estimation, which gives more constraints to the solution space, thereby improving the accuracy of illumination estimation.
在一种可能的实现方式中,可以将上述三维模型输入第三网络模型以得到上述光照特征向量。In a possible implementation, the three-dimensional model may be input into a third network model to obtain the illumination feature vector.
其中,第三网络模型可以通过多个场景的三维模型和每个三维模型对应的光照特征向量训练得到的。The third network model can be obtained by training three-dimensional models of multiple scenes and the illumination feature vector corresponding to each three-dimensional model.
在一种可能的实现方式中,上述目标信息还包括上述场景的光照颜色。光照颜色用于指示场景中光源的颜色,即场景的材质信息。 In a possible implementation, the target information further includes the lighting color of the scene. The lighting color is used to indicate the color of the light source in the scene, that is, the material information of the scene.
在一种可能的实现方式中,可以将上述三维模型输入第三网络模型以得到上述光照颜色。In a possible implementation, the three-dimensional model may be input into a third network model to obtain the lighting color.
其中,第三网络模型可以通过多个场景的三维模型和每个三维模型对应的光照颜色训练得到的。Among them, the third network model can be obtained by training three-dimensional models of multiple scenes and the lighting color corresponding to each three-dimensional model.
在一种可能的实现方式中,可以将上述三维模型输入第三网络模型以得到上述光照特征向量和光照颜色。In a possible implementation, the three-dimensional model may be input into a third network model to obtain the illumination feature vector and illumination color.
其中,第三网络模型可以通过多个场景的三维模型和每个三维模型对应的光照特征向量和光照颜色训练得到的。Among them, the third network model can be obtained by training three-dimensional models of multiple scenes and the illumination feature vector and illumination color corresponding to each three-dimensional model.
在一种可能的实现方式中,可以将上述三维模型分割为多个体素;根据上述光照特征向量确定上述多个体素的光照特征向量。In a possible implementation manner, the three-dimensional model may be segmented into a plurality of voxels; and the illumination feature vectors of the plurality of voxels may be determined according to the illumination feature vectors.
示例性地,可以将场景的三维模型输入第三网络模型以得到三维模型的光照特征向量。将场景的三维模型分割为多个体素(例如,可以按照10厘米的分辨率将场景的三维模型划分为多个体素)。根据三维模型的光照特征向量确定每个体素中存储的光照特征向量。通过三线性插值确定每个体素的光照特征向量。对于每一体素的光照特征向量,可以通过获取其周围的8个体素的光照特征向量进行三线性插值获得。Exemplarily, the three-dimensional model of the scene can be input into the third network model to obtain the illumination feature vector of the three-dimensional model. The three-dimensional model of the scene is divided into a plurality of voxels (for example, the three-dimensional model of the scene can be divided into a plurality of voxels at a resolution of 10 cm). The illumination feature vector stored in each voxel is determined according to the illumination feature vector of the three-dimensional model. The illumination feature vector of each voxel is determined by trilinear interpolation. For each illumination feature vector of a voxel, it can be obtained by obtaining the illumination feature vectors of the eight voxels around it and performing trilinear interpolation.
在一种可能的实现方式中,可以将上述三维模型分割为多个体素;根据上述光照颜色确定上述多个体素的光照颜色。In a possible implementation manner, the three-dimensional model may be segmented into a plurality of voxels; and the illumination colors of the plurality of voxels may be determined according to the illumination colors.
示例性地,可以将场景的三维模型输入第三网络模型以得到三维模型的光照颜色。将场景的三维模型分割为多个体素(如可以按照10厘米分辨率将场景的三维模型划分为多个体素)。根据三维模型的光照颜色(diffuse)确定每个体素中存储的光照颜色。通过三线性插值确定每个体素的光照颜色。对于每一体素的光照颜色,可以通过获取其周围的8个体素的光照颜色进行三线性插值获得。Exemplarily, the three-dimensional model of the scene can be input into the third network model to obtain the illumination color of the three-dimensional model. The three-dimensional model of the scene is divided into a plurality of voxels (e.g., the three-dimensional model of the scene can be divided into a plurality of voxels at a resolution of 10 cm). The illumination color stored in each voxel is determined according to the illumination color (diffuse) of the three-dimensional model. The illumination color of each voxel is determined by trilinear interpolation. For the illumination color of each voxel, the illumination color can be obtained by obtaining the illumination colors of the eight voxels around it and performing trilinear interpolation.
S203、根据场景的目标信息确定场景的光照强度。S203: Determine the illumination intensity of the scene according to the target information of the scene.
其中,光照强度用于指示场景中光照的强弱,光照强度的范围可以为0~1之间。场景的光照强度越高则场景中光照越强。The light intensity is used to indicate the intensity of the light in the scene, and the range of the light intensity may be between 0 and 1. The higher the light intensity of the scene, the stronger the light in the scene.
在一种可能的实现方式中,可以将上述目标信息输入第二网络模型以得到上述光照强度,上述第二网络模型用于根据输入的目标信息输出对应的光照强度。第二网络模型可以由场景的渲染图像和对应场景的参考图像训练得到。In a possible implementation, the target information may be input into a second network model to obtain the light intensity, and the second network model is used to output the corresponding light intensity according to the input target information. The second network model may be trained by a rendered image of a scene and a reference image of the corresponding scene.
在一种可能的实现方式中,场景的光照强度可以满足:
Ie=Θl(w0,zs,zl,zx)
In one possible implementation, the illumination intensity of the scene can satisfy:
I e = Θ l ( w 0 , z s , z l , z x )
其中,Ie为场景或体素的光照强度,Θl为第二网络模型(也可称为神经场或光照神经场),w0为光照发射方向,w0可以由渲染角度确定,zs为场景的场景光源位置信息,zl为场景或场景体素的光照特征向量,zx为场景或场景体素的位置编码。Among them, Ie is the illumination intensity of the scene or voxel, Θl is the second network model (also called the neural field or illumination neural field), w0 is the illumination emission direction, w0 can be determined by the rendering angle, zs is the scene light source position information of the scene, zl is the illumination feature vector of the scene or scene voxel, and zx is the position code of the scene or scene voxel.
在一种可能的实现方式中,还可以根据上述光照强度和上述光照颜色确定上述场景的光照表达,上述光照表达用于指示上述场景的光照颜色和光照强度。In a possible implementation manner, the lighting expression of the scene may also be determined according to the lighting intensity and the lighting color, where the lighting expression is used to indicate the lighting color and lighting intensity of the scene.
在一种可能的实现方式中,光照表达可以满足:
Le=Ie*ce
In one possible implementation, the lighting expression can satisfy:
Le = Ie * ce
其中,Le为场景或场景体素的光照表达,Ie为场景或场景体素的光照强度,ce为场景或场景体素的光照颜色。Among them, Le is the lighting expression of the scene or scene voxel, Ie is the lighting intensity of the scene or scene voxel, and ce is the lighting color of the scene or scene voxel.
示例性地,如图3所示,场景的三维模型输入第三网络模型可以得到场景的光照信息fl,场景的光照信息包括场景的光照特征向量zl和光照颜色ce。场景的三维模型输入第一网络模型可以得到场景的光源位置信息zs。场景的三维模型通过位置编码可以得到场景的位置编码zx。通过将得到的场景的光照特征向量zl、光源位置信息zs和位置编码zx,以及由渲染角度确定光照发射方向w0输入第二网络模型(光照神经场)可以得到场景的光照强度Ie(图中未示出),得到的场景的光照强度Ie与场景的光照颜色ce可以确定场景的光照表达LeExemplarily, as shown in FIG3 , the three-dimensional model of the scene is input into the third network model to obtain the illumination information fl of the scene, and the illumination information of the scene includes the illumination feature vector zl and illumination color c e of the scene. The three-dimensional model of the scene is input into the first network model to obtain the light source position information zs of the scene. The three-dimensional model of the scene can obtain the position code zx of the scene through position coding. By inputting the obtained illumination feature vector zl of the scene, the light source position information zs and the position code zx , and the illumination emission direction w0 determined by the rendering angle into the second network model (illumination neural field), the illumination intensity Ie of the scene (not shown in the figure) can be obtained, and the obtained illumination intensity Ie of the scene and the illumination color c e of the scene can determine the illumination expression L e of the scene.
相比于仅通过场景的图像进行光照估计,本申请实施例提供的光照估计方法,在进行光照估计时 额外引入了描述场景中光源位置的光源位置信息,由于灯光在不同场景下往往具有类似的几何特征。通过在光照估计时引入光源位置信息,这对解空间给出了更多约束,提高了光照估计的准确性和鲁棒性,并且缩小了渲染得到的场景CG数据和场景实际数据之间的域差距(domain gap),由此提高了光照估计的准确性。Compared with performing illumination estimation only through an image of a scene, the illumination estimation method provided in the embodiment of the present application performs illumination estimation. The light source position information describing the position of the light source in the scene is additionally introduced, because lights often have similar geometric features in different scenes. By introducing the light source position information during illumination estimation, more constraints are imposed on the solution space, improving the accuracy and robustness of illumination estimation, and narrowing the domain gap between the rendered scene CG data and the actual scene data, thereby improving the accuracy of illumination estimation.
可选地,本申请实施例提供的方法还可以包括:Optionally, the method provided in the embodiment of the present application may further include:
S204、将上述三维模型输入第一编码网络得到第一编码块。S204: Input the three-dimensional model into a first coding network to obtain a first coding block.
其中,上述第一编码块包括编码点特征。Wherein, the above-mentioned first coding block includes coding point features.
示例性地,可以将上述三维模型输入Unet网络进行编码以得到包含编码点特征(encoded point features)的第一编码块。Exemplarily, the above three-dimensional model can be input into the Unet network for encoding to obtain a first encoding block containing encoded point features.
S205、将上述三维模型输入第二编码网络得到第二编码块。S205: Input the above three-dimensional model into a second coding network to obtain a second coding block.
其中,上述第二编码块包括目标文本特征。Wherein, the second encoding block includes target text features.
示例性地,可以将上述三维模型输入目标文本编码器(如CLIP编码器)进行编码以得到包括目标文本特征(class features)的第二编码块。其中,上述目标文本可以包括光源和非光源。Exemplarily, the three-dimensional model may be input into a target text encoder (such as a CLIP encoder) for encoding to obtain a second encoding block including target text features (class features). The target text may include a light source and a non-light source.
S205、根据上述编码点特征和上述目标文本特征训练第一网络模型。S205: training a first network model according to the coding point features and the target text features.
其中,上述第一网络模型用于根据输入的三维模型输入对应的光源位置信息。第一网络模型也可称为光源位置信息提取网络。The first network model is used to input the corresponding light source position information according to the input three-dimensional model. The first network model can also be called a light source position information extraction network.
示例性地,如图4所示,可以将上述三维模型输入Unet进行编码以得到包含编码点特征(encoded point features)的第一编码块。将上述三维模型输入目标文本编码器进行编码以得到包括目标文本特征(class features)的第二编码块。之后编码点特征和目标文本特征将放在一个联合空间中,通过梯度下降优化两种特征之间的欧氏距离(L2距离),以训练第一网络模型。Exemplarily, as shown in FIG4 , the three-dimensional model can be input into Unet for encoding to obtain a first encoding block including encoded point features. The three-dimensional model is input into the target text encoder for encoding to obtain a second encoding block including target text features. The encoded point features and the target text features are then placed in a joint space, and the Euclidean distance (L2 distance) between the two features is optimized by gradient descent to train the first network model.
可选地,上述第一网络模型可以为残差网络(deep residual network,,ResNet)UNet(如Res16UNet34D)网络结构的第一网络模型。Optionally, the first network model mentioned above can be a first network model of a residual network (deep residual network, ResNet) or UNet (such as Res16UNet34D) network structure.
可选地,上述第一网络模型的预训练数据集可以为ScanNet或S3DIS数据集。Optionally, the pre-training data set of the first network model may be a ScanNet or S3DIS data set.
在一种可能的实现方式中,可以对上述第一网络模型的预训练数据集进行了微调(finetune)将预训练数据集中分类类别减少为光源和非光源两类。In a possible implementation, the pre-training data set of the first network model may be fine-tuned to reduce the classification categories in the pre-training data set to two categories: light source and non-light source.
可以看出,本申请实施例提供的方法可以通过场景三维模型的编码点特征和目标文本特征训练得到用于根据输入的三维模型输入对应的光源位置信息的第一网络模型。将场景的三维模型输入第一网络模型得到描述场景中光源位置的光源位置信息。由于灯光在不同场景下往往具有类似的几何特征。通过在光照估计时引入光源位置信息,这对解空间给出了更多约束,提高了光照估计的准确性和鲁棒性,并且缩小了渲染得到的场景CG数据和场景实际数据之间的domain gap,由此提高了光照估计的准确性。It can be seen that the method provided in the embodiment of the present application can obtain a first network model for inputting corresponding light source position information according to the input three-dimensional model through the coding point features and target text features of the scene three-dimensional model. The three-dimensional model of the scene is input into the first network model to obtain light source position information describing the position of the light source in the scene. Since lights often have similar geometric features in different scenes. By introducing light source position information during illumination estimation, this gives more constraints to the solution space, improves the accuracy and robustness of illumination estimation, and narrows the domain gap between the rendered scene CG data and the actual scene data, thereby improving the accuracy of illumination estimation.
S206、根据上述光照表达确定上述场景的渲染图像。S206: Determine a rendered image of the scene according to the lighting expression.
示例性地,可以通过渲染方程根据光照表达得到场景的渲染图像。Exemplarily, a rendered image of a scene may be obtained according to a lighting expression through a rendering equation.
例如,以渲染图像中的任一像素为例,它的颜色值是由入射到该像素上的光线决定的;而通过光线追踪,我们可以找到入射光线对应的光源(有可能会经过多次反射)。然后,可以通过上述光照表达从模型中获得该光源的颜色和亮度;再结合材质信息,通过渲染方程就可以计算这根光线投射到像素点后的颜色值。实际上,入射到某一点的光线会有无数条,因此我们通过采样的方式,即采样多条光线,每条光线重复上述追踪过程,把得到的颜色值进行叠加,就可以得到该像素最终的颜色值。For example, taking any pixel in a rendered image as an example, its color value is determined by the light incident on the pixel; and through ray tracing, we can find the light source corresponding to the incident light (which may have been reflected multiple times). Then, the color and brightness of the light source can be obtained from the model through the above lighting expression; combined with the material information, the color value of this ray after being projected onto the pixel can be calculated through the rendering equation. In fact, there are countless rays incident on a certain point, so we use sampling, that is, sampling multiple rays, repeating the above tracing process for each ray, and superimposing the obtained color values to obtain the final color value of the pixel.
S207、根据上述场景的渲染图像和上述场景的参考图像训练第二网络模型。S207: Train a second network model according to the rendered image of the scene and the reference image of the scene.
其中,上述第二网络模型用于根据场景的光源位置信息、位置编码和光照特征向量输出对应的光照强度。Among them, the above-mentioned second network model is used to output the corresponding light intensity according to the light source position information, position coding and lighting feature vector of the scene.
示例性地,可以根据上述场景的渲染图像和上述场景的参考图像之间的差值作为损失函数,通过梯度下降以训练第二网络模型。Exemplarily, the second network model can be trained by gradient descent based on the difference between the rendered image of the scene and the reference image of the scene as a loss function.
可以理解的是,通过根据上述场景的渲染图像和上述场景的参考图像训练第二网络模型,可以使得第二网络模型输出的光照强度更加接近真实场景的光照强度,由此进一步提高了光照估计的准确性。It can be understood that by training the second network model based on the rendered image of the above scene and the reference image of the above scene, the light intensity output by the second network model can be closer to the light intensity of the real scene, thereby further improving the accuracy of light estimation.
下面将结合图5介绍用于执行上述光照估计方法的光照估计装置。The following will introduce an illumination estimation device for executing the above illumination estimation method in conjunction with FIG. 5 .
可以理解的是,光照估计装置为了实现上述功能,其包含了执行各个功能相应的硬件和/或软件模块。结合本文中所公开的实施例描述的各示例的算法步骤,本申请实施例能够以硬件或硬件和计算机 软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以结合实施例对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。It is understandable that, in order to realize the above functions, the illumination estimation device includes hardware and/or software modules that perform the corresponding functions. Whether a function is executed in hardware or in a computer software-driven hardware manner depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application in combination with the embodiments, but such implementation should not be considered to be beyond the scope of the embodiments of this application.
本申请实施例可以根据上述方法示例对光照估计装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块可以采用硬件的形式实现。需要说明的是,本实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。The embodiment of the present application can divide the functional modules of the illumination estimation device according to the above method example. For example, each functional module can be divided according to each function, or two or more functions can be integrated into one processing module. The above integrated module can be implemented in the form of hardware. It should be noted that the division of modules in this embodiment is schematic and is only a logical function division. There may be other division methods in actual implementation.
在采用对应各个功能划分各个功能模块的情况下,图5示出了上述实施例中涉及的光照估计装置的一种可能的组成示意图,如图5所示,该光照估计装置500可以包括:收发单元501和处理单元502。In the case of dividing each functional module according to each function, FIG5 shows a possible composition diagram of the illumination estimation device involved in the above embodiment. As shown in FIG5 , the illumination estimation device 500 may include: a transceiver unit 501 and a processing unit 502 .
上述收发单元501,用于获取场景的三维模型。The above-mentioned transceiver unit 501 is used to obtain a three-dimensional model of the scene.
上述处理单元502,用于根据上述三维模型确定上述场景的目标信息,上述目标信息包括光源位置信息,上述光源位置信息用于指示上述场景中光源的位置。The processing unit 502 is used to determine target information of the scene according to the three-dimensional model. The target information includes light source position information. The light source position information is used to indicate the position of the light source in the scene.
上述处理单元502,还用于根据上述目标信息确定上述场景的光照强度。The processing unit 502 is further configured to determine the illumination intensity of the scene according to the target information.
在一种可能的实现方式中,上述处理单元502具体用于:将上述三维模型输入第一网络模型以得到上述光源位置信息,上述第一网络模型用于根据输入的三维模型输入对应的光源位置信息。In a possible implementation, the processing unit 502 is specifically used to: input the three-dimensional model into a first network model to obtain the light source position information, and the first network model is used to input the corresponding light source position information according to the input three-dimensional model.
在一种可能的实现方式中,上述处理单元502还用于:将上述三维模型输入第一编码网络得到第一编码块,上述第一编码块包括编码点特征;将上述三维模型输入第二编码网络得到第二编码块,上述第二编码块包括目标文本特征;根据上述编码点特征和上述目标文本特征训练第一网络模型,上述第一网络模型用于根据输入的三维模型输入对应的光源位置信息。In a possible implementation, the processing unit 502 is also used to: input the three-dimensional model into a first coding network to obtain a first coding block, wherein the first coding block includes coding point features; input the three-dimensional model into a second coding network to obtain a second coding block, wherein the second coding block includes target text features; train a first network model based on the coding point features and the target text features, wherein the first network model is used to input corresponding light source position information based on the input three-dimensional model.
在一种可能的实现方式中,上述目标信息还包括位置编码信息,上述位置编码信息用于指示上述场景在高维空间中的表示。In a possible implementation, the target information further includes position coding information, and the position coding information is used to indicate a representation of the scene in a high-dimensional space.
在一种可能的实现方式中,上述目标信息还包括光照特征向量,上述光照特征向量用于指示光照在上述场景中的分布情况。In a possible implementation, the target information further includes a lighting feature vector, and the lighting feature vector is used to indicate the distribution of lighting in the scene.
在一种可能的实现方式中,上述目标信息还包括上述场景的光照颜色,上述处理单元还用于:根据上述光照强度和上述光照颜色确定上述场景的光照表达,上述光照表达用于指示上述场景的光照颜色和光照强度。In a possible implementation, the target information also includes the lighting color of the scene, and the processing unit is further used to determine the lighting expression of the scene according to the lighting intensity and the lighting color, wherein the lighting expression is used to indicate the lighting color and lighting intensity of the scene.
在一种可能的实现方式中,上述处理单元502具体用于:将上述目标信息输入第二网络模型以得到上述光照强度,上述第二网络模型用于根据输入的目标信息输出对应的光照强度。In a possible implementation, the processing unit 502 is specifically used to: input the target information into a second network model to obtain the light intensity, and the second network model is used to output the corresponding light intensity according to the input target information.
在一种可能的实现方式中,上述处理单元502还用于:根据上述场景的光照表达确定上述场景的渲染图像,上述光照表达用于指示上述场景的光照颜色和光照强度;根据上述场景的渲染图像和上述场景的参考图像训练第二网络模型,上述第二网络模型用于根据输入的目标信息输出对应的光照强度。In a possible implementation, the processing unit 502 is further used to: determine a rendered image of the scene based on a lighting expression of the scene, wherein the lighting expression is used to indicate the lighting color and lighting intensity of the scene; train a second network model based on the rendered image of the scene and a reference image of the scene, wherein the second network model is used to output corresponding lighting intensity based on input target information.
在一种可能的实现方式中,上述处理单元502还用于:将上述三维模型分割为多个体素;根据上述场景的光照特征向量确定上述多个体素的光照特征向量,上述场景的光照特征向量用于指示光照在上述场景中的分布情况。In a possible implementation, the processing unit 502 is further used to: divide the three-dimensional model into multiple voxels; determine the illumination feature vectors of the multiple voxels based on the illumination feature vector of the scene, wherein the illumination feature vector of the scene is used to indicate the distribution of illumination in the scene.
在一种可能的实现方式中,上述处理单元502还用于:将上述三维模型分割为多个体素;根据上述场景的光照颜色确定上述多个体素的光照颜色。In a possible implementation, the processing unit 502 is further configured to: segment the three-dimensional model into a plurality of voxels; and determine the illumination colors of the plurality of voxels according to the illumination color of the scene.
本申请实施例还提供了一种芯片。图6示出了一种芯片600的结构示意图。芯片600包括一个或多个处理器601以及接口电路602。可选的,上述芯片600还可以包含总线603。The embodiment of the present application further provides a chip. FIG6 shows a schematic diagram of the structure of a chip 600. The chip 600 includes one or more processors 601 and an interface circuit 602. Optionally, the chip 600 may also include a bus 603.
处理器601可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述光照估计方法的各步骤可以通过处理器601中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 601 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned illumination estimation method may be completed by an integrated logic circuit of hardware in the processor 601 or by instructions in the form of software.
可选地,上述的处理器601可以是通用处理器、数字信号处理(digital signal processing,DSP)器、集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。Optionally, the processor 601 may be a general purpose processor, a digital signal processing (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The methods and steps disclosed in the embodiments of the present application may be implemented or executed. The general purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
接口电路602可以用于数据、指令或者信息的发送或者接收,处理器601可以利用接口电路602接收的数据、指令或者其他信息,进行加工,可以将加工完成信息通过接口电路602发送出去。 The interface circuit 602 can be used to send or receive data, instructions or information. The processor 601 can use the data, instructions or other information received by the interface circuit 602 to process, and can send the processing completion information through the interface circuit 602.
可选的,芯片还包括存储器,存储器可以包括只读存储器和随机存取存储器,并向处理器提供操作指令和数据。存储器的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。Optionally, the chip also includes a memory, which may include a read-only memory and a random access memory, and provides operation instructions and data to the processor. A portion of the memory may also include a non-volatile random access memory (NVRAM).
可选的,存储器存储了可执行软件模块或者数据结构,处理器可以通过调用存储器存储的操作指令(该操作指令可存储在操作系统中),执行相应的操作。Optionally, the memory stores executable software modules or data structures, and the processor can perform corresponding operations by calling operation instructions stored in the memory (the operation instructions can be stored in the operating system).
可选的,芯片可以使用在本申请实施例涉及的光照估计装置中。可选地,接口电路602可用于输出处理器601的执行结果。关于本申请实施例的一个或多个实施例提供的光照估计方法可参考前述各个实施例,这里不再赘述。Optionally, the chip can be used in the illumination estimation device involved in the embodiment of the present application. Optionally, the interface circuit 602 can be used to output the execution result of the processor 601. The illumination estimation method provided by one or more embodiments of the embodiment of the present application can refer to the aforementioned embodiments, which will not be repeated here.
需要说明的,处理器601、接口电路602各自对应的功能既可以通过硬件设计实现,也可以通过软件设计来实现,还可以通过软硬件结合的方式来实现,这里不做限制。It should be noted that the functions corresponding to the processor 601 and the interface circuit 602 can be implemented through hardware design, software design, or a combination of hardware and software, and there is no limitation here.
图7为本申请实施例提供的一种电子设备的结构示意图,电子设备100可以为手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)、光照估计装置或者光照估计装置中的芯片或者功能模块。7 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application. The electronic device 100 may be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), an illumination estimation device, or a chip or functional module in an illumination estimation device.
示例性的,图7是本申请实施例提供的一例电子设备100的结构示意图。电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。Exemplarily, FIG7 is a schematic diagram of the structure of an electronic device 100 provided in an embodiment of the present application. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, and a subscriber identification module (SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, etc.
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以硬件,软件或软件和硬件的组合实现。It is to be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown in the figure, or combine some components, or split some components, or arrange the components differently. The components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (AP), a modem processor, a graphics processor (GPU), an image signal processor (ISP), a controller, a memory, a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU), etc. Different processing units may be independent devices or integrated in one or more processors.
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The controller may be the nerve center and command center of the electronic device 100. The controller may generate an operation control signal according to the instruction operation code and the timing signal to complete the control of fetching and executing instructions.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(i nter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the processor 110 may include one or more interfaces. The interface may include an inter-integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a pulse code modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a mobile industry processor interface (MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (SIM) interface, and/or a universal serial bus (USB) interface, etc.
其中,I2C接口是一种双向同步串行总线,处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。Among them, the I2C interface is a bidirectional synchronous serial bus. The processor 110 can couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to realize the touch function of the electronic device 100. The MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193. The MIPI interface includes a camera serial interface (CSI), a display serial interface (DSI), etc. In some embodiments, the processor 110 and the camera 193 communicate through the CSI interface to realize the shooting function of the electronic device 100. The processor 110 and the display screen 194 communicate through the DSI interface to realize the display function of the electronic device 100.
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接 口连接方式,或多种接口连接方式的组合。It is understandable that the interface connection relationship between the modules illustrated in the embodiment of the present application is only a schematic illustration and does not constitute a structural limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connections in the above embodiments. port connection mode, or a combination of multiple port connection modes.
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。The charging management module 140 is used to receive charging input from a charger. The charger can be a wireless charger or a wired charger. The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140, and provides power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device 100 implements the display function through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, which connects the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。The display screen 194 is used to display images, videos, etc. The display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (QLED), etc. In some embodiments, the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
电子设备100可以通过ISP,摄像头193,触摸传感器、视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。The electronic device 100 can realize the shooting function through ISP, camera 193, touch sensor, video codec, GPU, display screen 194 and application processor.
其中,ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将上述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。Among them, ISP is used to process the data fed back by camera 193. For example, when taking a photo, the shutter is opened, and the light is transmitted to the camera photosensitive element through the lens. The light signal is converted into an electrical signal, and the camera photosensitive element transmits the above electrical signal to ISP for processing and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on the noise, brightness, and skin color of the image. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, ISP can be set in camera 193.
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号,应理解,在本申请实施例的描述中,以RGB格式的图像为例进行介绍,本申请实施例对图像格式不作限定。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。The camera 193 is used to capture still images or videos. The object generates an optical image through the lens and projects it onto the photosensitive element. The photosensitive element can be a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV or other format. It should be understood that in the description of the embodiments of the present application, an image in RGB format is used as an example for introduction, and the embodiments of the present application do not limit the image format. In some embodiments, the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。The digital signal processor is used to process digital signals, and can process not only digital image signals but also other digital signals. For example, when the electronic device 100 is selecting a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。Video codecs are used to compress or decompress digital videos. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record videos in a variety of coding formats, such as Moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The internal memory 121 can be used to store computer executable program codes, which include instructions. The processor 110 executes various functional applications and data processing of the electronic device 100 by running the instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area.
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The electronic device 100 can implement audio functions such as music playing and recording through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, and the application processor.
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。SIM卡接口195用于连接SIM卡。The button 190 includes a power button, a volume button, etc. The button 190 can be a mechanical button. It can also be a touch button. The electronic device 100 can receive button input and generate key signal input related to the user settings and function control of the electronic device 100. The motor 191 can generate a vibration prompt. The motor 191 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback. For example, touch operations acting on different applications (such as taking pictures, audio playback, etc.) can correspond to different vibration feedback effects. For touch operations acting on different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects. The indicator 192 can be an indicator light, which can be used to indicate the charging status, power changes, and can also be used to indicate messages, missed calls, notifications, etc. The SIM card interface 195 is used to connect a SIM card.
需要指出的是,电子设备100可以是芯片系统或有图7中类似结构的设备。其中,芯片系统可以由芯片构成,也可以包括芯片和其他分立器件。本申请的各实施例之间涉及的动作、术语等均可以相互参考,不予限制。本申请的实施例中各个设备之间交互的消息名称或消息中的参数名称等只是一个示例,具体实现中也可以采用其他的名称,不予限制。此外,图7中示出的组成结构并不构成对该电 子设备100的限定,除图7所示部件之外,该电子设备100可以包括比图7所示更多或更少的部件,或者组合某些部件,或者不同的部件布置。It should be noted that the electronic device 100 may be a chip system or a device with a similar structure as shown in FIG. 7. The chip system may be composed of chips, or may include chips and other discrete devices. The actions, terms, etc. involved in the various embodiments of the present application may refer to each other without limitation. The message name or parameter name in the message exchanged between the various devices in the embodiments of the present application is only an example, and other names may be used in the specific implementation without limitation. In addition, the composition structure shown in FIG. 7 does not constitute a Definition of the sub-device 100, in addition to the components shown in FIG. 7, the electronic device 100 may include more or less components than those shown in FIG. 7, or combine certain components, or arrange the components differently.
本申请中描述的处理器和收发器可实现在集成电路(integrated circuit,IC)、模拟IC、射频集成电路、混合信号IC、专用集成电路(application specific integrated circuit,ASIC)、印刷电路板(printed circuit board,PCB)、电子设备等上。该处理器和收发器也可以用各种IC工艺技术来制造,例如互补金属氧化物半导体(complementary metal oxide semiconductor,CMOS)、N型金属氧化物半导体(nMetal-oxide-semiconductor,NMOS)、P型金属氧化物半导体(positive channel metal oxide semiconductor,PMOS)、双极结型晶体管(Bipolar Junction Transistor,BJT)、双极CMOS(BiCMOS)、硅锗(SiGe)、砷化镓(GaAs)等。The processor and transceiver described in the present application can be implemented in an integrated circuit (IC), an analog IC, a radio frequency integrated circuit, a mixed signal IC, an application specific integrated circuit (ASIC), a printed circuit board (PCB), an electronic device, etc. The processor and transceiver can also be manufactured using various IC process technologies, such as complementary metal oxide semiconductor (CMOS), N-type metal oxide semiconductor (NMOS), P-type metal oxide semiconductor (positive channel metal oxide semiconductor, PMOS), bipolar junction transistor (BJT), bipolar CMOS (BiCMOS), silicon germanium (SiGe), gallium arsenide (GaAs), etc.
图8为本申请实施例提供的一种光照估计装置的结构示意图。该光照估计装置可适用于上述方法实施例所示出的场景中。为了便于说明,图8仅示出了光照估计装置的主要部件,包括处理器801、存储器802、控制电路803、以及输入输出装置804。处理器801主要用于对通信协议以及通信数据进行处理,执行软件程序,处理软件程序的数据。存储器802主要用于存储软件程序和数据。控制电路803主要用于供电及各种电信号的传递。输入输出装置804主要用于接收用户输入的数据以及对用户输出数据。FIG8 is a schematic diagram of the structure of a lighting estimation device provided in an embodiment of the present application. The lighting estimation device can be applied to the scenario shown in the above method embodiment. For ease of explanation, FIG8 only shows the main components of the lighting estimation device, including a processor 801, a memory 802, a control circuit 803, and an input-output device 804. The processor 801 is mainly used to process communication protocols and communication data, execute software programs, and process data of software programs. The memory 802 is mainly used to store software programs and data. The control circuit 803 is mainly used for power supply and transmission of various electrical signals. The input-output device 804 is mainly used to receive data input by a user and output data to the user.
当该光照估计装置为处理器801时,该控制电路803可以为主板,存储器802包括硬盘,RAM,ROM等具有存储功能的介质,处理器801可以包括基带处理器801和中央处理器,基带处理器主要用于对通信协议以及通信数据进行处理,中央处理器主要用于对整个光照估计装置进行控制,执行软件程序,处理软件程序的数据,输入输出装置804包括显示屏、键盘和鼠标等;控制电路803可以进一步包括或连接收发电路或收发器,例如:网线接口等,用于发送或接收数据或信号,例如与其他设备进行数据传输及通信。进一步的,还可以包括天线,用于无线信号的收发,用于与其他设备进行数据/信号传输。When the illumination estimation device is a processor 801, the control circuit 803 can be a mainboard, the memory 802 includes a hard disk, RAM, ROM and other media with storage functions, the processor 801 can include a baseband processor 801 and a central processing unit, the baseband processor is mainly used to process the communication protocol and communication data, the central processing unit is mainly used to control the entire illumination estimation device, execute software programs, and process the data of the software programs, and the input and output device 804 includes a display screen, a keyboard and a mouse, etc.; the control circuit 803 can further include or be connected to a transceiver circuit or a transceiver, such as: a network cable interface, etc., for sending or receiving data or signals, such as data transmission and communication with other devices. Further, it can also include an antenna for sending and receiving wireless signals, and for data/signal transmission with other devices.
本申请实施例还提供一种光照估计装置,该装置包括:至少一个处理器,当上述至少一个处理器执行程序代码或指令时,实现上述相关方法步骤实现上述实施例中的光照估计方法。An embodiment of the present application also provides a lighting estimation device, which includes: at least one processor, when the at least one processor executes program code or instructions, the above-mentioned related method steps are implemented to implement the lighting estimation method in the above-mentioned embodiment.
可选地,该装置还可以包括至少一个存储器,该至少一个存储器用于存储该程序代码或指令。Optionally, the device may further include at least one memory, and the at least one memory is used to store the program code or instruction.
本申请实施例还提供一种计算机存储介质,该计算机存储介质中存储有计算机指令,当该计算机指令在光照估计装置上运行时,使得光照估计装置执行上述相关方法步骤实现上述实施例中的光照估计方法。An embodiment of the present application further provides a computer storage medium, in which computer instructions are stored. When the computer instructions are executed on a lighting estimation device, the lighting estimation device executes the above-mentioned related method steps to implement the lighting estimation method in the above-mentioned embodiment.
本申请实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述实施例中的光照估计方法。The embodiment of the present application also provides a computer program product. When the computer program product is run on a computer, the computer is enabled to execute the above-mentioned related steps to implement the illumination estimation method in the above-mentioned embodiment.
本申请实施例还提供一种光照估计装置,这个装置具体可以是芯片、集成电路、组件或模块。具体地,该装置可包括相连的处理器和用于存储指令的存储器,或者该装置包括至少一个处理器,用于从外部存储器获取指令。当装置运行时,处理器可执行指令,以使芯片执行上述各方法实施例中的光照估计方法。The embodiment of the present application also provides a lighting estimation device, which can be a chip, an integrated circuit, a component or a module. Specifically, the device may include a connected processor and a memory for storing instructions, or the device includes at least one processor for obtaining instructions from an external memory. When the device is running, the processor can execute instructions so that the chip executes the lighting estimation method in the above-mentioned method embodiments.
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that in the various embodiments of the present application, the size of the serial numbers of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件,或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices and units described above can refer to the corresponding processes in the aforementioned method embodiments and will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其他的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其他的形式。In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the above units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是 或者也可以不是物理单元,既可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separate, and the components shown as units may be Or it may not be a physical unit, and may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
上述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例上述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the above functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application, or the part that contributes to the prior art or the part of the technical solution, can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) to execute all or part of the steps of the above methods in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, and other media that can store program codes.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。 The above is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any technician familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims (23)

  1. 一种光照估计方法,其特征在于,所述方法包括:A method for illumination estimation, characterized in that the method comprises:
    获取场景的三维模型;Get a 3D model of the scene;
    根据所述三维模型确定所述场景的目标信息,所述目标信息包括光源位置信息,所述光源位置信息用于指示所述场景中光源的位置;Determine target information of the scene according to the three-dimensional model, the target information includes light source position information, and the light source position information is used to indicate the position of the light source in the scene;
    根据所述目标信息确定所述场景的光照强度。The illumination intensity of the scene is determined according to the target information.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述三维模型确定所述场景的目标信息,包括:The method according to claim 1, characterized in that determining the target information of the scene according to the three-dimensional model comprises:
    将所述三维模型输入第一网络模型以得到所述光源位置信息,所述第一网络模型用于根据输入的三维模型输入对应的光源位置信息。The three-dimensional model is input into a first network model to obtain the light source position information, and the first network model is used to input the corresponding light source position information according to the input three-dimensional model.
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:The method according to claim 1 or 2, characterized in that the method further comprises:
    将所述三维模型输入第一编码网络得到第一编码块,所述第一编码块包括编码点特征;Inputting the three-dimensional model into a first coding network to obtain a first coding block, wherein the first coding block includes coding point features;
    将所述三维模型输入第二编码网络得到第二编码块,所述第二编码块包括目标文本特征;Inputting the three-dimensional model into a second encoding network to obtain a second encoding block, wherein the second encoding block includes target text features;
    根据所述编码点特征和所述目标文本特征训练第一网络模型,所述第一网络模型用于根据输入的三维模型输入对应的光源位置信息。A first network model is trained according to the coding point features and the target text features, wherein the first network model is used for inputting corresponding light source position information according to the input three-dimensional model.
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述目标信息还包括位置编码信息,所述位置编码信息用于指示所述场景在高维空间中的表示。The method according to any one of claims 1 to 3 is characterized in that the target information also includes position coding information, and the position coding information is used to indicate the representation of the scene in a high-dimensional space.
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述目标信息还包括光照特征向量,所述光照特征向量用于指示光照在所述场景中的分布情况。The method according to any one of claims 1 to 4 is characterized in that the target information also includes an illumination feature vector, and the illumination feature vector is used to indicate the distribution of illumination in the scene.
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述目标信息还包括所述场景的光照颜色,所述方法还包括:The method according to any one of claims 1 to 5, characterized in that the target information also includes the lighting color of the scene, and the method further includes:
    根据所述光照强度和所述光照颜色确定所述场景的光照表达,所述光照表达用于指示所述场景的光照颜色和光照强度。A lighting expression for the scene is determined according to the lighting intensity and the lighting color, where the lighting expression is used to indicate the lighting color and the lighting intensity of the scene.
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述根据所述目标信息确定所述场景的光照强度,包括:The method according to any one of claims 1 to 6, characterized in that determining the illumination intensity of the scene according to the target information comprises:
    将所述目标信息输入第二网络模型以得到所述光照强度,所述第二网络模型用于根据输入的目标信息输出对应的光照强度。The target information is input into a second network model to obtain the light intensity, and the second network model is used to output the corresponding light intensity according to the input target information.
  8. 根据权利要求1至7中任一所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 7, characterized in that the method further comprises:
    根据所述场景的光照表达确定所述场景的渲染图像,所述光照表达用于指示所述场景的光照颜色和光照强度;Determining a rendered image of the scene according to a lighting expression of the scene, wherein the lighting expression is used to indicate lighting color and lighting intensity of the scene;
    根据所述场景的渲染图像和所述场景的参考图像训练第二网络模型,所述第二网络模型用于根据输入的目标信息输出对应的光照强度。A second network model is trained according to the rendered image of the scene and the reference image of the scene, wherein the second network model is used to output corresponding light intensity according to input target information.
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 8, characterized in that the method further comprises:
    将所述三维模型分割为多个体素;Segmenting the three-dimensional model into a plurality of voxels;
    根据所述场景的光照特征向量确定所述多个体素的光照特征向量,所述场景的光照特征向量用于指示光照在所述场景中的分布情况。The illumination feature vectors of the plurality of voxels are determined according to the illumination feature vector of the scene, where the illumination feature vector of the scene is used to indicate the distribution of illumination in the scene.
  10. 根据权利要求1至9中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 9, characterized in that the method further comprises:
    将所述三维模型分割为多个体素;Segmenting the three-dimensional model into a plurality of voxels;
    根据所述场景的光照颜色确定所述多个体素的光照颜色。The illumination colors of the plurality of voxels are determined according to the illumination color of the scene.
  11. 一种光照估计装置,其特征在于,包括:收发单元和处理单元;An illumination estimation device, characterized in that it comprises: a transceiver unit and a processing unit;
    所述收发单元,用于获取场景的三维模型;The transceiver unit is used to obtain a three-dimensional model of the scene;
    所述处理单元,用于根据所述三维模型确定所述场景的目标信息,所述目标信息包括光源位置信息,所述光源位置信息用于指示所述场景中光源的位置;The processing unit is used to determine target information of the scene according to the three-dimensional model, wherein the target information includes light source position information, and the light source position information is used to indicate the position of the light source in the scene;
    所述处理单元,还用于根据所述目标信息确定所述场景的光照强度。The processing unit is further used to determine the illumination intensity of the scene according to the target information.
  12. 根据权利要求11所述的装置,其特征在于,所述处理单元具体用于:The device according to claim 11, characterized in that the processing unit is specifically used for:
    将所述三维模型输入第一网络模型以得到所述光源位置信息,所述第一网络模型用于根据输入的三维模型输入对应的光源位置信息。 The three-dimensional model is input into a first network model to obtain the light source position information, and the first network model is used to input the corresponding light source position information according to the input three-dimensional model.
  13. 根据权利要求11或12所述的装置,其特征在于,所述处理单元还用于:The device according to claim 11 or 12, characterized in that the processing unit is further used for:
    将所述三维模型输入第一编码网络得到第一编码块,所述第一编码块包括编码点特征;Inputting the three-dimensional model into a first coding network to obtain a first coding block, wherein the first coding block includes coding point features;
    将所述三维模型输入第二编码网络得到第二编码块,所述第二编码块包括目标文本特征;Inputting the three-dimensional model into a second encoding network to obtain a second encoding block, wherein the second encoding block includes target text features;
    根据所述编码点特征和所述目标文本特征训练第一网络模型,所述第一网络模型用于根据输入的三维模型输入对应的光源位置信息。A first network model is trained according to the coding point features and the target text features, wherein the first network model is used for inputting corresponding light source position information according to the input three-dimensional model.
  14. 根据权利要求11至13中任一项所述的装置,其特征在于,所述目标信息还包括位置编码信息,所述位置编码信息用于指示所述场景在高维空间中的表示。The device according to any one of claims 11 to 13 is characterized in that the target information also includes position coding information, and the position coding information is used to indicate the representation of the scene in a high-dimensional space.
  15. 根据权利要求11至14中任一项所述的装置,其特征在于,所述目标信息还包括光照特征向量,所述光照特征向量用于指示光照在所述场景中的分布情况。The device according to any one of claims 11 to 14 is characterized in that the target information also includes an illumination feature vector, and the illumination feature vector is used to indicate the distribution of illumination in the scene.
  16. 根据权利要求11至15中任一项所述的装置,其特征在于,所述目标信息还包括所述场景的光照颜色,所述处理单元还用于:The device according to any one of claims 11 to 15, characterized in that the target information also includes the lighting color of the scene, and the processing unit is further used to:
    根据所述光照强度和所述光照颜色确定所述场景的光照表达,所述光照表达用于指示所述场景的光照颜色和光照强度。A lighting expression for the scene is determined according to the lighting intensity and the lighting color, where the lighting expression is used to indicate the lighting color and the lighting intensity of the scene.
  17. 根据权利要求11至16中任一项所述的装置,其特征在于,所述处理单元具体用于:The device according to any one of claims 11 to 16, characterized in that the processing unit is specifically used for:
    将所述目标信息输入第二网络模型以得到所述光照强度,所述第二网络模型用于根据输入的目标信息输出对应的光照强度。The target information is input into a second network model to obtain the light intensity, and the second network model is used to output the corresponding light intensity according to the input target information.
  18. 根据权利要求11至17中任一项所述的装置,其特征在于,所述处理单元还用于:The device according to any one of claims 11 to 17, characterized in that the processing unit is further used for:
    根据所述场景的光照表达确定所述场景的渲染图像,所述光照表达用于指示所述场景的光照颜色和光照强度;Determining a rendered image of the scene according to a lighting expression of the scene, wherein the lighting expression is used to indicate lighting color and lighting intensity of the scene;
    根据所述场景的渲染图像和所述场景的参考图像训练第二网络模型,所述第二网络模型用于根据输入的目标信息输出对应的光照强度。A second network model is trained according to the rendered image of the scene and the reference image of the scene, wherein the second network model is used to output corresponding light intensity according to input target information.
  19. 根据权利要求11至18中任一项所述的装置,其特征在于,所述处理单元还用于:The device according to any one of claims 11 to 18, characterized in that the processing unit is further used for:
    将所述三维模型分割为多个体素;Segmenting the three-dimensional model into a plurality of voxels;
    根据所述场景的光照特征向量确定所述多个体素的光照特征向量,所述场景的光照特征向量用于指示光照在所述场景中的分布情况。The illumination feature vectors of the plurality of voxels are determined according to the illumination feature vector of the scene, where the illumination feature vector of the scene is used to indicate the distribution of illumination in the scene.
  20. 根据权利要求11至19中任一项所述的装置,其特征在于,所述处理单元还用于:The device according to any one of claims 11 to 19, characterized in that the processing unit is further used for:
    将所述三维模型分割为多个体素;Segmenting the three-dimensional model into a plurality of voxels;
    根据所述场景的光照颜色确定所述多个体素的光照颜色。The illumination colors of the plurality of voxels are determined according to the illumination color of the scene.
  21. 一种光照估计装置,包括至少一个处理器和存储器,其特征在于,所述至少一个处理器执行存储在存储器中的程序或指令,以使得所述光照估计装置实现上述权利要求1至10中任一项所述的方法。A lighting estimation device comprises at least one processor and a memory, wherein the at least one processor executes a program or instruction stored in the memory so that the lighting estimation device implements the method described in any one of claims 1 to 10.
  22. 一种计算机可读存储介质,用于存储计算机程序,其特征在于,当所述计算机程序在计算机或处理器运行时,使得所述计算机或所述处理器实现上述权利要求1至10中任一项所述的方法。A computer-readable storage medium for storing a computer program, characterized in that when the computer program is executed on a computer or a processor, the computer or the processor implements the method described in any one of claims 1 to 10.
  23. 一种计算机程序产品,所述计算机程序产品中包含指令,其特征在于,当所述指令在计算机或处理器上运行时,使得所述计算机或所述处理器实现上述权利要求1至10中任一项所述的方法。 A computer program product, comprising instructions, wherein when the instructions are executed on a computer or a processor, the computer or the processor implements the method according to any one of claims 1 to 10.
PCT/CN2023/123617 2022-12-09 2023-10-09 Illumination estimation method and apparatus WO2024119997A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202211581949 2022-12-09
CN202211581949.6 2022-12-09
CN202310767724.8 2023-06-26
CN202310767724.8A CN118172476A (en) 2022-12-09 2023-06-26 Illumination estimation method and device

Publications (1)

Publication Number Publication Date
WO2024119997A1 true WO2024119997A1 (en) 2024-06-13

Family

ID=91345987

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/123617 WO2024119997A1 (en) 2022-12-09 2023-10-09 Illumination estimation method and apparatus

Country Status (2)

Country Link
CN (1) CN118172476A (en)
WO (1) WO2024119997A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598333A (en) * 2019-09-16 2019-12-20 广东三维家信息科技有限公司 Method and device for determining light source position and electronic equipment
US10665011B1 (en) * 2019-05-31 2020-05-26 Adobe Inc. Dynamically estimating lighting parameters for positions within augmented-reality scenes based on global and local features
CN111833430A (en) * 2019-04-10 2020-10-27 上海科技大学 Illumination data prediction method, system, terminal and medium based on neural network
CN115088019A (en) * 2020-02-25 2022-09-20 Oppo广东移动通信有限公司 System and method for visualizing rays in a scene
CN115439595A (en) * 2022-11-07 2022-12-06 四川大学 AR-oriented indoor scene dynamic illumination online estimation method and device
CN115499576A (en) * 2021-06-18 2022-12-20 华为技术有限公司 Light source estimation method, device and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111833430A (en) * 2019-04-10 2020-10-27 上海科技大学 Illumination data prediction method, system, terminal and medium based on neural network
US10665011B1 (en) * 2019-05-31 2020-05-26 Adobe Inc. Dynamically estimating lighting parameters for positions within augmented-reality scenes based on global and local features
CN110598333A (en) * 2019-09-16 2019-12-20 广东三维家信息科技有限公司 Method and device for determining light source position and electronic equipment
CN115088019A (en) * 2020-02-25 2022-09-20 Oppo广东移动通信有限公司 System and method for visualizing rays in a scene
CN115499576A (en) * 2021-06-18 2022-12-20 华为技术有限公司 Light source estimation method, device and system
CN115439595A (en) * 2022-11-07 2022-12-06 四川大学 AR-oriented indoor scene dynamic illumination online estimation method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG, ZIAN ET AL.: "Learning Indoor Inverse Rendering with 3D Spatially-Varying Lighting", 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 17 October 2021 (2021-10-17), pages 12518 - 12527, XP034093112, DOI: 10.1109/ICCV48922.2021.01231 *

Also Published As

Publication number Publication date
CN118172476A (en) 2024-06-11

Similar Documents

Publication Publication Date Title
WO2020010979A1 (en) Method and apparatus for training model for recognizing key points of hand, and method and apparatus for recognizing key points of hand
US20200265234A1 (en) Electronic device for providing shooting mode based on virtual character and operation method thereof
US10681287B2 (en) Apparatus and method for displaying AR object
CN112562019A (en) Image color adjusting method and device, computer readable medium and electronic equipment
US11526704B2 (en) Method and system of neural network object recognition for image processing
CN116569213A (en) Semantical refinement of image regions
CN108701355B (en) GPU optimization and online single Gaussian-based skin likelihood estimation
US10929961B2 (en) Electronic device and method for correcting images using external electronic device
CN113688907B (en) A model training and video processing method, which comprises the following steps, apparatus, device, and storage medium
CN115699082A (en) Defect detection method and device, storage medium and electronic equipment
WO2022073282A1 (en) Motion recognition method based on feature interactive learning, and terminal device
CN111950570B (en) Target image extraction method, neural network training method and device
US11144197B2 (en) Electronic device performing function according to gesture input and operation method thereof
US11748913B2 (en) Modeling objects from monocular camera outputs
CN114445562A (en) Three-dimensional reconstruction method and device, electronic device and storage medium
WO2022133944A1 (en) Image processing method and image processing apparatus
CN113538227A (en) Image processing method based on semantic segmentation and related equipment
WO2022083118A1 (en) Data processing method and related device
WO2023216957A1 (en) Target positioning method and system, and electronic device
WO2024119997A1 (en) Illumination estimation method and apparatus
WO2023045724A1 (en) Image processing method, electronic device, storage medium, and program product
WO2022143314A1 (en) Object registration method and apparatus
WO2023003642A1 (en) Adaptive bounding for three-dimensional morphable models
CN113196279B (en) Facial attribute identification method and electronic equipment
CN112036487A (en) Image processing method and device, electronic equipment and storage medium