WO2023142035A1 - 虚拟形象处理方法及装置 - Google Patents

虚拟形象处理方法及装置 Download PDF

Info

Publication number
WO2023142035A1
WO2023142035A1 PCT/CN2022/074974 CN2022074974W WO2023142035A1 WO 2023142035 A1 WO2023142035 A1 WO 2023142035A1 CN 2022074974 W CN2022074974 W CN 2022074974W WO 2023142035 A1 WO2023142035 A1 WO 2023142035A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
image frame
environment
illumination
parameters
Prior art date
Application number
PCT/CN2022/074974
Other languages
English (en)
French (fr)
Inventor
胡溪玮
刘杨
李腾
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2022/074974 priority Critical patent/WO2023142035A1/zh
Priority to CN202280003423.1A priority patent/CN117813630A/zh
Publication of WO2023142035A1 publication Critical patent/WO2023142035A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/50Lighting effects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics

Definitions

  • the embodiments of the present application relate to the technical field of virtual reality, and more specifically, to a virtual image processing method and device.
  • the embodiments of the present application provide a virtual image processing method and device, which can make the virtual image more coordinated with the surrounding environment, make the visual effect more realistic, and improve user experience.
  • a virtual image processing method including: acquiring an image frame of a first environment, the first environment includes a cockpit of a vehicle or an environment where the vehicle is located, and the image frame is used to determine illumination parameters of the first environment; Obtain the illumination parameter; acquire the avatar based on the illumination parameter.
  • the illumination parameter is a parameter of a differentiable illumination model, for example, the illumination parameter may be a spherical harmonic illumination parameter.
  • the avatar is obtained based on the illumination parameters, and the light effect rendering effect close to or consistent with the light field of the scene captured by the image frame (that is, the first environment) can be obtained, so that the avatar is more closely related to the surrounding environment. Coordinated, more realistic in visual effects, and improve user experience.
  • the image frame includes a face image, and the face image is used to determine an illumination parameter of the first environment.
  • the scheme of the embodiment of the present application uses the face image to determine the lighting parameters, and obtains the avatar based on the lighting parameters, which is beneficial to improve the coordination between the light effect of the avatar and the light field at the face, and obtain a better rendering effect.
  • the image frame may include an image of the user's head in the cockpit of the vehicle, and the image of the user's head is used to determine the lighting parameter.
  • the image frame is obtained from the image sensor, and the method further includes: when the first environment is in the first lighting condition, controlling the image sensor to collect images at a first frequency; when the first When an environment is in a second lighting condition, the image sensor is controlled to collect images at a second frequency, the frequency of change of light intensity under the first lighting condition is greater than the frequency of change of light intensity under the second lighting condition, and the first frequency is greater than the second frequency .
  • the change frequency of the light intensity in the first environment is small, images can be collected at a small frequency.
  • the image collection frequency is reduced, the frequency of obtaining lighting parameters and light effect rendering will be correspondingly reduced. , reducing computing overhead and resource consumption as a whole.
  • the frequency of light intensity changes in the first environment is relatively high, and images can be collected at a higher frequency.
  • the lighting parameters can be obtained at a higher frequency, and light effect rendering can be performed, which is conducive to ensuring real-time light effect rendering In this way, even in the case of drastic changes in the light field, the coordination between the virtual image and the surrounding environment can still be guaranteed.
  • the method further includes: controlling the projection of the avatar in the projection area.
  • the projected area includes an area inside a cockpit of the vehicle.
  • the illumination parameters are determined based on the first target area in the image frame;
  • the illumination parameter is determined based on the second target region in the image frame.
  • the target area in the image frame when the projection area changes, different target areas in the image frame are used to determine the illumination parameters, that is, the target area in the image frame can be adjusted according to the projection area, for example, the The area displayed in the target area is as close as possible to the projection area, which is conducive to obtaining the lighting parameters closest to the lighting conditions of the projection area, improving the effect of light effect rendering, making the virtual image more harmonious with the surrounding environment, and more realistic in visual effect.
  • the method further includes: sending the image frame to the server; and acquiring the lighting parameters, including: acquiring the lighting parameters from the server.
  • obtaining the illumination parameter includes: inputting an image frame into an illumination parameter model to obtain the illumination parameter.
  • Illumination parameters are determined through the illumination parameter model, without complex preprocessing of image frames, which reduces the complexity of the overall calculation process, has low demand for computing resources, and fast calculation speed, which can be applied to scenes with high real-time requirements. middle.
  • the method further includes: adjusting the projection area in response to a user operation.
  • the user operation may issue a voice command for the user, and the voice command may be used to instruct to adjust the projection area.
  • the projection area is adjusted.
  • the current avatar is projected on the center console
  • the voice instruction may be "project the avatar on the passenger seat”.
  • the control projects an avatar on the passenger seat.
  • the user operation may be a voice instruction for the user. After the location of the user who issued the voice command is detected, the projection area is adjusted according to the location of the user.
  • the projected area is related to the location of the user who issued the voice command.
  • the voice instruction may be an instruction not related to adjusting the projection area.
  • the current avatar is projected on the passenger seat, and when it is detected that the user who is currently issuing a voice command is located in the rear seat, in response to the voice command, the control projection area is transferred from the passenger seat to the rear seat facing user. area.
  • the user operation may also be other actions that affect the projection area.
  • the current avatar is projected on the passenger seat, and when it is detected that the door on the side of the passenger seat is opened, in response to this operation, the control projection area is transferred from the passenger seat to the center console.
  • a virtual image processing method including: acquiring an image frame of a first environment, the first environment includes a cockpit of a vehicle or an environment in which the vehicle is located, and the image frame is used to determine the lighting parameters of the first environment; based on The image frame determines the lighting parameters; the lighting parameters are sent to the vehicle, and the lighting parameters are used for the processing of the avatar.
  • the lighting parameters are sent to the vehicle, so that the vehicle can obtain the virtual image based on the lighting parameters, and obtain a light effect rendering effect close to or consistent with the light field of the scene captured by the image frame, so that the virtual image is in harmony with the surrounding environment. It is more coordinated, more realistic in visual effects, and improves user experience.
  • determining the lighting parameters includes: inputting image frames into a lighting parameter model to obtain the lighting parameters, and the lighting parameter model is obtained by training based on sample image frames.
  • Illumination parameters are determined through the illumination parameter model, without complex preprocessing of image frames, which reduces the complexity of the overall calculation process, has low demand for computing resources, and fast calculation speed, which can be applied to scenes with high real-time requirements. middle.
  • the illumination parameter model is obtained by training the initial model with the sample image frame as input data, with the goal of reducing the difference between the reconstructed image and the sample image frame,
  • the reconstructed image is obtained by rendering the predicted illumination parameters in the sample image frame output by the initial model and the 3D model parameters of the target object in the sample image frame.
  • the real illumination parameters of the sample image frames are not needed during the training process, given any data set, for example, a data set of human face images, the illumination parameter model can be trained. Specifically, by establishing a loss function based on the difference between the reconstructed image obtained after rendering and the input image, the illumination parameter model can be trained, and the requirements for training samples are relatively low.
  • the image frame includes a face image, and the face image is used to determine an illumination parameter of the first environment.
  • a method for training an illumination parameter model including: acquiring a sample image frame; training an illumination parameter model based on the sample image frame.
  • training the illumination parameter model based on the sample image frame includes: using the sample image frame as the input image of the initial model to reduce the difference between the reconstructed image and the sample image frame
  • the initial model is trained for the target to obtain the trained model, that is, the illumination parameter model.
  • the reconstructed image is obtained by rendering based on the predicted illumination parameters in the sample image frame output by the initial model and the 3D model parameters of the target object in the sample image frame.
  • the real illumination parameters of the sample image frames are not needed during the training process, given any data set, for example, a data set of human face images, the illumination parameter model can be trained. Specifically, by establishing a loss function based on the difference between the reconstructed image obtained after rendering and the input image, the illumination parameter model can be trained, and the requirements for training samples are relatively low.
  • the sample image frame includes a face image
  • the target object includes a head
  • a virtual image processing device in a fourth aspect, includes a unit for executing the method in any one of the implementation manners of the above-mentioned first aspect.
  • a virtual image processing device in a fifth aspect, includes a unit for performing the method in any implementation manner of the second aspect above.
  • an illumination parameter model training device includes a unit for performing the method in any one of the above-mentioned implementation manners of the third aspect.
  • an avatar processing device which includes: a memory for storing programs; a processor for executing the programs stored in the memory, and when the programs stored in the memory are executed, the The processor is configured to execute the method in any one implementation manner of the first aspect or the second aspect.
  • the device may be a server, for example, a cloud server.
  • the device may also be an electronic device at the vehicle end, for example, a vehicle-mounted device or a vehicle-mounted chip.
  • the device may also be a vehicle.
  • a training device for an illumination parameter model comprising: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, The processor is configured to execute the method in any one implementation manner in the third aspect.
  • the training device can be a computer, a mainframe or a server and other devices with computing capabilities.
  • the training device can also be a chip.
  • a computer-readable medium stores program code for execution by a device, and the program code includes an implementation for executing any one of the first aspect, the second aspect, or the third aspect methods in methods.
  • a computer program product containing instructions is provided, and when the computer program product is run on a computer, it causes the computer to execute the method in any one of the above-mentioned first aspect, second aspect or third aspect. .
  • a system in an eleventh aspect, includes the device and the image sensor in any one of the implementation manners of the fourth aspect, the fifth aspect, or the sixth aspect, wherein the image sensor is used to collect images of the first environment frame.
  • the system may be a vehicle.
  • the system may be an onboard system.
  • the in-vehicle system includes a server and electronic equipment on the vehicle side.
  • the electronic equipment at the vehicle end may be any one of a vehicle-mounted chip or a vehicle-mounted device (such as a vehicle machine, a vehicle-mounted computer), and the like.
  • an electronic device in a twelfth aspect, includes the apparatus in any one implementation manner of the fourth aspect, the fifth aspect, or the sixth aspect.
  • the electronic device may specifically be any one of a vehicle, a vehicle-mounted chip, or a vehicle-mounted device (such as a vehicle machine, a vehicle-mounted computer), and the like.
  • the avatar is obtained based on the illumination parameters, and the light effect rendering effect close to or consistent with the light field of the scene captured by the image frame can be obtained, so that the avatar is more coordinated with the surrounding environment, and the visual effect is better.
  • determining the illumination parameters based on the face image is beneficial to improve the coordination between the light effect of the virtual image and the light field at the face, and obtain better rendering effect.
  • using the trained lighting model to determine the lighting parameters does not require complex preprocessing of the image frame, which reduces the complexity of the overall calculation process, has low demand for computing resources, and has a fast calculation speed, which can be applied to real-time requirements. in higher scenes.
  • the frequency of light intensity changes in the first environment is relatively high, and images can be collected at a higher frequency.
  • the lighting parameters can be obtained at a higher frequency, and light effect rendering can be performed, which is conducive to ensuring real-time light effect rendering In this way, even in the case of drastic changes in the light field, the coordination between the virtual image and the surrounding environment can still be guaranteed.
  • the target area in the image frame can be adjusted according to the projection area, which is conducive to obtaining the lighting parameters closest to the lighting conditions of the projection area, improving the effect of light effect rendering, and making the virtual image more harmonious with the surrounding environment , which is more visually realistic.
  • FIG. 1 is a schematic structural diagram of a system architecture of the present application
  • FIG. 2 is a schematic diagram of another system architecture of the present application.
  • FIG. 3 is a schematic flow chart of a virtual image processing method of the present application.
  • Fig. 4 is a schematic flowchart of a training method of an illumination parameter model of the present application
  • FIG. 5 is a schematic diagram of a training process of an illumination parameter model of the present application.
  • FIG. 6 is a schematic diagram of the light effect rendering process of the present application.
  • FIG. 7 is a schematic flowchart of another virtual image processing method of the present application.
  • FIG. 8 is a schematic diagram of a camera position of the present application.
  • FIG. 9 is a schematic diagram of a scene of the present application.
  • FIG. 10 is a schematic diagram of a light effect rendering effect of the present application.
  • FIG. 11 is a schematic diagram of a virtual image processing device of the present application.
  • FIG. 12 is a schematic diagram of another virtual image processing device of the present application.
  • FIG. 13 is a schematic diagram of a hardware device of the present application.
  • the solution of the embodiment of the present application can be applied in the human-computer interaction scene in the cockpit of the vehicle to improve the intelligent service experience of the vehicle.
  • an augmented reality device can be used to project an interactive virtual image in the cockpit to improve the service experience of human-computer interaction in the cockpit.
  • Avatars can provide users with various types of services such as voice assistants or driving companions.
  • the avatar can realize the function of the voice assistant.
  • the virtual image can be used as a visual image of the voice assistant.
  • the avatar can provide the user with vehicle control services, entertainment communication services, and the like.
  • users can control hardware such as windows, lights, doors, sunroofs, and air conditioners by interacting with avatars.
  • the user can perform operations such as playing music, playing movies, and sending and receiving emails by interacting with the avatar.
  • the avatar can be used as a driving companion to provide accompanying services for the user.
  • a virtual image such as a virtual partner can be projected on the passenger seat to provide the driver with accompanying services such as chatting, singing, dancing, etc., so that the driver has more fun while driving and improves the user experience.
  • accompanying services such as chatting, singing, dancing, etc.
  • the vehicle is in a driving state and the environment is changing, which may cause the avatar to be uncoordinated with the surrounding environment, thus affecting the user experience.
  • the scheme of the embodiment of the present application obtains the illumination parameters of the cabin or the vehicle environment in real time by acquiring the image frame of the environment of the vehicle cabin or the environment where the vehicle is located, and uses the illumination parameters to acquire the avatar, for example, to illuminate the avatar to be projected.
  • High-efficiency rendering is conducive to improving the consistency of the light effect of the virtual image and the light field of the surrounding environment, improving the coordination between the virtual image and the surrounding environment, making the virtual image more real, and improving user experience.
  • the embodiment of the present application provides a system 100 .
  • the system may include a vehicle 110 and a server 120 .
  • the server 120 can provide services for the vehicle 110 in the form of cloud, and the communication link between the server 120 and the vehicle 110 is bidirectional, that is, the server 120 can transmit information to the vehicle 110 , and the vehicle 110 can also transmit information to the server 120 .
  • the communication between the server 120 and the vehicle 110 can be realized through wireless communication and/or wired communication.
  • the vehicle 110 accesses the wireless network through a base station, and the server 120 may transmit information to the vehicle 110 through the base station, or may transmit information to the vehicle 110 through a roadside device.
  • the server 120 and the base station may be connected wirelessly or via a wired connection; the roadside device and the server 120 may be connected wirelessly or via a wired connection; in addition, the roadside device and the server 120 may communicate through a base station.
  • the aforementioned wireless networks include but are not limited to: 2G cellular communication, such as global system for mobile communication (GSM), general packet radio service (general packet radio service, GPRS); 3G cellular communication, such as broadband code division multiple access (wideband code division multiple access, WCDMA), time division synchronous code division multiple access (time division-synchronous code division multiple access, TS-SCDMA), code division multiple access (code division multiple access, CDMA); 4G cellular Communication, such as long term evolution (long term evolution, LTE); 5G cellular communication, or other evolved cellular communication technologies.
  • GSM global system for mobile communication
  • GPRS general packet radio service
  • 3G cellular communication such as broadband code division multiple access (wideband code division multiple access, WCDMA), time division synchronous code division multiple access
  • the vehicle 110 can upload the image frame captured by the camera to the server 120 , and the server 120 can process the image frame to obtain the illumination parameter, and send the illumination parameter to the vehicle 110 .
  • an illumination parameter model may be deployed on the server 120, where the illumination parameter model is a neural network model obtained through training.
  • the server 120 obtains the image frame to be processed from the vehicle 110 , and processes the image frame by using the illumination parameter model to obtain the illumination parameter, and sends the illumination parameter to the vehicle 110 .
  • the vehicle 110 can acquire the avatar based on the lighting parameters, and project the avatar.
  • the vehicle 110 may process image frames collected by the camera to obtain illumination parameters.
  • the vehicle 110 is deployed with an illumination parameter model, and the image frames acquired from the camera are input into the illumination parameter model to obtain the illumination parameters.
  • the vehicle 110 can acquire the avatar based on the illumination parameters, and project the avatar.
  • the illumination parameter model is a neural network model obtained through training, and its training can be implemented on the cloud where the server 120 is located, and the vehicle 110 can obtain relevant parameters of the illumination parameter model from the server 120 to update the illumination parameter model.
  • the vehicle 110 can transmit the collected image frames as training samples to the server 120, so that the server 120 can obtain more abundant and realistic sample data from the vehicle 110, thereby improving the accuracy of the lighting parameter model, and continuously improving and updating the lighting parametric model.
  • the embodiment of the present application provides a system architecture 200 for model training, and the illumination parameter model of the embodiment of the present application can be obtained through the system training.
  • the data collection device 260 is used to collect training data.
  • the training data may be sample image frames.
  • training data After the training data is collected, these training data are stored in the database 230, and the data stored in the database 230 may be the original data obtained from the data collection device 260, or the data after processing the original data.
  • the training device 220 trains the target model/rules 101 based on the training data maintained in the database 230 .
  • processing the raw data may include target image acquisition, image size adjustment, data screening, and the like.
  • raw images can be processed by the following procedure.
  • Face detection is performed on the original image to obtain a face image (an example of a target image).
  • the training device 220 obtains the target model/rule 101 based on the training data below.
  • the training device 220 processes the input raw data and compares the output value with the target value until the difference between the value output by the training device 220 and the target value The value is less than a certain threshold, thus completing the training of the target model/rule 101.
  • the target model/rule 101 in the embodiment of the present application may specifically be a neural network model.
  • the training data maintained in the database 230 may not all be collected by the data collection device 260, but may also be received from other devices.
  • the training device 220 does not necessarily perform the training of the target model/rule 101 based entirely on the training data maintained by the database 230, and it is also possible to obtain training data from other places for model training. limit.
  • the data stored in the database may be data obtained from multiple vehicles in a crowdsourcing manner, such as image frame data collected by the vehicles. The training process of an illumination parameter model will be described later in combination with FIG. 5 .
  • the target model/rule 101 obtained according to the training device 220 can be applied to different systems or devices, such as the execution device 210 shown in FIG. is the server etc.
  • the execution device 210 may be configured with an input/output (I/O) interface 212 for data interaction with external devices, and the data collection device 240 may transmit data to the execution device 210 .
  • the data collection device 240 may be a terminal, and the execution device 210 may be a server.
  • the data collection device 240 may be an image sensor, and the executing device 210 may be a terminal.
  • the execution device 210 When the execution device 210 preprocesses the input data, or in the calculation module 211 of the execution device 210 performs calculations and other related processing, the execution device 210 can call the data, codes, etc. in the data storage system 250 for corresponding processing , and the correspondingly processed data and instructions may also be stored in the data storage system 250 .
  • the data collection device 240 may be a terminal
  • the execution device 210 may be a server
  • the server may return the processing result to the terminal.
  • the training device 220 can generate corresponding target models/rules 101 based on different training data for different goals or different tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete the above tasks, thereby providing users Provides the desired result.
  • the data collection device 240 may automatically execute the process of sending input data from the device 210 during model training,
  • Figure 2 is only a schematic diagram of a system architecture provided by the embodiment of the present application, and the positional relationship between devices, devices, modules, etc. shown in the figure does not constitute any limitation, for example, in Figure 2, the data storage system 250 is relatively
  • the device 210 is an external memory, and in other cases, the data storage system 250 may also be placed in the execution device 210 .
  • a target model/rule 101 is trained according to the training device 220 , and the target model/rule 101 may be an illumination parameter model in the embodiment of the present application.
  • FIG. 3 shows a virtual image processing method 300 provided by an embodiment of the present application.
  • the method shown in FIG. 3 can be executed by electronic equipment at the vehicle end, and the electronic equipment can specifically include one or more of vehicles, on-board chips, or on-board devices (eg, on-board computers, on-board computers).
  • the method shown in FIG. 3 may also be executed by a cloud service device.
  • the method shown in FIG. 3 may also be executed by a system composed of a cloud service device and an electronic device at the vehicle end. For example, some of the steps are performed by the electronic device at the vehicle end, and some of the steps are performed by the cloud service device on the cloud.
  • the electronic device at the car end may also be called a car end device, or a car end; the cloud service device may also be called a cloud server, a cloud device, or a cloud.
  • the method shown in FIG. 3 can be executed by the vehicle 110 or the cloud server 120 in FIG. 1 .
  • the method shown in FIG. 3 may also be executed by the system 100 in FIG. 1 .
  • S310 to S330 are executed by the cloud, and S340 is executed by the vehicle.
  • the cloud device can send the processed data or the control instruction determined according to the above data to the electronic device at the vehicle end, and the electronic device at the vehicle end controls the projection.
  • the above method 300 includes at least step S310 to step S330. Step S310 to step S330 will be described in detail below.
  • the method is applied to a vehicle, and the first environment is the cockpit of the vehicle or the environment in which the vehicle is located, and the image frame may be an image frame collected by an image sensor, that is, the vehicle end (i.e., the vehicle) may obtain an image frame from the image The sensor acquires image frames.
  • the image sensor may be an image sensor inside the cockpit, or an image sensor outside the cockpit.
  • the image sensor in the cockpit may include a cockpit monitoring system (cabin monitoring system, CMS) camera or a driver monitoring system (driver monitoring system, DMS) camera.
  • the image sensor in the cockpit may include one or more of cameras such as a color camera, a depth camera or an infrared camera. It should be understood that this is only an example, and other types of image sensors may also be used to collect image frames, which is not limited in this embodiment of the present application.
  • the image sensor can be arranged at the position of the rearview mirror in the cockpit.
  • the viewing angle of the image sensor can cover areas such as the main driver's seat, the passenger's seat, and the rear seats.
  • the lighting parameters of the first environment may be understood as lighting parameters in the scene captured by the image frame.
  • the lighting parameter can reflect the lighting conditions in the scene captured by the image frame.
  • a lighting situation may also be referred to as a light field situation. Lighting parameters are used for avatar processing.
  • the first environment is the environment in the cockpit of the vehicle or the environment where the vehicle is located.
  • the lighting parameters refer to the parameters of the lighting model.
  • the lighting parameters can be a matrix of n light *6, where n light represents the number of point light sources in the environment, and each point light source is represented by 6 rational numbers; among them, the first three The rational numbers represent the positions of the light source (light positions), and the last three numbers represent the intensities of the three colors of the light source (red, blue and green).
  • the lighting parameter can be a matrix of n light *6, where n light represents the number of directional light sources in the environment, and each directional light source is represented by 6 rational numbers; among them, the first three The first rational number represents the direction of the light source (light directions), and the last three rational numbers represent the intensity of the light source's three colors of red, blue and green (light intensities).
  • the illumination parameter is a parameter of a differentiable illumination model.
  • the illumination parameter may be a spherical harmonic illumination parameter.
  • the expression of the spherical harmonic illumination function may satisfy the following formula:
  • l and m are the eigenvalues of the Laplace equation in spherical coordinates.
  • n is used to indicate the order of the spherical harmonic basis function, that is, the spherical harmonic order, and m is the degree of the spherical harmonic basis function.
  • is the spherical harmonic basis function is the coefficient of the spherical harmonic basis direction corresponding to the spherical harmonic basis function
  • the spherical harmonic illumination parameter is the coefficient of each spherical harmonic basis direction.
  • the spherical harmonic order is 3, that is, n is 2, and l may take values 0, 1, 2; when l takes 0, m may take a value of 0, and when l is 1, m
  • the possible values of are -1, 0, 1; when l is 2, the possible values of m are -2, -1, 0, 1, 2.
  • the RGB image includes three channels, and each channel has nine spherical harmonic illumination parameters. Therefore, for the RGB image, there are 27 spherical harmonic illumination parameters parameters, that is, the number of spherical harmonic illumination parameters to be predicted is 27.
  • the vehicle end may acquire illumination parameters locally.
  • the car end can determine the lighting parameter by itself.
  • the vehicle end is provided with an illumination parameter model, and the acquired image frame is input into the illumination parameter model to obtain the illumination parameter.
  • the illumination parameter model can be pre-set on the vehicle end or sent to the vehicle end by the server.
  • the illumination parameter model may be obtained through training on the server side. Further, the server side may further update the illumination parameter model through training, and send the updated model or model parameters to the vehicle end.
  • the vehicle end may also obtain lighting parameters from other devices, that is, receive lighting parameters sent by other devices.
  • the car end sends the acquired image frame to the server, and the server obtains the image frame from the car end, and after determining the lighting parameters based on the image frame, sends the lighting parameters to the car end, that is, the car end can obtain the lighting parameters from the server.
  • the server is provided with an illumination parameter model, and the image frames obtained from the vehicle are input into the illumination parameter model to obtain the illumination parameters.
  • the lighting parameter model can be obtained through training on the server side. Further, the server side can further update the lighting parameter model through training to improve the accuracy of the lighting parameters, thereby further improving the lighting effect of the avatar and the lighting of the surrounding environment. Consistency of field conditions.
  • the image frame includes a face image.
  • the face image is used to determine the lighting parameters of the first environment.
  • the virtual image is a character image.
  • the scheme of the embodiment of the present application uses the illumination parameters obtained from the face image information to perform light effect rendering on the avatar (for example, an avatar image), which is beneficial to improve the relationship between the light effect of the avatar and the light field in the scene displayed in the target area.
  • the coordination among them can get better rendering effect.
  • the image frame includes a target area, and the target area of the image frame is used to determine the illumination parameter.
  • the target area may be a face area.
  • the obtained illumination parameter may reflect the illumination condition at the face captured by the image frame.
  • the target area may be the face area closest to the projection area of the avatar among the plurality of face areas.
  • the target area may be the face area with the largest proportion in the image frame.
  • the target area may be the image frame The face area in the driver's area.
  • the target area may include the user's head area.
  • the target area may also include the user's upper body area.
  • the target area may include the user's hand area.
  • the target area may include the user's arm area.
  • the target area may also include the area where the user is located.
  • the image frame may be an image frame of the environment where the vehicle is located, and the area where the scene outside the vehicle is located in the image frame may also be used as the target area. Furthermore, in the image frame, the area where the scene outside the vehicle that is relatively close to the vehicle is located may be used as the target area.
  • the image frame may be an image frame in the cockpit, and the area where the decoration on the center console in the image frame is located may also be used as the target area.
  • the target area in the image frame may include a partial area in the image frame, or may include all areas in the image frame.
  • the lighting parameters of the first environment may also be defined in other ways, for example, the lighting parameters of the first environment may be an average value of lighting parameters of multiple regions in the image frame.
  • the lighting parameter of the first environment may be the lighting parameter of the target area in the image frame.
  • determining the illumination parameter based on the image frame may include inputting the image frame into an illumination parameter model to obtain the illumination parameter.
  • the illumination parameter model may be obtained through training based on sample image frames.
  • the illumination parameter model can be updated based on data update or algorithm update.
  • the process of determining the illumination parameters may be performed by the vehicle end.
  • the car end can acquire the lighting parameter model, for example, receiving parameters of the lighting parameter model sent by the server.
  • the vehicle end may store the illumination parameter model.
  • the car end processes the image frame through the illumination parameter model to obtain the illumination parameters.
  • the above processing process may also be performed by other devices, for example, the server processes the image frames through the illumination parameter model to obtain the illumination parameters.
  • the illumination parameter model can be deployed on the server, on the vehicle side, or on other servers or terminals. Get the lighting parameter model at processing time.
  • inputting the image frame into the illumination parameter model may include inputting image information of the target area in the image frame into the illumination parameter model.
  • the face in the image frame can be detected, captured and segmented by the face detection method to obtain the face image information, and then the face image information can be input into the illumination parameter model for processing .
  • the illumination parameter model may be a neural network model.
  • the illumination parameter model can be a model based on a neural network obtained through training, and the neural network here can be a convolutional neural network (convolutional neuron network, CNN), a recurrent neural network (recurrent neural network, RNN), a temporal recurrent neural network ( long-short term memory (LSTM), bidirectional long-short term memory (BLSTM), or deep convolutional neural networks (DCNN), etc.
  • CNN convolutional neural network
  • RNN recurrent neural network
  • LSTM long-short term memory
  • BLSTM bidirectional long-short term memory
  • DCNN deep convolutional neural networks
  • the neural network may be composed of neural units, and the neural unit may refer to an operation unit that takes x s and the intercept 1 as input, and the output of the operation unit may be shown in formula (1), for example:
  • W s is the weight of x s , which can also be called the parameter or coefficient of the neural network
  • x s is the neural network
  • the input of , b is the bias of the neuron unit.
  • f is the activation function of the neural unit, which is used to perform nonlinear transformation on the features in the neural network, thereby converting the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with multiple hidden layers.
  • DNN is divided according to the position of different layers, and the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the layers in the middle are all hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • connections between some neurons may be disconnected.
  • DNN looks complicated, it is actually not complicated in terms of the work of each layer.
  • it is the following linear relationship expression: in, is the input vector, is the output vector, Is the offset vector, W is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • Each layer is just an input vector After such a simple operation, the output vector is obtained. Due to the large number of DNN layers, the coefficient W and the offset vector The number is also higher.
  • DNN The definition of these parameters in DNN is as follows: Take the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the fourth neuron of the second layer to the second neuron of the third layer is defined as The superscript 3 represents the layer number of the coefficient W, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the kth neuron of the L-1 layer to the jth neuron of the L layer is defined as
  • the input layer has no W parameter.
  • more hidden layers make the network more capable of describing complex situations in the real world. Theoretically speaking, a model with more parameters has a higher complexity and a greater "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
  • a convolutional neural network is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a subsampling layer, which can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can only be connected to some adjacent neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units of the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as a way to extract image information that is independent of location.
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • the training method of the illumination parameter model will be introduced later.
  • Illumination parameters are determined through the illumination parameter model, without complex preprocessing of image frames, which reduces the complexity of the overall calculation process, has low demand for computing resources, and fast calculation speed, which can be applied to scenes with high real-time requirements. middle.
  • the lighting parameters may also be understood as lighting parameters in the cockpit.
  • other methods may also be used to predict the lighting parameters in the cockpit.
  • the illumination parameters in the cockpit can be predicted according to the environment in which the vehicle is located.
  • the front of the vehicle is about to enter a tunnel.
  • the lighting parameters in the cockpit can be predicted based on the lighting parameters in the tunnel.
  • the lighting parameters in the tunnel may be pre-saved lighting parameters.
  • the pre-saved illumination parameter may be pre-measured in the tunnel by an optical device, or may be an illumination parameter determined by using the solution of the present application when other vehicles pass through the tunnel before. This embodiment of the present application does not limit it.
  • the environment in which the vehicle is located may be determined by sensors on the vehicle. For example, it can be determined that the front of the vehicle is about to enter a tunnel through the image frame outside the vehicle collected by the image sensor. For another example, the data collected by the radar can be used to determine that the vehicle ahead is about to enter a tunnel. It should be understood that this is only an example, and the environment in which the vehicle is located may also be determined in other ways, which is not limited in this embodiment of the present application.
  • the lighting parameters in the cockpit can be predicted according to the location of the vehicle and weather information.
  • the lighting parameters in the cabin are predicted based on the weather conditions at said location of the vehicle. For example, according to the weather forecast, the weather conditions of the areas that the vehicle will pass through on the driving route are determined, and the illumination parameters in the cockpit are predicted according to the weather conditions.
  • the lighting parameters in the cockpit can be predicted in advance, which is conducive to real-time lighting effect rendering. For example, when the vehicle travels to position A, the image frame in the cockpit is acquired. Before the current lighting parameters have been calculated, light effect rendering can be performed based on the previously predicted lighting parameters in the cockpit at position A, so that the avatar It is more coordinated with the surrounding environment and improves user experience.
  • Performing light effect rendering on the original avatar based on the lighting parameters means that the lighting parameters are used during rendering.
  • Step S330 can also be understood as performing rendering based on lighting parameters and parameters of the original avatar.
  • the avatar may include various avatars such as a virtual partner, a virtual prop, a virtual cartoon character, or a virtual pet.
  • the virtual image can be obtained by using an existing scheme.
  • the three-dimensional (three Dimension, 3D) grid (mesh) information and texture information of the virtual image can be obtained, and the 3D virtual image can be obtained by performing 3D rendering on the 3D mesh information and texture information .
  • the lighting parameters obtained in step S320 are used during rendering to obtain the avatar rendered with light effects using the lighting parameters.
  • the avatar is obtained based on the lighting parameters of the environment, for example, the light effect rendering of the original avatar is performed using the lighting parameters of the environment, and the light field corresponding to the scene (that is, the environment) captured by the image frame can be obtained Approximate or consistent light effect rendering effects make the virtual image more coordinated with the surrounding environment, more realistic in visual effects, and improve user experience.
  • the augmented reality device may be controlled to project the avatar in the projection area.
  • the projected area includes an area within the vehicle's cabin.
  • the target area in the image frame may not be related to the projection area, or may be related to the projection area.
  • the target area in the image frame has nothing to do with the projection area, that is, the target area in the image frame and the projection area do not affect each other, and the target areas may be the same for different projection areas.
  • the projection area is regarded as an influencing factor of the target area in the image frame.
  • the target area can be different for different projection areas. Or it can be understood that the target area in the image frame may be determined according to the projection area.
  • the target area is the first target area in the image frame; when the projection area is the second area in the vehicle cockpit, the target area is the first target area in the image frame 2. Target area.
  • the illumination parameters are determined based on the first target area in the image frame; when the projection area is the second area in the vehicle cockpit, the illumination parameters are determined based on the target area in the image frame The second target area is determined.
  • the first area in the cockpit and the second area in the cockpit are different areas.
  • the first target area in the target image frame and the second target area in the target image frame are different areas.
  • the image frame is an image frame in the car
  • the first area in the cockpit can be the passenger seat
  • the driver's face area (an example of the first target area) in the image frame can be used as the target area
  • the area may be the center console, and the area where the ornaments on the center console in the image frame are located (an example of the second target area) may be the second target area.
  • the illumination parameters obtained when the driver's face area in the image frame is used as the target area are the illumination parameters around the driver's face. That is to say, when the virtual image needs to be projected on the passenger seat, the driver's The lighting parameters around the face of the original virtual image are rendered with light effects, and then the virtual image is projected on the passenger seat.
  • the lighting parameters obtained when the area where the decoration on the center console is located in the image frame is used as the target area are the lighting parameters around the decoration. That is to say, when the avatar needs to be projected on the center console, you can use Lighting parameters perform light effect rendering on the original avatar, and then project the avatar on the center console.
  • the target area in the image frame can be adjusted according to the projection area, for example, the area displayed by the target area in the image frame is the same as
  • the projection area should be as close as possible, which is conducive to obtaining the lighting parameters closest to the lighting conditions of the projection area, improving the effect of light effect rendering, making the virtual image more harmonious with the surrounding environment, and more realistic in visual effect.
  • the target area in the image frame is related to the projection area, which does not mean that the target area must be different when the projection area is different.
  • Multiple projection areas can also correspond to the same target area.
  • the target areas corresponding to the two projection areas may also be the same.
  • the difference between the light intensity of the two projection areas is usually small, so that when the distance between the two projection areas is small, the two projection areas correspond to The target area can also be the same.
  • the target area is the first target area in the image frame;
  • the target area is the second target area in the image frame.
  • the difference between the illumination intensities of the two projection areas may be large, and it is difficult for the illumination parameters of the same target area to be consistent with the illumination conditions of the two projection areas, which in turn affects the light intensity.
  • Effective rendering effect different areas in the image frame are used as the target area, for example, the area displayed by the target area in the image frame is as close as possible to the projection area, which is conducive to obtaining the illumination parameters closest to the illumination conditions of the projection area, Improve the effect of light effect rendering, make the virtual image more harmonious with the surrounding environment, and make the visual effect more realistic.
  • the area captured by the image frame may not be related to the projection area, or may be related to the projection area.
  • the area captured by the image frame refers to the area captured by the image sensor that captures the image frame.
  • the area captured by the image frame may also be referred to as the area displayed by the image frame.
  • the area captured by the image frame has nothing to do with the projection area, that is, the area captured by the image frame and the projection area do not affect each other.
  • the areas captured by the image frame may be the same.
  • the projection area is regarded as an influencing factor of the area captured by the image frame.
  • the areas captured by the image frames may be different.
  • the area captured by the image frame may be determined according to the projection area.
  • the image sensor when the projection area is the third area in the vehicle cockpit, the image sensor is controlled to capture image frames at a first angle; when the projection area is the fourth area in the vehicle cockpit, the image sensor is controlled to capture images at a second angle frame.
  • the third area in the cockpit and the fourth area in the cockpit are different areas.
  • the first angle is different from the second angle.
  • the area captured by the image frame captured by the image sensor at the first angle is different from the area captured by the image frame captured by the image sensor at the second angle. That is, control the image sensor to capture different areas.
  • control the first image sensor when the projection area is the third area in the vehicle cockpit, control the first image sensor to acquire image frames; when the projection area is the fourth area in the vehicle cockpit, control the second image sensor to acquire image frames.
  • the positions of the first image sensor and the second image sensor are different.
  • the area captured by the image frame captured by the first image sensor is different from the area captured by the image frame captured by the second image sensor. That is, a plurality of image sensors may be installed in the vehicle, for example, image sensors may be installed in front of each seat of the vehicle.
  • the image frame is an image frame in the car
  • the third area in the cockpit can be the passenger seat
  • the area captured by the image frame can be the area where the driver is.
  • the image sensor is controlled to face the driver's angle (first An example of an angle) captures an image frame, or controls an image sensor in front of the driver (an example of the first image sensor) to capture an image frame
  • the fourth area in the cockpit can be the rear seat
  • the area captured by the image frame can be the
  • control the image sensor to capture the image frame at an angle facing the passenger in the rear seat an example of the second angle
  • control the image sensor in front of the rear passenger an example of the second image sensor
  • the lighting parameters around the driver can be used to perform light effect rendering on the original virtual image, and then the virtual image can be projected on the passenger seat.
  • the lighting parameters around the rear passengers can be used to perform light effect rendering on the original virtual image, and then the virtual image can be projected on the rear seats.
  • the area captured by the image frame can be adjusted according to the projection area.
  • the area captured by the image frame is as close as possible to the projection area, which is conducive to obtaining the illumination closest to the illumination situation of the projection area. Parameters, improve the effect of light effect rendering, make the virtual image more harmonious with the surrounding environment, and make the visual effect more realistic.
  • the area captured by the image frame is related to the projection area, which does not mean that the area captured by the image frame must be different when the projection area is different.
  • the areas captured by the image frames corresponding to the multiple projection areas may also be the same.
  • the areas captured by the image frames corresponding to the two projection areas may also be the same.
  • the difference between the light intensity of the two projection areas is usually small, so that when the distance between the two projection areas is small, the two projection areas correspond to The areas captured by the image frames may also be the same.
  • the image sensor when the distance between the third area and the fourth area is greater than or equal to the second threshold, when the projection area is the third area in the vehicle cockpit, the image sensor is controlled to collect image frames at the first angle; When the projection area is the fourth area in the vehicle cockpit, the image sensor is controlled to collect image frames at the second angle.
  • the first image sensor is controlled to collect image frames; when the projection area is In the fourth area in the cockpit of the vehicle, the second image sensor is controlled to collect image frames.
  • the first threshold and the second threshold may be the same or different.
  • the first threshold may be 1 meter
  • the second threshold may be 1.3 m.
  • control the image sensor to capture different areas for example, make the area captured by the image sensor as close as possible to the projection area, which is beneficial to obtain the illumination parameters closest to the illumination conditions of the projection area, improve the effect of light effect rendering, and make the virtual The image is more coordinated with the surrounding environment, and the visual effect is more realistic.
  • the method 300 further includes: adjusting the projection area (not shown in the figure) in response to a user operation.
  • the projection area is area A in the cockpit, and after the user's operation, the projection area is area B in the cockpit. Area A and area B are different areas.
  • the user operation may issue a voice command for the user, and the voice command may be used to instruct to adjust the projection area.
  • the projection area is adjusted.
  • the current avatar is projected on the center console
  • the voice instruction may be "project the avatar on the passenger seat”.
  • the control projects an avatar on the passenger seat.
  • the user operation may be a voice instruction for the user. After the location of the user who issued the voice command is detected, the projection area is adjusted according to the location of the user.
  • the projected area is related to the location of the user who issued the voice command.
  • the voice instruction may be an instruction not related to adjusting the projection area.
  • the current avatar is projected on the passenger seat, and when it is detected that the user who is currently issuing a voice command is located in the rear seat, in response to the voice command, the control projection area is transferred from the passenger seat to the rear seat facing user. area.
  • the user operation may also be other actions that affect the projection area.
  • the current avatar is projected on the passenger seat, and when it is detected that the door on the side of the passenger seat is opened, in response to this operation, the control projection area is transferred from the passenger seat to the center console.
  • the current avatar is projected on the center console, and when it is detected that the user on the passenger seat leaves, in response to this operation, the control projection area is transferred from the center console to the passenger seat.
  • the current avatar is projected on the passenger seat, and when it is detected that a user appears on the rear seat, in response to this operation, the control projection area is transferred from the center console to the rear seat.
  • different projection areas can project different avatars.
  • the avatar can also change.
  • the size of the avatar may change.
  • the size of the avatar projected on the seat is larger than that projected on the center console.
  • the type of avatar may change when the projection area changes.
  • the type of the virtual image projected on the seat is a virtual partner
  • the virtual image projected on the center console is a virtual cartoon character, a virtual prop, or a virtual pet.
  • an avatar selection interface can also be provided for the user, and the avatar in the projection area can be set according to the user's selection. In this way, a projection image that meets user needs can be projected in the projection area to improve user experience.
  • the avatars displayed in the projection area are all avatars rendered with light effects.
  • the image sensor when the first environment (for example, in the cockpit of the vehicle or the environment where the vehicle is located) is in the first lighting condition, the image sensor is controlled to collect images at the first frequency; when the first environment is in the second lighting condition, the image sensor is controlled to The sensor collects images at a second frequency, the change frequency of the light intensity under the first light condition is greater than the change frequency of the light intensity under the second light condition, and the first frequency is greater than the second frequency.
  • the image acquisition frequency when the light field condition is relatively stable is smaller than the image acquisition frequency when the light field condition changes drastically.
  • images can be collected at a higher frequency, for example, 10 images are collected every 2 seconds; when the change frequency of the light intensity in the first environment is small, the image can be collected at a Acquire images less frequently, for example, 1 image every 2 seconds.
  • the collection frequency here is only an example, and other collection frequencies may also be set, as long as the first frequency is greater than the second frequency.
  • Light field information refers to the outgoing light intensity information of each single point in space.
  • the image acquisition frequency can be set as required.
  • the maximum image acquisition frequency can be set to 30 times per second, and the minimum image acquisition frequency is once every 15 seconds. As the light intensity changes, the image acquisition frequency can be adjusted between minimum and maximum values.
  • the external light source when the vehicle is on an unobstructed road, the external light source is relatively stable, and the light intensity in the vehicle cabin is relatively stable, that is, the change of the light intensity is small.
  • images may be collected at a lower frequency.
  • the external light source changes drastically, and the light intensity in the vehicle cabin changes drastically. In this case, images can be collected at a higher frequency.
  • the change frequency of the light intensity in the first environment is small, images can be collected at a small frequency.
  • the image collection frequency is reduced, the frequency of obtaining lighting parameters and light effect rendering will be correspondingly reduced. , reducing computing overhead and resource consumption as a whole.
  • the frequency of light intensity changes in the first environment is relatively high, and images can be collected at a higher frequency.
  • the lighting parameters can be obtained at a higher frequency, and light effect rendering can be performed, which is conducive to ensuring real-time light effect rendering In this way, even in the case of drastic changes in the light field, the coordination between the virtual image and the surrounding environment can still be guaranteed.
  • the collected image frames are processed at the third frequency; when the first environment is in the second lighting condition, the collected image frames are processed at the fourth frequency, and the first The change frequency of the light intensity under the light condition is greater than the change frequency of the light intensity under the second light condition, and the third frequency is greater than the fourth frequency.
  • the image can be processed at a higher frequency, for example, the image is processed 10 times every 2 seconds; when the change frequency of the light intensity in the first environment is small, the image can be processed with Process images less frequently, for example, 1 image every 2 seconds.
  • the processing frequency here is only an example, and other processing frequencies may also be set, as long as the third frequency is greater than the fourth frequency. That is to say, in the case of different changing frequencies of the light intensity, the number of times of image processing may be different even if the image acquisition frequency is the same.
  • 20 images are collected every 2 seconds, and currently 10 images are processed every 2 seconds, or it can be understood that 10 frames of the 20 frames of images are processed every 2 seconds to obtain lighting parameters.
  • the frequency of changes in light intensity decreases
  • to process an image every 2 seconds that is, the image acquisition frequency can remain unchanged, but reduce the number of image processing, or reduce the number of processed images; when the frequency of light intensity changes increases, every 2 seconds
  • the image is processed 5 times, that is, the frequency of image acquisition can remain unchanged, and the number of times of image processing is increased, or in other words, the number of processed images is increased.
  • the change frequency of the light intensity in the first environment When the change frequency of the light intensity in the first environment is small, the frequency of obtaining light parameters and the frequency of light effect rendering will also be reduced accordingly, reducing computing overhead and resource consumption as a whole.
  • the change frequency of the light intensity in the first environment is relatively high, obtaining the light parameters at a higher frequency and performing light effect rendering is beneficial to ensure the real-time performance of the light effect rendering. In this way, even when the light source changes drastically, The coordination between the virtual image and the surrounding environment can still be guaranteed.
  • the change frequency of the light intensity in the first environment is relatively high, light effect rendering can be performed on the avatar at a higher frequency, for example, rendering 10 times with different lighting parameters every 2 seconds, when the light intensity in the first environment
  • the change frequency of the intensity is small, the light effect rendering may be performed on the avatar at a low frequency, for example, once every 2 seconds.
  • the rendering frequency here is only an example, and other rendering frequencies may also be set, as long as the fifth frequency is greater than the sixth frequency. That is to say, in the case of different changing frequencies of light intensity, even if the image acquisition frequency and processing frequency are the same, the times of light effect rendering may be different.
  • the 10 frames of images are processed to obtain the lighting parameters of the 10 frames of images.
  • the avatar is rendered every 2 seconds, or , using one group of lighting parameters obtained from the 10 frames of images to perform light effect rendering on the virtual image, that is, the image acquisition frequency and processing frequency can remain unchanged, but the number of light effect rendering times can be reduced; when the light intensity
  • the virtual image is rendered 5 times every 2 seconds, that is, the frequency of image acquisition and processing can remain unchanged, and the light effect rendering is increased here.
  • the frequency of light effect rendering will be reduced accordingly, which can reduce the calculation overhead required for rendering and reduce resource consumption.
  • the frequency of light intensity changes in the first environment is relatively high, rendering light effects at a higher frequency is beneficial to ensure the real-time performance of light effect rendering. In this way, even when the light source changes drastically, the avatar can still be guaranteed Harmony with the surrounding environment.
  • the changing frequency of the light intensity can be determined by the vehicle end, or by the server.
  • the car end can obtain the change frequency of the light intensity locally.
  • the vehicle end may also obtain the frequency of change of the light intensity from the server.
  • the change frequency of the light intensity may be determined according to the comparison of light parameters used in two consecutive light effect renderings.
  • the current lighting parameter is obtained, and the lighting parameter is compared with the lighting parameter obtained last time, so as to determine the change frequency of the current lighting intensity.
  • the change frequency of the light intensity can be determined by a light sensor. That is, the change frequency of the light intensity is determined by the change of the intensity of the electrical signal output by the light sensor.
  • the changing frequency of the illumination intensity may be determined by the illumination intensity in different regions of the single image frame.
  • the frequency of change of the illumination intensity is relatively large.
  • the difference between the light intensities of different regions in a single image frame is small, the frequency of change of the light intensity is small.
  • part of the captured single image frame is the image of the part of the cockpit that has not entered the tunnel.
  • the light intensity of this part of the area is relatively small, and the difference between the light intensities of these two parts of the area is relatively large, which can reflect that the frequency of change of the current light intensity is relatively large.
  • the change frequency of the illumination intensity may be determined by the illumination intensity of multiple image frames.
  • the multiple image frames may be continuously collected multiple image frames, or the multiple image frames may also be discontinuous multiple image frames.
  • the difference between the light intensity of the multi-frame image frames is large, if the interval between the acquisition moments of the multi-frame image frames is small, the change frequency of the light intensity within the acquisition period of the multi-frame image frames is relatively large .
  • the difference between the illumination intensities of the multiple image frames is small, if the interval between the acquisition moments of the multiple image frames is relatively small, the frequency of change of the illumination intensity within the acquisition period of the multiple image frames is relatively small.
  • the intervals between the acquisition moments of the multiple image frames are relatively small, and part of the image frames in the multiple image frames are imaging when the vehicle does not enter the tunnel.
  • the light intensity of this part of the image frame is relatively large, and some of the image frames are the imaging of the vehicle driving into the tunnel.
  • the frequency of changes reflecting the current light intensity is relatively large.
  • the virtual image processing method is described above by taking the cockpit of the vehicle or the environment where the vehicle is located as the first environment as an example.
  • This method can also be used in other virtual reality application scenarios, such as avatars in live broadcast of film and television or movie special effects, such as 3D props, avatars, etc., and can be applied.
  • adding 3D props and special effects to the video stream has the requirement of obtaining the light field information of the characters in the video stream in real time and with low energy consumption.
  • the above method can also be used in this scenario.
  • the first environment is The environment of the characters in the video stream.
  • the face image is captured, and the captured face image is input into the illumination parameter model to predict the illumination parameters in the real-time scene, and then according to the predicted
  • the obtained lighting parameters perform light effect rendering on the 3D props required in the live broadcast, and complete the special effect task of realistic 3D props in the live video stream.
  • Fig. 4 shows a schematic flowchart of a method for training an illumination parameter model provided by an embodiment of the present application.
  • the lighting parameter model can be used to determine lighting parameters in method 300 .
  • the method shown in FIG. 4 may be executed by a server, such as a cloud server.
  • the method may be performed by electronic equipment at the vehicle end, and the electronic equipment may specifically include one or more of on-board chips or on-board devices (eg, on-board computer, on-board computer).
  • the method 400 shown in FIG. 4 may also be executed by a cloud service device.
  • the method 400 shown in FIG. 4 may also be executed by a system composed of a cloud service device and an electronic device at the vehicle end.
  • the method 400 may be executed by the server 120 shown in FIG. 1 or the training device 220 shown in FIG. 2 .
  • the method 400 includes step S410 to step S420. Steps S410 to S420 will be described below.
  • the method 400 may be executed by the training device 220 shown in FIG. 2 , and the sample image frames may be data stored in the database 230 .
  • the source of data in the database 230 may be collected by the data collection device 260, or crowdsourcing may be adopted, in which multiple vehicles send the image frames collected during driving to the server, and the server stores them in the database.
  • the illumination parameter model may be a neural network model.
  • step S420 may include: training an illumination parameter model based on the sample image frame and the illumination parameter of the sample image frame.
  • the sample image frame is used as the input data of the initial model, and the illumination parameters of the sample image frame are used as the target output of the initial model to train the initial model to obtain a trained model, that is, the illumination parameter model.
  • the sample image frame is input into the initial model for processing, and the predicted illumination parameters of the sample image frame output by the initial model are obtained, with the goal of reducing the difference between the predicted illumination parameters of the sample image frame and the illumination parameters of the sample image frame Adjust the parameters of the initial model to obtain the trained model, that is, the illumination parameter model.
  • the illumination parameter of the sample image frame may be the illumination parameter obtained by labeling, which is used as the real illumination parameter of the sample image frame during the training process, or in other words, as the target output of the model during the training process.
  • the predicted lighting parameters of the sample image frame are the lighting parameters output by the model.
  • the lighting parameters may be spherical harmonic lighting parameters.
  • the lighting parameters may be spherical harmonic lighting parameters.
  • the illumination parameter of the sample image frame may be obtained by using a light metering device to measure the light field of the shooting environment of the sample image frame.
  • the illumination parameters of the sample image frame may also be calculated by calculation methods such as deoptimization.
  • the image frame is input into the illumination parameter model for processing, and the illumination parameter can be obtained.
  • the above training method completes the training through the illumination parameters of the sample image frames.
  • the illumination parameters obtained by using photometry equipment are relatively accurate, which is conducive to ensuring the training effect of the model.
  • Methods such as solution optimization require a large amount of calculation, have high requirements on the processor, and take a long time to calculate, but they can save the cost of photometry equipment.
  • step S420 may include: taking the sample image frame as the input image of the initial model, and training the initial model with the goal of reducing the difference between the reconstructed image and the sample image frame, so as to obtain a trained model, that is, the illumination parameter Model.
  • the reconstructed image is obtained by rendering based on the predicted illumination parameters in the sample image frame output by the initial model and the 3D model parameters of the target object in the sample image frame.
  • the predicted lighting parameters in the sample image frame are the lighting parameters output by the model.
  • Rendering based on the predicted lighting parameters in the sample image frame output by the initial model and the 3D model parameters of the target object in the sample image frame can also be understood as, according to the predicted lighting parameters in the sample image frame output by the initial model. 3D models for light effect rendering.
  • the sample image frame is used as the input data of the initial model
  • the initial model outputs the predicted lighting parameters in the sample image frame and the 3D model parameters of the target object in the sample image frame, based on the predicted lighting parameters in the sample image frame and the sample
  • the 3D model parameters of the target object in the image frame are rendered to obtain a reconstructed image.
  • the parameters of the initial model are adjusted with the goal of reducing the difference between the reconstructed image and the input image until the training is completed, and the trained model, that is, the illumination parameter model, is obtained.
  • Adjust the parameters of the initial model with the goal of reducing the difference between the reconstructed image and the input image that is, use the loss function to optimize the parameters of the initial model, so that the reconstructed image is constantly approaching the input image.
  • the loss function is used to indicate the difference between the reconstructed image and the input image. In other words, the loss function is built based on the difference between the reconstructed image and the input image.
  • the loss function may adopt an L1 norm loss function.
  • the L1 norm loss function can also be called the minimum absolute value deviation.
  • the L1 norm-based loss function measures the difference between the pixel values of the reconstructed image and the pixel values of the input image.
  • the loss function may also adopt other types of loss functions, for example, an L2 norm loss function. This embodiment of the present application does not limit it.
  • the sample image frame includes a face image
  • the target object includes a head.
  • the face area may also be referred to as a face image.
  • the illumination parameter model trained in the above manner can be used to determine illumination parameters.
  • the lighting parameter model can be used to determine the lighting parameters of the face area in the image frame.
  • the illumination parameter model is trained based on a data set of face images, that is, the sample image frame is a face image.
  • the illumination parameter model may be a neural network model.
  • the face image is input into the neural network model for processing, and two sets of parameters are output, namely a set of spherical harmonic illumination parameters and a set of 3D head model parameters. 3D differentiable rendering is performed based on these two sets of parameters to obtain a 3D reconstructed head image, that is, a reconstructed image.
  • the image frame is input into the trained illumination parameter model for processing, and the illumination parameters of the face area in the image frame can be output. Furthermore, the avatar can be rendered based on the lighting parameters.
  • the lighting parameter model outputs the lighting parameters, and the lighting parameters are used to perform light effect rendering on the head model to obtain the rendered head model. It can be seen from FIG. 6 that the right side of the human face in the image frame has strong illumination and the left side has weak illumination, and the rendered human head has strong right illumination and weak left illumination. It should be understood that, in FIG. 6 , only the human head model is taken as an example to illustrate the light effect rendering effect, which does not limit the embodiment of the present application.
  • the real illumination parameters of the sample image frames are not needed during the training process, given any data set, for example, a data set of human face images, the illumination parameter model can be trained. Specifically, by establishing a loss function based on the difference between the reconstructed image obtained after rendering and the input image, the illumination parameter model can be trained, and the requirements for training samples are relatively low.
  • Fig. 7 shows a schematic flowchart of a virtual image processing method provided by an embodiment of the present application.
  • the method 700 shown in FIG. 7 may be regarded as a specific implementation manner of the method 300 shown in FIG. 3 .
  • the method shown in FIG. 7 can be executed by the vehicle end, for example, by one or more of the vehicle, the vehicle-mounted chip, or the vehicle-mounted device (eg, vehicle machine, vehicle computer). Alternatively, the method shown in FIG. 7 may also be executed by a cloud service device. Alternatively, the method shown in FIG. 7 may also be executed by a system composed of a cloud service device and an electronic device at the vehicle end. In the embodiment of the present application, the electronic device at the car end may also be called a car end device, or a car end; the cloud service device may also be called a cloud server, a cloud device, or a cloud.
  • the method shown in FIG. 7 can be applied to the system 100 in FIG. 1 and executed by the vehicle 110 or the on-board chip or on-board device on the vehicle 110 .
  • the face area in the image frame is used as the target area.
  • the face image may be an RGB image.
  • the face image may be collected by an image sensor.
  • the image sensor when the first environment is in the first lighting condition, the image sensor is controlled to collect images at the first frequency; when the first environment is in the second lighting condition, the image sensor is controlled to collect images at the second frequency, and the first lighting condition
  • the change frequency of the light intensity under the second light condition is greater than the change frequency of the light intensity under the second light condition, and the first frequency is greater than the second frequency.
  • the image sensor can be a CMS camera, which can collect images in the cockpit in real time.
  • the CMS camera can be set at the position of the rearview mirror in the cockpit, so that its viewing angle can cover the main driver's seat, the passenger seat and the rear seats.
  • the CMS camera Use a lower frequency to collect images when the light field information is relatively stable, for example, the CMS camera captures an image every 2 seconds.
  • the range of the face area in the image collected by the image sensor may be small.
  • the range of the face area may be small.
  • the image collected by the image sensor may be processed to obtain a face image. In this way, the illumination parameters of the face area can be obtained more accurately.
  • the range of the driver's face area is small, and the face in the image collected by the CMS camera can be detected and captured by the face detection method Segmentation to obtain the driver's face image, that is, the area in the rectangular frame of FIG. 9 .
  • the illumination parameter model may be a trained neural network model.
  • the illumination parameter is a spherical harmonic illumination parameter.
  • the spherical harmonic order is set to 3 for an RGB image, there are 27 illumination parameters output by the illumination parameter model.
  • the original virtual image can be obtained by performing 3D rendering through its 3D mesh information and texture information.
  • the method 700 may be executed in real time, that is, light effect rendering is performed on the original avatar in real time.
  • the rendered avatar is projected onto a projection area, for example, onto the passenger seat.
  • step S720 can be executed on the server, and the vehicle sends the acquired face image to the server, and the server inputs the face image into the illumination parameter model to predict the illumination parameter, and sends the illumination parameter to the vehicle.
  • FIG. 10 shows a schematic diagram of a model rendered with multiple sets of light effects.
  • the face image in FIG. 10 may be an image from the area in the rectangular box in FIG. 9 .
  • Lighting parameters can be used to perform light effect rendering on different objects.
  • light effect rendering is performed on the ball, teapot and rabbit models based on the light parameters of the face image, so that the light effect of the rendered model The effect is close to or consistent with the light field environment where the face is located.
  • the right side and left side of the face in (a) of Figure 10 are strongly illuminated, and the light effect rendering is performed on each model based on the lighting parameters obtained from the face image, and the right side and left side of the model after light effect rendering Side light is stronger.
  • the upper right of the face has strong illumination
  • the light effect rendering is performed on each model based on the illumination parameters obtained from the face image, and the upper right of the model after the light effect rendering is The light is strong.
  • the illumination on the upper right of the face in (b) in Figure 10 is stronger than the illumination on the upper right of the face in (c) in Figure 10, and accordingly, the upper right of the rendered model in (b) in Figure 10
  • the lighting of the side is stronger than the lighting of the upper right of the rendered model in (c) of FIG. 10 .
  • FIG. 10 is only an example of light effect rendering, and does not limit objects of light effect rendering in this embodiment of the present application.
  • the device of the embodiment of the present application will be described below with reference to FIG. 11 to FIG. 13 . It should be understood that the device described below can execute the method of the aforementioned embodiment of the present application. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the device of the embodiment of the present application below.
  • Fig. 11 is a schematic block diagram of an avatar processing device according to an embodiment of the present application.
  • the avatar processing device 3000 shown in FIG. 11 includes a first acquisition unit 3010, a second acquisition unit 3020 and a processing unit 3030.
  • the apparatus 3000 can be used to execute the avatar processing method of the embodiment of the present application, specifically, can be used to execute the method 300 or the method 700 .
  • the first obtaining unit 3010 may perform the above step S310
  • the second obtaining unit 3020 may perform the above step S320
  • the processing unit 3030 may perform the above step S330.
  • the first obtaining unit 3010 may perform the above step S710
  • the second obtaining unit 3020 may perform the above step S720
  • the processing unit 3030 may perform the above step S730.
  • the first acquiring unit 3010 is configured to acquire image frames of a first environment
  • the first environment includes a cockpit of a vehicle or an environment where the vehicle is located
  • the image frames are used to determine illumination parameters of the first environment.
  • the second acquiring unit 3020 may be configured to acquire the illumination parameters.
  • the processing unit 3030 is used to acquire the avatar based on the lighting parameters.
  • the illumination parameter may be a parameter of a differentiable illumination model.
  • the lighting parameter may be a spherical harmonic lighting parameter.
  • the image frame includes a face image, and the face image is used to determine an illumination parameter of the first environment.
  • the image frame may include an image of the user's head in the cockpit of the vehicle, and the image of the user's head is used to determine the lighting parameter.
  • the image frame is obtained from the image sensor
  • the device 3000 further includes a control unit 3040 configured to: when the first environment is in the first lighting condition, control the image sensor to collect images at the first frequency; when the first environment is in the second In light conditions, the image sensor is controlled to collect images at a second frequency, the change frequency of the light intensity under the first light condition is greater than the change frequency of the light intensity under the second light condition, and the first frequency is greater than the second frequency.
  • a control unit 3040 configured to: when the first environment is in the first lighting condition, control the image sensor to collect images at the first frequency; when the first environment is in the second In light conditions, the image sensor is controlled to collect images at a second frequency, the change frequency of the light intensity under the first light condition is greater than the change frequency of the light intensity under the second light condition, and the first frequency is greater than the second frequency.
  • control unit 3040 is further configured to: control the projection of the avatar in the projection area.
  • the projected area includes an area within the vehicle's cabin.
  • the illumination parameters are determined based on the first target area in the image frame; when the projection area is the second area in the vehicle cockpit, the illumination parameters are determined based on the image The second target area in the frame is determined.
  • the apparatus 3000 further includes a sending unit, configured to send the image frame to the server; a second obtaining unit 3020, specifically configured to obtain the illumination parameter from the server.
  • the second acquiring unit 3020 is specifically configured to input the image frame into the illumination parameter model to obtain the illumination parameter.
  • the apparatus 3000 includes an adjustment unit, configured to adjust the projection area in response to user operations.
  • the user operation may issue a voice command for the user, and the voice command may be used to instruct to adjust the projection area.
  • the projection area is adjusted.
  • the current avatar is projected on the center console
  • the voice instruction may be "project the avatar on the passenger seat”.
  • the control projects an avatar on the passenger seat.
  • the user operation may be a voice instruction for the user. After the location of the user who issued the voice command is detected, the projection area is adjusted according to the location of the user.
  • the projected area is related to the location of the user who issued the voice command.
  • the voice instruction may be an instruction not related to adjusting the projection area.
  • the current avatar is projected on the passenger seat, and when it is detected that the user who is currently issuing a voice command is located in the rear seat, in response to the voice command, the control projection area is transferred from the passenger seat to the rear seat facing user. area.
  • the user operation may also be other actions that affect the projection area.
  • the current avatar is projected on the passenger seat, and when it is detected that the door on the side of the passenger seat is opened, in response to this operation, the control projection area is transferred from the passenger seat to the center console.
  • Fig. 12 is a schematic block diagram of an avatar processing device according to an embodiment of the present application.
  • the avatar processing device 4000 shown in FIG. 12 includes an acquisition unit 4010 , a processing unit 4020 and a sending unit 4030 .
  • the acquiring unit 4010 and the processing unit 3020 may be used to execute the avatar processing method of the embodiment of the present application, specifically, may be used to execute the method 300 or the method 700 .
  • the acquisition unit 4010 is configured to acquire an image frame of a first environment, the first environment includes the cockpit of the vehicle or the environment in which the vehicle is located, and the image frame is used to determine the lighting parameters of the first environment;
  • a processing unit 4020 configured to determine an illumination parameter based on the image frame
  • the sending unit 4030 sends lighting parameters to the vehicle, and the lighting parameters are used for avatar processing.
  • the processing unit 4020 is specifically configured to: input the image frame into the illumination parameter model to obtain the illumination parameter, and the illumination parameter model is obtained through training based on the sample image frame.
  • the illumination parameter model is obtained by training the initial model with the sample image frame as input data, aiming at reducing the difference between the reconstructed image and the sample image frame, and the reconstructed image is obtained from the sample image frame output by the initial model
  • the illumination parameters and the 3D model parameters of the target object in the sample image frame are obtained by rendering.
  • the image frame includes a face image, and the face image is used to determine an illumination parameter of the first environment.
  • processing device 3000 and device 4000 are embodied in the form of functional units.
  • the term "unit” here may be implemented in the form of software and/or hardware, which is not specifically limited.
  • a "unit” may be a software program, a hardware circuit or a combination of both to realize the above functions.
  • the hardware circuitry may include application specific integrated circuits (ASICs), electronic circuits, processors (such as shared processors, dedicated processors, or group processors) for executing one or more software or firmware programs. etc.) and memory, incorporating logic, and/or other suitable components to support the described functionality.
  • ASICs application specific integrated circuits
  • processors such as shared processors, dedicated processors, or group processors for executing one or more software or firmware programs. etc.
  • memory incorporating logic, and/or other suitable components to support the described functionality.
  • the units of each example described in the embodiments of the present application can be realized by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
  • Fig. 13 is a schematic diagram of the hardware structure of the device provided by the embodiment of the present application.
  • the apparatus 5000 shown in FIG. 13 includes a memory 5001 , a processor 5002 , a communication interface 5003 and a bus 5004 .
  • the memory 5001 , the processor 5002 , and the communication interface 5003 are connected to each other through a bus 5004 .
  • the memory 5001 may be a read only memory (read only memory, ROM), a static storage device, a dynamic storage device or a random access memory (random access memory, RAM).
  • the memory 5001 may store programs, and when the programs stored in the memory 5001 are executed by the processor 5002, the processor 5002 is used to execute various steps of the avatar processing method of the embodiment of the present application.
  • the processor 5002 may execute the method 300 shown in FIG. 3 or the method 700 shown in FIG. 7 above.
  • the processor 5002 is configured to execute each step of the method for training an illumination parameter model in the embodiment of the present application.
  • the processor 5002 may execute the method 400 shown in FIG. 4 .
  • the processor 5002 may adopt a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an ASIC, a graphics processing unit (graphics processing unit, GPU) or one or more integrated circuits for executing related programs to realize The virtual image processing method or the training method of the illumination parameter model in the method embodiment of the present application.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • GPU graphics processing unit
  • integrated circuits for executing related programs to realize The virtual image processing method or the training method of the illumination parameter model in the method embodiment of the present application.
  • the processor 5002 may also be an integrated circuit chip with signal processing capabilities. During implementation, each step of the avatar processing method of the present application may be completed by an integrated logic circuit of hardware in the processor 5002 or instructions in the form of software.
  • processor 5002 can also be general-purpose processor, digital signal processor (digital signal processing, DSP), ASIC, off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC off-the-shelf programmable gate array
  • FPGA field programmable gate array
  • Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory 5001, and the processor 5002 reads the information in the memory 5001, and combines its hardware to complete the functions required by the units included in the processing device shown in Figure 11 or Figure 12, or execute the method embodiment of the present application A virtual image processing method or a training method of an illumination parameter model.
  • the communication interface 5003 implements communication between the apparatus 5000 and other devices or communication networks by using a transceiver device such as but not limited to a transceiver. For example, a face image can be obtained through the communication interface 5003 .
  • the bus 5004 may include a pathway for transferring information between various components of the device 5000 (eg, memory 5001, processor 5002, communication interface 5003).
  • the embodiment of the present application also provides a computer-readable medium, the computer-readable medium stores the program code for device execution, and the program code includes the training method for executing the avatar processing method or the lighting parameter model in the embodiment of the present application. method.
  • the embodiment of the present application also provides a computer program product containing instructions.
  • the computer program product When the computer program product is run on a computer, the computer executes the avatar processing method or the lighting parameter model training method in the embodiment of the present application.
  • the embodiment of the present application also provides a chip, the chip includes a processor and a data interface, the processor reads the instructions stored in the memory through the data interface, and executes the avatar processing method or the illumination parameter model in the embodiment of the present application. training method.
  • the chip may further include a memory, the memory stores instructions, the processor is used to execute the instructions stored in the memory, and when the instructions are executed, the processor is used to execute the instructions of the present application.
  • An embodiment of the present application further provides a system, the system includes the avatar processing device in the embodiment of the present application and an image sensor, wherein the image sensor is used to collect image frames of the first environment.
  • the system may be a vehicle.
  • the system may be an onboard system.
  • the in-vehicle system includes a server and electronic equipment on the vehicle side.
  • the electronic equipment at the vehicle end may be any one of a vehicle-mounted chip or a vehicle-mounted device (such as a vehicle machine, a vehicle-mounted computer), and the like.
  • processor in the embodiment of the present application may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, DSP, ASIC, FPGA or other programmable logic devices, discrete gates or Transistor logic devices, discrete hardware components, and more.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory in the embodiments of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories.
  • the non-volatile memory can be ROM, programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM) , EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which acts as external cache memory.
  • RAM random access memory
  • static random access memory static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory Access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • serial link DRAM SLDRAM
  • direct memory bus random access memory direct rambus RAM, DR RAM
  • the above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations.
  • the above-described embodiments may be implemented in whole or in part in the form of computer program products.
  • the computer program product comprises one or more computer instructions or computer programs.
  • the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center that includes one or more sets of available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media.
  • the semiconductor medium may be a solid state drive.
  • At least one means one or more, and “multiple” means two or more.
  • At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • at least one item (piece) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .
  • sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: various media capable of storing program codes such as U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk.

Abstract

本申请实施例提供了虚拟现实技术领域中的一种虚拟形象处理方法及装置,该方法包括,获取第一环境的图像帧,第一环境包括车辆的座舱或车辆所处的环境,图像帧用于确定第一环境的光照参数,获取光照参数,基于光照参数获取虚拟形象。本申请实施例的方法能够使虚拟形象与周围环境更协调,在视觉效果上更真实,提高用户体验。

Description

虚拟形象处理方法及装置 技术领域
本申请实施例涉及虚拟现实技术领域,并且更具体地,涉及一种虚拟形象处理方法及装置。
背景技术
随着车辆智能化的发展,车辆所能带给用户的多媒体体验越来越丰富多彩,但是车辆在使用过程中通常处于行驶状态,环境变化比较复杂,为智能化体验带来了一定的挑战,因此如何提升复杂环境下,智能化服务体验逐渐成为研究的重要方向。
发明内容
本申请实施例提供一种虚拟形象处理方法及装置,能够使虚拟形象与周围环境更协调,在视觉效果上更真实,提高用户体验。
第一方面,提供了一种虚拟形象处理方法,包括:获取第一环境的图像帧,第一环境包括车辆的座舱或车辆所处的环境,该图像帧用于确定第一环境的光照参数;获取该光照参数;基于光照参数获取虚拟形象。
光照参数为可微分光照模型的参数,例如,光照参数可以为球谐光照参数。
在本申请实施例的方案中,基于光照参数获取虚拟形象,能够得到与图像帧所拍摄的场景(即第一环境)的光场接近或一致的光效渲染效果,使虚拟形象与周围环境更协调,在视觉效果上更真实,提高用户体验。
结合第一方面,在第一方面的某些实现方式中,图像帧包括人脸图像,人脸图像用于确定第一环境的光照参数。
本申请实施例的方案利用人脸图像确定光照参数,基于该光照参数获取虚拟形象,有利于提升虚拟形象的光效与人脸处的光场之间的协调性,得到更好的渲染效果。
示例性地,该图像帧可以包括车辆的座舱内用户的头部的图像,用户的头部的图像用于确定光照参数。
结合第一方面,在第一方面的某些实现方式中,图像帧从图像传感器获得,方法还包括:当第一环境处于第一光照条件时,控制图像传感器以第一频率采集图像;当第一环境处于第二光照条件时,控制图像传感器以第二频率采集图像,第一光照条件下的光照强度的变化频率大于第二光照条件下的光照强度的变化频率,第一频率大于第二频率。
当第一环境中的光照强度的变化频率较小时,可以以较小的频率采集图像,当图像的采集频率减少时,相应地,光照参数的获取频率以及光效渲染的频率也会随之降低,从总体上减少计算开销,降低资源消耗。第一环境中的光照强度的变化频率较大,可以以较高的频率采集图像,在该情况下,能够以更高频率获取光照参数,并进行光效渲染,有利于保证光效渲染的实时性,这样,即使在光场情况变化剧烈的情况下,仍能保证虚拟形象与 周围环境的协调性。
结合第一方面,在第一方面的某些实现方式中,方法还包括:控制在投影区域投影虚拟形象。
结合第一方面,在第一方面的某些实现方式中,投影区域包括车辆的座舱内的区域。
结合第一方面,在第一方面的某些实现方式中,在投影区域为车辆座舱内的第一区域时,光照参数是基于图像帧中的第一目标区域确定的;在投影区域为车辆座舱内的第二区域时,光照参数是基于图像帧中的第二目标区域确定的。
在本申请实施例的方案中,当投影区域发生变化时,利用图像帧中的不同的目标区域确定光照参数,即图像帧中的目标区域可以根据投影区域进行调整,例如,使图像帧中的目标区域所展示的区域与投影区域尽量接近,有利于得到与投影区域的光照情况最接近的光照参数,提高光效渲染的效果,使虚拟形象与周围环境更协调,在视觉效果上更真实。
结合第一方面,在第一方面的某些实现方式中,方法还包括:将图像帧发送给服务器;获取光照参数,包括:从服务器获取光照参数。
结合第一方面,在第一方面的某些实现方式中,获取所述光照参数,包括:将图像帧输入至光照参数模型,获得光照参数。
通过光照参数模型确定光照参数,无需对图像帧进行复杂的预处理,降低了整体运算过程的复杂度,对计算资源的需求较低,计算速度快,能够应用于对实时性要求较高的场景中。
结合第一方面,在第一方面的某些实现方式中,方法还包括:响应于用户操作,调整投影区域。
示例性地,用户操作可以为用户发出语音指令,该语音指令可以用于指示调整投影区域。响应于该语音指令,调整投影区域。
例如,当前的虚拟形象投影在中控台上,该语音指令可以为“在副驾驶座位投影虚拟形象”。响应于该语音指令,控制在副驾驶座位上投影虚拟形象。
示例性地,用户操作可以为用户发出语音指令。在检测到发出语音指令的用户的位置后,根据该用户的位置调整投影区域。
换言之,投影区域与发出语音指令的用户的位置有关。在该情况下,该语音指令可以为与调整投影区域无关的指令。
例如,当前的虚拟形象投影在副驾驶座位上,当检测到当前发出语音指令的用户位于后排座位上时,响应于该语音指令,控制投影区域由副驾驶座位转移至面向后排座位用户的区域。
示例性地,用户操作还可以为其他影响投影区域的动作。
例如,当前的虚拟形象投影在副驾驶座位上,当检测到副驾驶座位侧的车门被打开时,响应于该操作,控制投影区域由副驾驶座位转移至中控台。
第二方面,提供了一种虚拟形象处理方法,包括:获取第一环境的图像帧,第一环境包括车辆的座舱或车辆所处的环境,图像帧用于确定第一环境的光照参数;基于图像帧确定光照参数;向车辆发送光照参数,光照参数用于虚拟形象的处理。
根据本申请实施例的方案,向车辆发送光照参数,使得车辆能够基于光照参数获取虚拟形象,得到与图像帧所拍摄的场景的光场接近或一致的光效渲染效果,使虚拟形象与周 围环境更协调,在视觉效果上更真实,提高用户体验。
结合第二方面,在第二方面的某些实现方式中,确定光照参数,包括:将图像帧输入至光照参数模型,以得到光照参数,光照参数模型是基于样本图像帧训练得到的。
通过光照参数模型确定光照参数,无需对图像帧进行复杂的预处理,降低了整体运算过程的复杂度,对计算资源的需求较低,计算速度快,能够应用于对实时性要求较高的场景中。
结合第二方面,在第二方面的某些实现方式中,光照参数模型是以样本图像帧作为输入数据,以减少重建图像和样本图像帧之间的差异为目标对初始模型进行训练得到的,重建图像是通过初始模型输出的样本图像帧中的预测光照参数和样本图像帧中的目标对象的三维模型参数进行渲染得到的。
根据本申请实施例的方案,在训练过程中无需样本图像帧的真实光照参数,给定任意数据集,例如,人脸图像的数据集,即可训练得到光照参数模型。具体地,通过渲染后得到的重建图像和输入图像之间的差异建立损失函数,即可训练得到光照参数模型,对训练样本的要求较低。
结合第二方面,在第二方面的某些实现方式中,图像帧包括人脸图像,人脸图像用于确定第一环境的光照参数。
第三方面,提供了一种光照参数模型的训练方法,包括:获取样本图像帧;基于样本图像帧训练得到光照参数模型。
结合第三方面,在第三方面的某些实现方式中,基于样本图像帧训练得到光照参数模型包括:以样本图像帧作为初始模型的输入图像,以减少重建图像和样本图像帧之间的差异为目标对初始模型进行训练,以得到训练好的模型,即光照参数模型。其中,重建图像是基于初始模型输出的样本图像帧中的预测光照参数和样本图像帧中的目标对象的三维模型参数进行渲染得到的。
根据本申请实施例的方案,在训练过程中无需样本图像帧的真实光照参数,给定任意数据集,例如,人脸图像的数据集,即可训练得到光照参数模型。具体地,通过渲染后得到的重建图像和输入图像之间的差异建立损失函数,即可训练得到光照参数模型,对训练样本的要求较低。
结合第三方面,在第三方面的某些实现方式中,样本图像帧包括人脸图像,目标对象包括头部。
第四方面,提供了一种虚拟形象处理装置,该装置包括用于执行上述第一方面的任意一种实现方式的方法的单元。
第五方面,提供了一种虚拟形象处理装置,该装置包括用于执行上述第二方面的任意一种实现方式的方法的单元。
第六方面,提供了一种光照参数模型的训练装置,该装置包括用于执行上述第三方面的任意一种实现方式的方法的单元。
应理解,在上述第一方面中对相关内容的扩展、限定、解释和说明也适用于第二方面、第三方面、第四方面、第五方面以及第六方面中相同的内容。
第七方面,提供了一种虚拟形象处理装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器 用于执行第一方面或第二方面的任意一种实现方式中的方法。
示例性地,该装置可以为服务器,例如,云端服务器。或者,该装置也可以为车端的电子设备,例如,车载装置或车载芯片等。或者,该装置也可以为车辆。
第八方面,提供一种光照参数模型的训练装置,该训练装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第三方面中的任意一种实现方式中的方法。该训练装置可以为电脑、主机或服务器等各类具备运算能力的设备。该训练装置还可以为芯片。
第九方面,提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行第一方面、第二方面或第三方面中的任意一种实现方式中的方法。
第十方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面、第二方面或第三方面中的任意一种实现方式中的方法。
第十一方面,提供一种系统,该系统包括第四方面、第五方面或第六方面中的任意一种实现方式中的装置和图像传感器,其中,图像传感器用于采集第一环境的图像帧。
示例性地,该系统可以为车辆。
或者,该系统可以为车载系统。车载系统包括服务器和车端的电子设备。该车端的电子设备可以为车载芯片或车载装置(例如车机、车载电脑)等中的任一项。
第十二方面,提供一种电子设备,该电子设备包括第四方面、第五方面或第六方面中的任意一种实现方式中的装置。
示例性地,电子设备具体可以为车辆、车载芯片或车载装置(例如车机、车载电脑)等中的任一项。
根据本申请实施例的方案,基于光照参数获取虚拟形象,能够得到与图像帧所拍摄的场景的光场接近或一致的光效渲染效果,使虚拟形象与周围环境更协调,在视觉效果上更真实,提高用户体验。此外,基于人脸图像确定光照参数,有利于提升虚拟形象的光效与人脸处的光场之间的协调性,得到更好的渲染效果。此外,利用训练得到的光照模型确定光照参数,无需对图像帧进行复杂的预处理,降低了整体运算过程的复杂度,对计算资源的需求较低,计算速度快,能够应用于对实时性要求较高的场景中。此外,当第一环境中的光照强度的变化频率较小时,可以以较小的频率采集图像,当图像的采集频率减少时,相应地,光照参数的获取频率以及光效渲染的频率也会随之降低,从总体上减少计算开销,降低资源消耗。第一环境中的光照强度的变化频率较大,可以以较高的频率采集图像,在该情况下,能够以更高频率获取光照参数,并进行光效渲染,有利于保证光效渲染的实时性,这样,即使在光场情况变化剧烈的情况下,仍能保证虚拟形象与周围环境的协调性。当投影区域发生变化时,图像帧中的目标区域可以根据投影区域进行调整,有利于得到与投影区域的光照情况最接近的光照参数,提高光效渲染的效果,使虚拟形象与周围环境更协调,在视觉效果上更真实。
附图说明
图1为本申请的一种系统架构的结构示意图;
图2为本申请的另一种系统架构的示意图;
图3为本申请的一种虚拟形象处理方法的示意性流程图;
图4为本申请的一种光照参数模型的训练方法的示意性流程图;
图5为本申请的一种光照参数模型的训练过程的示意图;
图6为本申请的光效渲染过程的示意图;
图7为本申请的另一种虚拟形象处理方法的示意性流程图;
图8为本申请的一种摄像头位置的示意图;
图9为本申请的一种场景示意图;
图10为本申请的一种光效渲染效果的示意图;
图11为本申请的一种虚拟形象处理装置的示意图;
图12为本申请的另一种虚拟形象处理装置的示意图;
图13为本申请的一种硬件装置的示意图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。
本申请实施例的方案可以应用在车辆的座舱内人机交互场景中,提升车辆的智能化服务体验。
在一种实现中,在座舱内可以通过增强现实设备投射出可交互的虚拟形象,提高座舱内人机交互的服务体验。虚拟形象可以为用户提供语音助手或行车伴侣等多种类型的服务。示例性地,虚拟形象可以实现语音助手的功能。或者说,该虚拟形象可以作为语音助手的可视化形象。
示例性地,虚拟形象可以为用户提供车机控制服务、娱乐交流服务等。例如,用户可以通过与虚拟形象的交互实现对车窗、车灯、车门、天窗以及空调等硬件的控制。再如,用户可以通过与虚拟形象交互实现播放音乐、播放电影、收发邮件等操作。
示例性地,虚拟形象可以作为行车伴侣,为用户提供陪伴服务。例如,在行车过程中,可以将虚拟伴侣等虚拟形象投射在副驾驶座上,为驾驶员提供聊天、唱歌、跳舞等陪伴服务,使驾驶员在行车过程中拥有更多乐趣,提高用户体验。或者,为驾驶员提供安全警示作用,降低疲劳驾驶带来的危险。
但车辆处于行驶状态,环境处于变化中,可能会导致虚拟形象与周围环境不够协调,从而影响用户体验。
本申请实施例的方案通过对车辆座舱的环境或车辆所处环境的图像帧的获取,来实时获取座舱或车辆环境的光照参数,利用该光照参数获取虚拟形象,例如对待投射的虚拟形象进行光效渲染,有利于提升虚拟形象的光效与周围环境的光场情况的一致性,提高虚拟形象与周围环境的协调性,使虚拟形象更真实,进而提高用户体验。
如图1所示,本申请实施例提供了一种系统100。该系统可以包括车辆110以及服务器120。服务器120可以以云端的形式为车辆110提供服务,服务器120与车辆110之间的通信链路是双向的,即服务器120可以向车辆110传输信息,车辆110也可以向服务器120传输信息。服务器120和车辆110之间的通信可以通过无线通信和/或有线通信的方式实现。例如,车辆110通过基站接入无线网络,服务器120可以通过基站向车辆110传输 信息,或者可以通过路侧设备向车辆110传输信息。服务器120和基站之间可以通过无线连接或者可以通过有线连接;路侧设备和服务器120之间可以通过无线连接或者可以通过有线连接;此外,路侧设备与服务器120之间可以通过基站通信。上述无线网络包括但不限于:2G蜂窝通信,例如全球移动通信系统(global system for mobile communication,GSM)、通用分组无线业务(general packet radio service,GPRS);3G蜂窝通信,例如宽带码分多址(wideband code division multiple access,WCDMA)、时分同步码分多址接入(time division-synchronous code division multiple access,TS-SCDMA)、码分多址接入(code division multiple access,CDMA);4G蜂窝通信,例如长期演进(long term evolution,LTE);5G蜂窝通信,或者其他演进的蜂窝通信技术。
在一种实现方式中,车辆110可以将摄像头采集的图像帧上传至服务器120,服务器120可以对该图像帧进行处理,获得光照参数,并向车辆110发送光照参数。例如,服务器120上可以部署光照参数模型,该光照参数模型是一种训练获得的神经网络模型。服务器120通过从车辆110获取待处理的图像帧,并利用光照参数模型对该图像帧进行处理,得到光照参数,并将光照参数发送给车辆110。车辆110在接收到光照参数后,可以基于光照参数获取虚拟形象,投影该虚拟形象。
在另一种实现方式中,车辆110可以对摄像头采集的图像帧进行处理,得到光照参数。例如,车辆110部署有光照参数模型,将从摄像头获取到的图像帧输入该光照参数模型,得到光照参数。车辆110在得到光照参数后,可以基于光照参数获取虚拟形象,投影虚拟形象。该光照参数模型是一种训练获得的神经网络模型,其训练可以在服务器120所在的云端实现,且车辆110可以从服务器120获取到光照参数模型的相关参数,以更新该光照参数模型,此外车辆110可以将采集到的图像帧作为训练样本传输至服务器120,如此服务器120可以从车辆110获取到更加丰富和符合实际环境的样本数据,从而提升光照参数模型的准确性,且不断完善和更新光照参数模型。
如图2所示,本申请实施例提供了一种用于模型训练的系统架构200,本申请实施例的光照参数模型可以通过该系统训练获得。在图2中,数据采集设备260用于采集训练数据。在本申请实施例中训练数据可以为样本图像帧。
在采集到训练数据之后,这些训练数据被存入数据库230,存入数据库230的数据可以是从数据采集设备260获取到的原始数据,或者是对原始数据进行处理后的数据。训练设备220基于数据库230中维护的训练数据训练得到目标模型/规则101。
示例性地,对原始数据进行处理可以包括目标图像获取,图像尺寸调整以及数据筛选等。
例如,可以通过以下过程对原始图像进行处理。
(1)对原始图像进行人脸检测,以获取人脸图像(目标图像的一例)。
(2)将人脸图像的尺寸调整为预设尺寸。
(3)基于调整后的人脸图像的像素进行筛选,以得到满足像素要求的人脸图像。
下面对训练设备220基于训练数据得到目标模型/规则101进行描述,训练设备220对输入的原始数据进行处理,将输出值与目标值进行对比,直到训练设备220输出的值与目标值的差值小于一定的阈值,从而完成目标模型/规则101的训练。
本申请实施例中的目标模型/规则101,即光照参数模型,具体可以为神经网络模型。 需要说明的是,在实际的应用中,所述数据库230中维护的训练数据不一定都来自于数据采集设备260的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备220也不一定完全基于数据库230维护的训练数据进行目标模型/规则101的训练,也有可能从其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。例如,数据库中存储的数据可以是以众包的方式从多个车辆处获得的数据,例如车辆采集的图像帧数据。后续将结合图5,描述一种光照参数模型的训练过程。
根据训练设备220训练得到的目标模型/规则101可以应用于不同的系统或设备中,如应用于图2所示的执行设备210,所述执行设备210可以是终端,如车载终端等,还可以是服务器等。在图2中,执行设备210可以配置输入/输出(input/output,I/O)接口212,用于与外部设备进行数据交互,数据采集设备240可以向执行设备210传输数据。示例性地,数据采集设备240可以为终端,执行设备210可以为服务器。示例性地,数据采集设备240可以为图像传感器,执行设备210可以为终端。
在执行设备210对输入数据进行预处理,或者在执行设备210的计算模块211执行计算等相关的处理过程中,执行设备210可以调用数据存储系统250中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统250中。
示例性地,数据采集设备240可以为终端,执行设备210可以为服务器,服务器可以将处理结果返回给终端。
训练设备220可以针对不同的目标或不同的任务,基于不同的训练数据生成相应的目标模型/规则101,该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
在图2中所示情况下,一种可能的实施方式中,数据采集设备240可以自动地执行设备210发送输入数据在模型训练的过程中,
图2仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图2中,数据存储系统250相对执行设备210是外部存储器,在其它情况下,也可以将数据存储系统250置于执行设备210中。
如图2所示,根据训练设备220训练得到目标模型/规则101,该目标模型/规则101在本申请实施例中可以是光照参数模型。
图3示出了本申请实施例提供的一种虚拟形象处理方法300。图3所示的方法可以由车端的电子设备来执行,电子设备具体可以包括车辆、车载芯片或车载装置(例如车机、车载电脑)等装置中的一个或多个。或者,图3所示的方法也可以由云服务设备执行。或者,图3所示的方法还可以是由云服务设备和车端的电子设备构成的系统执行。例如,一部分步骤由车端的电子设备执行,一部分步骤由云端的云服务设备来执行。本申请实施例中,车端的电子设备也可以称为车端设备,或车端;云服务设备也可以称为云端服务器、云端设备或云端。示例性地,图3所示的方法可以由图1中的车辆110或云端服务器120执行。或者,图3所示的方法也可以由图1的系统100执行。示例性地,S310至S330由云端执行,S340由车端执行。其中,云端设备可以将处理后得到的数据或根据上述数据确定的控制指令发送至车端的电子设备,由车端的电子设备控制投影。
以上方法300至少包括步骤S310至步骤S330。下面对步骤S310至步骤S330进行详细介绍。
S310,获取第一环境的图像帧,图像帧用于确定第一环境的光照参数。
在一种实现中,该方法应用于车辆,第一环境为车辆的座舱或车辆所处的环境,该图像帧可以是由图像传感器采集到的图像帧,即车端(即车辆)可以从图像传感器获取图像帧。
该图像传感器可以为座舱内的图像传感器,也可以为座舱外的图像传感器。示例性地,从功能上,座舱内的图像传感器可以包括座舱监控系统(cabin monitoring system,CMS)摄像头或驾驶员监控系统(driver monitoring system,DMS)摄像头。从属性上,座舱内的图像传感器可以包括彩色摄像头、深度摄像头或红外摄像头等摄像头中的一个或多个。应理解,此处仅为示例,还可以采用其他类型的图像传感器采集图像帧,本申请实施例对此不做限定。
示例性地,图像传感器可以设置于座舱内后视镜位置。这样,图像传感器的视角可以覆盖主驾驶座、副驾驶座以及后排座位等区域。
S320,获取第一环境的光照参数。
第一环境的光照参数可以理解为图像帧所拍摄的场景中的光照参数。或者说,该光照参数能够反映图像帧所拍摄的场景中的光照情况。光照情况也可以称为光场情况。光照参数用于虚拟形象的处理。
在一种实现中,第一环境为车辆座舱内环境或车辆所处环境。
光照参数指的是光照模型的参数。
在一种实现中,以点光源光照模型为例,光照参数可以为n light*6的矩阵,n light表示环境中点光源的数量,每个点光源由6个有理数表示;其中,前三个有理数表示光源的位置(light positions),后三个数表示光源红蓝绿三种颜色的强度(light intensities)。
在另一种实现中,以方向光源光照模型为例,光照参数可以为n light*6的矩阵,n light表示环境中方向光源的数量,每个方向光源由6个有理数表示;其中,前三个有理数表示光源的方向(light directions),后三个有理数表示光源红蓝绿三种颜色的强度(light intensities)。
在又一种实现中,光照参数为可微分光照模型的参数,示例性地,光照参数可以为球谐光照参数。
在本申请实施例对可微分光照模型的函数不做限制,为了方便理解,仅以一种球谐光照函数为例进行描述,其并非用于限定本申请。
示例性地,球谐光照函数的表达式可以满足如下公式:
Figure PCTCN2022074974-appb-000001
其中,l和m为球坐标下的拉普拉斯方程的本征值。n用于指示球谐基函数的阶数,即球谐阶数,m为球谐基函数的次数(degree)。
Figure PCTCN2022074974-appb-000002
为球谐基函数,
Figure PCTCN2022074974-appb-000003
为球谐基函数对应的球谐基方向的系数,球谐光照参数即为各球谐基方向的系数。在一种示例中,球谐阶数为3,即n为2,l可能取值为0,1,2;当l取0时,m的可能取值为0,当l为1时,m的可能取值为-1,0,1;当l取2时,m的可能取值为-2,-1,0,1,2。此时,球谐光照函数的表达式中共有9个球谐基函数和相应的9个球谐光照参数(或系数)。以确定三原色(red green blue,RGB)图像的球谐光照参数为例,RGB图像包括三个通道,每个通道 的球谐光照参数为9个,因此,对于RGB图像,有27个球谐光照参数,即待预测的球谐光照参数的数量为27。
在一种实现方式中,车端可以从本地获取光照参数。例如,车端可以自行确定该光照参数。此时,车端设置有光照参数模型,将获取的图像帧输入至该光照参数模型,即可获得光照参数。该光照参数模型可以预先设置于车端或者由服务器发送给车端。此外,该光照参数模型可以是在服务器端经过训练获得的,进一步的,服务器端可以经过训练进一步更新该光照参数模型,并将更新后的模型或模型参数发送给车端。
在另一种实现方式中,车端也可以从其他设备获取光照参数,即接收其他设备发送的光照参数。例如,车端将获取的图像帧发送给服务器,服务器从车端获取该图像帧,基于图像帧确定光照参数后,将光照参数发送至车端,即车端可以从服务器获取光照参数。此时,服务器端设置有光照参数模型,将从车端获取的图像帧输入至该光照参数模型,即可获得光照参数。该光照参数模型可以是在服务器端经过训练获得的,进一步的,服务器端可以经过训练进一步更新该光照参数模型,以提高光照参数的准确性,进而进一步提升虚拟形象的光效与周围环境的光场情况的一致性。
可选地,图像帧包括人脸图像。人脸图像用于确定第一环境的光照参数。
示例性地,虚拟形象为人物形象。
在一个图像帧中,不同区域的光照参数之间可能存在较大的差异。本申请实施例的方案利用人脸图像信息获得的光照参数对虚拟形象(例如,虚拟人物形象)进行光效渲染,有利于提升虚拟形象的光效与目标区域所展示的场景中的光场之间的协调性,得到更好的渲染效果。
可选地,图像帧包括目标区域,该图像帧的目标区域用于确定光照参数。在以上示例中,目标区域可以为人脸区域。
在该情况下,获得的光照参数可以反映图像帧所拍摄的人脸处的光照情况。
进一步地,在图像帧中包括多个人脸区域的情况下,可以选择该多个人脸区域中的至少一个作为目标区域。例如,目标区域可以为该多个人脸区域中与虚拟形象的投影区域最近的人脸区域,再如目标区域可以为图像帧中占比最多的人脸区域,再如,目标区域可以为图像帧中主驾区域中的人脸区域。
应理解,以上仅为示例,目标区域可以根据实际需要设置。
例如,该目标区域可以包括用户的头部区域。或者,目标区域也可以包括用户的上半身区域。或者,目标区域可以包括用户的手部区域。或者,目标区域可以包括用户的手臂区域。或者,目标区域还可以包括用户所在的区域。
再如,该图像帧可以为车辆所处环境的图像帧,该图像帧中的车外景物所在的区域也可以作为目标区域。更进一步地,可以将该图像帧中的与车辆之间的距离较近的车外景物所在的区域作为目标区域。再如,该图像帧可以为座舱内的图像帧,该图像帧中的中控台上的摆件所在的区域也可以作为目标区域。
图像帧中的目标区域可以包括图像帧中的部分区域,也可以包括图像帧中的全部区域。
还应理解,第一环境的光照参数也可以根据其他方式定义,例如,第一环境的光照参数可以为图像帧中的多个区域的光照参数的平均值。再如,第一环境的光照参数可以为图 像帧中的目标区域的光照参数。以上方式不对本申请实施例的方案构成限定。
在一种实现方式中,基于图像帧确定光照参数,可以包括,将该图像帧输入至光照参数模型中,以得到光照参数。该光照参数模型可以是基于样本图像帧训练得到的。
在训练完成后,该光照参数模型可以基于数据的更新或算法的更新,进行更新。
确定光照参数的处理过程可以由车端执行。车端可以获取该光照参数模型,例如,接收服务器发送的光照参数模型的参数。或者,车端可以存储有该光照参数模型。车端通过该光照参数模型对图像帧进行处理,得到光照参数。
或者,上述处理过程也可以由其他设备执行,例如,服务器通过该光照参数模型对图像帧进行处理,得到光照参数。其中,该光照参数模型可以部署于服务器,也可以部署于车端,还可以部署于其他服务器或终端上。在处理时获取该光照参数模型。
在第一环境的光照参数是根据图像帧中的目标区域确定的情况下,将图像帧输入至光照参数模型中,可以包括,将图像帧中的目标区域的图像信息输入至光照参数模型中。
以目标区域为人脸区域为例,可以通过人脸检测方法对图像帧中的人脸进行检测和抓取分割,得到人脸图像信息,进而将该人脸图像信息输入至光照参数模型中进行处理。
示例性地,光照参数模型可以为神经网络模型。光照参数模型可以是经过训练得到的基于神经网络搭建的模型,这里的神经网络可以是卷积神经网络(convolutional neuron network,CNN)、循环神经网络(recurrent neural network,RNN)、时间递归神经网络(long-short term memory,LSTM)、双向时间递归神经网络(bidirectional long-short term memory,BLSTM)、或深度卷积神经网络(deep convolutional neural networks,DCNN)等等。
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出例如可以如公式(1)所示:
Figure PCTCN2022074974-appb-000004
其中,s=1、2、……n,n为大于1的自然数,代表神经网络的层数,W s为x s的权重,又可以称为神经网络的参数或系数,x s为神经网络的输入,b为神经单元的偏置。f为神经单元的激活函数(activation functions),该激活函数用于对神经网络中的特征进行非线性变换,从而将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。可选的,在一些实现中,可以断开部分神经元之间的连接。
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是 如下线性关系表达式:
Figure PCTCN2022074974-appb-000005
其中,
Figure PCTCN2022074974-appb-000006
是输入向量,
Figure PCTCN2022074974-appb-000007
是输出向量,
Figure PCTCN2022074974-appb-000008
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2022074974-appb-000009
经过如此简单的操作得到输出向量。由于DNN层数多,系数W和偏移向量
Figure PCTCN2022074974-appb-000010
的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2022074974-appb-000011
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2022074974-appb-000012
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
卷积神经网络是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
该光照参数模型的训练方法将在后文中进行介绍。
通过光照参数模型确定光照参数,无需对图像帧进行复杂的预处理,降低了整体运算过程的复杂度,对计算资源的需求较低,计算速度快,能够应用于对实时性要求较高的场景中。
应理解,以上仅为光照参数的确定方式的一个示例。还可以采用其他方案确定光照参数。例如,可以通过解优化的方式确定光照参数。
在一种实现方式中,光照参数也可以理解为座舱内的光照参数。本申请实施例中还可以通过其他方式预测座舱内的光照参数。
示例性地,可以根据车辆所处的环境预测座舱内的光照参数。
例如,车辆前方即将驶入隧道,在该情况下,可以根据隧道内的光照参数预测座舱内的光照参数。隧道内的光照参数可以为预先保存的光照参数。比如,该预先保存的光照参数可以是通过光学设备在隧道中预先测量得到的,或者,也可以为其它车辆之前通过隧道时利用本申请的方案确定的光照参数。本申请实施例对此不做限定。
车辆所处的环境可以通过车辆上的传感器确定。例如,可以通过图像传感器采集的车外图像帧确定车辆前方即将驶入隧道。再如,可以通过雷达采集的数据确定车辆前方即将驶入隧道。应理解,此处仅为示例,车辆所处的环境还可以采用其他方式确定,本申请实施例对此不做限定。
示例性地,可以根据车辆所处的位置以及天气信息预测座舱内的光照参数。
换言之,根据车辆所述位置的天气情况预测座舱内的光照参数。例如,根据天气预报确定车辆行驶路径上将要经过的区域的天气情况,根据该天气情况预测座舱内的光照参数。
这样,可以提前预测座舱内的光照参数,有利于进行实时光效渲染。例如,车辆行驶至位置A时,获取座舱内的图像帧,在尚未计算出当前的光照参数的情况下,可以基于之前预测得到位置A处的座舱内的光照参数进行光效渲染,使虚拟形象与周围环境更协调,提高用户体验。
S330,基于该光照参数获取虚拟形象,即对原始虚拟形象进行光效渲染得到渲染后的虚拟形象。
基于该光照参数对原始虚拟形象进行光效渲染,指的是在渲染时采用了该光照参数。步骤S330也可以理解为基于光照参数和原始虚拟形象的参数进行渲染。
示例性地,该虚拟形象可以包括虚拟伴侣、虚拟道具、虚拟卡通人物或虚拟宠物等各种虚拟形象。
虚拟形象可以采用现有方案得到。例如,对于增强现实应用的虚拟形象,可以获取该虚拟形象的三维(three Dimension,3D)网格(mesh)信息和纹理信息,对3D mesh信息和纹理信息进行3D渲染即可得到3D的虚拟形象。在渲染时采用步骤S320得到的光照参数,即可得到利用该光照参数进行光效渲染后的虚拟形象。
根据本申请实施例的方案,基于环境的光照参数获取虚拟形象,例如,利用环境的光照参数对原始虚拟形象进行光效渲染,能够得到与图像帧所拍摄的场景(即该环境)的光场接近或一致的光效渲染效果,使虚拟形象与周围环境更协调,在视觉效果上更真实,提高用户体验。
S340,控制在投影区域投影该虚拟形象。
具体地,可以控制增强现实设备在投影区域投影该虚拟形象。
可选地,投影区域包括车辆的座舱内的区域。
图像帧中的目标区域可以与投影区域无关,也可以与投影区域相关。
在图像帧中的目标区域与投影区域无关的情况下,即图像帧中的目标区域与投影区域互不影响,对于不同的投影区域,目标区域均可以是相同的。
在图像帧中的目标区域与投影区域是相关的情况下,即将投影区域视为图像帧中的目标区域的一个影响因素。对于不同的投影区域,目标区域可以是不同的。或者可以理解为,图像帧中的目标区域可以是根据投影区域确定的。
可选地,在投影区域为车辆座舱内的第一区域时,目标区域为图像帧中的第一目标区域;在投影区域为车辆座舱内的第二区域时,目标区域为图像帧中的第二目标区域。
即在投影区域为车辆座舱内的第一区域时,光照参数是基于图像帧中的第一目标区域确定的;在投影区域为车辆座舱内的第二区域时,光照参数是基于图像帧中的第二目标区域确定的。
座舱内的第一区域和座舱内的第二区域为不同的区域。目标图像帧中的第一目标区域和目标图像帧中的第二目标区域是不同的区域。
例如,图像帧为车内图像帧,座舱内的第一区域可以为副驾驶座位,图像帧中的驾驶员的人脸区域(第一目标区域的一例)可以作为目标区域;座舱内的第二区域可以为中控 台,图像帧中的中控台上的摆件所在的区域(第二目标区域的一例)可以为第二目标区域。图像帧中的驾驶员的人脸区域作为目标区域时得到的光照参数即为驾驶员的人脸周围的光照参数,也就是说,当需要在副驾驶座位上投影虚拟形象时,可以利用驾驶员的人脸周围的光照参数对原始虚拟形象进行光效渲染,进而在副驾驶座位上投影该虚拟形象。图像帧中的中控台上的摆件所在的区域作为目标区域时得到的光照参数即为摆件周围的光照参数,也就是说,当需要在中控台上投影虚拟形象时,可以利用摆件周围的光照参数对原始虚拟形象进行光效渲染,进而在中控台上投影虚拟形象。
这样,当投影区域发生变化时,利用图像帧中的不同的目标区域确定光照参数,即图像帧中的目标区域可以根据投影区域进行调整,例如,使图像帧中的目标区域所展示的区域与投影区域尽量接近,有利于得到与投影区域的光照情况最接近的光照参数,提高光效渲染的效果,使虚拟形象与周围环境更协调,在视觉效果上更真实。
应理解,图像帧中的目标区域与投影区域相关,并不意味着,在投影区域不同时,目标区域一定不同。多个投影区域也可以对应同一个目标区域。例如,两个投影区域之间的光照强度之间的差异较小时,该两个投影区域对应的目标区域也可以是相同的。再如,当两个投影区域之间的距离较小时,两个投影区域的光照强度之间的差异通常较小,这样,当两个投影区域之间的距离较小时,该两个投影区域对应的目标区域也可以是相同的。
进一步地,在第一区域和第二区域之间的距离大于或等于第一阈值的情况下,在投影区域为车辆座舱内的第一区域时,目标区域为图像帧中的第一目标区域;在投影区域为车辆座舱内的第二区域时,目标区域为图像帧中的第二目标区域。
当两个投影区域之间的距离较大时,两个投影区域的光照强度之间的差异可能较大,相同的目标区域的光照参数难以与两个投影区域的光照情况保持一致,进而影响光效渲染效果。在该情况下,采用图像帧中的不同区域作为目标区域,例如,使图像帧中的目标区域所展示的区域与投影区域尽量接近,有利于得到与投影区域的光照情况最接近的光照参数,提高光效渲染的效果,使虚拟形象与周围环境更协调,在视觉效果上更真实。
图像帧所拍摄的区域可以与投影区域无关,也可以与投影区域相关。图像帧所拍摄的区域指的是采集该图像帧的图像传感器所拍摄的区域。图像帧所拍摄的区域也可以称为图像帧所展示的区域。
在图像帧所拍摄的区域与投影区域无关的情况下,即图像帧所拍摄的区域与投影区域互不影响,对于不同的投影区域,图像帧所拍摄的区域均可以是相同的。
在图像帧所拍摄的区域与投影区域是相关的情况下,即将投影区域视为图像帧所拍摄的区域的一个影响因素。对于不同的投影区域,图像帧所拍摄的区域可以是不同的。或者可以理解为,图像帧所拍摄的区域可以是根据投影区域确定的。
示例性地,在投影区域为车辆座舱内的第三区域时,控制图像传感器以第一角度采集图像帧;在投影区域为车辆座舱内的第四区域时,控制图像传感器以第二角度采集图像帧。
座舱内的第三区域和座舱内的第四区域为不同的区域。第一角度与第二角度不同,相应地,图像传感器以第一角度采集的图像帧所拍摄的区域与图像传感器以第二角度采集的图像帧所拍摄的区域不同。也就是说,控制图像传感器拍摄不同区域。
可替换地,在投影区域为车辆座舱内的第三区域时,控制第一图像传感器采集图像帧;在投影区域为车辆座舱内的第四区域时,控制第二图像传感器采集图像帧。
第一图像传感器和第二图像传感器的位置不同,相应地,第一图像传感器采集的图像帧所拍摄的区域与第二图像传感器采集的图像帧所拍摄的区域不同。也就是说,车辆中可以安装多个图像传感器,比如,在车辆的各个座位的前方安装图像传感器。
例如,图像帧为车内图像帧,座舱内的第三区域可以为副驾驶座位,图像帧所拍摄的区域可以为驾驶员所在的区域,比如,控制图像传感器以面向驾驶员的角度(第一角度的一例)采集图像帧,或者,控制驾驶员前方的图像传感器(第一图像传感器的一例)采集图像帧;座舱内的第四区域可以为后排座位,图像帧所拍摄的区域可以为后排座位的乘客所在的区域,比如,控制图像传感器以面向后排座位的乘客的角度(第二角度的一例)采集图像帧,或者,控制后排乘客前方的图像传感器(第二图像传感器的一例)采集图像帧。这样,当需要在副驾驶座位上投影虚拟形象时,可以利用驾驶员周围的光照参数对原始虚拟形象进行光效渲染,进而在副驾驶座位上投影该虚拟形象。当需要在后排座位上投影虚拟形象时,可以利用后排乘客周围的光照参数对原始虚拟形象进行光效渲染,进而在后排座位上投影该虚拟形象。
这样,当投影区域发生变化时,图像帧所拍摄的区域可以根据投影区域进行调整,例如,使图像帧所拍摄的区域与投影区域尽量接近,有利于得到与投影区域的光照情况最接近的光照参数,提高光效渲染的效果,使虚拟形象与周围环境更协调,在视觉效果上更真实。
应理解,图像帧所拍摄的区域与投影区域相关,并不意味着,在投影区域不同时,图像帧所拍摄的区域一定不同。多个投影区域对应的图像帧所拍摄的区域也可以相同。例如,两个投影区域之间的光照强度之间的差异较小时,该两个投影区域对应的图像帧所拍摄的区域也可以是相同的。再如,当两个投影区域之间的距离较小时,两个投影区域的光照强度之间的差异通常较小,这样,当两个投影区域之间的距离较小时,该两个投影区域对应的图像帧所拍摄的区域也可以是相同的。
进一步地,在第三区域和第四区域之间的距离大于或等于第二阈值的情况下,在投影区域为车辆座舱内的第三区域时,控制图像传感器以第一角度采集图像帧;在投影区域为车辆座舱内的第四区域时,控制图像传感器以第二角度采集图像帧。
或者,在第三区域和第四区域之间的距离大于或等于第二阈值的情况下,在投影区域为车辆座舱内的第三区域时,控制第一图像传感器采集图像帧;在投影区域为车辆座舱内的第四区域时,控制第二图像传感器采集图像帧。
第一阈值和第二阈值可以相同,也可以不同。例如,第一阈值可以为1米,第二阈值可以为1.3m。
当两个投影区域之间的距离较大时,两个投影区域的光照强度之间的差异可能较大,相同的拍摄区域的光照参数难以与两个投影区域的光照情况保持一致,进而影响光效渲染效果。在该情况下,控制图像传感器拍摄不同区域,例如,使图像传感器拍摄的区域与投影区域尽量接近,有利于得到与投影区域的光照情况最接近的光照参数,提高光效渲染的效果,使虚拟形象与周围环境更协调,在视觉效果上更真实。
应理解,以上仅为示例,还可以通过其他方式调整图像帧所拍摄的区域,本申请实施例对此不做限定。
可选地,方法300还包括:响应于用户操作,调整投影区域(图中未示出)。
例如,在用户操作之前,投影区域为座舱内的区域A,在用户操作后,投影区域为座舱内的区域B。区域A和区域B为不同的区域。
示例性地,用户操作可以为用户发出语音指令,该语音指令可以用于指示调整投影区域。响应于该语音指令,调整投影区域。
例如,当前的虚拟形象投影在中控台上,该语音指令可以为“在副驾驶座位投影虚拟形象”。响应于该语音指令,控制在副驾驶座位上投影虚拟形象。
示例性地,用户操作可以为用户发出语音指令。在检测到发出语音指令的用户的位置后,根据该用户的位置调整投影区域。
换言之,投影区域与发出语音指令的用户的位置有关。在该情况下,该语音指令可以为与调整投影区域无关的指令。
例如,当前的虚拟形象投影在副驾驶座位上,当检测到当前发出语音指令的用户位于后排座位上时,响应于该语音指令,控制投影区域由副驾驶座位转移至面向后排座位用户的区域。
示例性地,用户操作还可以为其他影响投影区域的动作。
例如,当前的虚拟形象投影在副驾驶座位上,当检测到副驾驶座位侧的车门被打开时,响应于该操作,控制投影区域由副驾驶座位转移至中控台。
再如,当前的虚拟形象投影在中控台上,当检测到副驾驶座位上的用户离开时,响应于该操作,控制投影区域由中控台转移至副驾驶座位。
再如,当前的虚拟形象投影在副驾驶座位上,当检测到后排座位上出现用户时,响应于该操作,控制投影区域由中控台转移至后排座位。
进一步地,不同的投影区域可以投影不同的虚拟形象。
也就是说,当投影区域发生变化时,虚拟形象也可以改变。示例性地,当投影区域发生变化时,虚拟形象的尺寸可以改变。例如,在座位上投影的虚拟形象的尺寸大于在中控台投影的虚拟形象。或者,当投影区域发生变化时,虚拟形象的类型可以改变。例如,在座位上投影的虚拟形象的类型为虚拟伴侣,在中控台上投影的虚拟形象为虚拟卡通人物、虚拟道具、或虚拟宠物等。此外,还可以为用户提供虚拟形象选择界面,且根据用户的选择,设定投影区域的虚拟形象。如此,可以再投影区域投射出符合用户需求的投影形象,提升用户体验。
需要说明的是,本申请实施例中投影区域显示的虚拟形象均为经过光效渲染的虚拟形象。
可选地,当第一环境(例如,车辆座舱内或车辆所处环境)处于第一光照条件时,控制图像传感器以第一频率采集图像;当第一环境处于第二光照条件时,控制图像传感器以第二频率采集图像,第一光照条件下的光照强度的变化频率大于第二光照条件下的光照强度的变化频率,第一频率大于第二频率。
光场情况相对稳定时的图像采集频率小于光场情况变化剧烈时的图像采集频率。
当第一环境中的光照强度的变化频率较大时,可以以更高的频率采集图像,例如,每2秒采集10次图像,当第一环境中的光照强度的变化频率较小时,可以以较小的频率采集图像,例如,每2秒采集1次图像。应理解,此处的采集频率仅为示例,还可以设置其他采集频率,只要第一频率大于第二频率即可。
光照强度的变化也可以理解为光场信息的变化。光场信息指的是空间中每个单点位置的出射光照强度信息。
图像采集频率可以根据需要设置。例如,可以设置图像采集频率的最大值是30次每秒,图像采集频率的最小值为15秒一次。随着光照强度的变化,图像采集频率可以在最小值和最大值之间调整。
示例性地,车辆在无遮挡的道路上时,外部光源较为稳定,车辆座舱内的光照强度相对稳定,也就是光照强度的变化较小,在该情况下,可以以较低的频率采集图像。当车辆驶入或驶出隧道或者在树荫变化较为频繁的道路行驶时,外部光源变化剧烈,车辆座舱内的光照强度剧烈变化,在该情况下,可以以较高的频率采集图像。
当第一环境中的光照强度的变化频率较小时,可以以较小的频率采集图像,当图像的采集频率减少时,相应地,光照参数的获取频率以及光效渲染的频率也会随之降低,从总体上减少计算开销,降低资源消耗。第一环境中的光照强度的变化频率较大,可以以较高的频率采集图像,在该情况下,能够以更高频率获取光照参数,并进行光效渲染,有利于保证光效渲染的实时性,这样,即使在光场情况变化剧烈的情况下,仍能保证虚拟形象与周围环境的协调性。
示例性地,当第一环境处于第一光照条件时,以第三频率处理采集到的图像帧;当第一环境处于第二光照条件时,以第四频率处理采集到的图像帧,第一光照条件下的光照强度的变化频率大于第二光照条件下的光照强度的变化频率,第三频率大于第四频率。
当第一环境中的光照强度的变化频率较大时,可以以更高的频率处理图像,例如,每2秒处理10次图像,当第一环境中的光照强度的变化频率较小时,可以以较小的频率处理图像,例如,每2秒处理1次图像。应理解,此处的处理频率仅为示例,还可以设置其他处理频率,只要第三频率大于第四频率即可。也就是说,在光照强度的变化频率不同的情况下,即使图像的采集频率相同,图像处理的次数也可以不同。例如,每2秒采集20次图像,当前每2秒处理10次图像,或者可以理解为,每2秒处理该20帧图像中的10帧图像,得到光照参数,当光照强度的变化频率减少时,每2秒处理1次图像,即图像的采集频率可以保持不变,但减少图像处理的次数,或者说,减少被处理的图像的数量;当光照强度的变化频率增大时,每2秒处理5次图像,即图像的采集频率可以保持不变,增加图像处理的次数,或者说,增大被处理的图像的数量。
当第一环境中的光照强度的变化频率较小时,光照参数的获取频率以及光效渲染的频率也会随之降低,从总体上减少计算开销,降低资源消耗。当第一环境中的光照强度的变化频率较大时,以更高频率获取光照参数,并进行光效渲染,有利于保证光效渲染的实时性,这样,即使在光源变化剧烈的情况下,仍能保证虚拟形象与周围环境的协调性。
示例性地,当第一环境处于第一光照条件时,以第五频率对虚拟形象进行光效渲染;当第一环境处于第二光照条件时,以第六频率对虚拟形象进行光效渲染,第一光照条件下的光照强度的变化频率大于第二光照条件下的光照强度的变化频率,第五频率大于第六频率。
当第一环境中的光照强度的变化频率较大时,可以以更高的频率对虚拟形象进行光效渲染,例如,每2秒利用不同的光照参数渲染10次,当第一环境中的光照强度的变化频率较小时,可以以较小的频率对虚拟形象进行光效渲染,例如,每2秒渲染1次。应理解, 此处的渲染频率仅为示例,还可以设置其他渲染频率,只要第五频率大于第六频率即可。也就是说,在光照强度的变化频率不同的情况下,即使图像的采集频率、处理频率相同,光效渲染的次数也可以不同。例如,当前每2秒采集10次图像,并对该10帧图像进行处理,得到该10帧图像的光照参数,当光照强度的变化频率减少时,每2秒对虚拟形象进行一次渲染,或者说,利用该10帧图像得到的多组光照参数的1组光照参数对虚拟形象进行光效渲染,即图像的采集频率、处理频率可以保持不变,但减少光效渲染的次数量;当光照强度的变化频率增大时,每2秒对虚拟形象进行5次渲染,即图像的采集频率、处理频率可以保持不变,增加光效渲染的此处。
当第一环境中的光照强度的变化频率较小时,光效渲染的频率也会随之降低,能够减少渲染所需的计算开销,降低资源消耗。当第一环境中的光照强度的变化频率较大时,以更高频率进行光效渲染,有利于保证光效渲染的实时性,这样,即使在光源变化剧烈的情况下,仍能保证虚拟形象与周围环境的协调性。
光照强度的变化频率可以由车端确定,也可以由服务器确定。车端可以从本地获取光照强度的变化频率。或者,车端也可以从服务器获取光照强度的变化频率。
示例性地,光照强度的变化频率可以根据连续两次光效渲染所使用的光照参数的对比确定。
例如,获取当前的光照参数,将该光照参数与之前一次获取的光照参数进行对比,以确定当前光照强度的变化频率。
示例性地,光照强度的变化频率可以通过光传感器确定。即通过光传感器输出的电信号的强度的变化情况确定光照强度的变化频率。
示例性地,光照强度的变化频率可以通过单帧图像帧的不同区域的光照强度确定。
当单帧图像帧中的不同区域的光照强度之间的差异较大时,光照强度的变化频率较大。当单帧图像帧中的不同区域的光照强度之间的差异较小时,光照强度的变化频率较小。
例如,当车辆驶入隧道时,采集到的单帧图像帧中的部分区域为座舱内未驶入隧道部分的成像,该部分区域的光照强度较大,部分区域为座舱内驶入隧道部分的成像,该部分区域的光照强度较小,这两部分区域的光照强度之间的差异较大,能够反映当前的光照强度的变化频率较大。
示例性地,光照强度的变化频率可以通过多帧图像帧的光照强度确定。
该多帧图像帧可以是连续采集的多帧图像帧,或者,该多帧图像帧也可以是不连续的多帧图像帧。
当多帧图像帧的光照强度之间的差异较大时,若该多帧图像帧的采集时刻之间的间隔较小,则该多帧图像帧的采集时段内的光照强度的变化频率较大。当多帧图像帧的光照强度之间的差异较小时,若该多帧图像帧的采集时刻之间的间隔较小,则该多帧图像帧的采集时段内的光照强度的变化频率较小。
例如,当车辆驶入隧道时,连续采集多帧图像帧,该多帧图像的采集时刻之间的间隔较小,该多帧图像帧中的部分图像帧为车辆未驶入隧道时的成像,该部分图像帧的光照强度较大,部分图像帧为车辆驶入隧道部分的成像,该部分图像帧的光照强度较小,也就是说多帧图像帧的光照强度之间的差异较大,能够反映当前的光照强度的变化频率较大。
应理解,以上仅为示例,光照强度的变化频率还可以通过其他方式确定,本申请实施 例对此不做限定。
以上以第一环境为车辆的座舱或车辆所处的环境为例,描述了虚拟形象处理方法。该方法也可以用于其它虚拟现实应用场景中,例如在影视直播或电影特效中的虚拟形象,例如3D道具、虚拟人物形象等方面,得到应用。以影视直播为例,在视频流中添加3D道具特效具有实时地、低耗能地获取视频流中人物所处光场信息的需求,此场景下同样可以使用以上方法,此时第一环境为视频流中人物所处的环境。例如,在影视直播的视频流中,按照一定频率,如每秒5次抓取人脸图像,将抓取获得的人脸图像输入光照参数模型预测出该实时场景下的光照参数,再根据预测得到的光照参数对直播中所需的3D道具进行光效渲染,完成真实化直播视频流中的3D道具特效任务。
图4示出了本申请实施例提供的一种光照参数模型的训练方法的示意性流程图。该光照参数模型可以用于在方法300中确定光照参数。
图4所示的方法可以由服务器来执行,例如由云端服务器来执行。在其它实现中,该方法可以由车端的电子设备来执行,电子设备具体可以包括车载芯片或车载装置(例如车机、车载电脑)等装置中的一个或多个。或者,图4所示的方法400也可以由云服务设备执行。或者,图4所示的方法400还可以是由云服务设备和车端的电子设备构成的系统执行。
示例性地,方法400可以由图1所示的服务器120或图2所示的训练设备220执行。
方法400包括步骤S410至步骤S420。下面对步骤S410至步骤S420进行说明。
S410,获取样本图像帧。
示例性地,方法400可以由图2所示的训练设备220执行,样本图像帧可以是数据库230中存储的数据。数据库230中的数据来源可以是数据采集设备260采集到的,或者可以采用众包的方式,由多个车辆将行驶过程中采集到的图像帧发送至服务器,并由服务器存储至数据库中。
S420,基于样本图像帧训练得到光照参数模型。
光照参数模型可以为神经网络模型。
示例性地,步骤S420可以包括:基于样本图像帧和样本图像帧的光照参数训练得到光照参数模型。
即以样本图像帧作为初始模型的输入数据,以样本图像帧的光照参数作为初始模型的目标输出对初始模型进行训练,以得到训练好的模型,即光照参数模型。
具体地,将样本图像帧输入初始模型中进行处理,得到初始模型输出的该样本图像帧的预测光照参数,以减少样本图像帧的预测光照参数和样本图像帧的光照参数之间的差异为目标调整初始模型的参数,以得到训练好的模型,即光照参数模型。
样本图像帧的光照参数可以是通过标注得到的光照参数,在训练过程中作为样本图像帧的真实的光照参数,或者说,在训练过程中作为模型的目标输出。样本图像帧的预测光照参数即为模型输出的光照参数。
示例性地,光照参数可以为球谐光照参数。具体描述可以参照前文中的描述,此处不再赘述。
示例性地,该样本图像帧的光照参数可以是采用测光设备对该样本图像帧的拍摄环境进行光场测量得到的。
示例性地,该样本图像帧的光照参数也可以是通过解优化等计算方法计算得到的。
将图像帧输入该光照参数模型中进行处理,可以得到光照参数。
然而,上述训练方法通过样本图像帧的光照参数完成训练。采用测光设备得到的光照参数较为准确,有利于保证模型的训练效果。解优化等方法的计算量较大,对处理器的要求较高,计算耗时长,但可以节约测光设备的成本。
可选地,步骤S420可以包括:以样本图像帧作为初始模型的输入图像,以减少重建图像和样本图像帧之间的差异为目标对初始模型进行训练,以得到训练好的模型,即光照参数模型。其中,重建图像是基于初始模型输出的样本图像帧中的预测光照参数和样本图像帧中的目标对象的三维模型参数进行渲染得到的。
样本图像帧中的预测光照参数即为模型输出的光照参数。
基于初始模型输出的样本图像帧中的预测光照参数和样本图像帧中的目标对象的三维模型参数进行渲染,也可以理解为,根据初始模型输出的样本图像帧中的预测光照参数对目标对象的三维模型进行光效渲染。
也就是说,将样本图像帧作为初始模型的输入数据,初始模型输出样本图像帧中的预测光照参数和样本图像帧中的目标对象的三维模型参数,基于样本图像帧中的预测光照参数和样本图像帧中的目标对象的三维模型参数进行渲染,得到重建图像。以减少重建图像和输入图像之间的差异为目标调整初始模型的参数,直至训练完成,得到训练好的模型,即光照参数模型。
以减少重建图像和输入图像之间的差异为目标调整初始模型的参数,即利用损失函数对初始模型的参数进行优化,使重建图像与输入图像不断接近。损失函数用于指示重建图像和输入图像之间的差异。或者说,损失函数是基于重建图像和输入图像之间的差异建立的。
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置权重),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
示例性地,该损失函数可以采用L1范数损失函数。L1范数损失函数也可以称为最小绝对值偏差。
例如,基于L1范数损失函数衡量重建图像的像素值和输入图像的像素值之间的差异。
或者,该损失函数也可以采用其他类型的损失函数,例如,L2范数损失函数。本申请实施例对此不做限定。
可选地,样本图像帧包括人脸图像,目标对象包括头部。在本申请实施例中,人脸区域也可以称为人脸图像。
通过上述方式训练得到的光照参数模型能够用于确定光照参数。例如,该光照参数模 型能够用于确定图像帧中的人脸区域的光照参数。
例如,如图5所示,基于人脸图像的数据集训练光照参数模型,即样本图像帧为人脸图像。光照参数模型可以为神经网络模型。将人脸图像输入神经网络模型中进行处理,输出两组参数,即一组球谐光照参数和一组3D人头模型参数。基于这两组参数进行3D可微渲染,得到3D重建人头图像,即重建图像。将重建图像与模型的输入图像(即人脸图像)进行对比,利用损失函数对模型的参数进行优化,使重建图像和输入图像之间的差异不断减小,或者说,使两张图像不断接近。
将图像帧输入训练好的光照参数模型进行处理,可以输出图像帧中的人脸区域的光照参数。进而可以基于该光照参数虚拟形象进行渲染。例如,如图6所示,光照参数模型输出光照参数,利用光照参数对人头模型进行光效渲染,得到渲染后的人头模型。从图6可以看出,图像帧中的人脸的右侧光照较强,左侧光照较弱,渲染后的人头的右侧光照较强,左侧光照较弱。应理解,图6中仅以人头模型为例对光效渲染效果进行说明,不对本申请实施例构成限定。
根据本申请实施例的方案,在训练过程中无需样本图像帧的真实光照参数,给定任意数据集,例如,人脸图像的数据集,即可训练得到光照参数模型。具体地,通过渲染后得到的重建图像和输入图像之间的差异建立损失函数,即可训练得到光照参数模型,对训练样本的要求较低。
图7示出了本申请实施例提供的一种虚拟形象处理方法的示意性流程图。图7所示的方法700可以视为图3所示的方法300的一种具体实现方式。具体描述可以参照前文中的方法300的描述,为了避免重复,在描述方法700时适当省略部分描述。
图7所示的方法可以由车端来执行,例如由车辆、车载芯片或车载装置(例如车机、车载电脑)等装置中的一个或多个。或者,图7所示的方法也可以由云服务设备执行。或者,图7所示的方法还可以是由云服务设备和车端的电子设备构成的系统执行。本申请实施例中,车端的电子设备也可以称为车端设备,或车端;云服务设备也可以称为云端服务器、云端设备或云端。
示例性地,图7所示的方法可以应用于图1的系统100,由其中的车辆110或车辆110上的车载芯片或车载装置执行。
在方法700中以图像帧中的人脸区域作为目标区域。
S710,获取座舱内的人脸图像。
示例性地,该人脸图像可以为RGB图像。
该人脸图像可以是由图像传感器采集得到的。
可选地,当第一环境处于第一光照条件时,控制图像传感器以第一频率采集图像;当第一环境处于第二光照条件时,控制图像传感器以第二频率采集图像,第一光照条件下的光照强度的变化频率大于第二光照条件下的光照强度的变化频率,第一频率大于第二频率。
示例性地,该图像传感器可以为CMS摄像头,可以实时采集座舱内的图像。
例如,如图8所示,CMS摄像头可以设置于按座舱内后视镜位置,这样,其视角可以覆盖主驾驶座、副驾驶座以及后排座位。
在光场信息相对稳定时使用较低频率采集图像,例如,CMS摄像头每2秒抓取1次 图像。在光场信息剧烈变化时使用较高频率采集图像,例如,CMS摄像头每2秒抓取10次图像。
这样,即使CMS摄像头处于打开的状态,但由于座舱内的光场信息在车辆行进过程中大多数时间内处于稳定状态,可以以较低频率采集图像,能够大大减少资源消耗。
图像传感器采集到的图像中人脸区域的范围可能较小,例如,在CMS摄像头的视角覆盖主驾驶座、副驾驶座以及后排座位时,人脸区域的范围可能较小。在该情况下,可以对图像传感器采集到的图像进行处理,以得到人脸图像。这样,能够更准确地得到人脸区域的光照参数。
例如,如图9所示,该CMS摄像头采集到的图像中,驾驶员的人脸区域的范围较小,可以通过人脸检测方法将CMS摄像头采集到的图像中的人脸进行检测和抓取分割,得到驾驶员的人脸图像,即图9的矩形框中的区域。
S720,将人脸图像输入至光照参数模型中预测光照参数。
该光照参数模型可以为训练好的神经网络模型。
具体的训练方法可以参考方法400,此处不再赘述。
示例性地,该光照参数为球谐光照参数。例如,球谐阶数设为3,对于RGB图像而言,光照参数模型输出的光照参数为27个。
S730,利用光照参数对原始虚拟形象进行光效渲染。
该原始虚拟形象可以是通过其3D mesh信息和纹理信息进行3D渲染得到的。
示例性地,方法700可以是实时执行地,即实时对原始虚拟形象进行光效渲染。渲染后的虚拟形象投影到投影区域,例如,投影到副驾驶座位上。
这样,能够实时得到与座舱内光照效果接近或一致的光效渲染效果。
在其他实施例中,以上步骤S720可以在服务器执行,由车辆将获得的人脸图像发送至服务器,服务器将人脸图像输入至光照参数模型中预测光照参数,并将光照参数发送给车辆。
图10示出了多组光效渲染后的模型的示意图。示例性地,图10中的人脸图像可以是来自于图9的矩形框中的区域的图像。光照参数可以用于对不同的对象进行光效渲染,例如,如图10所示,基于人脸图像的光照参数分别对球、茶壶和兔子模型进行光效渲染,以使渲染后的模型的光照效果与人脸所在光场环境接近或一致。例如,图10的(a)中的人脸的右侧和左侧光照较强,基于该人脸图像得到的光照参数对各模型进行光效渲染,光效渲染后的模型的右侧和左侧的光照较强。再如,图10的(b)和(c)中的人脸的右上方光照较强,基于人脸图像得到的光照参数对各模型进行光效渲染,光效渲染后的模型的右上方的光照较强。图10的(b)中的人脸的右上方的光照强于图10的(c)中的人脸的右上方的光照,相应地,图10的(b)中的渲染后的模型的右上方的光照强于图10的(c)中的渲染后的模型的右上方的光照。应理解,图10仅为光效渲染的示例,不对本申请实施例的光效渲染的对象构成限定。
下面结合图11至图13对本申请实施例的装置进行说明。应理解,下面描述的装置能够执行前述本申请实施例的方法,为了避免不必要的重复,下面在介绍本申请实施例的装置时适当省略重复的描述。
图11是本申请实施例的虚拟形象处理装置的示意性框图。图11所示的虚拟形象处理 装置3000包括第一获取单元3010,第二获取单元3020和处理单元3030。
装置3000可以用于执行本申请实施例的虚拟形象处理方法,具体地,可以用于执行方法300或方法700。例如,第一获取单元3010可以执行上述步骤S310,第二获取单元3020可以执行上述步骤S320,处理单元3030可以执行上述步骤S330。又例如,第一获取单元3010可以执行上述步骤S710,第二获取单元3020可以执行上述步骤S720,处理单元3030可以执行上述步骤S730。
示例性地,第一获取单元3010用于获取第一环境的图像帧,第一环境包括车辆的座舱或车辆所处的环境,图像帧用于确定第一环境的光照参数。第二获取单元3020可以用于获取所述光照参数。
处理单元3030用于基于光照参数获取虚拟形象。
示例性地,该光照参数可以为可微分光照模型的参数。例如,该光照参数可以为球谐光照参数。
可选地,图像帧包括人脸图像,人脸图像用于确定第一环境的光照参数。
示例性地,该图像帧可以包括车辆的座舱内用户的头部的图像,用户的头部的图像用于确定光照参数。
可选地,图像帧从图像传感器获得,装置3000还包括控制单元3040,用于:当第一环境处于第一光照条件时,控制图像传感器以第一频率采集图像;当第一环境处于第二光照条件时,控制图像传感器以第二频率采集图像,第一光照条件下的光照强度的变化频率大于第二光照条件下的光照强度的变化频率,第一频率大于第二频率。
可选地,控制单元3040还用于:控制在投影区域投影该虚拟形象。
可选地,投影区域包括车辆的座舱内的区域。
可选地,在投影区域为车辆座舱内的第一区域时,光照参数是基于图像帧中的第一目标区域确定的;在投影区域为车辆座舱内的第二区域时,光照参数是基于图像帧中的第二目标区域确定的。
可选地,装置3000还包括发送单元,用于将图像帧发送给服务器;第二获取单元3020,具体用于从服务器获取光照参数。
可选地,第二获取单元3020具体用于将图像帧输入至光照参数模型,获得光照参数。
可选地,装置3000包括调整单元,用于响应于用户操作,调整投影区域。
示例性地,用户操作可以为用户发出语音指令,该语音指令可以用于指示调整投影区域。响应于该语音指令,调整投影区域。
例如,当前的虚拟形象投影在中控台上,该语音指令可以为“在副驾驶座位投影虚拟形象”。响应于该语音指令,控制在副驾驶座位上投影虚拟形象。
示例性地,用户操作可以为用户发出语音指令。在检测到发出语音指令的用户的位置后,根据该用户的位置调整投影区域。
换言之,投影区域与发出语音指令的用户的位置有关。在该情况下,该语音指令可以为与调整投影区域无关的指令。
例如,当前的虚拟形象投影在副驾驶座位上,当检测到当前发出语音指令的用户位于后排座位上时,响应于该语音指令,控制投影区域由副驾驶座位转移至面向后排座位用户的区域。
示例性地,用户操作还可以为其他影响投影区域的动作。
例如,当前的虚拟形象投影在副驾驶座位上,当检测到副驾驶座位侧的车门被打开时,响应于该操作,控制投影区域由副驾驶座位转移至中控台。
图12是本申请实施例的虚拟形象处理装置的示意性框图。图12所示的虚拟形象处理装置4000包括获取单元4010、处理单元4020和发送单元4030。
获取单元4010和处理单元3020可以用于执行本申请实施例的虚拟形象处理方法,具体地,可以用于执行方法300或方法700。
具体地,获取单元4010,用于获取第一环境的图像帧,第一环境包括车辆的座舱或车辆所处的环境,图像帧用于确定第一环境的光照参数;
处理单元4020,用于基于图像帧确定光照参数;
发送单元4030,向车辆发送光照参数,光照参数用于虚拟形象的处理。
可选地,处理单元4020具体用于:将图像帧输入至光照参数模型中,以得到光照参数,光照参数模型是基于样本图像帧训练得到的。
可选地,光照参数模型是以样本图像帧作为输入数据,以减少重建图像和样本图像帧之间的差异为目标对初始模型进行训练得到的,重建图像是通过初始模型输出的样本图像帧中的光照参数和样本图像帧中的目标对象的三维模型参数进行渲染得到的。
可选地,图像帧包括人脸图像,人脸图像用于确定第一环境的光照参数。
需要说明的是,上述处理装置3000和装置4000以功能单元的形式体现。这里的术语“单元”可以通过软件和/或硬件形式实现,对此不作具体限定。
例如,“单元”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。
因此,在本申请的实施例中描述的各示例的单元,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
图13是本申请实施例提供的装置的硬件结构示意图。图13所示的装置5000(该装置5000具体可以是一种计算机设备)包括存储器5001、处理器5002、通信接口5003以及总线5004。其中,存储器5001、处理器5002、通信接口5003通过总线5004实现彼此之间的通信连接。
存储器5001可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器5001可以存储程序,当存储器5001中存储的程序被处理器5002执行时,处理器5002用于执行本申请实施例的虚拟形象处理方法的各个步骤。例如,处理器5002可以执行上文中图3所示的方法300或图7所示的方法700。或者,处理器5002用于执行本申请实施例的光照参数模型的训练方法的各个步骤。例如,处理器5002可以执行图4所示的方法400。
处理器5002可以采用通用的中央处理器(central processing unit,CPU),微处理器,ASIC,图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行 相关程序,以实现本申请方法实施例的虚拟形象处理方法或光照参数模型的训练方法。
处理器5002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的虚拟形象处理方法的各个步骤可以通过处理器5002中的硬件的集成逻辑电路或者软件形式的指令完成。
上述处理器5002还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、ASIC、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器5001,处理器5002读取存储器5001中的信息,结合其硬件完成图11或图12所示的处理装置中包括的单元所需执行的功能,或者,执行本申请方法实施例的虚拟形象处理方法或光照参数模型的训练方法。
通信接口5003使用例如但不限于收发器一类的收发装置,来实现装置5000与其他设备或通信网络之间的通信。例如,可以通过通信接口5003获取人脸图像。
总线5004可包括在装置5000各个部件(例如,存储器5001、处理器5002、通信接口5003)之间传送信息的通路。
本申请实施例还提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行本申请实施例中的虚拟形象处理方法或光照参数模型的训练方法。
本申请实施例还提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行本申请实施例中的虚拟形象处理方法或光照参数模型的训练方法。
本申请实施例还提供一种芯片,该芯片包括处理器与数据接口,该处理器通过该数据接口读取存储器上存储的指令,执行本申请实施例中的虚拟形象处理方法或光照参数模型的训练方法。
可选地,作为一种实现方式,该芯片还可以包括存储器,该存储器中存储有指令,该处理器用于执行该存储器上存储的指令,当该指令被执行时,该处理器用于执行本申请实施例中的虚拟形象处理方法或光照参数模型的训练方法。
本申请实施例还提供一种系统,该系统包括本申请实施例中的虚拟形象处理装置和图像传感器,其中,图像传感器用于采集第一环境的图像帧。
示例性地,该系统可以为车辆。
或者,该系统可以为车载系统。车载系统包括服务器和车端的电子设备。该车端的电子设备可以为车载芯片或车载装置(例如车机、车载电脑)等中的任一项。
应理解,本申请实施例中的处理器可以为中央处理单元(central processing unit,CPU),该处理器还可以是其他通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是ROM、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的随机存取存储器(random access memory,RAM)可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘。
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系,但也可能表示的是一种“和/或”的关系,具体可参考前后文进行理解。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本 申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (28)

  1. 一种虚拟形象处理方法,其特征在于,包括:
    获取第一环境的图像帧,所述第一环境包括车辆的座舱或车辆所处的环境,所述图像帧用于确定所述第一环境的光照参数;
    获取所述光照参数;
    基于所述光照参数获取虚拟形象。
  2. 根据权利要求1所述的方法,其特征在于,所述图像帧包括人脸图像,所述人脸图像用于确定所述第一环境的光照参数。
  3. 根据权利要求1或2所述的方法,其特征在于,所述图像帧从图像传感器获得,所述方法还包括:
    当所述第一环境处于第一光照条件时,控制所述图像传感器以第一频率采集图像;
    当所述第一环境处于第二光照条件时,控制所述图像传感器以第二频率采集图像,所述第一光照条件下的光照强度的变化频率大于所述第二光照条件下的光照强度的变化频率,所述第一频率大于所述第二频率。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述方法还包括:
    控制在投影区域投影所述虚拟形象。
  5. 根据权利要求4所述的方法,其特征在于,所述投影区域包括所述车辆的座舱内的区域。
  6. 根据权利要求4或5所述的方法,其特征在于,在所述投影区域为所述车辆座舱内的第一区域时,所述光照参数是基于所述图像帧中的第一目标区域确定的;在所述投影区域为所述车辆座舱内的第二区域时,所述光照参数是基于所述图像帧中的第二目标区域确定的。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,还包括:
    将所述图像帧发送给服务器;
    所述获取所述光照参数,包括:
    从所述服务器获取所述光照参数。
  8. 根据权利要求1-6任一项所述的方法,其特征在于,所述获取所述光照参数,包括:
    将所述图像帧输入至光照参数模型,获得所述光照参数。
  9. 一种虚拟形象处理方法,其特征在于,包括:
    获取第一环境的图像帧,所述第一环境包括车辆的座舱或车辆所处的环境,所述图像帧用于确定所述第一环境的光照参数;
    基于所述图像帧确定所述光照参数;
    向所述车辆发送所述光照参数,所述光照参数用于虚拟形象的处理。
  10. 根据权利要求9所述的方法,其特征在于,所述基于所述图像帧确定所述光照参数,包括:
    将所述图像帧输入至光照参数模型,以得到所述光照参数,所述光照参数模型是基于样本图像帧训练得到的。
  11. 根据权利要求9或10所述的方法,其特征在于,所述光照参数模型是以所述样本图像帧作为输入数据,以减少重建图像和所述样本图像帧之间的差异为目标对初始模型进行训练得到的,所述重建图像是通过所述初始模型输出的所述样本图像帧中的预测光照参数和所述样本图像帧中的目标对象的三维模型参数进行渲染得到的。
  12. 根据权利要求9-11任一项所述的方法,其特征在于,所述图像帧包括人脸图像,所述人脸图像用于确定所述第一环境的光照参数。
  13. 一种虚拟形象处理装置,其特征在于,包括:
    第一获取单元,用于获取第一环境的图像帧,所述第一环境包括车辆的座舱或车辆所处的环境,所述图像帧用于确定所述第一环境的光照参数;
    第二获取单元,用于获取所述光照参数;
    处理单元,用于基于所述光照参数获取虚拟形象。
  14. 根据权利要求13所述的装置,其特征在于,所述图像帧包括人脸图像,所述人脸图像用于确定所述第一环境的光照参数。
  15. 根据权利要求13或14所述的装置,其特征在于,所述图像帧从图像传感器获得,所述装置还包括控制单元,用于:
    当所述第一环境处于第一光照条件时,控制所述图像传感器以第一频率采集图像;
    当所述第一环境处于第二光照条件时,控制所述图像传感器以第二频率采集图像,所述第一光照条件下的光照强度的变化频率大于所述第二光照条件下的光照强度的变化频率,所述第一频率大于所述第二频率。
  16. 根据权利要求13-15中任一项所述的装置,其特征在于,所述控制单元还用于:
    控制在投影区域投影所述虚拟形象。
  17. 根据权利要求16所述的装置,其特征在于,所述投影区域包括所述车辆的座舱内的区域。
  18. 根据权利要求16或17所述的装置,其特征在于,在所述投影区域为所述车辆座舱内的第一区域时,所述光照参数是基于所述图像帧中的第一目标区域确定的;在所述投影区域为所述车辆座舱内的第二区域时,所述光照参数是基于所述图像帧中的第二目标区域确定的。
  19. 根据权利要求13-18中任一项所述的装置,其特征在于,所述装置还包括发送单元,用于:
    将所述图像帧发送给服务器;
    所述第二获取单元,具体用于:
    从所述服务器获取所述光照参数。
  20. 根据权利要求13-18中任一项所述的装置,其特征在于,所述第二获取单元具体用于:
    将所述图像帧输入至光照参数模型,获得所述光照参数。
  21. 一种虚拟形象处理装置,其特征在于,包括:
    获取单元,用于获取第一环境的图像帧,所述第一环境包括车辆的座舱或车辆所处的环境,所述图像帧用于确定所述第一环境的光照参数;
    处理单元,用于基于所述图像帧确定所述光照参数;
    发送单元,用于向所述车辆发送所述光照参数,所述光照参数用于虚拟形象的处理。
  22. 根据权利要求21所述的装置,其特征在于,所述处理单元具体用于:
    将所述图像帧输入至光照参数模型中,以得到所述光照参数,所述光照参数模型是基于样本图像帧训练得到的。
  23. 根据权利要求21或22所述的装置,所述光照参数模型是以所述样本图像帧作为输入数据,以减少重建图像和所述样本图像帧之间的差异为目标对初始模型进行训练得到的,所述重建图像是通过所述初始模型输出的所述样本图像帧中的预测光照参数和所述样本图像帧中的目标对象的三维模型参数进行渲染得到的。
  24. 根据权利要求21-23中任一项所述的装置,其特征在于,所述图像帧包括人脸图像,所述人脸图像用于确定所述第一环境的光照参数。
  25. 一种虚拟形象处理装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行如权利要求1至8或9至12中任一项所述的方法。
  26. 一种计算机可读存储介质,其特征在于,所述计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行如权利要求1至8或9至12中任一项所述的方法。
  27. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至8或9至12中任一项所述的方法。
  28. 一种系统,其特征在于,所述系统包括如权利要求13至20或权利要求21至24中任一项所述的装置和图像传感器,其中,图像传感器用于采集第一环境的图像帧。
PCT/CN2022/074974 2022-01-29 2022-01-29 虚拟形象处理方法及装置 WO2023142035A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2022/074974 WO2023142035A1 (zh) 2022-01-29 2022-01-29 虚拟形象处理方法及装置
CN202280003423.1A CN117813630A (zh) 2022-01-29 2022-01-29 虚拟形象处理方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/074974 WO2023142035A1 (zh) 2022-01-29 2022-01-29 虚拟形象处理方法及装置

Publications (1)

Publication Number Publication Date
WO2023142035A1 true WO2023142035A1 (zh) 2023-08-03

Family

ID=87470121

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/074974 WO2023142035A1 (zh) 2022-01-29 2022-01-29 虚拟形象处理方法及装置

Country Status (2)

Country Link
CN (1) CN117813630A (zh)
WO (1) WO2023142035A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116688494A (zh) * 2023-08-04 2023-09-05 荣耀终端有限公司 生成游戏预测帧的方法和电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106133796A (zh) * 2014-03-25 2016-11-16 Metaio有限公司 用于在真实环境的视图中表示虚拟对象的方法和系统
CN108305328A (zh) * 2018-02-08 2018-07-20 网易(杭州)网络有限公司 虚拟物体渲染方法、系统、介质和计算设备
CN108986199A (zh) * 2018-06-14 2018-12-11 北京小米移动软件有限公司 虚拟模型处理方法、装置、电子设备及存储介质
CN109495863A (zh) * 2018-09-21 2019-03-19 北京车和家信息技术有限公司 交互方法及相关设备
CN110310224A (zh) * 2019-07-04 2019-10-08 北京字节跳动网络技术有限公司 光效渲染方法及装置
CN110871684A (zh) * 2018-09-04 2020-03-10 比亚迪股份有限公司 车内投影方法、装置、设备和存储介质
US20200312004A1 (en) * 2019-03-27 2020-10-01 Hyundai Motor Company In-vehicle avatar processing apparatus and method of controlling the same

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106133796A (zh) * 2014-03-25 2016-11-16 Metaio有限公司 用于在真实环境的视图中表示虚拟对象的方法和系统
CN108305328A (zh) * 2018-02-08 2018-07-20 网易(杭州)网络有限公司 虚拟物体渲染方法、系统、介质和计算设备
CN108986199A (zh) * 2018-06-14 2018-12-11 北京小米移动软件有限公司 虚拟模型处理方法、装置、电子设备及存储介质
CN110871684A (zh) * 2018-09-04 2020-03-10 比亚迪股份有限公司 车内投影方法、装置、设备和存储介质
CN109495863A (zh) * 2018-09-21 2019-03-19 北京车和家信息技术有限公司 交互方法及相关设备
US20200312004A1 (en) * 2019-03-27 2020-10-01 Hyundai Motor Company In-vehicle avatar processing apparatus and method of controlling the same
CN110310224A (zh) * 2019-07-04 2019-10-08 北京字节跳动网络技术有限公司 光效渲染方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116688494A (zh) * 2023-08-04 2023-09-05 荣耀终端有限公司 生成游戏预测帧的方法和电子设备
CN116688494B (zh) * 2023-08-04 2023-10-20 荣耀终端有限公司 生成游戏预测帧的方法和电子设备

Also Published As

Publication number Publication date
CN117813630A (zh) 2024-04-02

Similar Documents

Publication Publication Date Title
WO2020253416A1 (zh) 物体检测方法、装置和计算机存储介质
US10877485B1 (en) Handling intersection navigation without traffic lights using computer vision
JP2022515895A (ja) 物体認識方法及び装置
US10850693B1 (en) Determining comfort settings in vehicles using computer vision
CN108229366B (zh) 基于雷达和图像数据融合的深度学习车载障碍物检测方法
US11715180B1 (en) Emirror adaptable stitching
US11488398B2 (en) Detecting illegal use of phone to prevent the driver from getting a fine
WO2022042049A1 (zh) 图像融合方法、图像融合模型的训练方法和装置
WO2022165809A1 (zh) 一种训练深度学习模型的方法和装置
US10744936B1 (en) Using camera data to automatically change the tint of transparent materials
DE102019115809A1 (de) Verfahren und system zum durchgehenden lernen von steuerbefehlen für autonome fahrzeuge
US11205068B2 (en) Surveillance camera system looking at passing cars
DE102020113779A1 (de) Umgebungs-kamerasystem mit nahtlosem zusammenfügen für die auswahl beliebiger blickwinkel
US20220335583A1 (en) Image processing method, apparatus, and system
US11891002B1 (en) Determining comfort settings in vehicles using computer vision
US20210127204A1 (en) Optimize the audio capture during conference call in cars
CN111292366B (zh) 一种基于深度学习和边缘计算的视觉行车测距算法
US11721100B2 (en) Automatic air recirculation systems for vehicles
CN114514535A (zh) 基于语义分割的实例分割系统和方法
DE102021133638A1 (de) Bildzusammensetzung in multi-view-automobilsystemen und robotersystemen
US11308641B1 (en) Oncoming car detection using lateral emirror cameras
WO2023142035A1 (zh) 虚拟形象处理方法及装置
CN111368972A (zh) 一种卷积层量化方法及其装置
WO2022100419A1 (zh) 一种图像处理方法及相关设备
CN115253300A (zh) 一种图形渲染方法以及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22922866

Country of ref document: EP

Kind code of ref document: A1