WO2022099613A1 - 图像生成模型的训练方法、新视角图像生成方法及装置 - Google Patents

图像生成模型的训练方法、新视角图像生成方法及装置 Download PDF

Info

Publication number
WO2022099613A1
WO2022099613A1 PCT/CN2020/128680 CN2020128680W WO2022099613A1 WO 2022099613 A1 WO2022099613 A1 WO 2022099613A1 CN 2020128680 W CN2020128680 W CN 2020128680W WO 2022099613 A1 WO2022099613 A1 WO 2022099613A1
Authority
WO
WIPO (PCT)
Prior art keywords
color
image
observation point
generation model
image generation
Prior art date
Application number
PCT/CN2020/128680
Other languages
English (en)
French (fr)
Inventor
韩磊
李琳
郑凯
仲大伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2020/128680 priority Critical patent/WO2022099613A1/zh
Priority to CN202080104956.XA priority patent/CN116250021A/zh
Publication of WO2022099613A1 publication Critical patent/WO2022099613A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present application relate to the field of artificial intelligence, and in particular, to a training method for an image generation model, a method and apparatus for generating a new perspective image.
  • AI Artificial intelligence
  • AI technology can be used for image generation, enabling intelligent machines to generate images from new perspectives based on existing images from different perspectives.
  • a new perspective view is obtained by using volume rendering technology by sampling the points on the observed light and storing the geometric information and texture information of the image in the neural network.
  • the embodiments of the present application provide a training method for an image generation model, a method and device for generating a new perspective image, and the image generation model learns the residual color of the spatial position where the light from any observation point passes through, and generates a new perspective image.
  • Differential colors are low-frequency information that are easy to represent and remember, so they can improve the clarity of images from new perspectives.
  • a first aspect of the implementation of the present application provides a training method for an image generation model, including:
  • the training device can receive the manually input position and view direction of the target observation point, and the target observation point is any observation point that observes the observed object.
  • Each observation point has its own position and viewing direction.
  • three-dimensional coordinates (x, y, z) are used to indicate the position of an observation point. Indicates the viewing direction of an observation point.
  • the training device can determine at least one reference image from the N images input in advance according to the position and the viewing angle direction.
  • N is an integer greater than or equal to 2.
  • the training device can predict the reference color of the spatial location where the light from the target observation point passes.
  • the training device can obtain the ground-truth of each pixel in the perspective image. After that, the training equipment can determine the residual color of the spatial position where the light from the target observation point passes through based on the real color and the reference color. Finally, the training device can use the residual color to train the image generation model.
  • the image generation model is obtained by training based on the residual color.
  • the residual color belongs to low-frequency information and is easy to represent and memorize. Therefore, the clarity of the new perspective image obtained based on the image generation model can be improved.
  • images predicted at different observation points are not exactly the same due to interference factors such as occlusion and illumination.
  • the reference color for the spatial location may also be different. Therefore, the training device may select the mode of the color at the spatial position where the light from the target observation point passes in the at least one reference image as the first reference color for subsequent determination of the residual color.
  • selecting the mode of the color of at least one reference image in the spatial position as the first reference color may reduce the influence of interference factors on the accuracy of the image generation model to a certain extent, and improve the accuracy of the technical solution. sex.
  • the image generation model can be trained using only the loss function of the residual color.
  • the training device can jointly train the image generation model based on the joint network, using the loss function of residual color and the loss function of direct prediction.
  • multiple loss functions are used to jointly train the image generation model, which improves the robustness of the algorithm, also enables the trained image generation model to be applicable to various situations, and improves the flexibility of the solution.
  • the training device may be based on the training result of the previous training cycle , and continuously optimize the image generation model to make the image generation model closer to the real situation, which can be optimized in the following ways.
  • the training device can obtain a new perspective image corresponding to the target observation point. Then, the new perspective image is compared with each reference image in the at least one reference image used during training, so as to determine the second reference color, and use the second reference color as the first image generation model for training in the next iteration cycle. a reference color.
  • the training device may determine the second reference color in the following manner.
  • the training device can select any pixel point as the reference point in the new perspective image obtained by the target observation point in the previous iteration cycle, and then select the image block of the same pixel size centered on the reference point as the basis for comparison.
  • the second reference color is determined by comparing the similarity of this image block in the new view image and the reference image. The following situations may occur:
  • the reference point is not included in at least one reference image. Occlusion, the first reference color used in the previous training cycle can continue to be used in subsequent iteration cycles.
  • the training device can determine that the second reference color is the mode of the color of the reference image at the reference point that satisfies the preset condition.
  • the color value of the second reference color may be determined to be 0.
  • the training device compares the perspective image of the target observation point predicted in the previous iteration cycle with at least one reference image used in training, thereby removing the image used in the training image generation model in the previous iteration cycle. Inappropriate parameter values, thereby reducing the impact of occlusion on the new perspective image, improving the robustness of the algorithm and the accuracy of the technical solution.
  • the training device may The direction determines at least one reference observation point, and then determines the image corresponding to the at least one reference observation point as a reference image.
  • the distance between each reference observation point and the target observation point needs to satisfy a preset condition. Since the closer the positions of the two points are, the more similar the viewing angle direction will be, and the higher the similarity of the images observed at the two points will be. Therefore, the distance here is determined by the position of the two points and the viewing angle direction. It is necessary to satisfy the preset condition between the positions of the two points, and also to make the angle of view direction of the two points satisfy the preset condition. Satisfying the preset condition may be less than or equal to the preset threshold.
  • the reference observation point is determined by the position and the viewing angle direction, and when the preset conditions are met, the accuracy of the new viewing angle image obtained according to the image generation model is within the allowable error range, which improves the accuracy of the solution. sex.
  • the loss function of the residual color may be:
  • ⁇ i represents the opacity of a spatial point in the spatial position
  • ⁇ i represents the distance between each spatial point on a ray
  • C(r) is used for Represents the true color
  • the function of the loss function of the residual color is to make the new perspective image predicted according to the first reference color and the residual color as close to the real image as possible.
  • the loss function for direct prediction may be:
  • c i represents the true color of each spatial point
  • ⁇ i represents the opacity of a spatial point in the spatial position
  • ⁇ i Represents the distance between each spatial point on a ray.
  • the role of the direct prediction loss function is to make the predicted new view image as close to the real image as possible while only learning the real color.
  • a second aspect of the embodiments of the present application provides a method for generating a new perspective image, including:
  • a virtual view point is an observation point that has not actually observed the observed object, and can be randomly selected manually. After manually selecting the virtual observation point, the execution device may receive the manually input position and view direction of the virtual observation point.
  • the execution device can input the position and viewing angle direction into the image generation model to obtain the residual color of the spatial position where the light from the virtual observation point passes. Then combined with the obtained reference color, a new perspective image corresponding to the virtual observation point is generated.
  • the reference color is determined according to at least one reference image.
  • the image generation model is obtained by training based on the residual color.
  • the residual color belongs to low-frequency information and is easy to represent and memorize. Therefore, the clarity of the new perspective image obtained based on the image generation model can be improved.
  • the image generation model may be obtained by training according to the loss function of the residual color. It can also be trained based on the loss function of the residual color and the loss function of direct prediction. Using the image generation model trained by the loss function based on the residual color and the directly predicted loss function, the effect of the predicted new perspective image will be more accurate.
  • the image generation model used by the execution device may be an image generation model jointly trained by using multiple loss functions, which improves the clarity of the generated image.
  • the reference color includes a first reference color
  • the first reference color refers to light from a virtual observation point The mode of the color of the traversed spatial position.
  • the manner in which the execution device acquires the first reference color may be to receive the first reference color sent by the training device.
  • the execution device does not acquire the first reference color from the training device, but can obtain the first reference color according to the virtual observation point. position and viewing direction to determine the first reference color.
  • the determination process can be described as follows:
  • the execution device may determine at least one reference observation point by virtualizing the position and viewing angle direction of the observation point, and then determine a reference image corresponding to each reference observation point in the at least one reference observation point.
  • the distance between the reference observation point and the virtual observation point needs to satisfy a preset condition. As long as the positions or viewing directions of the two observation points are different, the two points are different observation points. Since the closer the positions of the two points are, the more similar the viewing direction is, and the higher the similarity of the images observed at the two points is. Therefore, the distance here is determined by the positions of the two points and the viewing angle direction.
  • the position of the two points satisfies the preset condition, and the viewing angle direction of the two points satisfies the preset condition. Satisfying the preset condition may be less than or equal to the preset threshold.
  • the reference observation point is determined by the position and the viewing angle direction, and when the preset conditions are met, the accuracy of the new viewing angle image obtained according to the image generation model is within the allowable error range, which improves the accuracy of the solution. sex.
  • the loss function of the residual color may be:
  • ⁇ i the opacity of a spatial point in the spatial position
  • ⁇ i the distance of each spatial point on a ray
  • C(r) is used to represent the real color.
  • the function of the loss function of the residual color is to make the new perspective image predicted according to the first reference color and the residual color as close to the real image as possible.
  • the loss function for direct prediction may be:
  • c i represents the true color of each spatial point
  • ⁇ i represents the opacity of a spatial point in the spatial position
  • ⁇ i Represents the distance of each spatial point on a ray.
  • the role of the direct prediction loss function is to make the predicted new view image as close to the real image as possible while only learning the real color.
  • a third aspect of the embodiments of the present application provides an apparatus for training an image generation model, including:
  • the determining unit is used to determine the position and viewing angle direction of the target observation point, and then determine at least one reference image from the N input images according to the position and viewing angle direction of the target observation point, where N is an integer greater than or equal to 2. Then, according to at least one reference image, the reference color of the spatial position is determined.
  • the spatial position is the position where the light from the target observation point passes.
  • the acquiring unit is used to acquire the real color of the pixels in the viewing angle image corresponding to the target observation point.
  • the determination unit is also used to determine the residual color of the spatial position according to the reference color and the real color.
  • a processing unit for training the image generation model based on the residual color is
  • the reference color includes: a first reference color, where the first reference color is the mode of the color of the position where the light from the target observation point passes.
  • the processing unit is configured to:
  • the image generation model is trained according to the loss function of the residual color.
  • the loss function of direct prediction is obtained, and the image generation model is trained according to the loss function of residual color and the loss function of direct prediction.
  • the acquiring unit is further configured to acquire a target observation point A corresponding new perspective image, wherein the new perspective image is predicted by the execution device according to the image generation model.
  • the determining unit is further configured to determine the second reference color according to the new perspective image and each reference image in the at least one reference image. After that, use the second reference color as the first reference color.
  • the determining unit is specifically used for:
  • the second reference color is determined as the first reference color.
  • the second reference color is determined to satisfy the preset condition The mode of the reference color at the spatial location of the reference image.
  • the color value of the second reference color is determined to be 0.
  • the determining unit is specifically used for:
  • At least one reference observation point is determined according to the position of the target observation point and the viewing angle direction, wherein the distance between each reference observation point in the at least one reference observation point and the target observation point satisfies a preset condition.
  • At least one reference image is acquired according to the at least one reference observation point, wherein each reference observation point in the at least one reference observation point corresponds to each reference image in the at least one reference image.
  • the loss function of the residual color may be:
  • C(r) is used to represent the true color.
  • the function of the loss function of the residual color is to make the new perspective image predicted according to the first reference color and the residual color as close to the real image as possible.
  • the loss function for direct prediction may be:
  • c i represents the true color of each spatial point
  • ⁇ i represents the opacity of a spatial point in the spatial position
  • ⁇ i Represents the distance of each spatial point on a ray.
  • the role of the direct prediction loss function is to make the predicted new view image as close to the real image as possible while only learning the real color.
  • a fourth aspect of the embodiments of the present application provides an apparatus for generating a new perspective image, including:
  • the determining unit is used to determine the position and viewing angle direction of the virtual observation point.
  • the acquiring unit is used for inputting the position and viewing angle direction of the virtual observation point into the image generation model, and acquiring the residual color of the spatial position where the light rays from the virtual observation point pass through.
  • the obtaining unit is further configured to obtain a reference color, where the reference color is a color at a spatial position determined according to at least one reference image.
  • the processing unit is configured to generate a new perspective image corresponding to the virtual observation point according to the residual color and the reference color of the spatial position.
  • the image generation model includes: an image generation model obtained by training according to the loss function of the residual color. Alternatively, train the resulting image generation model based on the loss function for residual color, and the loss function for direct prediction.
  • the reference color includes: a first reference color, where the first reference color is light from a virtual observation point The mode of the color of the traversed spatial position.
  • the acquiring unit is specifically configured to receive the first reference color sent by the training device.
  • the reference color includes: a first reference color.
  • At least one reference observation point is determined according to the position and viewing angle direction of the virtual observation point, wherein the distance between each reference observation point in the at least one reference observation point and the virtual observation point satisfies a preset condition.
  • At least one reference picture is determined from the N reference pictures, wherein each reference observation point in the at least one reference observation point corresponds to each reference image in the at least one reference image, and N is an integer greater than or equal to 2.
  • a first reference color is determined from at least one reference image.
  • a fifth aspect of the embodiments of the present application provides an image processing system, including: a training device and an execution device.
  • the training device includes a first processor and a first memory, the first processor is configured to execute the method of the foregoing first aspect, the first memory is configured to store a training picture set, and the training picture set includes at least two images.
  • the execution device includes a second processor and a second memory, the second processor is configured to execute the method of the aforementioned second aspect, and the second memory is configured to store the new perspective image.
  • a sixth aspect of the embodiments of the present application provides a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and when the computer executes the program, the method of the foregoing first aspect or the second aspect is performed.
  • a seventh aspect of the embodiments of the present application provides a computer program product.
  • the computer program product When the computer program product is executed on a computer, the computer executes the method of the foregoing first aspect or the second aspect.
  • An eighth aspect of the embodiments of the present application provides a computer device, including:
  • processors memories, input and output devices, and buses.
  • the processor, the memory, and the input and output devices are connected to the bus.
  • Computer instructions are stored in the processor, and the processor is used to execute the computer instructions, so that the computer device performs the following steps:
  • At least one reference image is determined from the N input images according to the position and the viewing direction, where N is an integer greater than or equal to 2.
  • the reference color of the spatial position is determined, where the spatial position is the position where the light from the target observation point passes.
  • the computer device is adapted to perform the method of the aforementioned first aspect.
  • a ninth aspect of an embodiment of the present application provides a computer device, including:
  • processors memories, input and output devices, and buses.
  • the processor, the memory, and the input and output devices are connected to the bus.
  • Computer instructions are stored in the processor, and the processor is used to execute the computer instructions, so that the computer device performs the following steps:
  • the position and view direction are input into the image generation model, and the residual color of the spatial position where the light from the virtual observation point passes through is obtained.
  • the reference color is a color at a spatial position determined according to at least one reference image.
  • a new perspective image corresponding to the virtual observation point is generated.
  • the computer device is adapted to perform the method of the aforementioned second aspect.
  • FIG. 1 is a schematic structural diagram of an artificial intelligence main frame according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an application scenario of an image processing system according to an embodiment of the present application.
  • FIG. 3 is a system architecture diagram of an image processing system according to an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a training method for an image generation model according to an embodiment of the present application
  • FIG. 5 is another schematic flowchart of the training method of the image generation model according to the embodiment of the present application.
  • FIG. 6 is a schematic flowchart of a method for generating a new perspective image according to an embodiment of the present application
  • FIG. 7 is a schematic structural diagram of an apparatus for training an image generation model according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an apparatus for generating a new perspective image according to an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an image processing system implemented in this application.
  • the embodiments of the present application provide a training method for an image generation model, a method and device for generating a new perspective image, and the image generation model learns the residual color of the spatial point that light passes through, generates a new perspective image, and improves the clarity of the new perspective image.
  • the loss function is used to measure the difference between the predicted value and the true value, and the size of the difference can well reflect the difference between the model and the actual data.
  • the role of the training model is to make the results predicted by the model as close to the real results as possible, so the trained model can be evaluated and continuously optimized by setting the loss function. The larger the output value of the loss function (loss), the better the predicted results The greater the difference from the real results, the process of training the model is to reduce the loss as much as possible.
  • the target observation point, the reference observation point and the virtual observation point are all a perspective of observing the object, and an observation point can be simply understood as the attitude of a camera. Observing the same object at different observation points may result in different images because each observation point has its own position and viewing direction.
  • the position of the observation point is expressed as a three-dimensional coordinate point (x, y, z), and the viewing angle direction of the observation point includes the angle of rotation around each axis of the observation point, which can include three angles.
  • the point is not sensitive to the rotation direction of one of the axes, so the viewing angle direction of the observation point can also be expressed in two-dimensional form, such as There is no specific limitation here.
  • the target observation point refers to a viewpoint that is arbitrarily selected manually when the image generation model is trained.
  • the reference observation point refers to the observation point where the object to be observed has been observed and a corresponding perspective image has been generated.
  • the virtual observation point means that the object to be observed has not been observed at the observation point before, and there is no new perspective image corresponding to the virtual observation point in the existing picture set.
  • the reference color refers to the color of the spatial position where the light from the target observation point passes, and needs to be determined according to the reference image.
  • the process of determining the reference color is actually a prediction process. In the case of obtaining the reference image,
  • FIG. 1 is a schematic structural diagram of an artificial intelligence main frame according to an embodiment of the present application.
  • the main frame describes the overall workflow of the artificial intelligence system and is applicable to general demand in the field of artificial intelligence.
  • Figure 1 includes two dimensions: "Intelligent Information Chain” (horizontal axis) and “IT Value Chain” (vertical axis).
  • Intelligent information chain reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".
  • the "IT value chain” is the industrial ecological process from the underlying infrastructure of human intelligence, information (providing and processing technology) to the system, reflecting the value brought by artificial intelligence to the information technology industry.
  • the infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform.
  • the infrastructure communicates with the outside world through sensors; the computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud Storage and computing, interconnection networks, etc.
  • smart chips hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA
  • the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud Storage and computing, interconnection networks, etc.
  • sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, and the productization of intelligent information decision-making and implementation of applications. Its application areas mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical care, smart security, autonomous driving, safe city, smart terminals, etc.
  • FIG. 2 is a schematic diagram of an application scenario of the image processing system according to an embodiment of the present application.
  • a communication connection is established between the camera 201, the processor 202 and the smart phone 203, and the processor 202 can receive the photos or videos sent by the camera 201, and each frame in these photos and videos can be regarded as an image in the training picture set,
  • the processor 202 determines a reference image corresponding to the reference observation point from the received image according to the position and the viewing angle direction of the virtual observation point, and generates a new view image by using the image generation model that has been trained. After that, the processor 202 can integrate the new perspective images and send them to the smart phone 203 , or can only send the unintegrated new perspective images to the smart phone 203 .
  • the integrated image may be a 360° panorama photo or a 720° panorama photo, which is selected according to actual application needs, which is not specifically limited here.
  • the smartphone 203 displays the received image.
  • the embodiment shown in FIG. 2 is only an application scenario of the image processing system of the embodiment of the present application.
  • the camera 201 can also be replaced by other devices, such as a notebook computer or a tablet computer.
  • the processor 202 does not necessarily exist outside the smart phone 203 , and may be a processor in the smart phone 203 .
  • the smartphone 203 can also be replaced by other devices, which can be a virtual reality (VR) device, an augmented reality (AR) device, or a mixed reality (MR) device, as long as the device is suitable for The new perspective image can be displayed, which is not limited here.
  • VR virtual reality
  • AR augmented reality
  • MR mixed reality
  • FIG. 3 is a system architecture diagram of the image processing system provided by the embodiment of the present application.
  • the image processing system includes an execution device 310 , a training device 320 , a database 330 , a client device 340 and a data storage system 350 , wherein the execution device 310 includes a computing module 311 .
  • the database 330 stores a training picture set, so that the training device 320 predicts the reference color of the position where the light from the target observation passes through according to at least one reference image in the training picture set.
  • the training device 320 is configured to generate the image generation model 301 , and iteratively train the image generation model 301 by using at least one picture in the database 330 , so as to obtain the optimal image generation model 301 .
  • the execution device 310 After the execution device 310 generates a new perspective image according to the image generation model 301, the new perspective image can be sent to different devices, to the client device 340, or to the data storage system 350, which is not specifically limited here.
  • the image generation model 301 can be applied to different devices, such as mobile phones, tablets, notebook computers, VR devices, AR devices, monitoring systems, etc., which are not specifically limited here.
  • the way that the training device 320 configures the image generation model 301 in the execution device 310 can be sent through wireless communication, or through wired communication, and the image generation model 301 can also be configured on the execution device 310 through a removable storage device.
  • the actual configuration method is selected according to the needs of the actual application, which is not specifically limited here.
  • the training device 320 trains the image generation model 301, it will determine at least one picture from the input training picture set according to the inputted position and view direction of the target observation point. as a reference image.
  • the multiple images in the training picture set have various representations, which can be photos obtained by using a shooting device, or at least one image in a video frame, which is not limited here.
  • the multiple images in the training picture set can be acquired in various ways, which may be acquired from the data acquisition device 360 or sent by the client device 340, wherein the data acquisition device 360 may be a laptop computer or a camera, as long as It is a device that has a camera function and can take photos or videos, which is not specifically limited here.
  • the client device 340 and the execution device 310 may be independent devices, respectively, or may be a whole, which is not specifically limited here.
  • the execution device 310 is configured with an I/O interface 312 for data interaction with the client device 340.
  • the user can input the spatial position and view direction of the virtual observation point to the I/O interface 312 through the client device 340.
  • the O interface 312 sends the generated new perspective image to the client device 340 for provision to the user.
  • FIG. 3 is only a schematic diagram of the architecture of the image processing system provided by the embodiment of the present application, and the positional relationship between the devices and components shown in the figure does not constitute any limitation.
  • the execution device 310 may also be a graphics processor (graphics processing unit, GPU) or a neural-network processor (neural-network) in the mobile phone. processing units, NPU), which is not specifically limited here.
  • FIG. 4 is a schematic flowchart of the image generation model training method in the embodiment of the present application, including:
  • the training device determines the position and the viewing angle direction of the target observation point.
  • the relevant information of the target observation point includes the position of the target observation point and the viewing angle direction of the target observation point.
  • the position of the target observation point is a three-dimensional coordinate point (x, y, z), and the viewing angle direction of the observation point is two-dimensional.
  • the coordinate points and viewing direction of the target observation point can be manually selected and input into the training equipment.
  • the training device determines at least one reference image.
  • the photographing device may be a smartphone, a camera or a Polaroid, as long as it is a device having a photographing function, which is not specifically limited here.
  • There are various representations of the multiple images which may be photos obtained by the photographing device using the photographing function, or each frame of images in the video obtained by the photographing device using the video recording function, which is not specifically limited here.
  • the multiple images obtained by using the photographing device may be referred to as a training picture set, and these images may be input into the training device, so that the training device selects at least one reference image therefrom.
  • the process of selecting a reference image by the training device will be described below.
  • the training device can determine at least one reference observation point according to the input position of the target observation point and the viewing angle direction of the target observation point, and then determine the reference image corresponding to each reference observation point from the training picture set.
  • the distance between each of the at least one reference observation point and the target observation point needs to satisfy a preset condition.
  • the distance mentioned here is determined by the position of the observation point and the viewing angle direction, because the closer the positions of the two observation points and the more similar the viewing angle direction of the two observation points, the corresponding viewing angle of the two observation points.
  • the larger the overlapping area of the images the higher the similarity between the two perspective images.
  • the distance between the reference observation point and the target observation point needs to satisfy the preset conditions, including that the positions of the two points satisfy the preset conditions, and the viewing angle directions of the two points satisfy the preset conditions. Satisfying the preset condition may be less than or equal to a certain preset threshold.
  • each observation point can be represented by the coordinate system of the observation point itself, but when comparing other observation points, each observation point should be based on the same coordinate system, so that the comparison can be made.
  • the results are informative.
  • the training device can select a reference image from the training picture set within the allowable error range, which improves the practicability of the solution.
  • the training device determines the first reference color of the spatial position according to the at least one reference image.
  • the training device can back-project the spatial point in the spatial position where the light from the target observation point passes through to different pixel positions in the reference image to obtain the spatial point in the reference image.
  • reference color Due to the influence of factors such as occlusion and illumination, the reference color projected by different observation points in the same spatial position may be different.
  • the training equipment can use the mode of the reference color as the parameter used in the subsequent training process , which is the first reference color. To a certain extent, the influence of interference factors on the prediction result may be reduced, and the accuracy of the technical solution of the present application may be improved.
  • the training device acquires the real color of the pixel in the perspective image corresponding to the target observation point.
  • the observed object is actually observed at the target observation point, and there is a perspective image corresponding to the target observation point. Therefore, the real color of the pixels in the viewing angle image corresponding to the target observation point can be manually input to the training device in advance for comparing with the first reference color to determine the residual color.
  • step 403 and step 404 do not have a necessary sequence, and step 403 or step 404 may be performed first, and selection is made according to actual application needs, which is not specifically limited here.
  • the training device determines the residual color of the spatial position.
  • the residual color can be understood as a function of the position and viewing direction of the target observation point and the parameters of the neural network.
  • the first reference color and the real color are the relevant parameters of the neural network.
  • the training device can obtain the residual color of a certain spatial point in the spatial position where the light from the target observation point passes through the neural network. and opacity.
  • the training device determines whether the real color is transparent, and if so, executes step 408, and if not, executes step 407.
  • the training device can determine the type of loss function used for training the image generation model according to the transparency of the real color.
  • step 405 and step 406 do not have a certain sequence, and step 405 or step 406 may be performed first, and selection is made according to actual application needs, which is not specifically limited here.
  • the training device trains the image generation model according to the loss function of the residual color.
  • the training device determines that the real color is transparent, it can train the image generation model according to the preset loss function of the residual color.
  • the loss function for residual color can be:
  • C(r) is used to represent the true color.
  • the training device determines that the true color is not transparent, it can obtain the loss function of direct prediction.
  • the loss function for direct prediction can be:
  • c i represents the true color of each spatial point
  • ⁇ i represents the opacity of a spatial point in the spatial position
  • ⁇ i Represents the distance of each spatial point on a ray.
  • the role of the direct prediction loss function is to make the predicted new view image as close to the real image as possible while only learning the real color.
  • the training equipment can directly predict the true color of the viewing angle image corresponding to the target observation point in different ways, such as using the MPI technology or using the NeRF technology to predict the true color. This process is not the focus of the technical solution of this application, so it is not Do a detailed description.
  • the training device trains the image generation model according to the loss function of the residual color and the directly predicted loss function.
  • the training device can jointly train the image generation model according to the loss function of residual color and the loss function of direct prediction.
  • the loss function for joint training can be expressed as
  • Loss Loss whole +Loss resi
  • the image generation model is obtained by training based on the residual color.
  • the residual color belongs to low-frequency information and is easy to represent and memorize. Therefore, the clarity of the new perspective image obtained based on the image generation model can be improved.
  • the training device can train the image generation model in conjunction with the directly predicted loss function to avoid overfitting of the image generation model using only the loss function of the residual color when the real color is transparent, reducing the phenomenon of overfitting.
  • the probability of an error in the new-view image improves the robustness of the algorithm and the reliability of the technical solution of the present application.
  • some reference observation points may observe the color of the occluder, which affects the value of the first reference color and the residual color, and also affects the accuracy of the image generation model. .
  • FIG. 5 is an embodiment of the training method of the image generation model in the embodiment of the present application.
  • the training device determines the position and the viewing angle direction of the target observation point.
  • the training device determines at least one reference image.
  • the training device determines the first reference color of the spatial position according to the at least one reference image.
  • the training device acquires the real color of the pixel in the perspective image corresponding to the target observation point.
  • the training device determines the residual color of the spatial position.
  • the training device determines whether the real color is transparent, and if so, executes step 508, and if not, executes step 507.
  • the training device trains the image generation model according to the loss function of the residual color.
  • the training device trains the image generation model according to the loss function of the residual color and the directly predicted loss function.
  • Steps 501 to 509 are similar to steps 401 to 409 in the embodiment shown in FIG. 4 , and are not repeated here.
  • the training device determines whether the new perspective image and each reference image satisfy the preset condition, if yes, go to step 511, if not, go to step 512.
  • the training equipment After training the image generation model for an iterative cycle, the training equipment needs to check the accuracy of the image generation model and correct the existing problems, so as to continuously optimize the image generation model, so that the new image generation model obtained according to the image generation model can be improved.
  • the perspective image is as close to the real image as possible.
  • the optimization process of the image generation model is introduced below.
  • the execution device can generate the model according to the image after the previous iteration cycle, obtain the residual color of the spatial position corresponding to the virtual observation point, and then combine the reference color of this spatial position to predict the new perspective image corresponding to the virtual observation point. Then, the new perspective image is input into the training device, and the training device determines whether the reference image used in the training process is accurate by judging the similarity between the new perspective image and the reference image.
  • the projection of the spatial point on the image will correspond to a certain pixel position. Therefore, the way of judgment can be to select a certain pixel point in the new perspective image as the reference point, and compare the image blocks of the same pixel size centered on the reference point. Whether the similarity between the new view image and each reference image satisfies the preset conditions. If the preset conditions are met, it means that there is no occlusion in this reference image, and it can continue to be used in the training process of the image generation model in the next iteration cycle.
  • the size of the image block may be 3px ⁇ 3px or 5px ⁇ 5px, and px is the abbreviation of pixel, which means that the pixel is selected according to the needs of the actual application, which is not specifically limited here.
  • the similarity of the two image blocks satisfies a preset condition, which may be that the color similarity of the two image blocks is less than or equal to a preset threshold.
  • the training device determines that the second reference color is the first reference color.
  • each reference image selected by the training device satisfies the preset conditions, it means that the first reference color used in the previous iteration cycle is correct and can be used in the subsequent training process.
  • the training device determines that the mode of the reference colors of the reference image that meets the condition is the second reference color.
  • the training device needs to remove the reference image that does not meet the condition, and re-determine the reference color used in the training process.
  • At least one reference image contains a reference image that does not meet the conditions. There may be the following two situations:
  • the training device can determine the remaining (Y-X) images. It is the basis used to determine the second reference color.
  • the second reference color is the mode of the reference color of the (Y-X) reference image at the point to be observed.
  • the second reference color may be the same as the first reference color used in the previous iteration cycle, or may be different, which is related to the situation that the observed object is occluded, which is not specifically limited here.
  • Y is an integer greater than or equal to 1
  • X is an integer greater than or equal to 1 and less than Y.
  • each of the N reference images does not satisfy the condition, in which case the training device can determine that the color value of the second reference color is 0.
  • the reference color in 9 reference images is the color of the occluder (yellow), and the color in 9 reference images is the real color (red). At this time, there are two modes of reference colors. .
  • a possible case is that there are 6 reference images where the reference color is the color of occluder 1 (yellow), 6 reference images where the reference color is the color of occluder 2 (green), and 6 reference images
  • the reference color in is the real color (red), and there are three modes of the reference color at this time.
  • step 512 is to eliminate the adverse effect of the wrong reference image on the image generation model, so as to improve the robustness of the algorithm.
  • the training device optimizes the image generation model by using the second reference color as the first reference color.
  • the training device may input the second reference color as the first reference color into the image generation model, so as to adjust the parameters of the image generation model and optimize the image generation model.
  • the image generation model is obtained by training based on the residual color.
  • the residual color belongs to low-frequency information and is easy to represent and memorize. Therefore, the clarity of the new perspective image obtained based on the image generation model can be improved.
  • the training device removes the inappropriate parameter values used to generate the model for the training image in the previous iteration cycle by comparing the new perspective image with at least one reference image used during training, thereby reducing the occlusion of the point to be observed.
  • the impact on the new perspective image improves the robustness of the algorithm and the accuracy of the technical solution.
  • step 506 , step 508 and step 509 may not be executed, and step 507 is directly executed after step 505 .
  • the training device directly optimizes the image generation model according to the loss function of the residual color, and continuously optimizes the image generation model according to the training results of the previous iteration cycle.
  • the wrong reference images in the training image set are removed, so that the new perspective images obtained by the trained image generation model are more accurate.
  • the operation steps can be saved and the operation process can be simplified, thereby reducing the consumption of computing resources.
  • the quality of the image generation model is related to the accuracy of dense matching, and the dense matching of images is based on the similarity of textures, it is difficult to provide matching information in the untextured area of the image, while the texture-rich area can provide accurate matching information.
  • the area with rich texture mentioned here refers to the area where the color changes, for example, from red to yellow, the intersection of the two colors can be regarded as the edge of the texture.
  • human senses are also more sensitive to the perception of areas with rich textures, the image generation simulation provided by the embodiments of the present application will perform more training on areas with rich textures during training, so that the final image generation model is obtained. more practical.
  • the embodiment of the present application also provides a method for generating an image from a new perspective, which can generate an image from a new perspective by using the above-mentioned image generation model. Please refer to FIG. 6 .
  • FIG. 6 is an embodiment of a method for generating a new perspective image according to an embodiment of the present application.
  • the executing device determines the position and the viewing angle direction of the virtual observation point.
  • the virtual observation point is the observation point that has not actually observed the observed object, and can be randomly selected manually. After manually selecting the virtual observation point, the execution device may receive the manually input position and view direction of the virtual observation point.
  • the execution device generates a model according to the image, and obtains the residual color.
  • the execution device can input the position and view direction of the virtual observation point into the image generation model to obtain the residual color of the spatial position corresponding to the virtual observation point.
  • the spatial position corresponding to the virtual observation point refers to the position where the light from the virtual observation point passes.
  • the image generation model used by the execution device includes the image generation model in the embodiments shown in FIGS. 3 to 5 , which may be an image generation model that has not been fully trained, or a trained image generation model.
  • the actual application needs to make a selection, which is not specifically limited here.
  • the new perspective image obtained based on the residual color can be used to remove the occluded reference image, so as to optimize the image generation model. meaning.
  • the residual color obtained by using the trained image generation model is the residual color in the ideal state of the embodiment of the present application, and the new perspective image generated based on the residual color is also relatively accurate.
  • the executing device acquires the first reference color.
  • the executing device may determine at least one reference image from the training picture set, thereby acquiring the first reference color.
  • the execution subject that determines at least one reference image from the training picture set may also be a training device.
  • the process of selecting a reference image by the training device is similar to step 402 in the embodiment shown in FIG. 4 , the difference being that the reference observation is determined.
  • the basis of the point is the position and the viewing angle direction of the virtual observation point, not the position and the viewing angle direction of the target observation point, which will not be repeated here.
  • the first reference color in this embodiment includes the first reference color in the embodiments shown in FIG. 4 and FIG. 5 .
  • the execution device When the execution device obtains the first reference color, it can also obtain the opacity of the spatial position, because the opacity of the spatial position will affect the final imaging effect, so the execution device also needs to obtain the opacity.
  • the execution device generates a new perspective image according to the residual color and the first reference color.
  • the color of each pixel position is obtained by the color integration of multiple spatial points on a ray.
  • the execution device can integrate To obtain a new perspective image corresponding to the virtual observation point, the integration process may have the following situations.
  • One of the cases is to integrate the residual color and the first reference color of each spatial point separately, and then add the results of the integration to obtain a new perspective image.
  • Another case is to first add the first reference color and the residual color of each spatial point, and then integrate them together to obtain a new perspective image.
  • the functions with physical meaning can be used as
  • ⁇ i represents the opacity of a certain spatial point in the spatial position
  • ⁇ i is used to represent the distance of each spatial point on a ray.
  • the image generation model is obtained by training based on the residual color.
  • the residual color belongs to low-frequency information and is easy to be represented and memorized. Therefore, the image of the new perspective generated by the execution device using the image generation model has high definition.
  • FIG. 7 is an embodiment of the training device 700 for the image generation model provided by the embodiment of the present application, including:
  • Determining unit 701 configured to determine the position and viewing angle direction of the target observation point, and then determine at least one reference image from N input images according to the position and viewing angle direction of the target observation point, where N is an integer greater than or equal to 2 . Then, according to at least one reference image, the reference color of the spatial position is determined.
  • the spatial position is the position where the light from the target observation point passes.
  • the obtaining unit 702 is configured to obtain the real color of the pixel in the viewing angle image corresponding to the target observation point.
  • the determining unit 701 is further configured to determine the residual color of the spatial position according to the reference color and the real color.
  • the processing unit 703 is configured to train an image generation model according to the residual color.
  • the reference color includes: a first reference color, where the first reference color is the mode of the colors of the positions where the light rays from the target observation point pass.
  • processing unit 703 is configured to:
  • the image generation model is trained according to the loss function of the residual color.
  • the loss function of direct prediction is obtained, and the image generation model is trained according to the loss function of residual color and the loss function of direct prediction.
  • the acquiring unit 702 is further configured to acquire a new perspective image corresponding to the target observation point, where the new perspective image is predicted by the execution device according to the image generation model.
  • the determining unit 701 is further configured to determine the second reference color according to the new perspective image and each reference image in the at least one reference image. After that, use the second reference color as the first reference color.
  • the determining unit 701 is specifically configured to:
  • the second reference color is determined as the first reference color.
  • the second reference color is determined to satisfy the preset condition The mode of the reference color at the spatial location of the reference image.
  • the color value of the second reference color is determined to be 0.
  • the determining unit 701 is specifically configured to:
  • At least one reference observation point is determined according to the position of the target observation point and the viewing angle direction, wherein the distance between each reference observation point in the at least one reference observation point and the target observation point satisfies a preset condition.
  • At least one reference image is acquired according to the at least one reference observation point, wherein each reference observation point in the at least one reference observation point corresponds to each reference image in the at least one reference image.
  • the loss function of the residual color may be:
  • C(r) is used to represent the true color.
  • the loss function for direct prediction may be:
  • c i represents the true color of each spatial point
  • ⁇ i represents the opacity of a spatial point in the spatial position
  • ⁇ i Represents the distance of each spatial point on a ray.
  • the role of the direct prediction loss function is to make the predicted new view image as close to the real image as possible while only learning the real color.
  • the image generation model training apparatus 700 may perform the operations performed by the training device in the foregoing embodiments shown in FIG. 3 to FIG. 5 , and details are not repeated here.
  • FIG. 8 is an embodiment of the apparatus 800 for generating a new perspective image provided by the embodiment of the present application, including:
  • the determining unit 801 is used for determining the position and the viewing angle direction of the virtual observation point.
  • the obtaining unit 802 is configured to input the position and viewing angle direction of the virtual observation point into the image generation model, and obtain the residual color of the spatial position where the light rays from the virtual observation point pass through.
  • the obtaining unit 802 is further configured to obtain a reference color, where the reference color is a color at a spatial position determined according to at least one reference image.
  • the processing unit 803 is configured to generate a new perspective image corresponding to the virtual observation point according to the residual color and the reference color of the spatial position.
  • the image generation model includes: an image generation model obtained by training according to the loss function of the residual color. Alternatively, train the resulting image generation model based on the loss function for residual color, and the loss function for direct prediction.
  • the reference color includes: a first reference color, where the first reference color is the mode of the color of the spatial position where the light rays from the virtual observation point pass.
  • the obtaining unit 802 is specifically configured to receive the first reference color sent by the training device.
  • the reference color includes: a first reference color.
  • the obtaining unit 802 is specifically used for:
  • At least one reference observation point is determined according to the position and viewing angle direction of the virtual observation point, wherein the distance between each reference observation point in the at least one reference observation point and the virtual observation point satisfies a preset condition.
  • At least one reference picture is determined from the N reference pictures, wherein each reference observation point in the at least one reference observation point corresponds to each reference image in the at least one reference image, and N is an integer greater than or equal to 2.
  • a first reference color is determined from at least one reference image.
  • the apparatus 800 for generating a new perspective image may perform the operations performed by the processor in the embodiment shown in FIG. 2 or the operations performed by the execution device in the embodiment shown in FIG. 6 , and details are not repeated here. .
  • FIG. 9 is an embodiment of the image processing system 900 provided by the embodiment of the present application, including:
  • Training equipment 910 and execution equipment 920 are trained to train Training equipment 910 and execution equipment 920.
  • the training device 910 includes: a first processor 911 and a first memory 912 .
  • the first memory 912 is used to store a training picture set, where the training picture set includes at least two images.
  • the first processor 911 is configured to perform the operations performed by the training device in the foregoing embodiments shown in FIG. 3 to FIG. 5 , or the operations performed by the image generation model training apparatus 700 in the foregoing embodiment shown in FIG. 7 , specifically here No longer.
  • the training device 920 includes: a second processor 921 and a second memory 922 .
  • the second memory 922 is used to store new perspective images.
  • the second processor 921 is used for the operations performed by the processor in the foregoing embodiment shown in FIG. 2 , the operations performed by the execution device in the foregoing embodiment shown in FIG. 6 , or the foregoing operations performed by the apparatus 800 for generating a new perspective image in FIG. 8 .
  • the operations to be performed will not be described in detail here.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

一种图像生成模型的训练方法,根据残差颜色训练图像生成模型,残差颜色属于低频信息,易于表征和记忆,因此可以提高图像生成模型所生成的新视角图像的清晰度。本申请实施例方法包括:根据目标观测点的位置和视角方向,确定至少一张参考图像。然后根据至少一张参考图像在来自目标观测点的光线经过的空间位置的参考颜色和目标观测点对应视角图像中像素的真实颜色,确定空间位置的残差颜色,最后根据残差颜色,训练图像生成模型。

Description

图像生成模型的训练方法、新视角图像生成方法及装置 技术领域
本申请实施例涉及人工智能领域,尤其涉及图像生成模型的训练方法、新视角图像生成方法及装置。
背景技术
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术以及应用系统。简单来说,人工智能研究的是各种智能机器的设计原理和实现方法,使得机器具有感知、推理和决策的功能。在实践中,AI技术可以用于图像的生成,使得智能机器基于已有的不同视角的图像,生成新视角的图像。
在一种图像生成模型的训练方法中,通过对观测光线上的点进行采样,并将图像的几何信息和纹理信息存储在神经网络中,利用体渲染技术得到新视角视图。
在这种方法中,由于纹理信息属于高频信息,神经网络记忆并表征高频信息的难度较高,因此得到的新视角视图较为模糊,不够清晰。
发明内容
本申请实施例提供了图像生成模型的训练方法、新视角图像生成方法及装置,通过图像生成模型学习来自任一观测点的光线所经过的空间位置的残差颜色,生成新视角图像,由于残差颜色属于低频信息,易于表征和记忆,因此可以提升新视角图像的清晰度。
本申请实施第一方面提供了一种图像生成模型的训练方法,包括:
训练设备可以接收人工输入的目标观测点的位置和视角方向,目标观测点是对被观测物体进行观测的任意一个观测点。每一个观测点都有各自的位置和视角方向,通常情况下,使用三维坐标(x,y,z)表示一个观测点的位置,使用
Figure PCTCN2020128680-appb-000001
表示一个观测点的视角方向。
训练设备在获取到目标观测点的位置和视角方向之后,可以根据该位置和该视角方向,从预先输入的N张图像中,确定出至少一张参考图像。其中,N为大于或等于2的整数。之后,训练设备可以预测(predict)出来自目标观测点的光线所经过的空间位置的参考颜色(reference color)。
在对图像生成模型进行训练的时候,实际上是存在目标观测点对应的视角图像的,训练设备可以获取到该视角图像中每个像素的真实颜色(ground-truth)。之后,训练设备就可以根据真实颜色和参考颜色,确定出来自目标观测点的光线所经过的空间位置的残差颜色(residual color)。最后,训练设备就可以使用该残差颜色,对图像生成模型进行训练。
本申请实施例中,图像生成模型是根据残差颜色训练得到的,残差颜色属于低频信息,易于表征和记忆,因此,可以提升基于该图像生成模型得到的新视角图像的清晰度。
结合第一方面,本申请实施例第一方面的第一种实现方式中,由于遮挡、光照等干扰 因素使得在不同观测点预测得到的图像并不完全相同,在不同的参考图像中,同一个空间位置的参考颜色也可能不相同。因此,训练设备可以选择至少一张参考图像中,在来自于目标观测点的光线经过的空间位置的颜色的众数作为第一参考颜色,用于后续确定残差颜色。
本申请实施例中,选择至少一张参考图像在空间位置的颜色的众数作为第一参考颜色,在一定程度上可能会减少干扰因素对于图像生成模型的准确度的影响,提升技术方案的准确性。
结合第一方面或者第一方面的第一种实现方式,本申请实施例第一方面的第二种实现方式中,训练图像生成模型的损失函数可以是多个。
在真实颜色不是透明色的情况下,可以只使用残差颜色的损失函数对图像生成模型进行训练。
在真实颜色是透明色的情况下,只用残差颜色的损失函数训练图像生成模型,而不让图像生成模型学习真实颜色的话,容易出现过拟合的现象,使得空间中真实颜色是透明的点,在预测的图像中变得不透明,造成伪影(artifact)的情况,影响生成图像的清晰度。因此,在真实颜色是透明的情况下,训练设备可以基于联合网络,使用残差颜色的损失函数和直接预测的损失函数,对图像生成模型进行联合训练。
本申请实施例中,使用多个损失函数共同训练图像生成模型,提高了算法的鲁棒性,也让训练出来的图像生成模型能够适用多种情况,提升了方案的灵活性。
结合第一方面、第一方面的第一种至第二种实现方式中的任一种,本申请实施例第一方面的第三种实现方式中,训练设备可以根据上一训练周期的训练结果,不断优化图像生成模型,使得图像生成模型更贴近真实情况,可以使用下述方式进行优化。
训练设备在对图像生成模型进行过一个迭代周期的训练之后,可以得到目标观测点对应的新视角图像。然后对比新视角图像和训练时所使用的至少一张参考图像中的每一张参考图像,从而确定出第二参考颜色,将第二参考颜色作为在下一迭代周期对图像生成模型进行训练的第一参考颜色。
结合第一方面的第三种实现方式,本申请实施例第一方面的第四种实现方式中,训练设备可以使用如下的方式确定出第二参考颜色。
首先,训练设备可以将目标观测点在上一迭代周期得到的新视角图像中,选择任意一个像素点作为基准点,然后选择以该基准点为中心的相同像素大小的图像块作为对比的依据,通过比较这一图像块在新视角图像和参考图像中的相似度,确定出第二参考颜色。可能会出现以下几种情况:
如果以基准点为中心的相同像素大小的图像块,在新视角图像和每一张参考图像中的相似度满足都预设条件,则说明该基准点在至少一张参考图像中,均没有被遮挡,上一个训练周期使用的第一参考颜色可以继续在后续的迭代周期中使用。
如果在至少一张参考图像中,存在部分参考图像,使得该图像块在新视角图像和这部分参考图像中的相似度不满足预设条件,则意味着在这些参考图像中,该基准点被遮挡了。因此,训练设备可以确定第二参考颜色是满足预设条件的参考图像在该基准点的颜色的众 数。
如果该图像块在新视角图像和至少一张参考图像中的每一张参考图像中的相似度都不满足预设条件,则意味着该基准点在这些参考图像中都被遮挡了,训练设备可以将第二参考颜色的颜色值确定为0。
本申请实施例中,训练设备通过比较经过上一个迭代周期预测的目标观测点的视角图像和训练时所使用的至少一张参考图像,从而去除在上一迭代周期中对训练图像生成模型所用的不合适的参数值,从而降低遮挡对新视角图像造成的影响,提升了算法的鲁棒性,以及技术方案的准确性。
结合第一方面、第一方面的第一种至第四种实现方式中的任一种,本申请实施例第一方面的第五种实现方式中,训练设备可以根据目标观测点的位置和视角方向确定出至少一个参考观测点,然后确定至少一个参考观测点对应的图像为参考图像。
其中,每个参考观测点与目标观测点的距离均需要满足预设条件。由于两点的位置越接近,视角方向越相似,在两个点观测到的图像的相似程度才会越高,因此,此处的距离是由两个点的位置和视角方向共同决定的,既要使得两点的位置之间满足预设条件,又要使得两点的视角方向之间满足预设条件。满足预设条件可以是小于或者等于预设阈值。
本申请实施例中,通过位置和视角方向决定参考观测点,在满足预设条件的情况下使得根据图像生成模型得到的新视角图像的准确度在误差允许的范围之内,提升了方案的准确性。
结合第一方面、第一方面的第一种至第五种实现方式中的任一种,本申请实施例第一方面的第六种实现方式中,残差颜色的损失函数可以是:
Figure PCTCN2020128680-appb-000002
其中,
Figure PCTCN2020128680-appb-000003
用于表示根据残差颜色和第一参考颜色预测出的新视角图像,
Figure PCTCN2020128680-appb-000004
表示的是每个空间点的第一参考颜色,
Figure PCTCN2020128680-appb-000005
表示的是每个空间点的残差颜色,σ i表示的是空间位置中某一个空间点的不透明度,δ i表示的是一条光线上各个空间点之间的距离,C(r)用于表示真实颜色。
残差颜色的损失函数的作用是使得根据第一参考颜色和残差颜色预测出的新视角图像,尽可能地接近于真实图像。
结合第一方面、第一方面的第一种至第六种实现方式中的任一种,本申请实施例第一方面的第七种实现方式中,直接预测的损失函数可以是:
Figure PCTCN2020128680-appb-000006
其中,
Figure PCTCN2020128680-appb-000007
用于表示直接预测出的目标观测点对应的视角图像中像素的颜色,c i表示的是每个空间点的真实颜色,σ i表示的是空间位置中某一个空间点的不透明度,δ i表示的是一条光线上各个空间点之间的距离。直接预测的损失函数的作用是,在只学习真实颜色的情况下,使得预测出的新视角图像尽可能地接近于真实图像。
本申请实施例第二方面提供了一种新视角图像的生成方法,包括:
虚拟观测点(virtual view point)是实际上并没有对被观测物体进行过观测的观测点,可以由人工随机选择。在人工选定虚拟观测点之后,执行设备可以接收人工输入的虚拟观测点的位置和视角方向。
执行设备在获取到虚拟观测点的位置和视角方向,可以将该位置和视角方向输入到图像生成模型中,得到来自虚拟观测点的光线所经过的空间位置的残差颜色。然后结合获取到的参考颜色,生成虚拟观测点对应的新视角图像。其中,参考颜色是根据至少一张参考图像确定出来的。
本申请实施例中,图像生成模型是根据残差颜色训练得到的,残差颜色属于低频信息,易于表征和记忆,因此,可以提升基于该图像生成模型得到的新视角图像的清晰度。
结合第二方面,本申请实施例第二方面的第一种实现方式中,图像生成模型可以是根据残差颜色的损失函数训练得到的。也可以是根据残差颜色的损失函数,以及直接预测的损失函数训练得到的。使用根据残差颜色的损失函数,以及直接预测的损失函数训练得到的图像生成模型,预测得到的新视角图像的效果会更加准确。
本申请实施例中,执行设备使用的图像生成模型可以是使用多个损失函数共同训练得到的图像生成模型,提高了生成的图像的清晰度。
结合第二方面或者第二方面的第一种实现方式,本申请实施例第二方面的第二种实现方式中,参考颜色包括第一参考颜色,第一参考颜色是指来自虚拟观测点的光线经过的空间位置的颜色的众数。执行设备获取第一参考颜色的方式可以是接收训练设备发送的第一参考颜色。
结合第二方面或者第二方面的第一种实现方式,本申请实施例第二方面的第三种实现方式中,执行设备不从训练设备处获取第一参考颜色,而是可以根据虚拟观测点的位置和视角方向,来确定第一参考颜色。确定的过程可以如下所述:
执行设备可以虚拟观测点的位置和视角方向,确定出至少一个参考观测点,然后确定至少一个参考观测点中每个参考观测点对应的参考图像。其中,参考观测点与虚拟观测点的距离需要满足预设条件。只要两个观测点的位置或者视角方向不同,那么这两个点就是不同的观测点。由于两点的位置越接近,视角方向越相似,在两个点观测到的图像的相似程度越高,因此,此处的距离是由两个点的位置和视角方向共同决定的,既要使得两点的位置之间满足预设条件,又要使得两点的视角方向之间满足预设条件。满足预设条件可以是小于或者等于预设阈值。
本申请实施例中,通过位置和视角方向决定参考观测点,在满足预设条件的情况下使得根据图像生成模型得到的新视角图像的准确度在误差允许的范围之内,提升了方案的准确性。
结合第二方面的第一种至第三种实现方式中的任一种,本申请实施例第二方面的第四种实现方式中,残差颜色的损失函数可以是:
Figure PCTCN2020128680-appb-000008
其中,
Figure PCTCN2020128680-appb-000009
用于表示根据残差颜色和第一参考颜色预测出的新视角图像,
Figure PCTCN2020128680-appb-000010
表示的是每个空间点的第一参考颜色,
Figure PCTCN2020128680-appb-000011
表示的是每个空间点的残差颜色,σ i表示的是空间位置中某一个空间点的不透明度,δ i表示的是一条光线上各个空间点的距离,C(r)用于表示真实颜色。
残差颜色的损失函数的作用是使得根据第一参考颜色和残差颜色预测出的新视角图像,尽可能地接近于真实图像。
结合第二方面的第一种至第四种实现方式中的任一种,本申请实施例第二方面的第五种实现方式中,直接预测的损失函数可以是:
Figure PCTCN2020128680-appb-000012
其中,
Figure PCTCN2020128680-appb-000013
用于表示直接预测出的目标观测点对应的视角图像中像素的颜色,c i表示的是每个空间点的真实颜色,σ i表示的是空间位置中某一个空间点的不透明度,δ i表示的是一条光线上各个空间点的距离。直接预测的损失函数的作用是,在只学习真实颜色的情况下,使得预测出的新视角图像尽可能地接近于真实图像。
本申请实施例第三方面提供了一种图像生成模型的训练装置,包括:
确定单元,用于确定目标观测点的位置和视角方向,然后根据目标观测点的位置和视角方向,从N张输入图像中确定至少一张参考图像,其中,N为大于或等于2的整数。再根据至少一张参考图像,确定空间位置的参考颜色。其中,空间位置为来自目标观测点的光线经过的位置。
获取单元,用于获取目标观测点对应的视角图像中像素的真实颜色。
确定单元,还用于根据参考颜色和真实颜色,确定空间位置的残差颜色。
处理单元,用于根据残差颜色,训练图像生成模型。
本方面所示的有益效果,与第一方面的有益效果相似,详见第一方面所示,此处不再赘述。
结合第三方面,本申请实施例第三方面的第一种实现方式中,参考颜色包括:第一参考颜色,第一参考颜色为来自目标观测点的光线经过的位置的颜色的众数。
结合第三方面或者第三方面的第一种实现方式,本申请实施例第三方面的第二种实现方式中,处理单元,用于:
若真实颜色不是透明色,则根据残差颜色的损失函数,训练图像生成模型。
若真实颜色是透明色,则获取直接预测的损失函数,并根据残差颜色的损失函数,以及直接预测的损失函数,训练图像生成模型。
结合第三方面、第三方面的第一种实现至第二种实现方式中的任一种,本申请实施例第三方面的第三种实现方式中,获取单元,还用于获取目标观测点对应的新视角图像,其中,新视角图像是执行设备根据图像生成模型预测的。
确定单元,还用于根据新视角图像,以及至少一张参考图像中的每一张参考图像,确定第二参考颜色。之后,将第二参考颜色作为第一参考颜色。
结合第三方面的第三种实现方式,本申请实施例第三方面的第四种实现方式中,确定单元,具体用于:
确定目标观测点对应的新视角图像中的任一像素点为基准点。
若以基准点为中心的相同像素大小的图像块,在新视角图像和每一张参考图像中的相似度满足预设条件,则确定第二参考颜色为第一参考颜色。
若以基准点为中心的相同像素大小的图像块,在新视角图像和至少一张参考图像中的部分参考图像中的相似度不满足预设条件,则确定第二参考颜色为满足预设条件的参考图像在空间位置的参考颜色的众数。
若以基准点为中心的相同像素大小的图像块,在新视角图像和每一张参考图像中的相似度关系均不满足预设条件,则确定第二参考颜色的颜色值为0。
结合第三方面、第三方面的第一种至第四种实现方式中的任一种,本申请实施例第三方面的第五种实现方式中,确定单元,具体用于:
根据目标观测点的位置和视角方向,确定至少一个参考观测点,其中,至少一个参考观测点中的每个参考观测点与目标观测点的距离满足预设条件。
根据至少一个参考观测点,获取至少一张参考图像,其中,至少一个参考观测点中的每个参考观测点对应于至少一张参考图像中的每张参考图像。
结合第三方面的第一种至第五种实现方式中的任一种,本申请实施例第三方面的第六种实现方式中,残差颜色的损失函数可以是:
Figure PCTCN2020128680-appb-000014
其中,
Figure PCTCN2020128680-appb-000015
用于表示根据残差颜色和第一参考颜色预测出的新视角图像,
Figure PCTCN2020128680-appb-000016
表示的是每个空间点的第一参考颜色,
Figure PCTCN2020128680-appb-000017
表示的是每个空间点的残差颜色,σ i表示的是空间位置中某一个空间点的不透明度,δ i表示的是一条光线上各个空间点的距离。C(r)用于表示真实颜色。
残差颜色的损失函数的作用是使得根据第一参考颜色和残差颜色预测出的新视角图像,尽可能地接近于真实图像。
结合第三方面的第一种至第六种实现方式中的任一种,本申请实施例第三方面的第七种实现方式中,直接预测的损失函数可以是:
Figure PCTCN2020128680-appb-000018
其中,
Figure PCTCN2020128680-appb-000019
用于表示直接预测出的目标观测点对应的视角图像中像素的颜色,c i表示的是每个空间点的真实颜色,σ i表示的是空间位置中某一个空间点的不透明度,δ i表示的是一条光线上各个空间点的距离。直接预测的损失函数的作用是,在只学习真实颜色的情况下,使得预测出的新视角图像尽可能地接近于真实图像。
本申请实施例第四方面提供了一种新视角图像的生成装置,包括:
确定单元,用于确定虚拟观测点的位置和视角方向。
获取单元,用于将虚拟观测点的位置和视角方向输入到图像生成模型中,获取来自虚拟观测点的光线经过的空间位置的残差颜色。
获取单元,还用于获取参考颜色,参考颜色是根据至少一张参考图像,确定出的空间位置的颜色。
处理单元,用于根据空间位置的残差颜色和参考颜色,生成虚拟观测点对应的新视角图像。
本方面所示的有益效果,与第一方面的有益效果相似,详见第一方面所示,此处不再赘述。
结合第四方面,本申请实施例第四方面的第一种实现方式中,图像生成模型包括:根据残差颜色的损失函数,训练得到的图像生成模型。或者,根据残差颜色的损失函数,以及直接预测的损失函数,训练得到的图像生成模型。
结合第四方面或者第四方面的第一种实现方式,本申请实施例第四方面的第二种实现方式中,参考颜色包括:第一参考颜色,第一参考颜色为来自虚拟观测点的光线经过的空间位置的颜色的众数。
获取单元,具体用于接收训练设备发送的第一参考颜色。
结合第四方面或者第四方面的第一种实现方式,本申请实施例第四方面的第二种实现方式中,参考颜色包括:第一参考颜色。
获取单元,具体用于:
根据虚拟观测点的位置和视角方向,确定至少一个参考观测点,其中,至少一个参考观测点中的每个参考观测点与虚拟观测点的距离满足预设条件。
从N张参考图片中确定至少一张参考图片,其中,至少一个参考观测点中的每个参考观测点对应于至少一张参考图像中的每张参考图像,N为大于或等于2的整数。
根据至少一张参考图像,确定第一参考颜色。
本申请实施例第五方面提供了一种图像处理系统,包括:训练设备、执行设备。
训练设备包括第一处理器和第一存储器,第一处理器用于执行前述第一方面的方法, 第一存储器用于存储训练图片集,训练图片集中包括至少两张图像。
执行设备包括第二处理器和第二存储器,第二处理器用于执行前述第二方面的方法,第二存储器用于存储新视角图像。
本申请实施例第六方面提供了一种计算机可读存储介质,该计算机可读存储介质中保存有程序,当所述计算机执行所述程序时,执行前述第一方面或第二方面的方法。
本申请实施例第七方面提供了一种计算机程序产品,当所述计算机程序产品在计算机上执行时,计算机执行前述第一方面或第二方面的方法。
本申请实施例第八方面提供了一种计算机设备,包括:
处理器、存储器、输入输出设备以及总线。其中,处理器、存储器、输入输出设备与总线相连。处理器中存储计算机指令,处理器用于执行计算机指令,使得计算机设备执行以下步骤:
确定目标观测点的位置和视角方向。
根据位置和视角方向,从N张输入图像中确定至少一张参考图像,其中,N为大于或等于2的整数。
根据至少一张参考图像,确定空间位置的参考颜色,空间位置为来自目标观测点的光线经过的位置。
获取目标观测点对应的视角图像中像素的真实颜色。
根据参考颜色和真实颜色,确定空间位置的残差颜色。
根据残差颜色,训练图像生成模型。
该计算机设备用于执行前述第一方面的方法。
本申请实施例第九方面提供了一种计算机设备,包括:
处理器、存储器、输入输出设备以及总线。其中,处理器、存储器、输入输出设备与总线相连。处理器中存储计算机指令,处理器用于执行计算机指令,使得计算机设备执行以下步骤:
确定虚拟观测点的位置和视角方向。
将位置和视角方向输入到图像生成模型中,获取来自虚拟观测点的光线经过的空间位置的残差颜色。
获取参考颜色,参考颜色是根据至少一张参考图像,确定出的空间位置的颜色。
根据空间位置的残差颜色和参考颜色,生成虚拟观测点对应的新视角图像。
该计算机设备用于执行前述第二方面的方法。
附图说明
图1为本申请实施例人工智能主体框架的一种结构示意图;
图2为本申请实施例图像处理系统的一个应用场景示意图;
图3为本申请实施例图像处理系统的一种系统架构图;
图4为本申请实施例图像生成模型的训练方法的一个流程示意图;
图5为本申请实施例图像生成模型的训练方法的另一个流程示意图;
图6为本申请实施例新视角图像生成方法的一个流程示意图;
图7为本申请实施例图像生成模型的训练装置的一个结构示意图;
图8为本申请实施例新视角图像的生成装置的一个结构示意图;
图9为本申请实施图像处理系统的一个结构示意图。
具体实施方式
本申请实施例提供了图像生成模型的训练方法、新视角图像生成方法及装置,通过图像生成模型学习光线经过的空间点的残差颜色,生成新视角图像,提升了新视角图像的清晰度。
首先对成像过程进行简单的介绍。光线在穿过物体时,会受到物体本身的不透明度等相关因素的影响,再经过光的折射和反射,最终呈现出人眼所看到的颜色。使用数学模型来模拟人眼观测物体的过程,可以理解为对某一观测点发出的每条光线在每一个空间点的颜色进行积分的过程。
接下来,对本申请实施例可能涉及的相关概念进行解释。
(1)损失函数(loss function)。
损失函数用于衡量预测值和真实值之间的差值,差值的大小能够很好地反映出模型和实际数据之间的差异。训练模型的作用在于使得根据模型预测的结果尽可能接近真实结果,所以可以通过设定损失函数来评价并不断对所训练的模型进行优化,损失函数的输出值(loss)越大,说明预测结果和真实结果的差异越大,训练模型的过程就是在尽可能缩小loss的过程。
(2)目标观测点、参考观测点、虚拟观测点。
客观地说,目标观测点、参考观测点和虚拟观测点一样,都是对物体进行观测的一个视角,可以把一个观测点简单地理解为一个相机的姿态。在不同的观测点对同一个物体进行观测,得到的图像可能是不一样的,原因在于每个观测点都有各自的位置和视角方向。通常情况下,观测点的位置表示为三维的坐标点(x,y,z),观测点的视角方向包括绕着观测点的各个轴向旋转的角度,可以包括三个角度,由于某些观测点对于其中的一个轴向的旋转方向并不敏感,因此,观测点的视角方向也可以表示为二维的形式,比如
Figure PCTCN2020128680-appb-000020
具体此处不做限定。
本申请实施例中,目标观测点,是指在对图像生成模型进行训练时,人工任意选定的一个视角。参考观测点,是指已经对待观测物体进行观测并生成了相应的视角图像的观测点。虚拟观测点,是指之前并没有在该观测点对待观测物体进行观测,在现有的图片集中并不存在虚拟观测点对应的新视角图像。
(3)参考颜色。
参考颜色是指来自目标观测点的光线所经过的空间位置的颜色,需要根据参考图像进行确定。确定参考颜色的过程实际上是一个预测的过程,在得到参考图像的情况下,
下面对人工智能系统的总体工作流程进行描述,请参阅图1,图1为本申请实施例人工智能主体框架的一种结构示意图,该主体框架描述了人工智能系统总体工作流程,适用 于通用的人工智能领域需求。
图1所示的实施例中包括“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度。
“智能信息链”反映的是从数据的获取到处理的一系列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。
而“IT价值链”则是从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施。
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。基础设施通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据。
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理。
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力。
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用。
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶,平安城市,智能终端等。
下面对本申请实施例的应用场景进行简单的说明。请参阅图2,图2为本申请实施例图像处理系统的一个应用场景示意图。
相机201、处理器202和智能手机203之间建立通信连接,处理器202可以接收到相机201发送的照片或者视频,这些照片和视频中的每一帧画面可以看作是训练图片集中的 图像,处理器202根据虚拟观测点的位置和视角方向,从收到的图像中确定参考观测点对应的参考图像,利用已经训练好的图像生成模型,生成新视角图像。之后,处理器202可以将新视角图像进行整合,并发送给智能手机203,也可以只向智能手机203发送未经过整合的新视角图像。其中,整合后的图像可以是360°全景照片或者720°全景照片,根据实际应用的需要进行选择,具体此处不做限定。智能手机203对收到的图像进行展示。
需要注意的是,图2所示实施例只是本申请实施例图像处理系统的一个应用场景,在实际应用中,相机201还可以被其他的设备替换,可以是笔记本电脑,也可以是平板电脑,只要是具有摄像功能,能够拍摄照片或者视频的设备即可,具体此处不做限定。处理器202并不一定存在于智能手机203之外,可以就是智能手机203中的处理器。智能手机203也可以被其他的设备替换,可以是虚拟现实(virtual reality,VR)设备或者增强现实(augmented raelity,AR)设备,还可以是混合现实(mixed reality,MR)设备,只要该设备对新视角图像进行展示即可,具体此处不做限定。
下面对本申请实施例提供的图像处理系统进行介绍,请参阅图3,图3为本申请实施例提供的图像处理系统的一种系统架构图。在图3所示的实施例中,图像处理系统包括执行设备310,训练设备320,数据库330,客户设备340和数据存储系统350,其中执行设备310包括计算模块311。
其中,数据库330中存有训练图片集,使得训练设备320根据训练图片集中的至少一张参考图像,对来自目标观测的光线经过的位置的参考颜色进行预测。训练设备320用于生成图像生成模型301,并利用数据库330中的至少一张图片对图像生成模型301进行迭代训练,从而得到最优的图像生成模型301。执行设备310根据图像生成模型301生成新视角图像之后,可以将新视角图像发送给不同的设备,可以发送给客户设备340,也可以发送给数据存储系统350,具体此处不做限定。
图像生成模型301可以应用于不同的设备中,例如手机、平板、笔记本电脑、VR设备、AR设备、监控系统等等,具体此处不做限定。训练设备320将图像生成模型301配置在执行设备310中的方式可以是通过无线通信方式发送,也可以是通过有线通信方式发送,还可以通过可移动存储设备将图像生成模型301配置在执行设备310中,实际的配置方式根据实际应用的需要进行选择,具体此处不做限定。
数据库330中的训练图片集中有多张图像,训练设备320在对图像生成模型301进行训练时,会根据输入的目标观测点的位置和视角方向,从输入的训练图片集中确定出至少一张图片作为参考图像。训练图片集中的多张图像,有多种表现形式,可以是使用拍摄设备得到的照片,也可以是视频帧中的至少一帧图像,具体此处不做限定。训练图片集中的多张图像有多种获取方式,可以是从数据采集设备360中获取的,也可以是客户设备340发送的,其中,数据采集设备360可以是笔记本电脑,也可以是照相机,只要是具有摄像功能,能够拍摄照片或者视频的设备即可,具体此处不做限定。
本申请实施例中,客户设备340和执行设备310可以分别为独立的设备,也可以为一个整体,具体此处不做限定。执行设备310配置有I/O接口312,用于与客户设备340进行数据交互,用户可以通过客户设备340向I/O接口312输入虚拟观测点的空间位置和视 角方向,执行设备310通过I/O接口312将生成的新视角图像发送给客户设备340,提供给用户。
需要注意的是,图3仅是本申请实施例提供的图像处理系统的架构示意图,图中所示的设备、器件之间的位置关系并不构成任何限制。示例的,若执行设备310配置在客户设备340中,当客户设备340为手机时,执行设备310也可以是手机中的图像处理器(graphics processing unite,GPU)或者神经网络处理器(neural-network processing units,NPU),具体此处不做限定。
下面对本申请实施例提供的图像生成模型的训练方法进行描述,请参阅图4,图4为本申请实施例中图像生成模型训练方法的一个流程示意图,包括:
401、训练设备确定目标观测点的位置和视角方向。
在对图像生成模型进行训练时,需要将训练的目标观测点的相关信息输入到训练设备。其中,目标观测点的相关信息包括目标观测的位置和目标观测点的视角方向。本申请实施例以目标观测点的位置是一个三维的坐标点(x,y,z),观测点的视角方向是二维的
Figure PCTCN2020128680-appb-000021
为例,进行介绍。目标观测点的坐标点和视角方向可以由人工选定后,输入到训练设备中。
402、训练设备确定至少一张参考图像。
在对图像生成模型进行训练之前,可以使用拍摄设备在多个不同的观测点对同一个物体进行观测,得到多张图像。其中,拍摄设备可以是智能手机、相机或者拍立得,只要是具有拍摄功能的设备即可,具体此处不做限定。多张图像的表现形式有多种,可以是拍摄设备使用拍照功能得到的照片,也可以是拍摄设备使用录像功能得到的视频中的每一帧图像,具体此处不做限定。
使用拍摄设备得到的多张的图像,可以称为训练图片集,可以将这些图像输入到训练设备中,使得训练设备从中选择出至少一张参考图像。下面对训练设备选择参考图像的过程进行说明。
训练设备根据输入的目标观测点的位置和目标观测点的视角方向,可以确定出至少一个参考观测点,然后从训练图片集中确定出每个参考观测点对应的参考图像。其中,至少一个参考观测点中的每一个参考观测点和目标观测点之间的距离需要满足预设条件。这里所说的距离是由观测点的位置和视角方向共同决定的,因为两个观测点之间的位置越接近,以及两个观测点的视角方向越类似的话,这两个观测点对应的视角图像的重叠区域才会越大,两个视角图像的相似度才越高。因此,参考观测点和目标观测点之间的距离需要满足预设条件包括两点的位置之间满足预设条件,两点的视角方向之间满足预设条件。满足预设条件可以是小于或者等于某个预设阈值。
需要注意的是,每个观测点是视角方向都可以使用观测点自身的坐标系表示,但是在于其他观测点进行对比的时候,各个观测点应该以同一个坐标系为基准,这样才能使得对比的结果具有参考意义。
本申请实施例中,训练设备可以在误差允许的范围内从训练图片集中选择参考图像,提升了方案的可实现性。
403、训练设备根据至少一张参考图像,确定空间位置的第一参考颜色。
训练设备确定出至少一张参考图像之后,训练设备可以将来自目标观测点的光线经过的空间位置中的空间点,反投影回参考图像中的不同像素位置上,得到该空间点在参考图像中的参考颜色。由于遮挡和光照等因素的影响,不同观测点在同一空间位置反投影出的参考颜色可能会不同,为了减少这些因素的干扰,训练设备可以将参考颜色的众数作为后续训练过程中使用的参数,也即第一参考颜色。在一定程度上可能会降低干扰因素对于预测结果的影响,提升本申请技术方案的准确性。
404、训练设备获取目标观测点对应的视角图像中像素的真实颜色。
在对图像生成模型进行训练时,实际上在目标观测点对被观测物体进行过观测,并且存在目标观测点对应的视角图像。因此,目标观测点对应的视角图像中像素的真实颜色可以由人工预先输入给训练设备,用于跟第一参考颜色进行对比,确定出残差颜色。
需要注意的是,步骤403和步骤404没有必然的先后顺序,可以先执行步骤403,也可以先执行步骤404,根据实际应用的需要进行选择,具体此处不做限定。
405、训练设备确定空间位置的残差颜色。
残差颜色可以理解为,是关于目标观测点的位置和视角方向,以及神经网络的参数的一个函数,第一参考颜色和真实颜色是该神经网络的相关参数。训练设备在已知目标观测点的位置和视角方向,以及第一参考颜色和真实颜色的情况下,可以通过神经网络得到来自目标观测点的光线经过的空间位置中某一个空间点的残差颜色和不透明度。
406、训练设备确定真实颜色是否为透明,若是,则执行步骤408,若否,则执行步骤407。
由于颜色的透明度不同,对于最终生成的图像的影响程度也不同,因此训练设备在获取到真实颜色之后,可以根据真实颜色的透明情况,确定对图像生成模型进行训练所使用的损失函数的类型。
需要注意的是,步骤405和步骤406没有必然的先后顺序,可以先执行步骤405,也可以先执行步骤406,根据实际应用的需要进行选择,具体此处不做限定。
407、训练设备根据残差颜色的损失函数,训练图像生成模型。
训练设备在确定出真实颜色是透明的情况下,可以根据预设的残差颜色的损失函数,训练图像生成模型。残差颜色的损失函数可以是:
Figure PCTCN2020128680-appb-000022
其中,
Figure PCTCN2020128680-appb-000023
用于表示根据残差颜色和第一参考颜色预测出的新视角图像,
Figure PCTCN2020128680-appb-000024
表示的是每个空间点的第一参考颜色,
Figure PCTCN2020128680-appb-000025
表示的是每个空间点的残差颜色,σ i表示的是空间位置中某一个空间点的不透明度,δ i表示的是一条光线上各个空间点的距离。C(r)用于表示真实颜色。
408、训练设备获取直接预测的损失函数。
训练设备在确定出真实颜色不是透明的情况下,可以获取直接预测的损失函数。直接预测的损失函数可以是:
Figure PCTCN2020128680-appb-000026
其中,
Figure PCTCN2020128680-appb-000027
用于表示直接预测出的目标观测点对应的视角图像中像素的颜色,c i表示的是每个空间点的真实颜色,σ i表示的是空间位置中某一个空间点的不透明度,δ i表示的是一条光线上各个空间点的距离。直接预测的损失函数的作用是,在只学习真实颜色的情况下,使得预测出的新视角图像尽可能地接近于真实图像。
训练设备可以通过不同的方式直接预测出目标观测点对应的视角图像的真实颜色,比如使用MPI技术,或者使用NeRF技术预测真实颜色,这一过程并不是本申请技术方案重点关注的地方,所以不做详细的描述。
409、训练设备根据残差颜色的损失函数和直接预测的损失函数,训练图像生成模型。
训练设备在确定出直接预测的损失函数之后,可以根据残差颜色的损失函数和直接预测的损失函数,对图像生成模型进行联合训练。联合训练的损失函数可以表示为
Loss=Loss whole+Loss resi
本申请实施例中,图像生成模型是根据残差颜色训练得到的,残差颜色属于低频信息,易于表征和记忆,因此,可以提升基于该图像生成模型得到的新视角图像的清晰度。
进一步的,训练设备可以联合直接预测的损失函数对图像生成模型进行训练,避免因为真实颜色为透明的情况下,仅仅使用残差颜色的损失函数训练图像生成模型出现过拟合的现象,减少了新视角图像出现错误的概率,提高了算法的鲁棒性和本申请技术方案的可靠性。
由于不同参考观测点观测到的图像不尽相同,一些参考观测点观测到的可能是遮挡物的颜色,从而影响第一参考颜色和残差颜色的取值,也会影响图像生成模型的准确度。
因此,训练设备需要对图像生成模型进行优化,请参阅图5,图5为本申请实施例中图像生成模型的训练方法的一个实施例。
501、训练设备确定目标观测点的位置和视角方向。
502、训练设备确定至少一张参考图像。
503、训练设备根据至少一张参考图像,确定空间位置的第一参考颜色。
504、训练设备获取目标观测点对应的视角图像中像素的真实颜色。
505、训练设备确定空间位置的残差颜色。
506、训练设备确定真实颜色是否为透明,若是,则执行步骤508,若否,执行步骤507。
507、训练设备根据残差颜色的损失函数,训练图像生成模型。
508、训练设备获取直接预测的损失函数。
509、训练设备根据残差颜色的损失函数和直接预测的损失函数,训练图像生成模型。
步骤501至步骤509与图4所示实施例中步骤401至步骤409类似,此处不再赘述。
510、训练设备确定新视角图像与每一张参考图像是否满足预设条件,若是,则执行步骤511,若否,则执行步骤512。
训练设备在对图像生成模型进行了一个迭代周期的训练之后,需要对图像生成模型的准确度进行检测,对于存在的问题进行修正,从而不断优化图像生成模型,使得依据该图像生成模型得到的新视角图像尽可能地接近真实图像。下面对图像生成模型的优化过程进行介绍。
执行设备可以根据上一个迭代周期后的图像生成模型,得到虚拟观测点对应的空间位置的残差颜色,然后结合这一空间位置的参考颜色,预测出虚拟观测点对应的新视角图像。然后将新视角图像输入到训练设备中,训练设备通过判断新视角图像和参考图像之间的相似度,判断训练过程中使用的参考图像是否准确。
空间点投影到图像上会对应到某一个像素位置,因此,判断的方式可以是,选择新视角图像中的某一个像素点作为基准点,比较以基准点为中心的相同像素大小的图像块,在新视角图像和每一张参考图像中的相似度,是否满足预设条件。若满足预设条件,则说明这一张参考图像并不存在遮挡的情况,可以继续用于下一迭代周期对图像生成模型的训练过程中。
其中,图像块的大小可以是3px×3px,也可以是5px×5px,px是pixel的缩写,表示的是像素根据实际应用的需要进行选择,具体此处不做限定。两个图像块的相似度满足预设条件,可以是两个图像块的颜色相似度小于或等于预设阈值。
511、训练设备确定第二参考颜色为第一参考颜色。
如果训练设备选取的每一张参考图像都满足预设条件,则说明上一迭代周期使用的第一参考颜色并没有出错,可以继续在后续的训练过程中使用。
512、训练设备确定满足条件的参考图像的参考颜色的众数为第二参考颜色。
如果训练设备选取的至少一张参考图像中存在不满足条件的参考图像,那么训练设备需要将不满足条件的参考图像去掉,重新确定训练过程使用的参考颜色。
至少一张参考图像中存在不满足条件的参考图像,可能有以下两种情况:
一种情况是至少一张参考图像中的部分参考图像不满足条件,假设一共有Y张参考图像,其中的X张图像不满足预设条件,那么训练设备可以确定剩下的(Y-X)张图像才是用来确定第二参考颜色的依据。在这种情况下,第二参考颜色是这(Y-X)张参考图像在待观测点的参考颜色的众数。第二参考颜色可能与上一迭代周期使用的第一参考颜色相同,也有可能不同,与被观测物体被遮挡的情况有关,具体此处不做限定。其中,Y为大于或等于1的整数,X为大于或等于1,且小于Y的整数。
另一种情况是N张参考图像中的每一张参考图像都不满足条件,在这种情况下,训练设备可以确定第二参考颜色的颜色值为0。
可选的,在实际应用中可能存在多个第一参考颜色,出现这种情况的原因可能有多种,下面分别举例说明。假设有18张参考图像,待观测点的真实颜色是红色。
一种可能的情况是,有9张参考图像中的参考颜色是遮挡物的颜色(黄色),有9张参考图像中的颜色是真实颜色(红色),此时参考颜色的众数有两个。
一种可能的情况是,有6张参考图像中的参考颜色是遮挡物1的颜色(黄色),有6张参考图像中的参考颜色是遮挡物2的颜色(绿色),有6张参考图像中的参考颜色是真实颜色(红色),此时参考颜色的众数有三个。
在这些情况中,由于遮挡的存在使得第一参考颜色出现不符合实际需要的情形,仅仅使用图4所示的实施例的方法,难以去除错误的第一参考颜色带来的影响。步骤512的意义就在于剔除错误的参考图像对于图像生成模型的不利影响,提升算法的鲁棒性。
513、训练设备将第二参考颜色作为第一参考颜色,对图像生成模型进行优化。
训练设备在确定出第二参考颜色之后,可以将第二参考颜色作为第一参考颜色输入到图像生成模型中,从而调整图像生成模型的参数,对图像生成模型进行优化。
本申请实施例中,图像生成模型是根据残差颜色训练得到的,残差颜色属于低频信息,易于表征和记忆,因此,可以提升基于该图像生成模型得到的新视角图像的清晰度。
进一步的,训练设备通过比较新视角图像和训练时所使用的至少一张参考图像,从而去除在上一迭代周期中对训练图像生成模型所用的不合适的参数值,从而降低待观测点被遮挡对新视角图像造成的影响,提升了算法的鲁棒性,以及技术方案的准确性。
需要注意的是,在本申请的一种实施方式中,在图5所示实施例中,步骤506、步骤508和步骤509可以不执行,在步骤505之后直接执行步骤507。
在这种实现方式中,训练设备直接依据残差颜色的损失函数对图像生成模型进行优化,并根据上一迭代周期的训练结果不断对图像生成模型进行优化,相较于图4所示的实施例,去除了训练图片集中错误的参考图像,使得根据训练好的图像生成模型得到的新视角图像的准确度更高。同时,相较于图5所示的实施例,可以节约操作步骤,简化操作过程,从而减少了运算资源的消耗。
由于图像生成模型的质量与密集匹配的准确性有关,而图像的密集匹配又基于纹理的相似性,图像中无纹理的区域难以提供匹配信息,纹理丰富的区域可以提供准确的匹配信息。此处所说的纹理丰富的区域,是指颜色变化的区域,比如由红色变为黄色,两个颜色的交接处,可以看作是纹理的边缘。此外,由于人的感官对于纹理丰富的区域的感知也较为敏感,所以,本申请实施例提供的图像生成模拟在训练时会对纹理丰富的区域进行更多的训练,使得最终得到的图像生成模型更加实用。
本申请实施例还提供了一种新视角图像的生成方法,可以使用上述的图像生成模型,生成新视角的图像。请参阅图6,图6为本申请实施例中新视角图像生成方法的一个实施例。
601、执行设备确定虚拟观测点的位置和视角方向。
虚拟观测点是实际上并没有对被观测物体进行过观测的观测点,可以由人工随机选择。在人工选定虚拟观测点之后,执行设备可以接收人工输入的虚拟观测点的位置和视角方向。
602、执行设备根据图像生成模型,获取残差颜色。
执行设备在获取虚拟观测点的位置和视角方向之后,可以将虚拟观测点的位置和视角方向输入到图像生成模型中,得到虚拟观测点对应的空间位置的残差颜色。其中,虚拟观测点对应的空间位置是指来自虚拟观测点的光线所经过的位置。
本实施例中,执行设备使用的图像生成模型包括图3至图5所示实施例中的图像生成模型,可以是未完全训练好的图像生成模型,也可以是训练好的图像生成模型,根据实际应用的需要进行选择,具体此处不做限定。
虽然使用没训练好的图像生成模型得到的残差颜色存在较大的误差,但是基于该残差颜色得到的新视角图像可以用来去除遮挡的参考图像,从而对图像生成模型进行优化,有存在的意义。
使用训练好的图像生成模型得到的残差颜色是本申请实施例理想状态下的残差颜色,基于该残差颜色生成的新视角图像也较为准确。
603、执行设备获取第一参考颜色。
执行设备在接收到虚拟观测点的位置和视角方向之后,可以从训练图片集中确定出至少一张参考图像,从而获取到第一参考颜色。
需要注意的是,从训练图片集中确定出至少一张参考图像的执行主体也可以是训练设备,训练设备选择参考图像的过程与图4所示实施例步骤402类似,不同之处在于确定参考观测点的依据是虚拟观测点的位置和视角方向,而不是目标观测点是位置和视角方向,具体此处不再赘述。
需要注意的是,本实施例中的第一参考颜色包括图4和图5所示实施例中的第一参考颜色。
执行设备在获取到第一参考颜色时,还能够获取到空间位置的不透明度,因为空间位置的不透明度会影响最终的成像效果,因此执行设备还需要获取到不透明度。
604、执行设备根据残差颜色和第一参考颜色,生成新视角图像。
在图像中,每个像素位置的颜色是由一条光线上多个空间点的颜色积分得到的,执行设备在获取到每个空间点的第一参考颜色、残差颜色和不透明度之后,可以积分得到虚拟观测点对应的新视角图像,积分的过程可能有以下几种情况。
其中一种情况是分别对每个空间点的残差颜色和第一参考颜色进行积分,然后将积分的结果相加,得到新视角图像。
另一种情况是先把每个空间点的第一参考颜色和残差颜色相加,然后再一起积分,得到新视角图像。上述两种积分方式,使用的具有物理意义的函数都可以是
Figure PCTCN2020128680-appb-000028
其中,
Figure PCTCN2020128680-appb-000029
表示的是预测出来的新视角图像,
Figure PCTCN2020128680-appb-000030
表示的是每个空间点的第一参考颜色,
Figure PCTCN2020128680-appb-000031
表示的是每个空间点的残差颜色,σ i表示的是所述空间位置中某一个空间点的不透明度,δ i用于表示的是一条光线上各个空间点的距离。
本申请实施例中,图像生成模型是根据残差颜色训练得到的,残差颜色属于低频信息,易于表征和记忆,因此,执行设备使用该图像生成模型生成的新视角图像的清晰度较高。
下面对本申请实施例提供的图像生成模型的训练装置进行说明,请参阅图7,图7为本申请实施例提供的图像生成模型的训练装置700的一个实施例,包括:
确定单元701,用于确定目标观测点的位置和视角方向,然后根据目标观测点的位置 和视角方向,从N张输入图像中确定至少一张参考图像,其中,N为大于或等于2的整数。再根据至少一张参考图像,确定空间位置的参考颜色。其中,空间位置为来自目标观测点的光线经过的位置。
获取单元702,用于获取目标观测点对应的视角图像中像素的真实颜色。
确定单元701,还用于根据参考颜色和真实颜色,确定空间位置的残差颜色。
处理单元703,用于根据残差颜色,训练图像生成模型。
在本申请的一些可选实施例中,参考颜色包括:第一参考颜色,第一参考颜色为来自目标观测点的光线经过的位置的颜色的众数。
在本申请的一些可选实施例中,处理单元703,用于:
若真实颜色不是透明色,则根据残差颜色的损失函数,训练图像生成模型。
若真实颜色是透明色,则获取直接预测的损失函数,并根据残差颜色的损失函数,以及直接预测的损失函数,训练图像生成模型。
在本申请的一些可选实施例中,获取单元702,还用于获取目标观测点对应的新视角图像,其中,新视角图像是执行设备根据图像生成模型预测的。
确定单元701,还用于根据新视角图像,以及至少一张参考图像中的每一张参考图像,确定第二参考颜色。之后,将第二参考颜色作为第一参考颜色。
在本申请的一些可选实施例中,确定单元701,具体用于:
确定新视角图像中的任一像素点为基准点。
若以基准点为中心的相同像素大小的图像块,在新视角图像和每一张参考图像中的相似度满足预设条件,则确定第二参考颜色为第一参考颜色。
若以基准点为中心的相同像素大小的图像块,在新视角图像和至少一张参考图像中的部分参考图像中的相似度不满足预设条件,则确定第二参考颜色为满足预设条件的参考图像在空间位置的参考颜色的众数。
若以基准点为中心的相同像素大小的图像块,在新视角图像和每一张参考图像中的相似度关系均不满足预设条件,则确定第二参考颜色的颜色值为0。
在本申请的一些可选实施例中,确定单元701,具体用于:
根据目标观测点的位置和视角方向,确定至少一个参考观测点,其中,至少一个参考观测点中的每个参考观测点与目标观测点的距离满足预设条件。
根据至少一个参考观测点,获取至少一张参考图像,其中,至少一个参考观测点中的每个参考观测点对应于至少一张参考图像中的每张参考图像。
在本申请的一些可选实施例中,残差颜色的损失函数可以是:
Figure PCTCN2020128680-appb-000032
其中,
Figure PCTCN2020128680-appb-000033
用于表示根据残差颜色和第一参考颜色预测出的新视角图像,
Figure PCTCN2020128680-appb-000034
表示的是每个空间点的第一参考颜色,
Figure PCTCN2020128680-appb-000035
表示的是每个空间点的残差颜色,σ i表示的是空间位置中某一个空间点的不透明度,δ i表示的是一条光线上各个空间点的距离。C(r)用于表示真实颜色。
在本申请的一些可选实施例中,直接预测的损失函数可以是:
Figure PCTCN2020128680-appb-000036
其中,
Figure PCTCN2020128680-appb-000037
用于表示直接预测出的目标观测点对应的视角图像中像素的颜色,c i表示的是每个空间点的真实颜色,σ i表示的是空间位置中某一个空间点的不透明度,δ i表示的是一条光线上各个空间点的距离。直接预测的损失函数的作用是,在只学习真实颜色的情况下,使得预测出的新视角图像尽可能地接近于真实图像。
本实施例中,图像生成模型的训练装置700可以执行前述图3至图5所示实施例中训练设备执行的操作,具体此处不再赘述。
下面对本申请实施例提供的新视角图像的生成装置进行介绍,请参阅图8,图8为本申请实施例提供的新视角图像的生成装置800的一个实施例,包括:
确定单元801,用于确定虚拟观测点的位置和视角方向。
获取单元802,用于将虚拟观测点的位置和视角方向输入到图像生成模型中,获取来自虚拟观测点的光线经过的空间位置的残差颜色。
获取单元802,还用于获取参考颜色,参考颜色是根据至少一张参考图像,确定出的空间位置的颜色。
处理单元803,用于根据空间位置的残差颜色和参考颜色,生成虚拟观测点对应的新视角图像。
在本申请的一些可选实施例中,图像生成模型包括:根据残差颜色的损失函数,训练得到的图像生成模型。或者,根据残差颜色的损失函数,以及直接预测的损失函数,训练得到的图像生成模型。
在本申请的一些可选实施例中,参考颜色包括:第一参考颜色,第一参考颜色为来自虚拟观测点的光线经过的空间位置的颜色的众数。
获取单元802,具体用于接收训练设备发送的第一参考颜色。
在本申请的一些可选实施例中,参考颜色包括:第一参考颜色。
获取单元802,具体用于:
根据虚拟观测点的位置和视角方向,确定至少一个参考观测点,其中,至少一个参考观测点中的每个参考观测点与虚拟观测点的距离满足预设条件。
从N张参考图片中确定至少一张参考图片,其中,至少一个参考观测点中的每个参考观测点对应于至少一张参考图像中的每张参考图像,N为大于或等于2的整数。
根据至少一张参考图像,确定第一参考颜色。
本实施例中,新视角图像的生成装置800可以执行前述图2所示实施例中处理器所执 行的操作,或者图6所示实施例中执行设备所执行的操作,具体此处不再赘述。
本申请实施例还提供了一种图像处理系统,请参阅图9,图9为本申请实施例提供的图像处理系统900的一个实施例,包括:
训练设备910和执行设备920。
训练设备910包括:第一处理器911和第一存储器912。
第一存储器912,用于存储训练图片集,训练图片集中包括至少两张图像。
第一处理器911,用于执行前述图3至图5所示实施例中训练设备执行的操作,或者前述图7所示实施例中图像生成模型的训练装置700所执行的操作,具体此处不再赘述。
训练设备920包括:第二处理器921和第二存储器922。
第二存储器922,用于存储新视角图像。
第二处理器921,用于前述图2所示实施例中处理器所执行的操作,前述图6所示实施例中执行设备所执行的操作,或者前述图8新视角图像的生成装置800所执行的操作,具体此处不再赘述。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (23)

  1. 一种图像生成模型的训练方法,其特征在于,包括:
    确定目标观测点的位置和视角方向;
    根据所述位置和所述视角方向,从N张输入图像中确定至少一张参考图像,其中,N为大于或等于2的整数;
    根据所述至少一张参考图像,确定空间位置的参考颜色,所述空间位置为来自所述目标观测点的光线经过的位置;
    获取所述目标观测点对应的视角图像中像素的真实颜色;
    根据所述参考颜色和所述真实颜色,确定所述空间位置的残差颜色;
    根据所述残差颜色,训练图像生成模型。
  2. 根据权利要求1所述的方法,其特征在于,所述参考颜色包括:第一参考颜色,所述第一参考颜色为所述空间位置的颜色的众数。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述残差颜色,训练图像生成模型,包括:
    若所述真实颜色不是透明色,则根据所述残差颜色的损失函数,训练所述图像生成模型;
    若所述真实颜色是透明色,则获取直接预测的损失函数;
    根据所述残差颜色的损失函数,以及所述直接预测的损失函数,训练所述图像生成模型。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,在所述根据所述残差颜色,训练图像生成模型之后,所述方法还包括:
    获取目标观测点对应的新视角图像,其中,所述新视角图像是执行设备根据所述图像生成模型预测的;
    根据所述新视角图像,以及所述至少一张参考图像中的每一张参考图像,确定第二参考颜色;
    将所述第二参考颜色作为所述第一参考颜色。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述新视角图像,以及所述至少一张参考图像中的每一张参考图像,确定第二参考颜色,包括:
    确定所述新视角图像中的任一像素点为基准点;
    若以所述基准点为中心的相同像素大小的图像块,在所述新视角图像和所述每一张参考图像中的相似度满足预设条件,则确定所述第二参考颜色为所述第一参考颜色;
    若以所述基准点为中心的相同像素大小的图像块,在所述新视角图像和所述至少一张参考图像中的部分参考图像中的相似度不满足预设条件,则确定所述第二参考颜色为满足预设条件的参考图像在所述空间位置的参考颜色的众数;
    若以所述基准点为中心的相同像素大小的图像块,在所述新视角图像和所述每一张参考图像中的相似度关系均不满足预设条件,则确定所述第二参考颜色的颜色值为0。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述根据所述位置和所述 视角方向,从N张输入图像中确定至少一张参考图像,包括:
    根据所述位置和所述视角方向,确定至少一个参考观测点,其中,所述至少一个参考观测点中的每个参考观测点与所述目标观测点的距离满足预设条件;
    根据所述至少一个参考观测点,获取所述至少一张参考图像,其中,所述至少一个参考观测点中的每个参考观测点对应于所述至少一张参考图像中的每张参考图像。
  7. 一种新视角图像的生成方法,其特征在于,包括:
    确定虚拟观测点的位置和视角方向;
    将所述位置和所述视角方向输入到图像生成模型中,获取来自所述虚拟观测点的光线经过的空间位置的残差颜色;
    获取参考颜色,所述参考颜色是根据至少一张参考图像,确定出的所述空间位置的颜色;
    根据所述空间位置的残差颜色和所述参考颜色,生成所述虚拟观测点对应的新视角图像。
  8. 根据权利要求7所述的方法,其特征在于,所述图像生成模型包括:
    根据残差颜色的损失函数,训练得到的图像生成模型;或,
    根据所述残差颜色的损失函数,以及直接预测的损失函数,训练得到的图像生成模型。
  9. 根据权利要求7或8所述的方法,其特征在于,所述参考颜色包括:第一参考颜色,所述第一参考颜色为所述空间位置的颜色的众数;
    所述获取参考颜色包括:
    接收训练设备发送的第一参考颜色。
  10. 根据权利要求7或8所述的方法,其特征在于,所述参考颜色包括:所述第一参考颜色;
    所述获取参考颜色,包括:
    根据所述位置和视角方向,确定至少一个参考观测点,其中,所述至少一个参考观测点中的每个参考观测点与所述虚拟观测点的距离满足预设条件;
    从N张参考图片中确定至少一张参考图片,其中所述至少一个参考观测点中的每个参考观测点对应于所述至少一张参考图像中的每张参考图像;
    根据所述至少一张参考图像,确定所述第一参考颜色。
  11. 一种图像生成模型的训练装置,其特征在于,包括:
    确定单元,用于:
    确定目标观测点的位置和视角方向;
    根据所述位置和所述视角方向,从N张输入图像中确定至少一张参考图像,其中,N为大于或等于2的整数;
    根据所述至少一张参考图像,确定空间位置的参考颜色,所述空间位置为来自所述目标观测点的光线经过的位置;
    获取单元,用于获取所述目标观测点对应的视角图像中像素的真实颜色;
    所述确定单元,还用于根据所述参考颜色和所述真实颜色,确定所述空间位置的残差 颜色;
    处理单元,用于根据所述残差颜色,训练图像生成模型。
  12. 根据权利要求11所述的装置,其特征在于,所述参考颜色包括:第一参考颜色,所述第一参考颜色为所述空间位置的颜色的众数。
  13. 根据权利要求11所述的装置,其特征在于,所述处理单元,具体用于:
    若所述真实颜色不是透明色,则根据所述残差颜色的损失函数,训练所述图像生成模型;
    若所述真实颜色是透明色,则获取直接预测的损失函数;
    根据所述残差颜色的损失函数,以及所述直接预测的损失函数,训练所述图像生成模型。
  14. 根据权利要求11至13中任一项所述的装置,其特征在于,所述获取单元,还用于获取目标观测点对应的新视角图像,其中,所述新视角图像是执行设备根据所述图像生成模型预测的;
    所述确定单元,还用于根据所述新视角图像,以及所述至少一张参考图像中的每一张参考图像,确定第二参考颜色;
    所述确定单元,还用于将所述第二参考颜色作为所述第一参考颜色。
  15. 根据权利要求14所述的装置,其特征在于,所述确定单元,具体用于:
    确定所述新视角图像中的任一像素点为基准点;
    若以所述基准点为中心的相同像素大小的图像块,在所述新视角图像和所述每一张参考图像中的相似度满足预设条件,则确定所述第二参考颜色为所述第一参考颜色;
    若以所述基准点为中心的相同像素大小的图像块,在所述新视角图像和所述至少一张参考图像中的部分参考图像中的相似度不满足预设条件,则确定所述第二参考颜色为满足预设条件的参考图像在所述空间位置的参考颜色的众数;
    若以所述基准点为中心的相同像素大小的图像块,在所述新视角图像和所述每一张参考图像中的相似度关系均不满足预设条件,则确定所述第二参考颜色的颜色值为0。
  16. 根据权利要求11至15所述的装置,其特征在于,所述确定单元,具体用于:
    根据所述位置和所述视角方向,确定至少一个参考观测点,其中,所述至少一个参考观测点中的每个参考观测点与所述目标观测点的距离满足预设条件;
    根据所述至少一个参考观测点,获取所述至少一张参考图像,其中,所述至少一个参考观测点中的每个参考观测点对应于所述至少一张参考图像中的每张参考图像。
  17. 一种新视角图像的生成装置,其特征在于,包括:
    确定单元,用于确定虚拟观测点的位置和视角方向;
    获取单元,用于将所述位置和所述视角方向输入到图像生成模型中,获取来自所述虚拟观测点的光线经过的空间位置的残差颜色;
    所述获取单元,还用于获取参考颜色,所述参考颜色是根据至少一张参考图像,确定出的所述空间位置的颜色;
    处理单元,用于根据所述空间位置的残差颜色和所述参考颜色,生成所述虚拟观测点 对应的新视角图像。
  18. 根据权利要求17所述的装置,其特征在于,所述图像生成模型包括:
    根据残差颜色的损失函数,训练得到的图像生成模型;或,
    根据所述残差颜色的损失函数,以及直接预测的损失函数,训练得到的图像生成模型。
  19. 根据权利要求17或18所述的装置,其特征在于,所述参考颜色包括:第一参考颜色,所述第一参考颜色为所述空间位置的颜色的众数;
    所述获取单元,具体用于接收训练设备发送的第一参考颜色。
  20. 根据权利要求17或18所述的装置,其特征在于,所述参考颜色包括:所述第一参考颜色;
    所述获取单元,具体用于:
    根据所述位置和视角方向,确定至少一个参考观测点,其中,所述至少一个参考观测点中的每个参考观测点与所述虚拟观测点的距离满足预设条件;
    从N张参考图片中确定至少一张参考图片,其中,所述至少一个参考观测点中的每个参考观测点对应于所述至少一张参考图像中的每张参考图像,N为大于或等于2的整数;
    根据所述至少一张参考图片,确定所述第一参考颜色。
  21. 一种图像处理系统,其特征在于,包括:
    训练设备、执行设备;
    所述训练设备包括第一处理器和第一存储器,所述第一处理器用于执行权利要求1至6中任一项所述的方法,所述第一存储器用于存储训练图片集,所述训练图片集中包括至少两张图像;
    所述执行设备包括第二处理器和第二存储器,所述第二处理器用于执行权利要求7至10中任一项所述的方法,所述第二存储器用于存储新视角图像。
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中保存有程序,当所述计算机执行所述程序时,执行权利要求1至10中任一项所述的方法。
  23. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上执行时,所述计算机执行权利要求1至10中任一项所述的方法。
PCT/CN2020/128680 2020-11-13 2020-11-13 图像生成模型的训练方法、新视角图像生成方法及装置 WO2022099613A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/128680 WO2022099613A1 (zh) 2020-11-13 2020-11-13 图像生成模型的训练方法、新视角图像生成方法及装置
CN202080104956.XA CN116250021A (zh) 2020-11-13 2020-11-13 图像生成模型的训练方法、新视角图像生成方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/128680 WO2022099613A1 (zh) 2020-11-13 2020-11-13 图像生成模型的训练方法、新视角图像生成方法及装置

Publications (1)

Publication Number Publication Date
WO2022099613A1 true WO2022099613A1 (zh) 2022-05-19

Family

ID=81602050

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/128680 WO2022099613A1 (zh) 2020-11-13 2020-11-13 图像生成模型的训练方法、新视角图像生成方法及装置

Country Status (2)

Country Link
CN (1) CN116250021A (zh)
WO (1) WO2022099613A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115953544A (zh) * 2023-03-16 2023-04-11 浪潮电子信息产业股份有限公司 一种三维重建方法、装置、电子设备及可读存储介质
CN115965736A (zh) * 2023-03-16 2023-04-14 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质
CN116434146A (zh) * 2023-04-21 2023-07-14 河北信服科技有限公司 一种三维可视化综合管理平台
CN116681818A (zh) * 2022-10-28 2023-09-01 荣耀终端有限公司 新视角重建方法、新视角重建网络的训练方法及装置
CN117746192A (zh) * 2024-02-20 2024-03-22 荣耀终端有限公司 电子设备及其数据处理方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107945282A (zh) * 2017-12-05 2018-04-20 洛阳中科信息产业研究院(中科院计算技术研究所洛阳分所) 基于对抗网络的快速多视角三维合成和展示方法及装置
US20190295223A1 (en) * 2018-03-22 2019-09-26 Adobe Inc. Aesthetics-guided image enhancement
CN110322002A (zh) * 2019-04-30 2019-10-11 深圳市商汤科技有限公司 图像生成网络的训练及图像处理方法和装置、电子设备
CN110321849A (zh) * 2019-07-05 2019-10-11 腾讯科技(深圳)有限公司 图像数据处理方法、装置以及计算机可读存储介质
CN110634170A (zh) * 2019-08-30 2019-12-31 福建帝视信息科技有限公司 一种基于语义内容和快速图像检索的照片级图像生成方法
CN111652798A (zh) * 2020-05-26 2020-09-11 浙江大华技术股份有限公司 人脸姿态迁移方法和计算机存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107945282A (zh) * 2017-12-05 2018-04-20 洛阳中科信息产业研究院(中科院计算技术研究所洛阳分所) 基于对抗网络的快速多视角三维合成和展示方法及装置
US20190295223A1 (en) * 2018-03-22 2019-09-26 Adobe Inc. Aesthetics-guided image enhancement
CN110322002A (zh) * 2019-04-30 2019-10-11 深圳市商汤科技有限公司 图像生成网络的训练及图像处理方法和装置、电子设备
CN110321849A (zh) * 2019-07-05 2019-10-11 腾讯科技(深圳)有限公司 图像数据处理方法、装置以及计算机可读存储介质
CN110634170A (zh) * 2019-08-30 2019-12-31 福建帝视信息科技有限公司 一种基于语义内容和快速图像检索的照片级图像生成方法
CN111652798A (zh) * 2020-05-26 2020-09-11 浙江大华技术股份有限公司 人脸姿态迁移方法和计算机存储介质

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116681818A (zh) * 2022-10-28 2023-09-01 荣耀终端有限公司 新视角重建方法、新视角重建网络的训练方法及装置
CN116681818B (zh) * 2022-10-28 2024-04-09 荣耀终端有限公司 新视角重建方法、新视角重建网络的训练方法及装置
CN115953544A (zh) * 2023-03-16 2023-04-11 浪潮电子信息产业股份有限公司 一种三维重建方法、装置、电子设备及可读存储介质
CN115965736A (zh) * 2023-03-16 2023-04-14 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质
CN115953544B (zh) * 2023-03-16 2023-05-09 浪潮电子信息产业股份有限公司 一种三维重建方法、装置、电子设备及可读存储介质
CN116434146A (zh) * 2023-04-21 2023-07-14 河北信服科技有限公司 一种三维可视化综合管理平台
CN116434146B (zh) * 2023-04-21 2023-11-03 河北信服科技有限公司 一种三维可视化综合管理平台
CN117746192A (zh) * 2024-02-20 2024-03-22 荣耀终端有限公司 电子设备及其数据处理方法
CN117746192B (zh) * 2024-02-20 2024-06-28 荣耀终端有限公司 电子设备及其数据处理方法

Also Published As

Publication number Publication date
CN116250021A (zh) 2023-06-09

Similar Documents

Publication Publication Date Title
WO2022099613A1 (zh) 图像生成模型的训练方法、新视角图像生成方法及装置
US11373332B2 (en) Point-based object localization from images
US11232286B2 (en) Method and apparatus for generating face rotation image
WO2022178952A1 (zh) 一种基于注意力机制和霍夫投票的目标位姿估计方法及系统
CN112509115B (zh) 序列图像动态场景三维时变无约束重建方法及系统
WO2023050992A1 (zh) 用于人脸重建的网络训练方法、装置、设备及存储介质
EP3992908A1 (en) Two-stage depth estimation machine learning algorithm and spherical warping layer for equi-rectangular projection stereo matching
CN111753698A (zh) 一种多模态三维点云分割系统和方法
US11328476B2 (en) Layout estimation using planes
US20220415030A1 (en) AR-Assisted Synthetic Data Generation for Training Machine Learning Models
US11961266B2 (en) Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture
WO2021167586A1 (en) Systems and methods for object detection including pose and size estimation
CN112802081A (zh) 一种深度检测方法、装置、电子设备及存储介质
EP4290459A1 (en) Augmented reality method and related device thereof
WO2022187073A1 (en) Modeling objects from monocular camera outputs
US20230093827A1 (en) Image processing framework for performing object depth estimation
WO2023184278A1 (en) Method for semantic map building, server, terminal device and storage medium
CN114049678B (zh) 一种基于深度学习的面部动作捕捉方法及系统
CN115841546A (zh) 一种场景结构关联的地铁站多视矢量仿真渲染方法及系统
US20220157016A1 (en) System and method for automatically reconstructing 3d model of an object using machine learning model
US20220198707A1 (en) Method and apparatus with object pose estimation
RU2757563C1 (ru) Способ визуализации 3d портрета человека с измененным освещением и вычислительное устройство для него
US20230290057A1 (en) Action-conditional implicit dynamics of deformable objects
US20230144458A1 (en) Estimating facial expressions using facial landmarks
US20240233146A1 (en) Image processing using neural networks, with image registration

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20961169

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20961169

Country of ref document: EP

Kind code of ref document: A1