CN113724309A - Image generation method, device, equipment and storage medium - Google Patents

Image generation method, device, equipment and storage medium Download PDF

Info

Publication number
CN113724309A
CN113724309A CN202110996416.3A CN202110996416A CN113724309A CN 113724309 A CN113724309 A CN 113724309A CN 202110996416 A CN202110996416 A CN 202110996416A CN 113724309 A CN113724309 A CN 113724309A
Authority
CN
China
Prior art keywords
virtual
light energy
determining
visual
photons
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110996416.3A
Other languages
Chinese (zh)
Inventor
林耀冬
张欣
陈杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202110996416.3A priority Critical patent/CN113724309A/en
Publication of CN113724309A publication Critical patent/CN113724309A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/514Depth or shape recovery from specularities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Abstract

The embodiment of the application discloses an image generation method, an image generation device, image generation equipment and a storage medium, and belongs to the field of computer graphics. The method comprises the following steps: coordinates of a plurality of viewpoints of a three-dimensional virtual scene are determined. And taking each pixel point in the target projection image as a starting point, and emitting virtual photons to the three-dimensional virtual scene, wherein the target projection image is a projection image corresponding to the current scene to be simulated. Determining a brightness of each of the plurality of viewpoints based on light energy of virtual photons around each of the plurality of viewpoints. And generating an image corresponding to the three-dimensional virtual scene based on the brightness of each of the plurality of visual points. According to the embodiment of the application, the image corresponding to the three-dimensional virtual scene is automatically generated through the image generation method, and the entity scene and the entity equipment are not needed, so that the image generation cost is reduced, and the image generation efficiency is improved.

Description

Image generation method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the field of computer graphics, in particular to an image generation method, an image generation device, image generation equipment and a storage medium.
Background
In recent years, image generation technology has been rapidly developed, and the application of image generation technology is becoming more and more extensive, for example, the technology is widely applied to a plurality of fields such as scientific research, industrial production, medical health and the like, so that the requirements of users on image generation technology are also becoming higher and higher.
Images are currently generated by essentially photographing real scenes using professional imaging equipment. However, professional imaging equipment is expensive, and in some scenes, a special shooting scene needs to be manually set up, so that the cost is high and the efficiency is low, and therefore an economical and efficient image generation method is urgently needed.
Disclosure of Invention
The embodiment of the application provides an image generation method, an image generation device, image generation equipment and a storage medium, which can solve the problem of image generation in the related art. The technical scheme is as follows:
in one aspect, an image generation method is provided, and the method includes:
determining coordinates of a plurality of visual points of a three-dimensional virtual scene, wherein the plurality of visual points are corresponding points of a plurality of pixel points on an image plane of a virtual camera in the three-dimensional virtual scene;
emitting virtual photons to the three-dimensional virtual scene by taking each pixel point in a target projection image as a starting point, wherein the target projection image is a projection image corresponding to a scene needing simulation at present;
determining a brightness of each of the plurality of viewpoints based on light energy of virtual photons around each of the plurality of viewpoints;
and generating an image corresponding to the three-dimensional virtual scene based on the brightness of each of the plurality of visual points.
Optionally, the determining coordinates of a plurality of visual points of the three-dimensional virtual scene includes:
the coordinates of the plurality of viewpoints are determined in a reverse ray tracing manner.
Optionally, the determining the coordinates of the plurality of visual points in a reverse ray tracing manner includes:
determining a plurality of rays, wherein the rays are rays which are led out from each pixel point on the image plane of the virtual camera and pass through the optical center of the virtual camera;
and taking a plurality of points of intersection of the plurality of rays and the surface of the object in the three-dimensional virtual scene as the plurality of visual points, and determining the coordinates of the plurality of visual points.
Optionally, the determining the brightness of each of the plurality of visual points based on the light energy of the virtual photons around each of the plurality of visual points includes:
determining a first light energy for each of the plurality of viewpoints based on light energy of virtual photons around each of the plurality of viewpoints;
determining brightness of a corresponding visual point in the plurality of visual points based on the first light energy of each visual point in the plurality of visual points, wherein the brightness is carried by light rays reflected by the corresponding visual point when the light rays reach an optical center of the virtual camera.
Optionally, the determining the first light energy of each of the plurality of visual points based on the light energy of the virtual photon around each of the plurality of visual points comprises:
determining light energy for each of the plurality of viewpoints at a plurality of time instants based on light energy emitted to virtual photons around each of the plurality of viewpoints at the plurality of time instants, the plurality of time instants being a plurality of different time instants at which virtual photons are emitted to the three-dimensional virtual scene;
and determining the average value of the light energy of each of the plurality of visual points at the plurality of time instants as the first light energy of the corresponding visual point in the plurality of visual points.
Optionally, the determining, based on the light energy emitted to the virtual photons around each of the plurality of visual points at a plurality of time instants, the light energy of each of the plurality of visual points at the plurality of time instants comprises:
selecting one time from the plurality of times as a target time, and determining the light energy of each of the plurality of visual points at the target time according to the following operations until the light energy of each of the plurality of visual points at each time is determined:
determining coordinates of the virtual photons emitted at the target moment and residing on the surface of an object in the three-dimensional virtual scene to obtain coordinates of a plurality of virtual photons;
determining an optical energy of each virtual photon of the plurality of virtual photons;
determining the light energy of each of the plurality of visual points at the target time based on the coordinates of the plurality of visual points, the coordinates of the plurality of virtual photons, and the light energy of the virtual photons of the plurality of virtual photons that are located around each of the plurality of visual points.
Optionally, the determining the optical energy of each virtual photon in the plurality of virtual photons comprises:
determining a pixel value of a pixel point corresponding to each virtual photon in the plurality of virtual photons in the target projection image according to a forward ray tracing mode;
and determining the light energy of each virtual photon in the plurality of virtual photons based on the pixel value of the pixel point corresponding to each virtual photon in the plurality of virtual photons in the target projection image and the light energy of the virtual light source used for emitting the plurality of virtual photons.
Optionally, the determining, based on the coordinates of the plurality of visual points, the coordinates of the plurality of virtual photons, and the light energy of the virtual photons of the plurality of virtual photons located around each of the plurality of visual points, the light energy of each of the plurality of visual points at the target time comprises:
selecting one visual point from the plurality of visual points, and determining the light energy of the selected visual point at the target moment according to the following operations until the light energy of each visual point at the target moment is determined:
determining virtual photons located in a specified range based on the coordinates of the selected visual point and the coordinates of the virtual photons, wherein the specified range is a sphere range which takes the selected visual point as a sphere center and takes a specified numerical value as a radius;
and determining the sum of the light energy of the virtual photons in the specified range as the light energy of the selected visual point at the target moment.
Optionally, the image corresponding to the three-dimensional virtual scene includes a binocular image, and the coordinates of the multiple viewpoints are coordinates of the multiple viewpoints in a world coordinate system of the three-dimensional virtual scene;
after generating the image corresponding to the three-dimensional virtual scene based on the brightness of each of the plurality of visual points, the method further includes:
converting coordinates of each of the plurality of viewpoints in the world coordinate system into coordinates in a camera coordinate system of the virtual camera;
and correspondingly storing the coordinates of the binocular image and the plurality of visual points in the camera coordinate system.
Optionally, the method further includes:
taking a plurality of stored binocular images as input of a neural network model to be trained, taking vertical coordinates of a plurality of visual points corresponding to each binocular image in the plurality of binocular images in a camera coordinate system of the virtual camera as output of the neural network model to be trained, and training the neural network model to be trained;
the binocular images are generated based on the three-dimensional virtual scene.
Optionally, the image corresponding to the three-dimensional virtual scene includes a binocular image;
after generating the image corresponding to the three-dimensional virtual scene based on the brightness of each of the plurality of visual points, the method further includes:
generating a disparity map based on the binocular images;
and correspondingly storing the binocular image and the disparity map.
Optionally, the method further includes:
converting each stored disparity map in the plurality of disparity maps into a depth map to obtain a plurality of depth maps;
taking a plurality of stored binocular images as input of a neural network model to be trained, taking a depth map obtained by converting a disparity map corresponding to each binocular image in the binocular images as output of the neural network model to be trained, and training the neural network model to be trained;
the binocular images are generated based on the three-dimensional virtual scene.
In another aspect, there is provided an image generating apparatus, the apparatus comprising:
the system comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining the coordinates of a plurality of visual points of a three-dimensional virtual scene, and the plurality of visual points are corresponding points of a plurality of pixel points on an image plane of a virtual camera in the three-dimensional virtual scene;
the transmitting module is used for transmitting virtual photons to the three-dimensional virtual scene by taking each pixel point in a target projection image as a starting point, wherein the target projection image is a projection image corresponding to a scene needing simulation currently;
a second determining module for determining a brightness of each of the plurality of viewpoints based on light energy of virtual photons around each of the plurality of viewpoints;
and the first generating module is used for generating an image corresponding to the three-dimensional virtual scene based on the brightness of each of the plurality of visual points.
Optionally, the first determining module includes:
a first determining submodule for determining coordinates of the plurality of viewpoints in a reverse ray tracing manner.
Optionally, the first determining sub-module includes:
a first determining unit, configured to determine a plurality of rays, where the plurality of rays are rays that are extracted from each pixel point on an image plane of the virtual camera and pass through an optical center of the virtual camera;
and the second determining unit is used for determining the coordinates of the plurality of visual points by taking a plurality of points of intersection of the plurality of rays and the surface of the object in the three-dimensional virtual scene as the plurality of visual points.
Optionally, the second determining module includes:
a second determining sub-module for determining a first light energy of each of the plurality of viewpoints based on light energy of virtual photons around each of the plurality of viewpoints;
and the third determining sub-module is used for determining the brightness of a corresponding visual point in the plurality of visual points based on the first light energy of each visual point in the plurality of visual points, wherein the brightness is the light energy carried by the light reflected by the corresponding visual point when reaching the optical center of the virtual camera.
Optionally, the second determining sub-module includes:
a third determining unit configured to determine optical energy of each of the plurality of viewpoints at a plurality of time instants at which virtual photons are emitted to the three-dimensional virtual scene, based on optical energy of virtual photons emitted to surroundings of each of the plurality of viewpoints at the plurality of time instants;
a fourth determining unit, configured to determine an average value of light energy of each of the plurality of visual points at the plurality of time instants as a first light energy of a corresponding visual point in the plurality of visual points.
Optionally, the third determining unit is specifically configured to:
selecting one time from the plurality of times as a target time, and determining the light energy of each of the plurality of visual points at the target time according to the following operations until the light energy of each of the plurality of visual points at each time is determined:
determining coordinates of the virtual photons emitted at the target moment and residing on the surface of an object in the three-dimensional virtual scene to obtain coordinates of a plurality of virtual photons;
determining an optical energy of each virtual photon of the plurality of virtual photons;
determining the light energy of each of the plurality of visual points at the target time based on the coordinates of the plurality of visual points, the coordinates of the plurality of virtual photons, and the light energy of the virtual photons of the plurality of virtual photons that are located around each of the plurality of visual points.
Optionally, the third determining unit is specifically configured to:
determining a pixel value of a pixel point corresponding to each virtual photon in the plurality of virtual photons in the target projection image according to a forward ray tracing mode;
and determining the light energy of each virtual photon in the plurality of virtual photons based on the pixel value of the pixel point corresponding to each virtual photon in the plurality of virtual photons in the target projection image and the light energy of the virtual light source used for emitting the plurality of virtual photons.
Optionally, the third determining unit is specifically configured to:
selecting one visual point from the plurality of visual points, and determining the light energy of the selected visual point at the target moment according to the following operations until the light energy of each visual point at the target moment is determined:
determining virtual photons located in a specified range based on the coordinates of the selected visual point and the coordinates of the virtual photons, wherein the specified range is a sphere range which takes the selected visual point as a sphere center and takes a specified numerical value as a radius;
and determining the sum of the light energy of the virtual photons in the specified range as the light energy of the selected visual point at the target moment.
Optionally, the image corresponding to the three-dimensional virtual scene includes a binocular image, and the coordinates of the multiple viewpoints are coordinates of the multiple viewpoints in a world coordinate system of the three-dimensional virtual scene;
the device further comprises:
a first conversion module for converting coordinates of each of the plurality of viewpoints in the world coordinate system into coordinates in a camera coordinate system of the virtual camera;
and the first storage module is used for correspondingly storing the coordinates of the binocular image and the plurality of the visual points in the camera coordinate system.
Optionally, the apparatus further comprises:
the first training module is used for taking a plurality of stored binocular images as the input of a neural network model to be trained, taking the vertical coordinates of a plurality of visual points, corresponding to each binocular image, in the camera coordinate system of the virtual camera as the output of the neural network model to be trained, and training the neural network model to be trained;
the binocular images are generated based on the three-dimensional virtual scene.
Optionally, the image corresponding to the three-dimensional virtual scene includes a binocular image; the device further comprises:
the second generation module is used for generating a disparity map based on the binocular image;
and the second storage module is used for correspondingly storing the binocular image and the disparity map.
Optionally, the apparatus further comprises:
the second conversion module is used for converting each disparity map in the stored multiple disparity maps into a depth map to obtain multiple depth maps;
the second training module is used for taking the stored binocular images as the input of the neural network model to be trained, taking a depth map obtained by converting a disparity map corresponding to each binocular image in the binocular images as the output of the neural network model to be trained, and training the neural network model to be trained;
the binocular images are generated based on the three-dimensional virtual scene.
In another aspect, a computer device is provided, which includes a memory for storing a computer program and a processor for executing the computer program stored in the memory to implement the steps of the image generation method described above.
In another aspect, a computer-readable storage medium is provided, in which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of the image generation method described above.
In another aspect, a computer program product is provided comprising instructions which, when run on a computer, cause the computer to perform the steps of the image generation method described above.
The technical scheme provided by the embodiment of the application can at least bring the following beneficial effects:
according to the method and the device, the plurality of visual points of the three-dimensional virtual machine scene are determined, the brightness of the plurality of visual points is determined, the image corresponding to the three-dimensional virtual scene is generated based on the brightness of the plurality of visual points, the generated image effect is real, the image generation process is automatic, the image generation cost is reduced, and the image generation efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of an image generation method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of determining a visual point of a three-dimensional virtual scene according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a three-dimensional virtual scene provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of projecting virtual photons into a three-dimensional virtual scene according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a plurality of projection images provided by an embodiment of the present application;
fig. 6 is a schematic diagram of determining a first light energy of a visual point according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a speckle pattern generated by an image generation method according to an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of a true speckle pattern provided by an embodiment of the present application;
fig. 9 is a schematic diagram of a binocular image provided by an embodiment of the present application;
fig. 10 is a schematic diagram of a disparity map provided in an embodiment of the present application;
fig. 11 is a schematic structural diagram of an image generating apparatus according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.
An execution subject of the image generation method provided by the embodiment of the application can be computer equipment. The computer device may be a single computer device, or may be a computer cluster composed of a plurality of computer devices.
The Computer device may be any electronic product that can perform human-Computer interaction with a user through one or more modes such as a keyboard, a touch pad, a touch screen, a remote controller, voice interaction, or handwriting equipment, for example, a PC (Personal Computer), a palm PC ppc (pocket PC), a tablet PC, and the like.
Those skilled in the art should appreciate that the foregoing computer devices are merely exemplary and that other computer devices, existing or later developed, that may be suitable for use with the embodiments of the present application are also encompassed within the scope of the embodiments of the present application and are hereby incorporated by reference.
Next, a detailed explanation will be given of an image generation method provided in an embodiment of the present application.
Fig. 1 is a flowchart of an image generation method provided in an embodiment of the present application, where the method is applied to a computer device. Referring to fig. 1, the method includes the following steps.
S101, determining coordinates of a plurality of visual points of the three-dimensional virtual scene, wherein the plurality of visual points are corresponding points of a plurality of pixel points on an image plane of the virtual camera in the three-dimensional virtual scene.
In some embodiments, the coordinates of the plurality of viewpoints may be determined in a reverse ray tracing manner.
Wherein, the implementation process of determining the coordinates of the plurality of visual points according to the inverse ray tracing mode comprises the following steps: and determining a plurality of rays, wherein the rays are rays which are led out from each pixel point on the image plane of the virtual camera and pass through the optical center of the virtual camera. And taking a plurality of points of intersection of the rays and the surface of the object in the three-dimensional virtual scene as the plurality of visual points, and determining the coordinates of the plurality of visual points.
According to the pinhole imaging principle, when a camera is used for shooting an object in a three-dimensional scene, a plurality of points on the surface of the object in the three-dimensional scene reflect light to the optical center of the camera, and the light reaches a corresponding pixel point on the image plane of the camera through the optical center of the camera to be imaged. Similarly, when the virtual camera is used to shoot an object in a three-dimensional virtual scene, a plurality of points on the surface of the object in the three-dimensional virtual scene reflect light to the optical center of the virtual camera, and reach corresponding pixel points on the image plane of the virtual camera through the optical center of the virtual camera to form an image, where the plurality of points on the surface of the object are a plurality of visual points of the three-dimensional virtual scene. Therefore, in order to determine the plurality of visual points of the three-dimensional virtual scene, an inverse ray tracing method may be used, i.e., rays passing through the optical center of the virtual camera are extracted from each pixel point on the image plane of the virtual camera, and a plurality of points where the rays intersect with the surface of the object in the three-dimensional virtual scene are determined as the plurality of visual points.
Because there may be a plurality of objects in the three-dimensional virtual scene, and the materials of the plurality of objects are different, that is, the materials of the surfaces of the objects intersecting the plurality of rays are different, the method for determining the visual point is also different. For example, the surface of the object intersected by the plurality of rays may be a rough surface or a mirror surface. These two cases will be described below.
In the first case, the surface of the object is a rough surface, and when any ray of the rays intersects with the rough surface in the three-dimensional virtual scene, the intersected point is determined as a visual point of the three-dimensional virtual scene.
In the second case, the surface of the object is a mirror surface, and when any ray of the rays intersects with the mirror surface in the three-dimensional virtual scene, the reflection line of the any ray at the intersection point of the mirror surface is determined. And when the reflection line is intersected with the rough surface in the three-dimensional virtual scene, determining a point at which the reflection line is intersected with the rough surface as a visual point of the three-dimensional virtual scene.
For example, as shown in fig. 2, the surface of the object in the three-dimensional virtual scene includes a rough surface and a mirror surface. Two rays are led out from a plurality of pixel points on the image plane of the virtual camera to the optical center of the virtual camera. The first ray intersects with the rough surface of the object in the three-dimensional virtual scene, and therefore the point where the first ray intersects with the rough surface of the object in the three-dimensional virtual scene is determined as a visual point a of the three-dimensional virtual scene. The second ray intersects the specular surface of the object in the three-dimensional virtual scene, and therefore, the determination of the reflection line of the second ray at the intersection point with the specular surface of the object in the three-dimensional virtual scene is continued. The reflection line intersects with the rough surface of the object in the three-dimensional virtual scene, so that the intersection point of the reflection line and the rough surface of the object in the three-dimensional virtual scene is determined as another visual point b of the three-dimensional virtual scene.
After the plurality of visual points of the three-dimensional virtual scene are determined according to the method, a world coordinate system of the three-dimensional virtual scene can be established, and the coordinates of each visual point in the plurality of visual points are determined based on the world coordinate system of the three-dimensional virtual scene. That is, the coordinates of the plurality of visual points are the coordinates of the plurality of visual points in the world coordinate system of the three-dimensional virtual scene. In this embodiment of the present application, a world coordinate system of the three-dimensional virtual scene may be established using any point in the three-dimensional virtual scene as an origin, and using a horizontal right direction, a vertical downward direction, and a vertical upward direction as coordinate axes, and certainly, the world coordinate system of the three-dimensional virtual scene may also be established in other manners, which is not limited in this embodiment of the present application.
Before the coordinates of the plurality of visual points of the three-dimensional virtual scene are determined, the three-dimensional virtual scene may be created. The method for manufacturing the three-dimensional virtual scene comprises the following steps: and acquiring three-dimensional model data based on the three-dimensional entity scene. The three-dimensional model data is input to a three-dimensional scene editor or physical simulation tool. And automatically building a corresponding three-dimensional virtual scene by using a three-dimensional scene editor or a physical simulation tool. That is, the three-dimensional virtual scene is a three-dimensional virtual scene that is automatically built by using three-dimensional model data of a three-dimensional entity scene as input of a three-dimensional scene editor or a physical simulation tool.
Illustratively, when three-dimensional virtual scenes corresponding to a plurality of packages stacked and placed in an express delivery site are manufactured, three-dimensional model data of the plurality of packages stacked and placed can be acquired, the three-dimensional model data is input into a three-dimensional scene editor or a physical simulation tool, and the three-dimensional virtual scenes corresponding to the plurality of packages stacked and placed are automatically built by using the three-dimensional scene editor or the physical simulation tool, so that the three-dimensional virtual scene shown in fig. 3 is obtained.
It should be noted that the virtual camera may be a monocular camera or a binocular camera, and the virtual camera may be deployed at any position in the three-dimensional virtual scene according to the needs of the user.
And S102, taking each pixel point in the target projection image as a starting point, emitting virtual photons to the three-dimensional virtual scene, wherein the target projection image is a projection image corresponding to the current scene to be simulated.
The target projection image comprises a plurality of pixel points, and when the target projection image is irradiated by using a virtual light source, a plurality of virtual photons emitted by the virtual light source are emitted to the three-dimensional virtual scene through each pixel point in the plurality of pixel points. In the process of emitting the virtual photons, the emitting direction of each virtual photon in the virtual photons is random, and the virtual photons can reside on the surface of an object in the three-dimensional virtual scene after being emitted to the three-dimensional virtual scene.
As shown in fig. 4, among the plurality of virtual photons emitted through the target projection image toward the three-dimensional virtual scene, a portion of the virtual photons reside at a visible point a of the three-dimensional virtual scene. Another portion of the virtual photons reside at non-visible points c of the three-dimensional virtual scene. And after a part of the virtual photons reach the visual point b of the three-dimensional virtual scene, the virtual photons leave the three-dimensional virtual scene through reflection.
It should be noted that the target projection image is a projection image corresponding to a scene that needs to be currently simulated, which is selected from the stored plurality of projection images. Each of the plurality of projected images has a different projection pattern to enable simulation of different lighting scenes, and the plurality of projected images can simulate any lighting scene.
For example, as shown in fig. 5, the plurality of projection images include a structured light fringe pattern, an infrared staggered dot pattern, and an infrared random dot pattern. The structured light stripe pattern is used for simulating a scene irradiated by structured light, the infrared staggered dot matrix speckle pattern is used for simulating a scene irradiated by infrared light, and the infrared random dot matrix speckle pattern is used for simulating a scene irradiated by infrared light at random. Of course, the plurality of projection images may also include simulated images of other illuminated scenes, which is not limited herein.
S103, determining the brightness of each of the plurality of visual points based on the light energy of the virtual photons around each of the plurality of visual points.
In some embodiments, the implementation process of determining the brightness of each of the plurality of visual points based on the light energy of the virtual photons around each of the plurality of visual points comprises the following steps (1) - (2):
(1) a first light energy for each of the plurality of viewpoints is determined based on light energy of virtual photons around each of the plurality of viewpoints.
In some embodiments, the light energy for each of the plurality of viewpoints at a plurality of time instances, which are a plurality of different time instances at which virtual photons are emitted into the three-dimensional virtual scene, may be determined based on the light energy emitted at the plurality of time instances to virtual photons around each of the plurality of viewpoints. And determining the average value of the light energy of each of the plurality of visual points at the plurality of time instants as the first light energy of the corresponding visual point in the plurality of visual points.
Since the virtual light source may emit the virtual photons to the three-dimensional virtual scene through the target projection image without interruption, the plurality of moments may be a plurality of different moments at which the virtual light source emits the virtual photons to the three-dimensional virtual scene. In addition, since the directions of the virtual photons passing through each pixel point are random, the number of the virtual photons residing near each of the plurality of visual points is different, so that the light energy obtained by each of the plurality of visual points at the plurality of time instants is different, and therefore, in some embodiments, the average value of the light energy of each of the plurality of visual points at the plurality of time instants may be determined as the light energy of the corresponding visual point, i.e., the first light energy. The plurality of times may be a plurality of times with the same interval or a plurality of times with different intervals.
Since the implementation process of determining the light energy of each of the plurality of visual points at a plurality of time instances is the same, one time instance may be selected from the plurality of time instances as a target time instance, and the light energy of each of the plurality of visual points at the target time instance may be determined according to the following operations until the light energy of each of the plurality of visual points at each time instance is determined: and determining the coordinates of the object surface where the virtual photon emitted at the target moment resides in the three-dimensional virtual scene to obtain the coordinates of a plurality of virtual photons. The optical energy of each virtual photon of the plurality of virtual photons is determined. Determining the optical energy of each of the plurality of visual points at the target time based on the coordinates of the plurality of visual points, the coordinates of the plurality of virtual photons, and the optical energy of the virtual photons of the plurality of virtual photons located around each of the plurality of visual points.
Based on the above, after each pixel point in the target projection image is taken as a starting point and virtual photons are emitted to the three-dimensional virtual scene, some virtual photons may reside on the surface of an object in the three-dimensional virtual scene, so in this embodiment of the application, when a plurality of virtual photons emitted at the target time reside on the surface of an object in the three-dimensional virtual scene, the coordinates of the virtual photons emitted at the target time may be determined based on the three-dimensional coordinate system of the three-dimensional virtual scene, so as to obtain the coordinates of the plurality of virtual photons. Exemplarily, according to a forward ray tracing manner, the coordinates of the plurality of virtual photons are determined, and the pixel value of a pixel point corresponding to each virtual photon in the plurality of virtual photons in the target projection image is determined. And determining the light energy of each virtual photon in the plurality of virtual photons based on the pixel value of the pixel point corresponding to each virtual photon in the plurality of virtual photons in the target projection image and the light energy of the virtual light source used for emitting the plurality of virtual photons.
That is, for each pixel point in the target projection image, a plurality of rays are drawn from the virtual light source and pass through the pixel point, and the rays point to different directions. For any ray in the rays, if the ray intersects with the surface of the object in the three-dimensional virtual scene, determining the coordinate of the intersection point as the coordinate of the virtual photon emitted from the pixel point, and if the ray does not intersect with the surface of the object in the three-dimensional virtual scene, discarding the ray. According to the mode, the coordinates of the virtual photons can be determined, and meanwhile, the corresponding pixel point of each virtual photon in the target projection image can be determined, namely the virtual photon is sent from which pixel point in the target projection image. Then, the light energy of each virtual photon can be determined based on the pixel value of the pixel point corresponding to each virtual photon in the target projection image and the light energy of the virtual light source.
For example, after determining a corresponding pixel point of each virtual photon in the target projection image, a corresponding relationship between the coordinate of the virtual photon and the pixel value of the pixel point may be stored.
Because the virtual photon is emitted by the virtual light source through the pixel point in the target projection image, the light energy of the virtual photon is related to the light energy of the virtual light source and the pixel value of the pixel point in the target projection image. Moreover, based on the above description, according to the forward ray tracing manner, not only the coordinates of each virtual photon can be determined, but also the correspondence between the coordinates of the virtual photon and the pixel values of the pixel points can be determined. Therefore, the pixel value of the pixel point corresponding to each virtual photon in the plurality of virtual photons in the target projection image can be determined from the corresponding relation between the coordinate of the virtual photon and the pixel value of the pixel point according to the coordinates of the plurality of virtual photons. And then, determining the product of the pixel value of the corresponding pixel point and the light energy of the virtual light source as the light energy of the corresponding virtual photon.
Since the light energy of each visual point at the target time is determined in the same manner, one visual point may be selected from the plurality of visual points, and the light energy of the selected visual point at the target time may be determined according to the following operations until the light energy of each visual point at the target time is determined: and determining the virtual photon positioned in a specified range based on the coordinate of the selected visual point and the coordinates of the plurality of virtual photons, wherein the specified range is a sphere range which takes the selected visual point as a sphere center and takes a specified numerical value as a radius. And determining the sum of the light energy of the virtual photons in the specified range as the light energy of the selected visual point at the target moment.
That is, a designated range is determined based on the coordinates of the selected visual point, and the coordinates of the plurality of virtual photons are compared with the designated range, thereby determining a virtual photon located within the designated range from among the plurality of virtual photons. The sum of the optical energy of each virtual photon located within the specified range may then be determined as the optical energy of the selected viewpoint at the target instant.
The method comprises the steps of firstly determining the light energy of all virtual photons residing on the surface of an object in the three-dimensional virtual scene, then determining the virtual photons within a specified range from the virtual photons, and further determining the light energy of the selected visual point at the target moment. Optionally, in other embodiments, the virtual photons within the specified range may be determined first, then the optical energy of each virtual photon within the specified range is determined, and then the sum of the optical energy of these virtual photons within the specified range is determined as the optical energy of the selected visual point at the target time. Thus, the calculation amount of the light energy for determining the virtual photons can be reduced, and the image generation efficiency can be improved.
Because each pixel point on the image plane of the virtual camera has a certain photosensitive range and can receive light energy in a certain range from the surface of an object in the three-dimensional virtual scene, for each visual point in the plurality of visual points, the sum of the light energy of virtual photons in the specified range of the visual point can be determined, and the sum of the light energy of each virtual photon in the specified range is determined as the light energy of the corresponding visual point at the target moment.
As shown in fig. 6, one of the pixels on the image plane of the virtual camera can receive the light energy from the designated range of the visual point a (the solid line and the dotted line in fig. 6 represent the light energy emitted by the virtual photon in the designated range of the visual point to the corresponding pixel), so that the sum of the light energy of the virtual photon in the designated range of the visual point a is determined as the light energy of the visual point a at the target time. Another pixel point on the image plane of the virtual camera can receive the reflected light energy from the mirror surface, the reflected light energy is obtained by the reflection of the mirror surface from the light energy in the specified range of the visual point b, so the sum of the light energy of the virtual photons in the specified range of the visual point b is determined as the light energy of the visual point b at the target moment.
(2) Determining the brightness of the corresponding visual point in the plurality of visual points based on the first light energy of each visual point in the plurality of visual points, wherein the brightness is the light energy carried by the light ray reflected by the corresponding visual point when the light ray reaches the optical center of the virtual camera.
According to the pinhole imaging principle, in the imaging process, light rays reflected by a plurality of points on the surface of an object in a three-dimensional scene reach corresponding pixel points on an image plane of a camera through an optical center of the camera, and the light energy carried by the light rays reflected by each point in the plurality of points on the surface of the object determines the brightness of the point. In the same way, in the imaging process, the light reflected by each of the plurality of visual points of the three-dimensional virtual scene reaches the corresponding pixel point on the image plane of the virtual camera through the optical center of the virtual camera, and the light energy carried by the light emitted by each of the plurality of visual points determines the brightness of the visual point. Accordingly, the brightness of the respective one of the plurality of viewable points may be determined based on the first light energy of each of the plurality of viewable points.
Based on the foregoing description, the first light energy of each of the plurality of viewpoints is determined by virtual photons emitted by a virtual light source, the illumination of the viewpoints by the virtual light source being referred to as indirect illumination. However, in some cases, the plurality of visual points may also have self-luminescence and/or direct illumination, for example, when an object in which a visual point is located is a luminescent object, the visual point has self-luminescence. When a light source directly illuminates a certain visual point, the visual point has direct illumination. Thus, it is also necessary to determine a second light energy and a third light energy for each of the plurality of visual points, the second light energy being self-luminous light energy of the visual points, and the third light energy being direct illumination light energy of the visual points. Then, based on the first light energy, the second light energy, and the third light energy of each of the plurality of viewable points, the brightness of the corresponding viewable point of the plurality of viewable points is determined.
Determining the brightness of each of the plurality of visual points includes the following four cases based on whether each of the plurality of visual points has spontaneous light and direct illumination, and the following four cases are respectively described by taking any one of the visual points as an example. For convenience of description, any one of the plurality of viewpoints is referred to as a target viewpoint.
In the first case, the target visual point has self-luminous light and direct illumination, and at this time, the brightness of the target visual point may be determined based on the first light energy, the second light energy, and the third light energy of the target visual point.
As an example, the brightness of the target visual point may be determined by the following rendering equation (1) based on the first light energy, the second light energy, and the third light energy of the target visual point.
Figure BDA0003234207660000141
Wherein, in the above rendering equation (1), p is a target visual point, ωoIs the outgoing ray direction of the point p, i.e. the direction of the point p to the optical center of the virtual camera, omegaiIs the direction of the incident ray at point p, thetaiIs the angle of the incident ray to the normal to point p. L (p, ω)o) Is the brightness of the p-point. L ise(p,ωo) Is the second light energy in the direction from the point p to the optical center of the virtual camera, i.e., the self-luminous light energy. s2The sphere is a sphere with the p point as the sphere center and the designated numerical value as the radius. f (p, omega)oi) The bi-directional reflection function determines the ratio of incident light energy to the exiting light energy. Objects of different materials have different bi-directional reflection functions.
Ld(p,ωi) Is the third light energy of the p-point in the direction of the incident ray, i.e. the light energy of the direct illumination. L isi(p,ωi) Is the first light energy of the p-point in the direction of the incident ray, i.e. the light energy of the indirect illumination. cos θiIs the cosine of the angle between the incident ray and the normal to the surface at point p. d omegaiThe direction of the incident ray at point p.
Optionally, since the integral is an area integral and has no analytic solution, the integral may be solved by multiple iterations using a monte carlo method to obtain the brightness of the target visual point.
In the second case, the target visual point has self-luminescence without direct illumination, and at this time, the brightness of the target visual point may be determined based on the first and second light energies of the target visual point.
As an example, the brightness of the target visual point may be determined by the following rendering equation (2) based on the first light energy and the second light energy of the target visual point.
Figure BDA0003234207660000151
The meaning of each parameter and the integral calculation method in the above rendering equation (2) have been described in the first case, and are not described herein again.
In the third case, the target visual point has no self-luminescence and has direct illumination, and in this case, the brightness of the target visual point may be determined based on the first and third light energies of the target visual point.
As an example, the brightness of the target visual point may be determined by the following rendering equation (3) based on the first light energy and the third light energy of the target visual point.
Figure BDA0003234207660000152
The meaning of each parameter and the integral calculation method in the above rendering equation (3) have been described in the first case, and are not described herein again.
In the fourth case, the target visual point is not self-luminous nor directly illuminated, and at this time, the brightness of the target visual point may be determined based on the first light energy of the target visual point.
As an example, the brightness of the target visual point may be determined by the following rendering equation (4) based on the first light energy of the target visual point.
Figure BDA0003234207660000161
The meaning of each parameter and the integral calculation method in the above rendering equation (4) have been described in the first case, and are not described herein again.
And S104, generating an image corresponding to the three-dimensional virtual scene based on the brightness of each of the plurality of visual points.
Based on the foregoing, the plurality of projection images including the target projection image are all gray-scale images, and the pixel values of the pixel points in the gray-scale images have a certain relationship with the brightness, so after the brightness of each of the plurality of visual points is determined, the pixel value of the pixel point corresponding to each visual point on the image plane of the virtual camera can be determined based on the brightness of each visual point in the plurality of visual points, and then the image corresponding to the three-dimensional virtual scene is generated.
For example, the integral corresponding to each of the plurality of visual points is iterated more than 200 times by using the monte carlo method, and the brightness of each visual point is calculated. Then, based on the brightness of each viewable point, a speckle pattern as shown in fig. 7 can be obtained. Wherein, fig. 8 is a speckle pattern of a real scene. By comparing fig. 7 with fig. 8, it can be determined that the speckle pattern generated by using the image generation method of the embodiment of the present application is real and has no difference from the real speckle pattern.
Based on the foregoing description, the brightness of each of the plurality of visual points is related to the light energy of the virtual photon passing through each pixel point of the target projection image, and the light energy of each virtual photon is related to a pixel point of the target projection image, so that different target projection images can be used to generate images corresponding to different three-dimensional virtual scenes.
For example, when an infrared staggered dot matrix speckle pattern or an infrared random dot matrix speckle pattern is used as a target projection image, a speckle pattern corresponding to a three-dimensional virtual scene can be generated. When the structured light stripe pattern is used as a target projection image, a structured light pattern corresponding to a three-dimensional virtual scene may be generated.
It should be noted that the image corresponding to the three-dimensional virtual scene may include a monocular image or a binocular image. For example, as shown in fig. 9, the image corresponding to the three-dimensional virtual scene includes a binocular image, that is, a left eye image and a right eye image.
When the image corresponding to the three-dimensional virtual scene includes a binocular image, the following two processing methods may be adopted for the binocular image.
In a first processing mode, coordinates of each of the plurality of visual points in the world coordinate system are converted into coordinates in a camera coordinate system of the virtual camera, and the coordinates of the binocular image and the plurality of visual points in the camera coordinate system are correspondingly stored.
As an example, the external reference matrix of the virtual camera may be multiplied by the coordinates of each of the plurality of viewpoints in the world coordinate system to obtain the coordinates of each of the plurality of viewpoints in the camera coordinate system of the virtual camera.
For example, for any of the plurality of visual points, the coordinates of the any visual point in the camera coordinate system of the virtual camera may be determined according to the following formula (5) based on the coordinates of the any visual point in the world coordinate system of the three-dimensional virtual scene and the external reference matrix of the virtual camera.
Figure BDA0003234207660000171
Wherein, in the above formula (5), (x)c,yc,zc) Is the coordinate of the visual point in the camera coordinate system of the virtual camera. (x, y, z) is the coordinates of the viewpoint in the world coordinate system of the three-dimensional virtual scene. M0=[R3×3T3×1]Is an external parameter matrix of the virtual camera, wherein R3×3For a rotation matrix, T3×1Is a translation matrix.
And the second processing mode is that a disparity map is generated based on the binocular image, and the binocular image and the disparity map are correspondingly stored.
As an example, for any of the plurality of visual points, the parallax value of the parallax point corresponding to the pixel point corresponding to the any visual point may be determined according to the following formula (6) based on the coordinates of the visual point in the world coordinate system of the three-dimensional virtual scene, the distance between the left virtual camera and the right virtual camera, and the focal length in the x direction in the internal reference of the any virtual camera. And determining the parallax value of each parallax point to obtain a parallax map.
disp=fxTxZc (6)
In the formula (6), disp is a parallax value of a parallax point corresponding to a pixel point corresponding to the any visual point. f. ofxThe focal length in the horizontal axis direction of the internal reference of any virtual camera may be, for example, the focal length in the horizontal axis direction of the internal reference of the left virtual camera. T isxRefer to left virtual camera and right virtualThe spacing between the cameras. ZcRefers to the vertical coordinates of any of the visual points in the world coordinate system of the three-dimensional virtual scene.
For example, fig. 10 is a disparity map generated based on a left eye map and a right eye map. The closer an object in the three-dimensional virtual scene is to the lens of the virtual camera, the greater the parallax is, and the whiter the color of the parallax map is.
The binocular images and the coordinates of the plurality of visual points in the camera coordinate system stored in the first processing mode, and the binocular images and the disparity map stored in the second processing mode can be widely applied to cameras, games and other fields related to image processing.
As an example, the binocular image may be used to train a neural network model to be trained for the camera domain. The neural network model to be trained is used for calculating the distance from the target object to the camera. The neural network model to be trained is enabled to master the accurate mapping relation between the input and the output through deep learning by providing a plurality of inputs and outputs for the neural network model to be trained, so that the calculation accuracy of the neural network model to be trained is improved. The neural network model to be trained can be trained through the following two implementation modes.
In a first implementation manner, a plurality of stored binocular images are used as input of a neural network model to be trained, vertical coordinates of a plurality of visual points, corresponding to each binocular image, in a camera coordinate system of the virtual camera in the plurality of binocular images are used as output of the neural network model to be trained, and the neural network model to be trained is trained. The binocular images are generated based on the three-dimensional virtual scene.
That is, for any binocular image in the plurality of binocular images, the any binocular image is used as the input of the neural network model to be trained, the vertical coordinates of the plurality of visual points corresponding to the any binocular image in the camera coordinate system of the virtual camera are used as the output of the neural network model to be trained, and the neural network model to be trained is trained. After the neural network model to be trained is trained through the plurality of binocular images and the visual points corresponding to the plurality of binocular images, the training process of the neural network model to be trained can be completed.
In a second implementation, each disparity map in the stored multiple disparity maps is converted into a depth map, so as to obtain multiple depth maps. And taking the stored binocular images as the input of the neural network model to be trained, taking a depth map obtained by converting the disparity map corresponding to each binocular image in the binocular images as the output of the neural network model to be trained, and training the neural network model to be trained. The binocular images are generated based on the three-dimensional virtual scene.
That is, for any binocular image in the plurality of binocular images, the any binocular image is used as the input of the neural network model to be trained, the depth map obtained by converting the disparity map corresponding to the any binocular image is used as the output of the neural network model to be trained, and the neural network model to be trained is trained. After the neural network model to be trained is trained through the depth map obtained by converting the plurality of binocular images and the disparity maps corresponding to the plurality of binocular images, the training process of the neural network model to be trained can be completed.
The disparity map may be converted into a corresponding depth map based on related parameters of the camera, and a method for converting the disparity map into the depth map may refer to related technologies, which is not limited herein.
When the neural network model to be trained is trained, the real value of the distance measurement needs to be acquired as the output of the neural network model to be trained, which cannot be realized through manual labeling, but the process can be realized after the binocular image and the depth value are generated by using the image generation method provided by the embodiment of the application, so that the training of the neural network model to be trained is facilitated. Moreover, under the condition that a large number of generated images exist, the processing precision of the trained neural network model can be improved.
Fig. 11 is a schematic structural diagram of an image generation apparatus provided in an embodiment of the present application, where the image generation apparatus may be implemented as part or all of a computer device by software, hardware, or a combination of the two. Referring to fig. 11, the apparatus includes: a first determining module 1101, a transmitting module 1102, a second determining module 1103 and a first generating module 1104.
A first determining module 1101, configured to determine coordinates of multiple visual points of a three-dimensional virtual scene, where the multiple visual points are corresponding points of multiple pixel points on an image plane of a virtual camera in the three-dimensional virtual scene;
the emitting module 1102 is configured to emit virtual photons to the three-dimensional virtual scene by using each pixel point in the target projection image as a starting point, where the target projection image is a projection image corresponding to a current scene to be simulated;
a second determining module 1103 for determining the brightness of each of the plurality of viewpoints based on the light energy of the virtual photons around each of the plurality of viewpoints;
a first generating module 1104, configured to generate an image corresponding to the three-dimensional virtual scene based on the brightness of each of the plurality of visual points.
Optionally, the first determining module 1101 includes:
a first determining submodule for determining coordinates of the plurality of viewpoints in a reverse ray tracing manner.
Optionally, the first determining sub-module includes:
the first determining unit is used for determining a plurality of rays, wherein the rays are rays which are led out from each pixel point on the image plane of the virtual camera and pass through the optical center of the virtual camera;
and the second determining unit is used for determining the coordinates of the plurality of visual points by taking a plurality of points of intersection of the plurality of rays and the surface of the object in the three-dimensional virtual scene as the plurality of visual points.
Optionally, the second determining module 1103 includes:
a second determining sub-module for determining a first light energy of each of the plurality of viewpoints based on light energy of virtual photons around each of the plurality of viewpoints;
and the third determining submodule is used for determining the brightness of the corresponding visual point in the plurality of visual points based on the first light energy of each visual point in the plurality of visual points, wherein the brightness is the light energy carried when the light ray reflected by the corresponding visual point reaches the optical center of the virtual camera.
Optionally, the third determining sub-module includes:
a third determining unit configured to determine optical energy of each of the plurality of viewpoints at a plurality of time instants at which virtual photons are emitted to the three-dimensional virtual scene, based on optical energy of virtual photons emitted to surroundings of each of the plurality of viewpoints at the plurality of time instants;
a fourth determining unit, configured to determine an average value of light energy of each of the plurality of visual points at the plurality of time instants as the first light energy of the corresponding visual point in the plurality of visual points.
Optionally, the third determining unit is specifically configured to:
selecting one time from the plurality of times as a target time, and determining the light energy of each of the plurality of visual points at the target time according to the following operations until the light energy of each of the plurality of visual points at each time is determined:
determining the coordinates of the virtual photons emitted at the target moment and residing on the surface of an object in the three-dimensional virtual scene to obtain the coordinates of the virtual photons;
determining an optical energy of each virtual photon of the plurality of virtual photons;
determining the optical energy of each of the plurality of visual points at the target time based on the coordinates of the plurality of visual points, the coordinates of the plurality of virtual photons, and the optical energy of the virtual photons of the plurality of virtual photons located around each of the plurality of visual points.
Optionally, the third determining unit is specifically configured to:
determining a pixel value of a pixel point corresponding to each virtual photon in the plurality of virtual photons in the target projection image according to a forward ray tracing mode;
and determining the light energy of each virtual photon in the plurality of virtual photons based on the pixel value of the pixel point corresponding to each virtual photon in the plurality of virtual photons in the target projection image and the light energy of the virtual light source used for emitting the plurality of virtual photons.
Optionally, the third determining unit is specifically configured to:
selecting one visual point from the plurality of visual points, and determining the optical energy of the selected visual point at the target moment according to the following operations until the optical energy of each visual point at the target moment is determined:
determining virtual photons located in a specified range based on the selected coordinate of the visual point and the coordinates of the virtual photons, wherein the specified range is a sphere range which takes the selected visual point as a sphere center and takes a specified numerical value as a radius;
and determining the sum of the optical energy of the virtual photons within the specified range as the optical energy of the selected visual point at the target moment.
Optionally, the image corresponding to the three-dimensional virtual scene includes a binocular image, and the coordinates of the multiple viewpoints are coordinates of the multiple viewpoints in a world coordinate system of the three-dimensional virtual scene;
the device also includes:
a first conversion module, configured to convert coordinates of each of the plurality of viewpoints in the world coordinate system into coordinates in a camera coordinate system of the virtual camera;
the first storage module is used for correspondingly storing the coordinates of the binocular image and the plurality of visual points in the camera coordinate system.
Optionally, the apparatus further comprises:
the first training module is used for taking the stored binocular images as the input of the neural network model to be trained, taking the vertical coordinates of the visual points, corresponding to each binocular image, in the camera coordinate system of the virtual camera as the output of the neural network model to be trained, and training the neural network model to be trained;
the binocular images are generated based on the three-dimensional virtual scene.
Optionally, the image corresponding to the three-dimensional virtual scene includes a binocular image; the device also includes:
the second generation module is used for generating a disparity map based on the binocular image;
and the second storage module is used for correspondingly storing the binocular image and the disparity map.
Optionally, the apparatus further comprises:
the second conversion module is used for converting each disparity map in the stored multiple disparity maps into a depth map to obtain multiple depth maps;
the second training module is used for taking the stored binocular images as the input of the neural network model to be trained, taking a depth map obtained by converting a disparity map corresponding to each binocular image in the binocular images as the output of the neural network model to be trained, and training the neural network model to be trained;
the binocular images are generated based on the three-dimensional virtual scene.
The image generation method provided by the embodiment of the application can simulate any projection image, namely can simulate any illumination scene to obtain the image under the corresponding illumination scene. The generated image is real in result and is not different from the real image, so that the generated image can be used for training the deep learning algorithm in multiple fields to improve the precision of the deep learning algorithm. The image generation method is automatic in process, uses the virtual three-dimensional scene and the virtual camera, and does not need a solid scene and solid equipment, so that the image generation cost is reduced, and the image generation efficiency is improved.
It should be noted that: in the image generating apparatus provided in the above embodiment, when generating an image, only the division of the above functional modules is exemplified, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to complete all or part of the above described functions. In addition, the image generation apparatus and the image generation method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.
Fig. 12 is a block diagram of a terminal 1200 according to an embodiment of the present application. The terminal 1200 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 1200 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so forth.
In general, terminal 1200 includes: a processor 1201 and a memory 1202.
The processor 1201 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1201 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1201 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1201 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing content that the display screen needs to display. In some embodiments, the processor 1201 may further include an AI (Artificial Intelligence) processor for processing a computing operation related to machine learning.
Memory 1202 may include one or more computer-readable storage media, which may be non-transitory. Memory 1202 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1202 is used to store at least one instruction for execution by processor 1201 to implement the image generation methods provided by method embodiments herein.
In some embodiments, the terminal 1200 may further optionally include: a peripheral interface 1203 and at least one peripheral. The processor 1201, memory 1202, and peripheral interface 1203 may be connected by a bus or signal line. Various peripheral devices may be connected to peripheral interface 1203 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1204, touch display 1205, camera 1206, audio circuitry 1207, pointing component 1208, and power source 1209.
The peripheral interface 1203 may be used to connect at least one peripheral associated with I/O (Input/Output) to the processor 1201 and the memory 1202. In some embodiments, the processor 1201, memory 1202, and peripheral interface 1203 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1201, the memory 1202 and the peripheral device interface 1203 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 1204 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1204 communicates with a communication network and other communication devices by electromagnetic signals. The radio frequency circuit 1204 converts an electric signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electric signal. Optionally, the radio frequency circuit 1204 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1204 may communicate with other terminals through at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 1204 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 1205 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1205 is a touch display screen, the display screen 1205 also has the ability to acquire touch signals on or over the surface of the display screen 1205. The touch signal may be input to the processor 1201 as a control signal for processing. At this point, the display 1205 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 1205 may be one, providing the front panel of the terminal 1200; in other embodiments, the display 1205 can be at least two, respectively disposed on different surfaces of the terminal 1200 or in a folded design; in still other embodiments, the display 1205 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 1200. Even further, the display screen 1205 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display panel 1205 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or other materials.
Camera assembly 1206 is used to capture images or video. Optionally, camera assembly 1206 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1206 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
The audio circuitry 1207 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals into the processor 1201 for processing or inputting the electric signals into the radio frequency circuit 1204 to achieve voice communication. For stereo capture or noise reduction purposes, multiple microphones may be provided at different locations of terminal 1200. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1201 or the radio frequency circuit 1204 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1207 may also include a headphone jack.
The positioning component 1208 is configured to locate a current geographic Location of the terminal 1200 to implement navigation or LBS (Location Based Service). The Positioning component 1208 can be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.
The power supply 1209 is used to provide power to various components within the terminal 1200. The power source 1209 may be alternating current, direct current, disposable or rechargeable. When the power source 1209 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
Those skilled in the art will appreciate that the configuration shown in fig. 12 is not intended to be limiting of terminal 1200 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
In some embodiments, a computer-readable storage medium is also provided, in which a computer program is stored, which when executed by a processor implements the steps of the image generation method in the above embodiments. For example, the computer readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It is noted that the computer-readable storage medium referred to in the embodiments of the present application may be a non-volatile storage medium, in other words, a non-transitory storage medium.
It should be understood that all or part of the steps for implementing the above embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.
That is, in some embodiments, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of the image generation method described above.
It is to be understood that reference herein to "at least one" means one or more and "a plurality" means two or more. In the description of the embodiments of the present application, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in order to facilitate clear description of technical solutions of the embodiments of the present application, in the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.
The above-mentioned embodiments are not intended to limit the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (13)

1. An image generation method, characterized in that the method comprises:
determining coordinates of a plurality of visual points of a three-dimensional virtual scene, wherein the plurality of visual points are corresponding points of a plurality of pixel points on an image plane of a virtual camera in the three-dimensional virtual scene;
emitting virtual photons to the three-dimensional virtual scene by taking each pixel point in a target projection image as a starting point, wherein the target projection image is a projection image corresponding to a scene needing simulation at present;
determining a brightness of each of the plurality of viewpoints based on light energy of virtual photons around each of the plurality of viewpoints;
and generating an image corresponding to the three-dimensional virtual scene based on the brightness of each of the plurality of visual points.
2. The method of claim 1, wherein determining coordinates of a plurality of viewpoints of the three-dimensional virtual scene comprises:
the coordinates of the plurality of viewpoints are determined in a reverse ray tracing manner.
3. The method of claim 2, wherein said determining coordinates of said plurality of viewpoints in a reverse ray tracing manner comprises:
determining a plurality of rays, wherein the rays are rays which are led out from each pixel point on the image plane of the virtual camera and pass through the optical center of the virtual camera;
and taking a plurality of points of intersection of the plurality of rays and the surface of the object in the three-dimensional virtual scene as the plurality of visual points, and determining the coordinates of the plurality of visual points.
4. The method of claim 1, wherein determining the brightness of each of the plurality of viewpoints based on the light energy of the virtual photons surrounding each of the plurality of viewpoints comprises:
determining a first light energy for each of the plurality of viewpoints based on light energy of virtual photons around each of the plurality of viewpoints;
determining brightness of a corresponding visual point in the plurality of visual points based on the first light energy of each visual point in the plurality of visual points, wherein the brightness is carried by light rays reflected by the corresponding visual point when the light rays reach an optical center of the virtual camera.
5. The method of claim 4, wherein determining the first light energy for each of the plurality of viewpoints based on the light energy of the virtual photons around each of the plurality of viewpoints comprises:
determining light energy for each of the plurality of viewpoints at a plurality of time instants based on light energy emitted to virtual photons around each of the plurality of viewpoints at the plurality of time instants, the plurality of time instants being a plurality of different time instants at which virtual photons are emitted to the three-dimensional virtual scene;
and determining the average value of the light energy of each of the plurality of visual points at the plurality of time instants as the first light energy of the corresponding visual point in the plurality of visual points.
6. The method of claim 5, wherein determining the light energy for each of the plurality of viewpoints at the plurality of time instants based on the light energy emitted at the plurality of time instants to virtual photons surrounding each of the plurality of viewpoints comprises:
selecting one time from the plurality of times as a target time, and determining the light energy of each of the plurality of visual points at the target time according to the following operations until the light energy of each of the plurality of visual points at each time is determined:
determining coordinates of the virtual photons emitted at the target moment and residing on the surface of an object in the three-dimensional virtual scene to obtain coordinates of a plurality of virtual photons;
determining an optical energy of each virtual photon of the plurality of virtual photons;
determining the light energy of each of the plurality of visual points at the target time based on the coordinates of the plurality of visual points, the coordinates of the plurality of virtual photons, and the light energy of the virtual photons of the plurality of virtual photons that are located around each of the plurality of visual points.
7. The method of claim 6, wherein said determining the optical energy of each of said plurality of virtual photons comprises:
determining a pixel value of a pixel point corresponding to each virtual photon in the plurality of virtual photons in the target projection image according to a forward ray tracing mode;
and determining the light energy of each virtual photon in the plurality of virtual photons based on the pixel value of the pixel point corresponding to each virtual photon in the plurality of virtual photons in the target projection image and the light energy of the virtual light source used for emitting the plurality of virtual photons.
8. The method of claim 6, wherein determining the light energy for each of the plurality of viewpoints at the target time based on the coordinates of the plurality of viewpoints, the coordinates of the plurality of virtual photons, and the light energy of the virtual photons of the plurality of virtual photons located around each of the plurality of viewpoints, comprises:
selecting one visual point from the plurality of visual points, and determining the light energy of the selected visual point at the target moment according to the following operations until the light energy of each visual point at the target moment is determined:
determining virtual photons located in a specified range based on the coordinates of the selected visual point and the coordinates of the virtual photons, wherein the specified range is a sphere range which takes the selected visual point as a sphere center and takes a specified numerical value as a radius;
and determining the sum of the light energy of the virtual photons in the specified range as the light energy of the selected visual point at the target moment.
9. The method of any one of claims 1-8, wherein the corresponding image of the three-dimensional virtual scene comprises a binocular image, and the coordinates of the plurality of visual points are the coordinates of the plurality of visual points in a world coordinate system of the three-dimensional virtual scene;
after generating the image corresponding to the three-dimensional virtual scene based on the brightness of each of the plurality of visual points, the method further includes:
converting coordinates of each of the plurality of viewpoints in the world coordinate system into coordinates in a camera coordinate system of the virtual camera;
and correspondingly storing the coordinates of the binocular image and the plurality of visual points in the camera coordinate system.
10. The method of claim 9, wherein the method further comprises:
taking a plurality of stored binocular images as input of a neural network model to be trained, taking vertical coordinates of a plurality of visual points corresponding to each binocular image in the plurality of binocular images in a camera coordinate system of the virtual camera as output of the neural network model to be trained, and training the neural network model to be trained;
the binocular images are generated based on the three-dimensional virtual scene.
11. The method of any of claims 1-8, wherein the corresponding images of the three-dimensional virtual scene comprise binocular images;
after generating the image corresponding to the three-dimensional virtual scene based on the brightness of each of the plurality of visual points, the method further includes:
generating a disparity map based on the binocular images;
and correspondingly storing the binocular image and the disparity map.
12. The method of claim 11, wherein the method further comprises:
converting each stored disparity map in the plurality of disparity maps into a depth map to obtain a plurality of depth maps;
taking a plurality of stored binocular images as input of a neural network model to be trained, taking a depth map obtained by converting a disparity map corresponding to each binocular image in the binocular images as output of the neural network model to be trained, and training the neural network model to be trained;
the binocular images are generated based on the three-dimensional virtual scene.
13. A computer device, characterized in that the computer device comprises a memory for storing a computer program and a processor for executing the computer program stored in the memory to implement the steps of the method according to any of the preceding claims 1-12.
CN202110996416.3A 2021-08-27 2021-08-27 Image generation method, device, equipment and storage medium Pending CN113724309A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110996416.3A CN113724309A (en) 2021-08-27 2021-08-27 Image generation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110996416.3A CN113724309A (en) 2021-08-27 2021-08-27 Image generation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113724309A true CN113724309A (en) 2021-11-30

Family

ID=78678592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110996416.3A Pending CN113724309A (en) 2021-08-27 2021-08-27 Image generation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113724309A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116723303A (en) * 2023-08-11 2023-09-08 腾讯科技(深圳)有限公司 Picture projection method, device, equipment and storage medium
WO2024021557A1 (en) * 2022-07-25 2024-02-01 网易(杭州)网络有限公司 Reflected illumination determination method and apparatus, global illumination determination method and apparatus, medium, and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107274474A (en) * 2017-07-03 2017-10-20 长春理工大学 Indirect light during three-dimensional scenic stereoscopic picture plane is drawn shines multiplexing method
CN107909647A (en) * 2017-11-22 2018-04-13 长春理工大学 The virtual 3D scenes light field projected image method for drafting of the sense of reality based on spatial reuse
US20180192032A1 (en) * 2016-04-08 2018-07-05 Maxx Media Group, LLC System, Method and Software for Producing Three-Dimensional Images that Appear to Project Forward of or Vertically Above a Display Medium Using a Virtual 3D Model Made from the Simultaneous Localization and Depth-Mapping of the Physical Features of Real Objects
WO2018187743A1 (en) * 2017-04-06 2018-10-11 Maxx Media Group, LLC Producing three-dimensional images using a virtual 3d model
CN108876840A (en) * 2018-07-25 2018-11-23 江阴嘉恒软件技术有限公司 A method of vertical or forward projection 3-D image is generated using virtual 3d model
CN111462208A (en) * 2020-04-05 2020-07-28 北京工业大学 Non-supervision depth prediction method based on binocular parallax and epipolar line constraint
CN111563878A (en) * 2020-03-27 2020-08-21 中国科学院西安光学精密机械研究所 Space target positioning method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180192032A1 (en) * 2016-04-08 2018-07-05 Maxx Media Group, LLC System, Method and Software for Producing Three-Dimensional Images that Appear to Project Forward of or Vertically Above a Display Medium Using a Virtual 3D Model Made from the Simultaneous Localization and Depth-Mapping of the Physical Features of Real Objects
WO2018187743A1 (en) * 2017-04-06 2018-10-11 Maxx Media Group, LLC Producing three-dimensional images using a virtual 3d model
CN107274474A (en) * 2017-07-03 2017-10-20 长春理工大学 Indirect light during three-dimensional scenic stereoscopic picture plane is drawn shines multiplexing method
CN107909647A (en) * 2017-11-22 2018-04-13 长春理工大学 The virtual 3D scenes light field projected image method for drafting of the sense of reality based on spatial reuse
CN108876840A (en) * 2018-07-25 2018-11-23 江阴嘉恒软件技术有限公司 A method of vertical or forward projection 3-D image is generated using virtual 3d model
CN111563878A (en) * 2020-03-27 2020-08-21 中国科学院西安光学精密机械研究所 Space target positioning method
CN111462208A (en) * 2020-04-05 2020-07-28 北京工业大学 Non-supervision depth prediction method based on binocular parallax and epipolar line constraint

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024021557A1 (en) * 2022-07-25 2024-02-01 网易(杭州)网络有限公司 Reflected illumination determination method and apparatus, global illumination determination method and apparatus, medium, and device
CN116723303A (en) * 2023-08-11 2023-09-08 腾讯科技(深圳)有限公司 Picture projection method, device, equipment and storage medium
CN116723303B (en) * 2023-08-11 2023-12-05 腾讯科技(深圳)有限公司 Picture projection method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US11574437B2 (en) Shadow rendering method and apparatus, computer device, and storage medium
US11962930B2 (en) Method and apparatus for controlling a plurality of virtual characters, device, and storage medium
CN110827391B (en) Image rendering method, device and equipment and storage medium
CN109840949A (en) Augmented reality image processing method and device based on optical alignment
CN112870707A (en) Virtual object display method in virtual scene, computer device and storage medium
CN113724309A (en) Image generation method, device, equipment and storage medium
CN111311757B (en) Scene synthesis method and device, storage medium and mobile terminal
CN103959340A (en) Graphics rendering technique for autostereoscopic three dimensional display
CN108701372B (en) Image processing method and device
CN108668108A (en) A kind of method, apparatus and electronic equipment of video monitoring
KR102633468B1 (en) Method and device for displaying hotspot maps, and computer devices and readable storage media
CN112150560B (en) Method, device and computer storage medium for determining vanishing point
CN111680758B (en) Image training sample generation method and device
KR20210052570A (en) Determination of separable distortion mismatch
CN112308103B (en) Method and device for generating training samples
CN113487662A (en) Picture display method and device, electronic equipment and storage medium
CN109949396A (en) A kind of rendering method, device, equipment and medium
CN113240784B (en) Image processing method, device, terminal and storage medium
CN113592997B (en) Object drawing method, device, equipment and storage medium based on virtual scene
CN114093020A (en) Motion capture method, motion capture device, electronic device and storage medium
CN112950535A (en) Video processing method and device, electronic equipment and storage medium
CN113689484B (en) Method and device for determining depth information, terminal and storage medium
CN113971714A (en) Target object image rendering method and device, electronic equipment and storage medium
CN117671164A (en) High-precision map base map construction method
CN116863063A (en) Error determination method and device for three-dimensional model, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination