CN113870165A - Image synthesis method and device, electronic equipment and storage medium - Google Patents

Image synthesis method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113870165A
CN113870165A CN202111194061.2A CN202111194061A CN113870165A CN 113870165 A CN113870165 A CN 113870165A CN 202111194061 A CN202111194061 A CN 202111194061A CN 113870165 A CN113870165 A CN 113870165A
Authority
CN
China
Prior art keywords
image
projected
camera
visual angle
pixel point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111194061.2A
Other languages
Chinese (zh)
Inventor
刘思阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202111194061.2A priority Critical patent/CN113870165A/en
Publication of CN113870165A publication Critical patent/CN113870165A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Studio Devices (AREA)

Abstract

The embodiment of the application provides an image synthesis method and device, electronic equipment and a storage medium. The scheme is as follows: receiving a visual angle switching instruction comprising a first visual angle; acquiring a to-be-projected image, wherein the to-be-projected image is an image acquired by at least one first camera in a first preset range of a first visual angle; based on camera parameters of a first camera, projecting an image to be projected into an image coordinate system corresponding to a first visual angle to obtain a projected image comprising a cavity area; filling the holes in the projected image according to image information of pixel points in a second preset range of the hole area in the projected image to obtain a composite image of a first visual angle; a composite image of a first perspective is shown. By the technical scheme provided by the embodiment of the application, the deployment cost required in the visual angle switching process is reduced.

Description

Image synthesis method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to an image synthesis method and apparatus, an electronic device, and a storage medium.
Background
At present, in the process of large-scale network/television live broadcast or movie shooting, in order to present image pictures corresponding to different viewing angles to viewers, a large number of cameras are required to be used for shooting in the shooting process.
For example, in a certain large evening, 100 high-definition cameras are arranged in a shooting scene to shoot a stage for 360 degrees, and in the live broadcasting process of the evening, the switching of the viewing angles is compiled, so that the audience can watch the video pictures shot by the camera corresponding to the switched viewing angle.
Since the completion of the view switching needs to depend on a large number of cameras, the view switching process needs to consume a large amount of deployment cost.
Disclosure of Invention
An object of the embodiments of the present application is to provide an image synthesis method, an image synthesis apparatus, an electronic device, and a storage medium, so as to reduce deployment cost required in a view switching process. The specific technical scheme is as follows:
in a first aspect of this embodiment, there is provided an image synthesis method, including:
receiving a visual angle switching instruction comprising a first visual angle;
acquiring a to-be-projected image, wherein the to-be-projected image is an image acquired by at least one first camera within a first preset range of the first visual angle;
based on the camera parameters of the first camera, projecting the image to be projected into an image coordinate system corresponding to the first visual angle to obtain a projected image comprising a cavity area;
filling holes in the projected image according to image information of pixel points in a second preset range of the hole area in the projected image to obtain a composite image of the first visual angle;
a composite image of the first perspective is shown.
Optionally, the step of filling up the hole of the projection image according to the image information of the pixel point in the second preset range of the hole area in the projection image to obtain the composite image of the first view angle includes:
expanding the cavity area in the projected image outwards by a second preset range to obtain a first expanded area;
acquiring an image to be repaired of the projected image, wherein the image to be repaired comprises the cavity area and the first expansion area;
filling holes in the hole area by using the image information of each pixel point of the first expansion area in the image to be repaired to obtain a repaired image;
and carrying out image fusion on the projected image and the repaired image to obtain a composite image of the first visual angle.
Optionally, the step of expanding the cavity region in the projection image outward by a second preset range to obtain a first expanded region includes:
determining each edge pixel point of the hollow area in the projected image;
and expanding each edge pixel point of the hollow area in the projected image outwards by a second preset range to obtain a first expanded area.
Optionally, the step of acquiring an image to be restored of the projection image includes:
acquiring an image to be restored of the projected image by using the following formula:
Iin=Iprj*Mexp
wherein, IinFor the image to be restored, IprjFor the projection image, MexpAnd the mask image is a first expansion area corresponding to the hollow area in the projection image.
Optionally, the step of filling up the hole in the hole area by using the image information of each pixel point of the first extension area in the image to be restored to obtain the restored image includes:
inputting the image to be restored into a pre-trained image restoration model to obtain a restored image; the image restoration model is obtained by utilizing a preset training set for training; the preset training set comprises sample images acquired by a plurality of second cameras at different viewing angles and camera parameters of each second camera.
Optionally, the image restoration model is obtained by training through the following steps:
acquiring the preset training set;
based on the camera parameters of each second camera, projecting the sample images except the second visual angle in the preset training set to an image coordinate system corresponding to the second visual angle to obtain sample projection images including the cavity area;
expanding the cavity area in the sample projection image outwards by a second preset range to obtain a second expanded area;
acquiring a sample image to be repaired of the sample projection image, wherein the sample image to be repaired is an image comprising the cavity area and the second expansion area;
inputting the sample image to be restored into a preset depth network model to obtain a sample restored image;
performing image fusion on the sample projection image and the sample restoration image to obtain a sample composite image of the second visual angle;
calculating a loss value of the preset depth network model according to the sample synthetic image and a sample image of a second visual angle in the preset training set;
when the loss value is larger than a preset loss value threshold value, adjusting parameters of the preset depth network model, and returning to execute the step of inputting the sample image to be repaired into the preset depth network model to obtain a sample repaired image;
and when the loss value is not greater than the preset loss value threshold value, determining the current preset depth network model as a trained image restoration model.
Optionally, the step of calculating a loss value of the preset depth network model according to the sample synthetic image and the sample image of the second view angle in the preset training set includes:
calculating a loss value of the preset depth network model by using the following formula:
loss=||Iprd′*Min′-Itrue′*Min′||
wherein loss is the loss value, | | | | is norm operation, IprdIs a sample of the second viewing angleThe composite image, Min' mask images corresponding to hollow regions and extended regions in the sample image to be repaired, Itrue' is a sample image of the second view.
Optionally, the step of performing image fusion on the projection image and the repaired image to obtain a composite image of the first view angle includes:
and carrying out image fusion on the projected image and the repaired image by using the following formula to obtain a composite image of the first visual angle:
Iprd=Iout*Mhole+[(1-α)Iout*Mexp+α*Iin*Mexp]+(Iin-Iin*Min)
wherein, IprdIs a composite image of the first view angle, IoutFor the restoration image, MholeA hole mask corresponding to the hole area in the projection image, wherein alpha is a preset weight, MexpA mask image of a first extended region corresponding to the hollow region in the projection image, IinFor the image to be restored, MinAnd mask images corresponding to the hollow area and the expansion area in the image to be repaired are obtained.
Optionally, the camera parameters include external parameters of the camera and internal parameters of the camera;
the step of projecting the image to be projected into an image coordinate system corresponding to the first view angle based on the camera parameter of the first camera to obtain a projected image including a cavity region includes:
aiming at each pixel point in the image to be projected, calculating a second coordinate value of the pixel point in a camera coordinate system according to a first coordinate value of the pixel point and camera internal parameters of the first camera;
aiming at each pixel point in the image to be projected, calculating a third coordinate value of the pixel point in a world coordinate system according to a second coordinate value of the pixel point and camera external parameters of the first camera;
aiming at each pixel point in the image to be projected, calculating a fourth coordinate value of the pixel point in a camera coordinate system corresponding to the first visual angle according to a third coordinate value of the pixel point and camera external parameters of the virtual camera at the first visual angle;
aiming at each pixel point in the image to be projected, calculating a fifth coordinate value of the pixel point in an image coordinate system corresponding to the first visual angle according to a fourth coordinate value of the pixel point and camera internal parameters of the virtual camera at the first visual angle;
and projecting the image to be projected according to a fifth coordinate value of each pixel point in the image coordinate system corresponding to the first visual angle to obtain a projected image comprising a cavity area.
In a second aspect of the present application, there is also provided an image synthesizing apparatus, comprising:
the receiving module is used for receiving a visual angle switching instruction comprising a first visual angle;
the first acquisition module is used for acquiring an image to be projected, wherein the image to be projected is an image acquired by at least one first camera within a first preset range of the first visual angle;
the first projection module is used for projecting the image to be projected into an image coordinate system corresponding to the first visual angle based on the camera parameters of the first camera to obtain a projected image comprising a cavity area;
the synthesis module is used for filling the holes in the projected image according to the image information of pixel points in a second preset range of the hole area in the projected image to obtain a synthesized image of the first visual angle;
and the display module is used for displaying the composite image of the first visual angle.
In a third aspect of the present application, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing any of the above-described image synthesis method steps when executing a program stored in the memory.
In a fourth aspect of the present application, there is also provided a computer-readable storage medium having a computer program stored therein, where the computer program is executed by a processor to implement any of the image synthesis method steps described above.
In a fifth aspect of the present application, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the image synthesis methods described above.
According to the technical scheme, after the visual angle switching instruction is received, the obtained to-be-projected image is projected to the image coordinate system corresponding to the first visual angle to obtain the projected image comprising the void region, so that the void region is filled with the image information of each pixel point in the second preset range of the void region, a composite image of the first visual angle is obtained, and the composite image is displayed.
Because the synthesized image of the first visual angle is synthesized based on the image to be projected, which is collected by at least one camera in the first preset range of the first visual angle, the image corresponding to the first visual angle can be synthesized by using the image collected by the camera in the adjacent visual angle of the first visual angle in the visual angle switching process, so that the cameras are prevented from being deployed at the position corresponding to the first visual angle, the number of the cameras required to be deployed is reduced, and the deployment cost required in the visual angle switching process is greatly reduced.
Moreover, the visual angle switching instruction comprising the first visual angle is received, so that the synthetic image corresponding to the first visual angle is displayed, the deployment cost required in the visual angle switching process is reduced, more visual angles capable of being switched are provided, the requirement of a user on visual angle switching is met, and the display effect is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a schematic diagram of a live television scene;
fig. 2 is a first flowchart of an image synthesis method according to an embodiment of the present application;
FIG. 3-a is an image of an individual A acquired by a camera at view A;
FIG. 3-B is an image of the square A acquired by the camera at view B;
FIG. 4-a is a schematic view of a projected image provided by an embodiment of the present application;
FIG. 4-b is a mask image of a void region of the projected image shown in FIG. 4-a;
fig. 5 is a schematic flowchart of an image projection method according to an embodiment of the present application;
fig. 6 is a schematic flowchart of a second image synthesis method according to an embodiment of the present application;
FIG. 7-a is a projected image provided by an embodiment of the present application;
FIG. 7-b is a mask image corresponding to the projected image shown in FIG. 7-a;
FIG. 7-c is a to-be-repaired image corresponding to the projection image shown in FIG. 7-a;
FIG. 7-d is a mask image corresponding to the image to be repaired shown in FIG. 7-c;
FIG. 8 is a schematic flowchart of an image model training method according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an image synthesis apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
For the sake of understanding, the process of switching the viewing angle during a certain tv live broadcast in the related art is described with reference to fig. 1. Fig. 1 is a schematic diagram of a live tv scene.
In stage deployment, 7 cameras are arranged in front of the stage, namely camera 0 shown in fig. 1, camera L1-camera L3 on the left side of camera 0, and camera R1-camera R3 on the right side of camera 0. Since the distance between each of the 7 cameras shown in fig. 1 and the stage and the shooting angle are different, the viewing angle corresponding to each camera shown in fig. 1 is different.
The viewing angle corresponding to the camera is represented as a shooting viewing angle corresponding to the camera, that is, a viewing angle corresponding to the stage picture viewed by the audience through the camera. The view angle corresponding to the camera is affected by the shooting angle, shooting distance, shooting height and other factors corresponding to the camera. Here, the viewing angle corresponding to each camera in fig. 1 is not particularly limited.
At a certain moment of live television broadcasting, the current live broadcast picture is a video picture acquired by the camera 0. The director now wants to show the live view from the view angle corresponding to the camera L3 to the viewer, and can switch the video view captured by the camera 0 of the currently played live view to the video view captured by the camera L3.
Therefore, in the related art, in order to enable a user to freely switch a video picture of any view angle, corresponding cameras need to be arranged at different view angles, which results in a large amount of deployment cost for the view angle switching process.
In order to solve the problems in the related art, embodiments of the present application provide an image synthesis method. The method may be applied to any electronic device. As shown in fig. 2, fig. 2 is a first flowchart of an image synthesis method according to an embodiment of the present application. The method comprises the following steps.
In step S201, a viewing angle switching instruction including a first viewing angle is received.
Step S202, a to-be-projected image is obtained, wherein the to-be-projected image is an image collected by at least one first camera in a first preset range of a first visual angle.
Step S203, based on the camera parameters of the first camera, projecting the image to be projected to an image coordinate system corresponding to the first visual angle to obtain a projected image including the cavity area.
And step S204, filling the holes in the projected image according to the image information of the pixel points in the second preset range of the hole area in the projected image to obtain a composite image of the first visual angle.
Step S205, a composite image of the first view angle is displayed.
In this embodiment, the electronic device may be a server that pushes a video image, or may be a user device that is used by a user to watch the video image. Here, the electronic device is not particularly limited.
By the method shown in fig. 1, after the view switching instruction is received, the obtained projected image to be projected is projected into the image coordinate system corresponding to the first view to obtain the projected image including the void region, so that the void region is filled with the image information of each pixel point in the second preset range of the void region to obtain the composite image of the first view, and the composite image is displayed.
Because the synthesized image of the first visual angle is synthesized based on the image to be projected, which is collected by at least one camera in the first preset range of the first visual angle, the image corresponding to the first visual angle can be synthesized by using the image collected by the camera in the adjacent visual angle of the first visual angle in the visual angle switching process, so that the cameras are prevented from being deployed at the position corresponding to the first visual angle, the number of the cameras required to be deployed is reduced, and the deployment cost required in the visual angle switching process is greatly reduced.
Moreover, the visual angle switching instruction comprising the first visual angle is received, so that the synthetic image corresponding to the first visual angle is displayed, the deployment cost required in the visual angle switching process is reduced, more visual angles capable of being switched are provided, the requirement of a user on visual angle switching is met, and the display effect is improved.
The following examples are given to illustrate the examples of the present application.
In step S201, a viewing angle switching instruction including a first viewing angle is received.
In this step, when the user needs to perform view angle switching on the image displayed at the current view angle, the view angle switching operation of the electronic device on the image displayed at the current view angle may be triggered. At this time, the electronic device receives a view switching instruction, where the view switching instruction includes a switched view, that is, a first view.
In this embodiment of the application, the first angle of view may be a virtual angle of view, that is, the first angle of view is an angle of view in which a camera is not deployed in a shooting scene. Here, the first viewing angle is not particularly limited.
For the step S202, an image to be projected is obtained, where the image to be projected is an image collected by at least one first camera within a first preset range of a first viewing angle.
In this step, after receiving the view switching instruction, the electronic device may acquire, according to a first view included in the view switching instruction, images acquired by all cameras within a first preset range of the first view as images to be projected.
In an alternative embodiment, the first predetermined range may be expressed in the form of a distance.
For ease of understanding, the description will be given by taking the above-described fig. 1 as an example. Now, assume that the first viewing angle is a viewing angle corresponding to the camera 0 in fig. 1 (in this case, the camera 0 is a virtual camera, that is, the camera is not deployed at the position corresponding to the camera), and the first preset range is 2 meters.
After receiving the view switching instruction, the electronic device determines all cameras in a circular area with the center point of the position of the camera 0 as the center of a circle and the radius of 2 meters, so as to acquire images acquired by all the cameras in the circular area as images to be projected.
In this embodiment of the application, the first preset range may be set according to a user requirement. The first preset range of the first visual angle at least comprises one camera. Since the number of cameras included in the first preset range of the first view angle is affected by the size of the first preset range, the number of images to be projected acquired by the electronic device will be greater as more cameras are included in the first preset range of the first view angle. Here, the number of the acquired images to be projected is not particularly limited. For convenience of description, in the embodiments of the present application, only one image to be projected is taken as an example for illustration, and is not used for any limiting purpose.
In step S203, the image to be projected is projected into the image coordinate system corresponding to the first viewing angle based on the camera parameters of the first camera, so as to obtain a projected image including the void region.
In an optional embodiment, the electronic device may project each pixel point in the image to be projected into an image coordinate system corresponding to the first viewing angle by using a preset projection technology, such as a three-dimensional (3-Dimension, 3D) projection technology, according to a camera parameter of the first camera, so as to obtain the projected image. Here, the preset projection technique is not particularly limited.
In the embodiment of the present application, since the viewing angle corresponding to the image to be projected is different from the viewing angle corresponding to the projected image (i.e., the first viewing angle), when the image to be projected is projected into the image coordinate system corresponding to the first viewing angle, due to the existence of the blocking phenomenon, a part of pixels in the image to be projected is blocked, or a part of blocked pixels cannot be projected into the image coordinate system corresponding to the first viewing angle, at this time, a void region will appear in the projected image.
For the sake of understanding, the above projection process is described by taking a pixel point in the image to be projected, such as pixel point a, as an example. Now, assume that the pixel point a is projected to the image coordinate system corresponding to the first view angle by using the predetermined projection technique is the pixel point B. At this time, the image information of the pixel point a is filled to the pixel point B. The determination method of the pixel point B corresponding to the pixel point a can be referred to the following description, and is not specifically described herein.
However, due to the existence of the shielding phenomenon, a plurality of pixel points in the image to be projected may be projected to the same pixel point in the image coordinate system corresponding to the first viewing angle, or the shielded portion of the image to be projected may not be projected to the image coordinate system corresponding to the first viewing angle, which may cause the pixel points without the image information to appear in the projected image to be projected, and the area where one or more pixel points without the image information are located is recorded as a cavity area. That is, the void region is a region formed by pixel points where no image information exists in the projected image.
The image information includes, but is not limited to, a Red Green Blue (RGB) value and a depth value corresponding to the pixel point.
For ease of understanding, reference is made to fig. 3-a and 3-b. Fig. 3-a shows an image obtained by capturing an image of the square a from the camera at the viewing angle a. Fig. 3-B are images of the square a acquired by the camera at view B.
Now, assume that the electronic device projects the image corresponding to the view angle a into the image coordinate system corresponding to the view angle B.
Only one face of cube a, face 301, is included in image a shown in fig. 3-a, i.e., face 302 and face 303 in image B shown in fig. 3-B are occluded by face 301. Therefore, after image a shown in fig. 3-a is projected to the image coordinate system corresponding to viewing angle B, only one surface of cube a, i.e., surface 301, is included in the projected image. That is, in the projected image, no image information exists at the pixel points in the area where the surface 302 and the surface 303 shown in fig. 3-b are located, that is, the area where the surface 302 and the surface 303 shown in fig. 3-b are located is a hollow area.
The number of the hollow regions included in the projection image is at least one. When a plurality of hollow regions are included in the projection image, the shapes of the hollow regions may be the same or different. The number of pixel points included in each hollow area in the projection image may be one or more. In the embodiment of the present application, the number of the void regions included in the projection image and the size of each void region are different under the influence of the viewing angle difference between the viewing angle of the projection image to be projected and the first viewing angle, the number of the projection images to be projected, and other factors. Here, the number of the hole regions included in the projection image, the size of each hole region, and the number of the pixel points included in each hole region are not particularly limited. For convenience of description, in the embodiments of the present application, only one hollow area is taken as an example for illustration, and is not meant to be limiting.
In an optional embodiment, when the number of the images to be projected is multiple, when the electronic device projects the images to be projected to the image coordinate system corresponding to the first viewing angle, all the images to be projected may be projected to the same image.
For the above-mentioned method of acquiring the projection image, reference is made to the following description, which is not specifically described herein.
In the embodiment of the present application, the camera parameters of the first camera include camera internal parameters and camera external parameters.
The camera internal reference comprises the focal length of the camera, the offset of an original point X axis and the offset of an original point Y axis. The camera parameters of the camera can be expressed as an internal parameter matrix intrinsicscamNamely:
Figure BDA0003302351270000111
wherein cam denotes a camera, fcamFocal length, cx, of camera camcamIs the origin X-axis offset, cy, of camera camcamThe origin Y-axis offset of camera cam.
The camera external parameters include positions and directions in a world coordinate system, and can be expressed as a camera rotation matrix and a camera displacement matrix.
The camera rotation matrix RcamCan be expressed as:
Figure BDA0003302351270000112
the above-mentioned camera displacement matrix tcamCan be expressed as:
Figure BDA0003302351270000113
the camera external parameters of the camera can be expressed as external parameter matrixes externsicscamI.e. by
Figure BDA0003302351270000114
Where cam denotes a camera, R is a direction in the world coordinate system, t denotes a position in the world coordinate system, X denotes an X-axis direction in the world coordinate system, Y denotes a Y-axis direction in the world coordinate system, and Z denotes a Z-axis direction in the world coordinate system.
The camera internal parameters and the camera external parameters of the camera can be obtained by corresponding preset calibration methods. Taking the camera internal reference as an example, the camera internal reference of the camera can be calibrated by using a zhang's calibration method. Here, the method of acquiring the camera internal reference and the camera external reference of the camera is not particularly limited.
In an alternative embodiment, when a plurality of cameras are included in the first preset range of the first viewing angle, in order to reduce the difference between the images acquired by each camera in terms of resolution, color, and the like, and ensure the image quality of the images with the switched viewing angles, the camera parameters of each camera may be the same or similar.
In the embodiment of the present application, when the plurality of cameras are included in the first preset range of the first angle of view, the camera parameters of each camera are different because the angle of view corresponding to each camera is different.
In step S204, according to the image information of the pixel point in the second preset range of the void region in the projection image, the void filling is performed on the projection image to obtain a composite image of the first view angle.
In this embodiment of the application, since the image information of each pixel point in the image has a certain correlation with the image information of the pixel points around the pixel point, for the void area in the projected image, the electronic device may use the image information of the pixel points around the void area to fill up the void area, that is, determine the image information of each pixel point in the void area according to the image information of the pixel points in the second preset range of the void area in the projected image, so as to obtain the synthesized image at the first viewing angle.
The second preset range may be set according to a user experience value or a user requirement, and the second preset range is not particularly limited.
For ease of understanding, the description will be made with reference to fig. 4-a and 4-b as an example. Fig. 4-a is a schematic diagram of a projected image provided by an embodiment of the present application. FIG. 4-b is a mask image of the void region of the projected image shown in FIG. 4-a.
In the mask image shown in fig. 4-b, the white regions are void regions, such as region 401 and region 402, and the black regions are non-void regions (i.e., active pixel regions). From the mask image shown in fig. 4-b, it can be determined that the regions 401 and 402 in fig. 4-a are void regions. The non-hole area (or the effective pixel area) is an area formed by pixel points with image information.
When the electronic device fills the holes in the region 401 and the region 402 in fig. 4-a, the image information of the region 401 and the region 402 may be determined according to the image information of the right region of the region 401 and the region 402, and the holes may be filled in the region 401 and the region 402 according to the determined image information.
The image information of the pixel point includes, but is not limited to, an RGB value and a depth value of the pixel point. Here, the image information is not particularly limited.
The method for filling the holes can be referred to the following description, and is not specifically described here.
In step S205, a composite image of the first view angle is shown.
In this step, after the electronic device synthesizes the synthesized image of the first viewing angle, that is, after the virtual synthesized image of the first viewing angle is obtained, the electronic device may display the synthesized image of the first viewing angle. For example, the composite image is displayed on a display interface.
In the embodiment of the present application, only one frame of video image in the video data is taken as an example, and a method for synthesizing an image switched to a virtual view (i.e., the first view) is described. Each frame of video image in the switched virtual view can be synthesized by referring to the method for synthesizing the image in the first view, which is not specifically described herein.
In an alternative embodiment, as shown in fig. 5, fig. 5 is a schematic flowchart of an image projection method provided in the embodiment of the present application. In step S203, namely, based on the camera parameters of the first camera, the image to be projected is projected into the image coordinate system corresponding to the first viewing angle, so as to obtain the projected image including the cavity region, which may be specifically subdivided into the following steps, step S501 to step S505.
Step S501, aiming at each pixel point in the image to be projected, calculating a second coordinate value of the pixel point in a camera coordinate system according to the first coordinate value of the pixel point and camera internal parameters of the first camera.
For easy understanding, a pixel point P in the image to be projected is taken as an example for description. The coordinate value of the pixel point P in the image to be projected is (u)src,vsrc) Where src denotes the camera that acquires the image to be projected, usrcIs the abscissa value, v, of a pixel point P in the image to be projectedsrcThe vertical coordinate value of the pixel point P in the image to be projected is shown.
The coordinate value of the pixel point P is (u)src,vsrc) For the coordinate value of the pixel point P in the image coordinate system, the coordinate value is converted into a homogeneous coordinate, i.e. the coordinate value
Figure BDA0003302351270000131
In an optional embodiment, for each pixel point in the image to be projected, the electronic device may calculate a second coordinate value of the pixel point in the camera coordinate system by using the following formula:
Figure BDA0003302351270000132
wherein the content of the first and second substances,
Figure BDA0003302351270000133
is the second coordinate value, x, of the pixel point P in the camera coordinate systemsrcIs the coordinate value, y, of the pixel point P in the X-axis direction of the camera coordinate systemsrcIs the coordinate value, z, of the pixel point P in the Y-axis direction of the camera coordinate systemsrcIs the coordinate value of the pixel point P in the Z-axis direction of the camera coordinate system, intrinsicssrcAnd the reference matrix is an internal reference matrix of the src of the camera, I represents inversion operation, and d is a depth value corresponding to the pixel point P in the image to be projected.
Step S502, aiming at each pixel point in the image to be projected, calculating a third coordinate value of the pixel point in a world coordinate system according to the second coordinate value of the pixel point and camera external parameters of the first camera.
For the sake of understanding, the above-mentioned pixel point P is still used as an example for description.
In an optional embodiment, for each pixel point in the image to be projected, the electronic device may calculate a third coordinate value of the pixel point in the world coordinate system by using the following formula:
Figure BDA0003302351270000141
wherein the content of the first and second substances,
Figure BDA0003302351270000142
is the third coordinate value of the pixel point P in the world coordinate system, X is the coordinate value of the pixel point P in the X-axis direction of the world coordinate system, Y is the coordinate value of the pixel point P in the Y-axis direction of the world coordinate system, Z is the coordinate value of the pixel point P in the Z-axis direction of the world coordinate system, and extrinsicssrcThe external reference matrix of camera src, I, denotes the inversion operation.
Step S503, aiming at each pixel point in the image to be projected, according to the third coordinate value of the pixel point and the camera external parameter of the virtual camera at the first visual angle, calculating the fourth coordinate value of the pixel point in the camera coordinate system corresponding to the first visual angle.
For the sake of understanding, the above-mentioned pixel point P is still used as an example for description.
In an optional embodiment, for each pixel point in the image to be projected, the electronic device may calculate a fourth coordinate value of the pixel point in the camera coordinate system corresponding to the first view angle by using the following formula:
Figure BDA0003302351270000151
wherein tar denotes a virtual camera corresponding to a first view angle,
Figure BDA0003302351270000152
is the fourth coordinate value, x, of the coordinate system of the camera corresponding to the first view angle projected by the pixel point PtarThe coordinate value y in the X-axis direction in the camera coordinate system corresponding to the first visual angle projected by the pixel point PtarIs the coordinate value, z, in the Y-axis direction in the camera coordinate system corresponding to the first view angle projected by the pixel point PtarProjecting the pixel point P to the coordinate value in the Z-axis direction in the camera coordinate system corresponding to the first visual angle, extrinsicstarIs the external reference matrix of the virtual camera tar.
Step S504, for each pixel point in the image to be projected, according to the fourth coordinate value of the pixel point and the camera internal reference of the virtual camera at the first view angle, calculating a fifth coordinate value of the pixel point in the image coordinate system corresponding to the first view angle.
For the sake of understanding, the above-mentioned pixel point P is still used as an example for description.
In an optional embodiment, for each pixel point in the image to be projected, the electronic device may calculate a fifth coordinate value of the pixel point in the image coordinate system corresponding to the first view angle by using the following formula:
Figure BDA0003302351270000153
wherein the content of the first and second substances,
Figure BDA0003302351270000154
is the fifth coordinate value, u, of the image coordinate system corresponding to the first view angle projected by the pixel point PtarIs the coordinate value, v, in the X-axis direction in the image coordinate system corresponding to the projection of the pixel point P to the first view angletarThe coordinate value in the Y-axis direction in the image coordinate system corresponding to the first visual angle projected by the pixel point P, inrinsicstarIs the internal reference matrix of the virtual camera tar.
In an alternative embodiment, in order to reduce the difference between the synthesized image from the first view and the image to be projected in terms of resolution, color, and the like, the internal reference matrix of the virtual camera tar may be the same as the internal reference matrix of the camera src.
The external reference matrix of the virtual camera tar may be calibrated according to the position of the first view, and the external reference matrix of the virtual camera tar is not specifically limited herein.
And step S505, projecting the image to be projected according to a fifth coordinate value of each pixel point in the image coordinate system corresponding to the first visual angle in the image to be projected, so as to obtain the projected image comprising the cavity area.
In this step, after determining the position coordinate value (i.e., the fifth coordinate value) of the pixel point corresponding to each pixel point in the image coordinate system corresponding to the first view angle in the image to be projected, the electronic device may perform projection on each pixel point in the image to be projected, so as to obtain a projected image including the void region.
In an alternative embodiment, the projection process may be specifically expressed as: and filling the image information of each pixel point in the image to be projected to a pixel point corresponding to a fifth coordinate value of the pixel point in the image coordinate system under the first visual angle. For example, the image information of the pixel point a is filled into the pixel point B.
In this embodiment of the application, due to the existence of the shielding phenomenon, when the image information of the pixel point in the image to be projected is filled to the pixel point corresponding to the fifth coordinate value corresponding to the pixel point, the image information of the plurality of pixel points in the image to be projected may be filled to the same pixel point in the projected image. At this time, the image information of the pixel point in the projected image can be determined in various ways.
For example, the image information of the pixel point in the projected image may be an average value, a mode, and the like of the image information corresponding to each pixel point projected to the pixel point in the to-be-projected image.
For convenience of understanding, the process of projecting the pixel point P to the image coordinate system corresponding to the second view angle is described by taking the coordinate of the pixel point P as (0,0) as an example. The homogeneous coordinate corresponding to the pixel point P is:
Figure BDA0003302351270000161
now, assume that the internal reference matrix of the camera src corresponding to the image to be projected is:
Figure BDA0003302351270000162
Figure BDA0003302351270000163
the external reference matrix of the camera src corresponding to the image to be projected is as follows:
Figure BDA0003302351270000164
the internal reference matrix of the virtual camera tar corresponding to the projection image is as follows:
Figure BDA0003302351270000171
the external reference matrix of the virtual camera tar corresponding to the projection image is as follows:
Figure BDA0003302351270000172
and a second coordinate value of the pixel point P in the camera coordinate system:
Figure BDA0003302351270000173
[]-1for solving matricesAnd (4) carrying out reverse operation.
A third coordinate value of the pixel point P in the world coordinate system:
Figure BDA0003302351270000174
and a fourth coordinate value of the pixel point P in the camera coordinate system corresponding to the first visual angle:
Figure BDA0003302351270000175
a fifth coordinate value of the pixel point P in the image coordinate system corresponding to the first view angle:
Figure BDA0003302351270000176
therefore, the coordinates of the pixel point P in the image to be projected, which correspond to the projected image, are: (51.84695977,26.45017069). The electronic device may project image information of the pixel point P in the image to be projected to a pixel point with a coordinate value (51.84695977,26.45017069) in the projected image.
Through the steps S501 to S505, the electronic device can accurately project each pixel point in the image to be projected into the image coordinate system corresponding to the first viewing angle, so as to obtain the projected image including the void region.
In the method shown in fig. 5, the electronic device directly projects each pixel point in the image to be projected according to the calculated fifth coordinate value. In addition, the electronic device may further perform coordinate conversion on the pixel points in the image to be projected according to the calculated second coordinate value and the third coordinate value, and then project each pixel point in the world coordinate system of the image to be projected into the camera coordinate system corresponding to the first view angle, so as to perform coordinate conversion on each pixel point in the camera coordinate system corresponding to the first view angle, thereby obtaining the projected image.
In an alternative embodiment, according to the method shown in fig. 2, the present application further provides an image synthesis method. As shown in fig. 6, fig. 6 is a schematic flowchart of a second image synthesis method according to an embodiment of the present application. Specifically, the above step S204 is subdivided into steps S2041 to S2044.
Step S2041, the cavity area in the projection image is expanded outward by a second preset range to obtain a first expanded area.
In this step, the electronic device may expand the hollow region in the projection image according to the second preset range, that is, the hollow region is expanded outward by the second preset range, and the expanded region is the first expanded region.
In an alternative embodiment, in step S2041, the hole region in the projection image is expanded outward by a second preset range to obtain a first expanded region, which may be specifically represented as the following steps, i.e., step one to step two.
Step one, determining each edge pixel point of a hollow area in a projected image.
And step two, extending each edge pixel point of the hollow area in the projected image to the outside within a second preset range to obtain a first extended area.
The edge pixel points are pixel points included in the area where the edge of the cavity area is located.
In an optional embodiment, each edge pixel point of the hollow area in the projection image is expanded outward by a second preset range to obtain a first expanded area, which may be represented as: and determining the circular expansion area corresponding to each edge pixel point by taking each edge pixel point of the hollow area in each projection image as the center of a circle and taking the second preset range as the radius, so that the area except the hollow area in the union area of the circular expansion areas corresponding to each edge pixel point of the hollow area is determined as the first expansion area of the hollow area.
For example, when the second preset range represents a preset number of pixel points, the electronic device may determine, for each edge pixel point of the hollow area in the projection image, a circular extension area formed by a preset number of pixel points around the edge pixel point, and determine a union area of the circular extension areas corresponding to each edge pixel point, where an area other than the hollow area in the union area is the first extension area.
The second preset range may be represented in a form of a distance, such as 2 mm, besides being represented in a form of a pixel. Here, the expression of the second preset range is not particularly limited, and the value of the second preset range is not particularly limited.
When the second preset range is represented by a preset distance, the determination method of the first extension area may refer to the determination method of the first extension area when the second preset range is represented by a preset number of pixels, and will not be described in detail herein.
In this embodiment of the application, when determining the first extension area, the electronic device may extend the second preset range to the outside for each pixel point in the void area in addition to extending the second preset range to the outside for each pixel point in the void area, so that the area except for the void area in the union area after each pixel point extends the second preset range is determined as the first extension area of the void area.
For ease of understanding, the description will be made with reference to fig. 7-a to 7-d as an example. Fig. 7-a is a projection image provided by an embodiment of the present application. Fig. 7-b is a mask image corresponding to the projected image shown in fig. 7-a. Fig. 7-c is a to-be-repaired image corresponding to the projection image shown in fig. 7-a. Fig. 7-d is a mask image corresponding to the image to be repaired shown in fig. 7-c.
In FIG. 7-a, a projected image 701 (denoted as I)prj) Includes a void region, namely region 702 (denoted as M)hole). The black area is an effective pixel area formed after each pixel point in the image to be projected is projected to an image coordinate system corresponding to the first visual angle. The mask image 703 shown in fig. 7-b is obtained by acquiring the hole mask corresponding to the projection image 701. The region 704 is the region 702 in the projection image 701.
According to the mask image 703, the electronic device can accurately determine each edge pixel point of the region 702 in the projected image 701, i.e., the outermost pixel point in the circular region shown by the region 704 in fig. 7-b.
The electronic device may expand each edge pixel point in the region 704 to a second preset range to obtain a region 709 (denoted as M) as shown in fig. 7-din). At this time, the electronic device may determine the circular ring region other than the hole region 704 in the region 709 as a first extended region (denoted as M) in fig. 7-bexp) I.e. region 707 in fig. 7-c is determined as the first extension region.
M abovein、MexpAnd MholeThe relationship between the three can be expressed as: min=Mexp+Mhole
Through the second preset range, the electronic equipment can accurately determine the position of the first expansion region in the projected image, and the relevance between the image information of each pixel point in the cavity region determined in the later stage and the image information of each pixel point in the first expansion region is ensured, so that the accuracy of repairing the image is ensured.
Step S2042, an image to be repaired of the projection image is acquired, and the image to be repaired includes a cavity region and a first expansion region.
In this step, the electronic device may intercept, according to the hole region and the first extension region, an image corresponding to the hole region and the first extension region in the projection image, so as to obtain an image to be restored.
The above-described fig. 7-a, 7-c and 7-d are still used as examples for explanation. After the electronic device determines that the first extended area is the area 707 shown in fig. 7-c, the electronic device may use the first extended area (i.e., the area 707 in fig. 7-c) and the void area (i.e., the area 706 in fig. 7-c, i.e., the area 702 in fig. 7-a) as the area to be repaired (i.e., the area 709 in the mask image 708 shown in fig. 7-d), and obtain an image corresponding to the area to be repaired from the projection image 701, so as to obtain the image to be repaired 705.
In an alternative embodiment, the electronic device may obtain the to-be-repaired image of the projection image by using the following formula:
Iin=Iprj*Mexp
wherein, IinFor the image to be restored, IprjFor projecting an image, MexpThe mask image of the first expansion area corresponding to the hollow area in the projection image is obtained.
In the embodiment of the application, the image to be repaired is obtained by obtaining the projected image, so that the hole filling in the later period can be performed only aiming at the area to be repaired in the image to be repaired, the calculation amount of the hole filling process is effectively reduced, and the hole filling efficiency is improved.
Step S2043, performing hole filling on the hole region by using the image information of each pixel point of the first extension region in the image to be repaired, so as to obtain a repaired image.
In an optional embodiment, in step S2043, the image information of each pixel point in the first extension area in the image to be repaired is used to perform hole filling on the hole area, so as to obtain the repaired image, which may be specifically expressed as:
inputting an image to be restored into a pre-trained image restoration model to obtain a restored image; the image restoration model is obtained by utilizing a preset training set for training; the preset training set comprises sample images acquired by a plurality of second cameras at different viewing angles and camera parameters of each second camera.
In the embodiment of the application, the image restoration model is used for carrying out hole filling on the hole area in the image to be restored, so that the accuracy of restoring the image and the hole filling efficiency of the hole area can be effectively improved.
For the training process of the image inpainting model, reference is made to the following description, which is not specifically described here.
And step S2044, carrying out image fusion on the projected image and the repaired image to obtain a composite image of the first visual angle.
In this step, after obtaining the restored image, the electronic device performs image fusion processing on the restored image and the projected image, and determines the fused image as a composite image at a first viewing angle.
In an alternative embodiment, the electronic device may fuse image information of the position of the hole region included in the repair image and the projected image into the hole region of the projected image to obtain a composite image at the first viewing angle.
In another alternative embodiment, in step S2044, the projection image and the repaired image are subjected to image fusion to obtain a composite image with a first viewing angle, which may be specifically represented as:
and carrying out image fusion on the projected image and the repaired image by using the following formula to obtain a synthetic image with a first visual angle:
Iprd=Iout*Mhole+[(1-α)Iout*Mexp+α*Iin*Mexp]+(Iin-Iin*Min)
wherein, IprdIs a composite image of a first view angle, IoutTo restore an image, MholeA hole mask corresponding to a hole region in the projected image, alpha being a predetermined weight, MexpFor projecting a mask image of a first extended area corresponding to a hollow area in the image, IinFor the image to be restored, MinAnd mask images corresponding to the hollow area and the expansion area in the image to be repaired are obtained.
By the method shown in fig. 6, the electronic device only fills the holes in the hole region of the image to be restored to obtain the restored image, so that the projected image and the restored image are subjected to image fusion, and the finally obtained synthesized image of the first viewing angle does not include the hole region, thereby ensuring the integrity and accuracy of the synthesized image of the first viewing angle.
Based on the same inventive concept, according to the image synthesis method provided in the embodiment of the present application, an image inpainting model training method is also provided in the embodiment of the present application, as shown in fig. 8, fig. 8 is a schematic flow diagram of the image model training method provided in the embodiment of the present application. The method comprises the following steps.
Step S801, a preset training set is obtained.
The preset training set comprises sample images acquired by a plurality of second cameras at different viewing angles and camera parameters of each second camera.
For ease of understanding, it is not illustrated in fig. 1.
The preset training set may include: at a certain time, each camera in fig. 1 captures an image of the stage at a corresponding viewing angle, i.e., a sample image, and the camera parameters of each camera in fig. 1.
In this embodiment of the application, the sample images and the camera parameters included in the preset training set may be a set of training data, or may be multiple sets of training data. Shooting scenes corresponding to sample images included in the same set of training data, viewpoints of the cameras and the like are the same. Shooting scenes corresponding to sample images in different sets of training data, viewpoints of cameras and the like can be different or the same. Here, the preset training set is not particularly limited. For convenience of understanding, the embodiments of the present application are only illustrated by taking a set of training data as an example, and do not serve any limiting purpose.
Step S802, based on the camera parameters of each second camera, projecting the sample images except the second visual angle in the preset training set to an image coordinate system corresponding to the second visual angle to obtain sample projection images including the cavity area.
In this step, the electronic device may select a viewing angle corresponding to any sample image as the second viewing angle. For each sample image in the preset training set except for the sample image corresponding to the second view angle, the electronic device may project the sample image into an image coordinate system corresponding to the second view angle according to a camera parameter of a camera corresponding to the sample image, so as to obtain a sample projection image including a void region.
The projection mode of the sample image can refer to the projection mode of the image to be projected, and is not specifically described here.
Step S803, the cavity region in the sample projection image is expanded outward by a second preset range, so as to obtain a second expanded region.
The second extension area may be determined in accordance with the determination method of the first extension area, and will not be described in detail here.
Step S804, a to-be-repaired sample image of the sample projection image is obtained, where the to-be-repaired sample image is an image including a cavity region and a second expansion region.
The method for acquiring the sample image to be repaired may refer to the method for acquiring the sample image to be repaired, and will not be specifically described herein.
Step S805, inputting the sample image to be restored into a preset depth network model to obtain a sample restored image.
The dimension included in the input data of the preset depth network model is the same as the dimension included in the output data.
For example, the input sample image to be restored includes the RGB value and the depth value of each pixel, and the sample restored image output by the preset depth network model also includes the RGB value and the depth value of each pixel.
In this embodiment of the application, the preset depth network model may be a depth residual error network (ResNet) model or a Visual Geometry Group (VGG) model. Here, the preset deep network model is not particularly limited.
And step 806, performing image fusion on the sample projection image and the sample restoration image to obtain a sample composite image at the second visual angle.
The image fusion method of the sample projection image and the sample restored image is referred to the image fusion method of the projection image and the restored image, and will not be specifically described here.
Step S807, calculating a loss value of the preset depth network model according to the sample synthetic image and the sample image of the second view angle in the preset training set.
In an alternative embodiment, the electronic device may calculate the loss value of the preset deep network model by using the following formula:
loss=||Iprd′*Min′-Itrue′*Min′||
wherein loss is a loss value, | | | | is a norm operation, Iprd' composite image of sample for second view, Min' mask images corresponding to hollow region and extended region in sample image to be repaired, Itrue' is a sample image of the second view.
In this embodiment, the loss value of the preset deep network model may be calculated in other manners. For example, the electronic device may calculate a difference sum between the image information of each pixel point in the sample composite image and the image information of the pixel point at the corresponding position in the sample image of the second view in the preset training set, and determine the difference sum as a loss value of the preset depth network model. Here, the method of calculating the loss value of the preset deep network model is not particularly limited.
In this embodiment, after the electronic device calculates and waits for the loss value, the electronic device may compare the loss value with a preset loss value threshold. If the loss value is greater than the predetermined loss value threshold, step S808 is executed. If the loss value is not greater than the predetermined loss value threshold, step S809 is executed.
Step S808, when the loss value is greater than the preset loss value threshold, adjusting parameters of the preset deep network model, and returning to execute step S805.
When the loss value is greater than the preset loss value threshold, the electronic device may determine that the preset deep network model is not converged. At this time, the electronic device may adjust parameters of the preset depth network model, and return to perform step S805, that is, return to perform the step of inputting the sample image to be repaired into the preset depth network model to obtain the sample repaired image.
In the embodiment of the present application, the parameters of the preset deep network model include, but are not limited to, a weight and an offset. The electronic device may adjust the parameters of the preset deep network model by using a gradient descent method, a reverse adjustment method, or the like.
And step S809, when the loss value is not greater than the preset loss value threshold, determining the current preset depth network model as the trained image restoration model.
In this step, when the loss value is less than or equal to the preset loss value threshold, the electronic device may determine that the preset deep network model converges. At this time, the electronic device may determine the current preset depth network model as a trained image inpainting model, that is, a pre-trained image inpainting model input by the image to be inpainted.
By the method shown in fig. 8, the electronic device may complete training of the preset depth network model by using the sample image in the preset training set, so as to ensure accuracy of an image restoration model obtained by training, accuracy of a restoration image obtained by filling a hole in an image to be restored by using the image restoration model, and accuracy of a synthetic image obtained based on the restoration image.
Based on the same inventive concept, according to the image synthesis method provided by the embodiment of the present application, the embodiment of the present application further provides an image synthesis device. As shown in fig. 9, fig. 9 is a schematic structural diagram of an image synthesis apparatus according to an embodiment of the present application. The apparatus includes the following modules.
A receiving module 901, configured to receive a view switching instruction including a first view;
the first acquiring module 902 is configured to acquire an image to be projected, where the image to be projected is an image acquired by at least one first camera within a first preset range of a first viewing angle;
a first projection module 903, configured to project the image to be projected into an image coordinate system corresponding to the first view angle based on a camera parameter of the first camera, so as to obtain a projected image including a cavity region;
the synthesis module 904 is configured to perform hole filling on the projected image according to image information of pixel points in a second preset range of a hole region in the projected image, so as to obtain a synthesized image at a first viewing angle;
a displaying module 905, configured to display the composite image of the first view.
Optionally, the synthesizing module 904 includes:
the expansion submodule is used for expanding the cavity area in the projection image outwards by a second preset range to obtain a first expansion area;
the acquisition submodule is used for acquiring an image to be restored of the projected image, wherein the image to be restored comprises a cavity area and a first expansion area;
the restoration submodule is used for filling holes in the hole area by utilizing the image information of each pixel point of the first expansion area in the image to be restored to obtain a restored image;
and the fusion submodule is used for carrying out image fusion on the projection image and the repaired image to obtain a composite image of a first visual angle.
Optionally, the expansion submodule may be specifically configured to determine each edge pixel point of the hollow area in the projection image; and expanding each edge pixel point of the hollow area in the projected image to the outside within a second preset range to obtain a first expanded area.
Optionally, the obtaining sub-module may be specifically configured to obtain an image to be restored of the projection image by using the following formula:
Iin=Iprj*Mexp
wherein, IinFor the image to be restored, IprjFor projecting an image, MexpThe mask image of the first expansion area corresponding to the hollow area in the projection image is obtained.
Optionally, the restoration submodule may be specifically configured to input the image to be restored into a pre-trained image restoration model to obtain a restored image; the image restoration model is obtained by utilizing a preset training set for training; the preset training set comprises sample images acquired by a plurality of second cameras at different viewing angles and camera parameters of each second camera.
Optionally, the image synthesizing apparatus may further include:
the second acquisition module is used for acquiring a preset training set;
the second projection module is used for projecting the sample images except the second visual angle in the preset training set to an image coordinate system corresponding to the second visual angle based on the camera parameters of each second camera to obtain a sample projection image comprising a cavity area;
the expansion module is used for expanding the cavity area in the sample projection image outwards by a second preset range to obtain a second expansion area;
the third acquisition module is used for acquiring a sample image to be repaired of the sample projection image, wherein the sample image to be repaired comprises a cavity area and a second expansion area;
the restoration module is used for inputting the sample image to be restored into a preset depth network model to obtain a sample restored image;
the fusion module is used for carrying out image fusion on the sample projection image and the sample restoration image to obtain a sample composite image at a second visual angle;
the calculation module is used for calculating a loss value of the preset depth network model according to the sample synthetic image and the sample image of the second visual angle in the preset training set;
the adjusting module is used for adjusting the parameters of the preset depth network model when the loss value is larger than the preset loss value threshold value, and returning to the step of calling the repairing module to input the sample image to be repaired into the preset depth network model to obtain a sample repaired image;
and the determining module is used for determining the current preset depth network model as the trained image restoration model when the loss value is not greater than the preset loss value threshold.
Optionally, the calculating module may be specifically configured to calculate a loss value of the preset deep network model by using the following formula:
loss=||Iprd′*Min′-Itrue′*Min′||
wherein loss is a loss value, | | | | is a norm operation, Iprd' composite image of sample for second view, Min' mask images corresponding to hollow region and extended region in sample image to be repaired, Itrue' is a sample image of the second view.
Optionally, the fusion sub-module may be specifically configured to perform image fusion on the projection image and the repaired image by using the following formula to obtain a composite image at a first viewing angle:
Iprd=Iout*Mhole+[(1-α)Iput*Mexp+α*Iin*Mexp]+(Iin-Iin*Min)
wherein, IprdIs a composite image of a first view angle, IoutTo restore an image, MholeA hole mask corresponding to a hole region in the projected image, alpha being a predetermined weight, MexpFor projecting a mask image of a first extended area corresponding to a hollow area in the image, IinFor the image to be restored, MinAnd mask images corresponding to the hollow area and the expansion area in the image to be repaired are obtained.
Optionally, the camera parameters include external parameters of the camera and internal parameters of the camera;
the first projection module 903 may be specifically configured to calculate, for each pixel point in the image to be projected, a second coordinate value of the pixel point in the camera coordinate system according to the first coordinate value of the pixel point and the camera internal reference of the first camera;
calculating a third coordinate value of each pixel point in the world coordinate system according to the second coordinate value of the pixel point and the camera external parameter of the first camera for each pixel point in the image to be projected;
for each pixel point in the image to be projected, calculating a fourth coordinate value of the pixel point in a camera coordinate system corresponding to the first visual angle according to the third coordinate value of the pixel point and camera external parameters of the virtual camera at the first visual angle;
for each pixel point in the image to be projected, calculating a fifth coordinate value of the pixel point in an image coordinate system corresponding to the first visual angle according to the fourth coordinate value of the pixel point and camera internal parameters of the virtual camera at the first visual angle;
and projecting the image to be projected according to a fifth coordinate value of each pixel point in the image coordinate system corresponding to the first visual angle in the image to be projected to obtain the projected image comprising the cavity area.
By means of the device, after the visual angle switching instruction is received, the obtained projected image to be projected is projected to the image coordinate system corresponding to the first visual angle, the projected image including the void area is obtained, so that the void area is filled with the image information of each pixel point in the second preset range of the void area, the composite image of the first visual angle is obtained, and the composite image is displayed.
Because the synthesized image of the first visual angle is synthesized based on the image to be projected, which is collected by at least one camera in the first preset range of the first visual angle, the image corresponding to the first visual angle can be synthesized by using the image collected by the camera in the adjacent visual angle of the first visual angle in the visual angle switching process, so that the cameras are prevented from being deployed at the position corresponding to the first visual angle, the number of the cameras required to be deployed is reduced, and the deployment cost required in the visual angle switching process is greatly reduced.
Moreover, the visual angle switching instruction comprising the first visual angle is received, so that the synthetic image corresponding to the first visual angle is displayed, the deployment cost required in the visual angle switching process is reduced, more visual angles capable of being switched are provided, the requirement of a user on visual angle switching is met, and the display effect is improved.
Based on the same inventive concept, according to the image synthesis method provided by the embodiment of the present application, the embodiment of the present application further provides an electronic device, as shown in fig. 10, comprising a processor 1001, a communication interface 1002, a memory 1003 and a communication bus 1004, wherein the processor 1001, the communication interface 1002 and the memory 1003 complete mutual communication through the communication bus 1004,
a memory 1003 for storing a computer program;
the processor 1001 is configured to implement the following steps when executing the program stored in the memory 1003:
receiving a visual angle switching instruction comprising a first visual angle;
acquiring a to-be-projected image, wherein the to-be-projected image is an image acquired by at least one first camera in a first preset range of a first visual angle;
based on camera parameters of a first camera, projecting an image to be projected into an image coordinate system corresponding to a first visual angle to obtain a projected image comprising a cavity area;
filling the holes in the projected image according to image information of pixel points in a second preset range of the hole area in the projected image to obtain a composite image of a first visual angle;
a composite image of a first perspective is shown.
By means of the electronic device provided by the embodiment of the application, after the visual angle switching instruction is received, the obtained projected image to be projected is projected into the image coordinate system corresponding to the first visual angle, the projected image including the void area is obtained, so that the void area is filled by utilizing the image information of each pixel point in the second preset range of the void area, the composite image of the first visual angle is obtained, and the composite image is displayed.
Because the synthesized image of the first visual angle is synthesized based on the image to be projected, which is collected by at least one camera in the first preset range of the first visual angle, the image corresponding to the first visual angle can be synthesized by using the image collected by the camera in the adjacent visual angle of the first visual angle in the visual angle switching process, so that the cameras are prevented from being deployed at the position corresponding to the first visual angle, the number of the cameras required to be deployed is reduced, and the deployment cost required in the visual angle switching process is greatly reduced.
Moreover, the visual angle switching instruction comprising the first visual angle is received, so that the synthetic image corresponding to the first visual angle is displayed, the deployment cost required in the visual angle switching process is reduced, more visual angles capable of being switched are provided, the requirement of a user on visual angle switching is met, and the display effect is improved.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
Based on the same inventive concept, according to the image synthesis method provided in the embodiments of the present application, the embodiments of the present application further provide a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the image synthesis method described in any of the embodiments above.
Based on the same inventive concept, according to the image synthesis method provided in the embodiments of the present application, the embodiments of the present application also provide a computer program product containing instructions that, when run on a computer, cause the computer to perform the image synthesis method described in any of the embodiments above.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for embodiments such as the apparatus, the electronic device, the computer-readable storage medium, and the computer program product, since they are substantially similar to the method embodiments, the description is simple, and for relevant points, reference may be made to part of the description of the method embodiments.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (12)

1. An image synthesis method, characterized in that the method comprises:
receiving a visual angle switching instruction comprising a first visual angle;
acquiring a to-be-projected image, wherein the to-be-projected image is an image acquired by at least one first camera within a first preset range of the first visual angle;
based on the camera parameters of the first camera, projecting the image to be projected into an image coordinate system corresponding to the first visual angle to obtain a projected image comprising a cavity area;
filling holes in the projected image according to image information of pixel points in a second preset range of the hole area in the projected image to obtain a composite image of the first visual angle;
a composite image of the first perspective is shown.
2. The method according to claim 1, wherein the step of performing hole filling on the projection image according to image information of pixels in a second preset range of a hole region in the projection image to obtain a composite image of the first view angle includes:
expanding the cavity area in the projected image outwards by a second preset range to obtain a first expanded area;
acquiring an image to be repaired of the projected image, wherein the image to be repaired comprises the cavity area and the first expansion area;
filling holes in the hole area by using the image information of each pixel point of the first expansion area in the image to be repaired to obtain a repaired image;
and carrying out image fusion on the projected image and the repaired image to obtain a composite image of the first visual angle.
3. The method of claim 2, wherein the step of extending the hole region in the projection image outward by a second preset range to obtain a first extended region comprises:
determining each edge pixel point of the hollow area in the projected image;
and expanding each edge pixel point of the hollow area in the projected image outwards by a second preset range to obtain a first expanded area.
4. The method according to claim 2, wherein the step of acquiring the image to be restored of the projection image comprises:
acquiring an image to be restored of the projected image by using the following formula:
Iin=Iprj*Mexp
wherein, IinFor the image to be restored, IprjFor the projection image, MexpAnd the mask image is a first expansion area corresponding to the hollow area in the projection image.
5. The method according to claim 2, wherein the step of filling the hole in the hole area by using the image information of each pixel point of the first extension area in the image to be restored to obtain the restored image comprises:
inputting the image to be restored into a pre-trained image restoration model to obtain a restored image; the image restoration model is obtained by utilizing a preset training set for training; the preset training set comprises sample images acquired by a plurality of second cameras at different viewing angles and camera parameters of each second camera.
6. The method of claim 5, wherein the image inpainting model is trained using the following steps:
acquiring the preset training set;
based on the camera parameters of each second camera, projecting the sample images except the second visual angle in the preset training set to an image coordinate system corresponding to the second visual angle to obtain sample projection images including the cavity area;
expanding the cavity area in the sample projection image outwards by a second preset range to obtain a second expanded area;
acquiring a sample image to be repaired of the sample projection image, wherein the sample image to be repaired is an image comprising the cavity area and the second expansion area;
inputting the sample image to be restored into a preset depth network model to obtain a sample restored image;
performing image fusion on the sample projection image and the sample restoration image to obtain a sample composite image of the second visual angle;
calculating a loss value of the preset depth network model according to the sample synthetic image and a sample image of a second visual angle in the preset training set;
when the loss value is larger than a preset loss value threshold value, adjusting parameters of the preset depth network model, and returning to execute the step of inputting the sample image to be repaired into the preset depth network model to obtain a sample repaired image;
and when the loss value is not greater than the preset loss value threshold value, determining the current preset depth network model as a trained image restoration model.
7. The method of claim 6, wherein the step of calculating the loss value of the preset depth network model according to the sample synthetic image and the sample image of the second view angle in the preset training set comprises:
calculating a loss value of the preset depth network model by using the following formula:
loss=||Iprd′*Min′-Itrue′*Min′||
wherein loss is the loss value, | | | | is norm operation, Iprd' synthesizing an image for the sample of the second view, Min' mask images corresponding to hollow regions and extended regions in the sample image to be repaired, Itrue' is a sample image of the second view.
8. The method of claim 2, wherein the step of image fusing the projected image and the restored image to obtain the composite image at the first viewing angle comprises:
and carrying out image fusion on the projected image and the repaired image by using the following formula to obtain a composite image of the first visual angle:
Iprd=Iout*Mhole+[(1-α)Iout*Mexp+α*Iin*Mexp]+(Iin-Iin*Min)
wherein, IprdIs a composite image of the first view angle, IoutFor the restoration image, MholeA hole mask corresponding to the hole area in the projection image, wherein alpha is a preset weight, MexpA mask image of a first extended region corresponding to the hollow region in the projection image, IinFor the image to be restored, MinAnd mask images corresponding to the hollow area and the expansion area in the image to be repaired are obtained.
9. The method of claim 1, wherein the camera parameters include camera external parameters and camera internal parameters;
the step of projecting the image to be projected into an image coordinate system corresponding to the first view angle based on the camera parameter of the first camera to obtain a projected image including a cavity region includes:
aiming at each pixel point in the image to be projected, calculating a second coordinate value of the pixel point in a camera coordinate system according to a first coordinate value of the pixel point and camera internal parameters of the first camera;
aiming at each pixel point in the image to be projected, calculating a third coordinate value of the pixel point in a world coordinate system according to a second coordinate value of the pixel point and camera external parameters of the first camera;
aiming at each pixel point in the image to be projected, calculating a fourth coordinate value of the pixel point in a camera coordinate system corresponding to the first visual angle according to a third coordinate value of the pixel point and camera external parameters of the virtual camera at the first visual angle;
aiming at each pixel point in the image to be projected, calculating a fifth coordinate value of the pixel point in an image coordinate system corresponding to the first visual angle according to a fourth coordinate value of the pixel point and camera internal parameters of the virtual camera at the first visual angle;
and projecting the image to be projected according to a fifth coordinate value of each pixel point in the image coordinate system corresponding to the first visual angle to obtain a projected image comprising a cavity area.
10. An image synthesizing apparatus, characterized in that the apparatus comprises:
the receiving module is used for receiving a visual angle switching instruction comprising a first visual angle;
the first acquisition module is used for acquiring an image to be projected, wherein the image to be projected is an image acquired by at least one first camera within a first preset range of the first visual angle;
the first projection module is used for projecting the image to be projected into an image coordinate system corresponding to the first visual angle based on the camera parameters of the first camera to obtain a projected image comprising a cavity area;
the synthesis module is used for filling the holes in the projected image according to the image information of pixel points in a second preset range of the hole area in the projected image to obtain a synthesized image of the first visual angle;
and the display module is used for displaying the composite image of the first visual angle.
11. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1-9 when executing a program stored in the memory.
12. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-9.
CN202111194061.2A 2021-10-13 2021-10-13 Image synthesis method and device, electronic equipment and storage medium Pending CN113870165A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111194061.2A CN113870165A (en) 2021-10-13 2021-10-13 Image synthesis method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111194061.2A CN113870165A (en) 2021-10-13 2021-10-13 Image synthesis method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113870165A true CN113870165A (en) 2021-12-31

Family

ID=78999267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111194061.2A Pending CN113870165A (en) 2021-10-13 2021-10-13 Image synthesis method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113870165A (en)

Similar Documents

Publication Publication Date Title
US10425638B2 (en) Equipment and method for promptly performing calibration and verification of intrinsic and extrinsic parameters of a plurality of image capturing elements installed on electronic device
US10609282B2 (en) Wide-area image acquiring method and apparatus
EP2862356B1 (en) Method and apparatus for fusion of images
US20230291884A1 (en) Methods for controlling scene, camera and viewing parameters for altering perception of 3d imagery
JP2019511024A (en) Adaptive Stitching of Frames in the Process of Generating Panoramic Frames
JP6308748B2 (en) Image processing apparatus, imaging apparatus, and image processing method
RU2453922C2 (en) Method of displaying original three-dimensional scene based on results of capturing images in two-dimensional projection
KR101941801B1 (en) Image processing method and device for led display screen
CN111866523B (en) Panoramic video synthesis method and device, electronic equipment and computer storage medium
CN108200360A (en) A kind of real-time video joining method of more fish eye lens panoramic cameras
CN104349155A (en) Method and equipment for displaying simulated three-dimensional image
CN110689476A (en) Panoramic image splicing method and device, readable storage medium and electronic equipment
CN105574813A (en) Image processing method and device
US10354399B2 (en) Multi-view back-projection to a light-field
KR101725024B1 (en) System for real time making of 360 degree VR video base on lookup table and Method for using the same
KR101704362B1 (en) System for real time making of panoramic video base on lookup table and Method for using the same
WO2021031210A1 (en) Video processing method and apparatus, storage medium, and electronic device
CN110958444A (en) 720-degree view field environment situation sensing method and situation sensing system
CN113870165A (en) Image synthesis method and device, electronic equipment and storage medium
CN109218602A (en) Image capture unit, image treatment method and electronic device
CN116506563A (en) Virtual scene rendering method and device, electronic equipment and storage medium
bin Hashim et al. Spherical high dynamic range virtual reality for virtual tourism: Kellie's Castle, Malaysia
CN108769662B (en) Multi-view naked eye 3D image hole filling method and device and electronic equipment
WO2020244194A1 (en) Method and system for obtaining shallow depth-of-field image
CN102447829B (en) Setting method and system for shooting parameter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination