WO2023272531A1 - Image processing method and apparatus, device, and storage medium - Google Patents

Image processing method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2023272531A1
WO2023272531A1 PCT/CN2021/103290 CN2021103290W WO2023272531A1 WO 2023272531 A1 WO2023272531 A1 WO 2023272531A1 CN 2021103290 W CN2021103290 W CN 2021103290W WO 2023272531 A1 WO2023272531 A1 WO 2023272531A1
Authority
WO
WIPO (PCT)
Prior art keywords
point
pixel
depth
mth
color
Prior art date
Application number
PCT/CN2021/103290
Other languages
French (fr)
Chinese (zh)
Inventor
杨铀
蒋小广
刘琼
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to CN202180100038.4A priority Critical patent/CN117730530A/en
Priority to PCT/CN2021/103290 priority patent/WO2023272531A1/en
Publication of WO2023272531A1 publication Critical patent/WO2023272531A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/271Image signal generators wherein the generated image signals comprise depth maps or disparity maps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/282Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems

Definitions

  • the embodiments of the present application relate to image technologies, and relate to but are not limited to image processing methods, devices, equipment, and storage media.
  • the image processing method, device, equipment, and storage medium provided in the embodiments of the present application are implemented as follows:
  • the image processing method provided by the embodiment of the present application includes: according to the depth of the first pixel in the depth map (Depth Map) under the first reference viewpoint, performing region division on the depth map to obtain at least one region;
  • the coordinates of the mth second pixel point of the view to be rendered are inversely transformed into at least one target area in the at least one area, and the coordinates of the mth second pixel point in the at least one target area are obtained.
  • the image processing device includes: a region division module, configured to perform region division on the depth map according to the depth of the first pixel point in the depth map under the first reference viewpoint to obtain at least one region; coordinates An inverse transformation module, configured to inversely transform the coordinates of the mth second pixel point of the view to be rendered under the target viewpoint into at least one target area in the at least one area, to obtain the mth second pixel point Existing position points in the at least one target area; wherein, m is greater than 0 and less than or equal to the total number of pixels of the view to be rendered; the rendering module is configured to place the mth second pixel in the For at least one position point in the target area, render the color of the mth second pixel point.
  • a region division module configured to perform region division on the depth map according to the depth of the first pixel point in the depth map under the first reference viewpoint to obtain at least one region
  • coordinates An inverse transformation module configured to inversely transform the coordinates of the mth second pixel point
  • the electronic device provided by the embodiment of the present application includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the steps in the image processing method when executing the program.
  • the computer-readable storage medium provided by the embodiment of the present application stores a computer program thereon, and when the computer program is executed by a processor, the steps in the image processing method are implemented.
  • the depth map is divided into regions according to the depth of the first pixel in the depth map under the first reference viewpoint, instead of dividing the pixels under the first reference viewpoint based on a predetermined plane distribution law.
  • the view (Viewport) is divided into planes; in this way, because the area division combines the depth of each point in the actual scene, the color of the second pixel in the final rendering is more accurate, so that after the view to be rendered is rendered (that is, the synthetic view ) has more image detail.
  • Fig. 1 is a schematic diagram of the implementation flow of the image processing method of the embodiment of the present application.
  • FIG. 2 is a schematic flow diagram of another implementation of the image processing method of the embodiment of the present application.
  • FIG. 3 is a schematic flow diagram of another implementation of the image processing method of the embodiment of the present application.
  • Fig. 4 is the schematic diagram of the multiplane image (Multiplane Image, MPI) representation of the embodiment of the present application that is composed of 4 plane layers combined with basis functions;
  • MPI Multiplane Image
  • Fig. 5 is the workflow of the synthesis model NeX of scene new angle of view of the embodiment of the present application (the process of obtaining H n (v) is omitted);
  • Fig. 7 is a schematic diagram of an example of an MPI plane layer
  • Fig. 8 is the schematic workflow diagram of the PMIP (Patch Multiplane Image) model of the embodiment of the present application.
  • Fig. 9 is the schematic flow sheet that the embodiment of the present application obtains PMPI shape
  • FIG. 11 is a schematic diagram of a calculation flow chart of a depth map according to an embodiment of the present application.
  • Figure 12 is a schematic diagram of the comparison of the synthesis effect in the fern scene
  • Figure 13 Schematic diagram of the comparison of the synthesis effect in the trex scene
  • FIG. 14 is a schematic structural diagram of an image processing device according to an embodiment of the present application.
  • FIG. 15 is a schematic diagram of a hardware entity of an electronic device according to an embodiment of the present application.
  • FIG. 16 is a schematic diagram of another hardware entity of the electronic device according to the embodiment of the present application.
  • first ⁇ second ⁇ third involved in the embodiment of this application is to distinguish similar or different objects, and does not represent a specific ordering of objects. Understandably, “first ⁇ second ⁇ third The specific order or sequence of "three” can be interchanged where permitted so that the embodiments of the application described herein can be practiced in other orders than those illustrated or described herein.
  • An embodiment of the present application provides an image processing method, which is applied to an electronic device, and the electronic device may be any device capable of data processing, for example, the electronic device is a notebook computer, a mobile phone, a server, a TV, or a projector.
  • Fig. 1 is a schematic diagram of the implementation flow of the image processing method of the embodiment of the present application. As shown in Fig. 1, the method may include the following steps 101 to 103:
  • Step 101 according to the depth of the first pixel in the depth map under the first reference viewpoint, perform region division on the depth map to obtain at least one region.
  • the area division range of the depth map there is no limit to the area division range of the depth map.
  • a certain block or several blocks of the depth map can be divided into the area; in other embodiments, The entire depth map can be divided into regions.
  • the number of divided regions may or may not be a specific number. If it is not a specific number, the number of divided regions is related to the actual scene, that is, related to the depth distribution of the first pixel in the depth map.
  • the method for obtaining the depth map can be based on binocular stereo vision, that is, two images of the same scene are simultaneously acquired through two cameras carried by an electronic device at a certain distance, and the corresponding pixels in the two images are found through a stereo matching algorithm, and then calculated according to the triangulation principle
  • the disparity information can be obtained, and the disparity information can be used to represent the depth information of objects in the scene through conversion.
  • the acquisition of the depth information of the scene is realized through the active ranging sensor carried by the electronic device; wherein, the active ranging sensor may be, for example, a time of flight (Time of flight, TOF) camera, a structured light device, or a laser radar.
  • the active ranging sensor may be, for example, a time of flight (Time of flight, TOF) camera, a structured light device, or a laser radar.
  • the electronic device may also obtain the depth map under the first reference viewpoint through steps 201 to 203 of the following embodiment, that is, obtain the depth under the first reference viewpoint based on at least one view under the second reference viewpoint Figure; where the first reference viewpoint is different from the second reference viewpoint.
  • this method does not require the electronic device to have a binocular camera, nor does it need to have an active ranging sensor to obtain a depth map, so that the image processing provided by the embodiment of the present application
  • the method can be applied to more electronic devices, and its universality is stronger.
  • the code stream may also be decoded by receiving the code stream sent by the encoding end, so as to obtain the depth map under the first reference viewpoint.
  • the depth map or a block or blocks in the depth map may be divided into regions according to the depth relationship between the first pixels. For example, divide pixels with equal depth into the same area. In another example, pixels with depth differences within a specific range are divided into the same area.
  • Step 102 Inversely transform the coordinates of the mth second pixel point of the view to be rendered under the target viewpoint into at least one target area in the at least one area, and obtain the mth second pixel point in the Existing position points in at least one target area; wherein, m is greater than 0 and less than or equal to the total number of pixels of the view to be rendered.
  • the at least one target area may be all areas in the at least one area, or one or more areas; when it is all areas, there is no need to screen the at least one area, and the area The at least one divided area may be used as the target area.
  • any area in the at least one area can be used as the target area, for example, a specific number of areas are randomly selected as the target area; In some other embodiments, an area satisfying a specific condition may also be selected from the at least one area as the target area. For example, an area whose number of pixels is larger than a specific number is used as a target area; another example is an area whose depth is smaller than a specific depth is used as a target area.
  • the homogeneous coordinates (u t ,v t ,1) of the second pixel point of the view to be rendered under the target viewpoint can be inversely transformed into the target area by the following formula (1), so as to obtain the pixel
  • means equal in a certain proportion
  • R and t are the rotation matrix and translation vector from the camera coordinate system of the first reference viewpoint to the camera coordinate system of the target viewpoint.
  • a is the negative value of the region depth of the target region. If the depths of the pixels in the same target region are not equal, the average or median depth of the pixels in the region can be used as the depth of the region.
  • k s and k t are camera internal parameters corresponding to the first reference viewpoint and the target viewpoint respectively.
  • Step 103 Render the color of the mth second pixel according to the position of the mth second pixel in the at least one target area.
  • step 103 may be implemented through steps 210 and 211 of the following embodiments.
  • the depth map is divided into regions according to the depth of the first pixel in the depth map under the first reference viewpoint, instead of dividing the pixels under the first reference viewpoint based on a predetermined plane distribution law.
  • the view is plane-divided; in this way, since the area division combines the depth of each point in the actual scene, the color of the second pixel in the final rendering is more accurate, and then the image of the view to be rendered after being rendered (that is, the synthetic view) More details.
  • FIG. 2 is a schematic diagram of the implementation process of the image processing method in the embodiment of the present application. As shown in FIG. 2, the method may include the following steps 201 to 211:
  • Step 201 perform three-dimensional reconstruction on the included scene according to at least one view under the second reference viewpoint, and obtain point cloud data of the scene under the camera coordinate system of the first reference viewpoint.
  • the sparse view of the scene can be used as the input of the colmap tool to perform camera parameter estimation and multi-dimensional stereo reconstruction (MVS) from motion to structure (SFM), thereby obtaining the point cloud data; wherein, the point cloud data
  • MVS multi-dimensional stereo reconstruction
  • SFM motion to structure
  • the point cloud data The coordinates of the midpoint are denoted as (x,y,d), where d represents the depth of the point relative to the camera of the first reference viewpoint.
  • the sparse views of the scene are views under different second reference viewpoints.
  • Step 202 determining the disparity map of the scene.
  • the electronic device may implement step 202 in the following way: Obtain a transparency map of at least one plane of the scene according to at least one view under the second reference viewpoint; and the corresponding plane depth to synthesize the disparity map of the scene; thus, compared to the method based on binocular stereo vision, electronic devices without binocular cameras can still implement the image processing method, so its universality Stronger, and save the hardware cost of electronic equipment.
  • the MPI representation of the scene is synthesized through the NeX model, and based on this, the disparity map is synthesized according to the following formula (2)
  • d i represents the depth of the i-th MPI plane (sorted from far to near), and ⁇ i represents the transparency of the i-th flat MPI plane.
  • D represents the number of transparency maps.
  • Step 203 obtain a depth map under the first reference viewpoint according to the disparity map and the point cloud data.
  • the electronic device can implement step 203 in this way: according to the disparity map and the point cloud data, obtain the inverse proportional coefficient between the disparity map and the depth map; according to the inverse proportional coefficient and the disparity map , to obtain the depth map under the first reference viewpoint.
  • the disparity is inversely proportional to the depth. Therefore, the inverse proportion coefficient can be determined first, and then the disparity map can be converted into a depth map based on the coefficient.
  • the inverse proportional coefficient is recorded as ⁇
  • P s is the point cloud data
  • (x, y, d) are the coordinates of the point in the camera coordinate system of the first reference viewpoint.
  • Step 204 according to the depth of the first pixel in the depth map, determine the depth relationship between the first pixels
  • Step 205 perform region division on the depth map to obtain at least one region.
  • the first pixel points with the same depth or a depth difference within a specific range are divided into the same area.
  • the region division can be realized by using the OTSU algorithm or the superpixel segmentation algorithm.
  • Step 206 determining the transformation relationship between the camera coordinate system where the first reference viewpoint is located and the camera coordinate system where the target viewpoint is located.
  • the transformation relationship includes a rotation matrix and a translation vector.
  • Step 207 acquiring the internal camera parameters corresponding to the first reference viewpoint and the internal camera parameters corresponding to the target viewpoint;
  • Step 208 determine the area depth of at least one target area in the at least one area.
  • the mean or median depth of the pixels in the area can be used as the area depth of the area; for the same depth of pixels in the same target area In the case of , the depth of any pixel in the area can be used as the area depth of the area.
  • Step 209 Perform a reverse homography on the homogeneous coordinates of the mth second pixel point according to the transformation relationship, the camera intrinsic parameters corresponding to the first reference viewpoint and the target viewpoint, and the region depth transform to obtain the existing location point of the mth second pixel point in the at least one target area.
  • Step 210 from the existing position points of the mth second pixel point in the at least one target area, select the position points satisfying the condition as valid position points.
  • the location point if the existing location point is in the corresponding area, the location point is regarded as a valid location point; otherwise, if the existing location point is not in the corresponding area, it is regarded as an invalid location point and discarded.
  • Step 211 Render the color of the mth second pixel according to the effective position point.
  • step 211 can be implemented as follows: determine the color coefficient, transparency, basic color value and basis function of the effective position point; wherein, the independent variable of the basis function is the effective position point and the target The relative direction of the viewpoint; according to the color coefficient, basic color value and basis function of the effective position point, the observed color value of the effective position point from the relative direction is obtained; the transparency of each effective position point is Combining with the observed color value to obtain a composite color value; using the composite color value to render the color of the mth second pixel.
  • the transparency and color coefficient of the effective location point can be determined through step 304 of the following embodiment; the basic color value of the effective location point can be determined through step 305 of the following embodiment; through the following embodiment Step 306 of determining the basis functions of the effective location points.
  • the relative direction may be a unit direction vector of the target viewpoint relative to the effective location point, or may be a unit direction vector of the effective location point relative to the target viewpoint.
  • effective position points are first selected from the existing position points of the mth second pixel point in the at least one target area, and then based on the effective position points instead of each Once there is a position point, render the color of the mth second pixel point; in this way, the amount of calculation can be saved, thereby improving rendering efficiency, and further improving the synthesis efficiency of the synthesized view.
  • FIG. 3 is a schematic diagram of the implementation process of the image processing method in the embodiment of the present application. As shown in FIG. 3 , the method may include the following steps 301 to 309:
  • Step 301 divide the depth map into regions according to the depth of the first pixel in the depth map under the first reference viewpoint, and obtain at least one region;
  • Step 302 Inversely transform the coordinates of the mth second pixel point of the view to be rendered under the target viewpoint into at least one target area in the at least one area, and obtain the mth second pixel point in the at least one target area.
  • Step 303 from among the existing position points of the mth second pixel point in the at least one target area, select the position points satisfying the conditions as valid position points;
  • Step 304 Obtain the transparency and color coefficient of the effective location point according to the coordinates of the effective location point and the trained first multi-layer perceptron.
  • the coordinates of the effective position point can be mapped to a vector with the first dimension; the vector with the first dimension is input into the first multi-layer perceptron to obtain the effective position The transparency and color factor of the point.
  • the size of the first dimension which may be 56 dimensions or any dimension.
  • mapping to the space coordinates (x, y, d) of the effective position point can be realized by the following formula (4):
  • Step 305 according to the coordinates of the effective location point, obtain the basic color value of the effective location point
  • Step 306 according to the relative direction and the trained second multi-layer perceptron, obtain the basis function of the effective location point.
  • the relative direction is mapped to a vector with a second dimension; the vector with the second dimension is input into the second multi-layer perceptron to obtain the basis function of the effective position point .
  • the size of the second dimension can be arbitrary, and the value of h can also be set arbitrarily.
  • Step 307 Obtain the color value of the effective location point viewed from the relative direction according to the color coefficient, the basic color value and the basis function of the effective location point.
  • the color value C P (v) of the effective position point P observed from the relative direction v can be obtained according to the following formula (5):
  • v represents the unit direction vector of the point P relative to the target viewpoint
  • k 0 P represents the basic color value of point P (such as RGB value, of course, it is not limited to this color format, and can also be expressed in other color formats)
  • [k 1 P ,...,k N P ] represents the color of point P coefficient.
  • [k 0 P ,k 1 P ,...,k N P ] is only related to the coordinates of point P, and has nothing to do with v.
  • Step 308 combining the transparency of each effective position point with the observed color value to obtain the synthesized color value of the mth second pixel point.
  • the composite color value C t of the mth second pixel can be calculated according to the following formula (6):
  • i represents the i-th effective position point
  • D represents the total number of effective position points
  • C i represents the color value of the effective position point observed from the v direction
  • ⁇ i represents the transparency of the effective position point
  • Step 309 using the composite color value to render the color of the mth second pixel.
  • the method further includes: obtaining a synthetic view under the target viewpoint after the color of each second pixel of the view to be rendered is rendered; obtaining a real view under the target viewpoint ; According to the synthetic view and the real view, a synthetic loss is obtained; according to the synthetic loss, update the parameter values of the first multi-layer perceptron and the second multi-layer perceptron; thus, the first multi-layer perceptron
  • the results obtained by the layer perceptron and the second multi-layer perceptron are more accurate, so that the next time a similar scene is synthesized from a new view, a synthetic view with better image quality can be obtained.
  • the composite loss can be calculated according to the following formula (7):
  • L rec can be calculated according to the following formula (8);
  • TV(k 0 ) is the total variation loss of the regularization term, and
  • represents the coefficient of the regularization term.
  • the above image processing method can be applied to the online use stage, and can also be applied to the offline training stage.
  • the method further includes: using the updated first multi-layer perceptron and the updated second multi-layer perceptron to re-render the mth second pixel color, until the obtained composite loss meets the condition or the number of updates meets the condition, and the first multilayer perceptron and the second multilayer perceptron that can be used in the online use stage are obtained.
  • the depth map under the first reference viewpoint can be obtained through various methods.
  • the depth map can be obtained by decoding the code stream sent by the encoding end.
  • the encoding device can, according to the depth of the first pixel in the depth map under the first reference viewpoint, The depth map is divided into regions to obtain at least one region; then, the total number of regions obtained by division and the depth map are encoded to generate a code stream; thus, at the decoding end, the decoding device can obtain the total number of regions and the depth map obtained by decoding the code stream.
  • the depth map and then transmit these information to the image processing device, and the image processing device divides the depth map into regions according to the total number of regions, so as to obtain the at least one region, and then perform other content as in the above image processing method, And then obtain the synthesized view; and, transmit the synthesized view to the display device for image display or play.
  • the encoding device can encode the depth map to generate a code stream; thus, at the decoding end, the decoding device can obtain the depth map by decoding the code stream, Then the depth map is transmitted to the image processing device, and the image processing device performs region division on the depth map according to a specific region division algorithm, so as to obtain the at least one region, and then execute other content as in the above image processing method, and then obtain synthetic view; and, transmitting the synthetic view to a display device for image display or playback.
  • the scene new perspective synthesis model NeX is based on MPI and basis function (Basis function) to obtain better scene new perspective rendering results.
  • the NeX model modifies the color frame of MPI to add an effect that changes with the viewing angle for the color frame of MPI.
  • the characterization of MPI combined with basis functions is shown in Figure 4, where the basis functions are combined on the RGB values of the color map, and the combination method is shown in the following formula (9):
  • P represents the point coordinate position in MPI under the space coordinate system with the first reference viewpoint as the origin (hereinafter referred to as point P)
  • v represents the unit direction vector of point P relative to the target viewpoint
  • C P (v) represents point P is the color value (RGB format) observed from the v direction
  • k 0 P represents the underlying RGB value of point P, which is equivalent to the color value in the original MPI.
  • [k 1 P ,...,k N P ] represents the RGB coefficients of point P.
  • [k 0 P ,k 1 P ,...,k N P ] is only related to the coordinates of point P, and has nothing to do with v.
  • the NeX model takes as input a sparse view of the scene and can output new views around the input viewpoint.
  • the overall flow of the NeX model is shown in Figure 5:
  • the three vectors are sequentially spliced into a 56-dimensional vector, which is used as the real input of the first multi-layer perceptron F ⁇ .
  • a new view under the target viewpoint is obtained by rendering the MPI, the camera parameters corresponding to the first reference viewpoint, and the camera parameters corresponding to the target viewpoint.
  • the rendering method adopts the standard inverse homography (Standard inverse homography), as shown in the following formula (11):
  • R and t are the rotation matrix and translation vector from the camera coordinate system of the first reference viewpoint to the camera coordinate system of the target viewpoint in the world coordinate system.
  • a is the negative of the plane depth value in MPI.
  • k s and k t are camera internal parameters corresponding to the first reference viewpoint and the target viewpoint respectively.
  • (u t ,v t ,1) are the homogeneous coordinates of the pixels in the image (that is, the view to be rendered) under the target viewpoint.
  • (u s ,v s ,1) are the homogeneous coordinates of point P in the corresponding plane under the first reference viewpoint.
  • each pixel point (u t ,v t ,1) in the view to be rendered there is a corresponding point (u si ,v si ,1) in the i-th plane (sorted from far to near) in MPI .
  • each pixel point (u t , v t ,1) has D corresponding pixel points in the MPI.
  • a series of P points (u si ,v si ,1) corresponding to the pixel points (u t ,v t ,1) in the view to be rendered are obtained from step e.
  • the RGB value C i and the ⁇ value ⁇ i at point P the RGB value C t of the pixel point (u t ,v t ,1) is calculated according to the following formula 12:
  • the output image (that is, the synthetic view) is compared with the real view under the target viewpoint, and the difference between the two is measured by reconstruction loss.
  • the reconstruction error L rec is calculated according to the following formula (13):
  • the scene representation adopted by the NeX model is the MPI representation combined with basis functions. Therefore, the NeX model incorporates the flaws of MPI.
  • most regions of space have no visible surfaces.
  • most areas of the color map and transparency map in MPI are invalid values, that is, they do not contain visible information.
  • the MPI plane layer example shown in Figure 7 shows the 40th plane layer (a) to the 45th plane Layer (f).
  • the first row is a color map
  • the second row is a transparency map (black is the invalid area).
  • a scene perspective rendering model based on PMPI and basis functions (hereinafter referred to as PMPI model) is provided.
  • the block multi-plane image introduces the depth information of the scene on the basis of MPI, and thus obtains the shape (region division and depth range of each region) that changes adaptively with the depth of the scene.
  • PMPI model Using PMPI and basis functions as scene representation, a complete new view rendering model is built around the characteristics of PMPI.
  • the model takes a sparse view of the scene as input (no need to input a depth map of the scene), and outputs a new view of a given viewpoint (i.e., a synthetic view under the target viewpoint).
  • d. determine the shape of the PMPI (region boundary and the depth range of each region) by the depth map, as shown in Figure 9 for example, wherein, 901 is under the situation that area number A is less than 10, carries out region by algorithm Otsu algorithm to depth map
  • the result of the division for example, the final division into a foreground mask and a background mask.
  • 902 is the result obtained by performing region division on the depth map through the superpixel segmentation algorithm when the region number A is greater than or equal to 10.
  • the second multi-layer perceptron MLP2 is used to input the position encoding vector of v x and v y in the unit direction vector v of point P relative to the target viewpoint (observation point), and learn the basis of point P in the color map in PMPI.
  • Function H n (v), the number of basis functions N 8;
  • a new view under the target viewpoint is obtained by rendering the PMPI, the camera parameters corresponding to the first reference viewpoint, and the camera parameters corresponding to the target viewpoint.
  • the rendering method is still the standard inverse homography shown in the above formula (11).
  • the standard inverse homography shown in the above formula (11) is used to obtain its corresponding point P coordinates at each depth.
  • the number of P points corresponding to the pixels of the view to be rendered is uncertain (less than or equal to 7), and the depth thereof will also change with the pixels of the view to be rendered.
  • the coordinates of the 7 points are screened to obtain an effective P point;
  • Equation 6 Calculate the reconstruction loss according to the synthetic view under the target viewpoint and the real view under the target viewpoint, and use Equation 6 as the loss function in the training process.
  • the determination process of the depth map is shown in Figure 11, wherein, 1: obtain the MPI through the simplified version (remove the basis function) NeX model; 2: synthesize the disparity map from the transparency map ( ⁇ map) of the MPI; 3: use the disparity map And the sparse point cloud computing depth map obtained by colmap:
  • a For the sparse view of the scene, use the colmap tool for camera parameter estimation and multidimensional stereo reconstruction (MVS) from motion to structure (SfM) to obtain a sparse point cloud.
  • the coordinates of the sparse point cloud are (x,y,d), where d is the depth relative to the reference camera;
  • the disparity map of the scene is synthesized by the transparency layer of MPI.
  • the sparse point cloud computing depth map obtained from the disparity map and colmap is shown in 3 in Figure 11.
  • d i is the depth of the ith plane of MPI (ordered from far to near).
  • Parallax and depth are inversely proportional, and the inverse proportionality coefficient is determined.
  • the inverse proportional coefficient is denoted as ⁇ .
  • the optimal value ⁇ ' of ⁇ is obtained by minimizing the L2 loss as shown in the following equation (16):
  • P s is the sparse point cloud
  • (x, y, d) are the coordinates of the point in the camera coordinate system of the first reference viewpoint.
  • the depth map can be calculated from the disparity map and the inverse proportional coefficient.
  • the PMPI model is compared with the NeX model on the real scene fern and trex.
  • the output image size is fixed at 1008 ⁇ 756.
  • the PMPI model is trained for 400 rounds in the process of synthesizing MPI, 4000 rounds of synthetic PMPI training, and 4000 rounds of NeX training. The rest of the settings are the same. The training time of the two is almost the same.
  • Table 1 The data comparison between the two on the test set is shown in Table 1 below:
  • Figures 12 and 13 The intuitive results of the synthesis are shown in Figures 12 and 13, where Figure 12 is the comparison of the synthesis effects in the fern scene, the red boxes 121 and 122 are the synthesis results of the NeX model, and the red boxes 123 and 124 are the synthesis results of the PMPI model.
  • Figure 13 is a comparison of the synthesis effect in the trex scene, the red boxes 131 and 132 are the synthesis results of the NeX model, and the red boxes 133 and 134 are the synthesis results of the PMPI model.
  • the image processing device provided by the embodiments of the present application, including the included modules and the units included in each module, can be implemented by various types of processors; of course, it can also be implemented by specific logic circuit implementation.
  • FIG. 14 is a schematic structural diagram of an image processing device according to an embodiment of the present application. As shown in FIG. 14 , the image processing device 14 includes:
  • a region division module 141 configured to perform region division on the depth map according to the depth of the first pixel in the depth map under the first reference viewpoint, to obtain at least one region;
  • a coordinate inverse transformation module 142 configured to inversely transform the coordinates of the m th second pixel point of the view to be rendered under the target viewpoint into at least one target area in the at least one area, to obtain the m th second pixel point Existing position points of pixels in the at least one target area; wherein, m is greater than 0 and less than or equal to the total number of pixels of the view to be rendered;
  • the rendering module 143 is configured to render the color of the mth second pixel according to the position of the mth second pixel in the at least one target area.
  • the area division module 141 is configured to: determine the depth relationship between the first pixels according to the depth of the first pixel in the depth map; and determine the depth relationship between the first pixels according to the depth relationship.
  • the depth map is divided into regions to obtain at least one region.
  • the area division module 141 is configured to: divide the first pixel points with the same depth or a depth difference within a specific range into the same area.
  • the coordinate inverse transformation module 142 is configured to: determine the transformation relationship between the camera coordinate system where the first reference viewpoint is located and the camera coordinates where the target viewpoint is located; obtain the camera corresponding to the first reference viewpoint The internal reference and the camera internal reference corresponding to the target viewpoint; determining the region depth of the at least one target region; according to the transformation relationship, the camera internal reference corresponding to the first reference viewpoint and the target viewpoint, and the region depth, Inverse homography transformation is performed on the coordinates of the m th second pixel to obtain the existing position of the m th second pixel in the at least one target area.
  • the rendering module 143 is configured to: select the position points satisfying the conditions from the position points of the mth second pixel point in the at least one target area as valid position points; and According to the effective position point, render the color of the mth second pixel point.
  • the rendering module 143 is configured to: determine the color coefficient, transparency, basic color value and basis function of the effective position point; wherein, the argument of the basis function is the effective position point and the The relative direction of the target viewpoint; according to the color coefficient, basic color value and basis function of the effective position point, the observed color value of the effective position point from the relative direction is obtained; the effective position point of each Combining the transparency and the observed color value to obtain a composite color value; using the composite color value to render the color of the mth second pixel.
  • the rendering module 143 is configured to: obtain the transparency and color coefficient of the effective location point according to the coordinates of the effective location point and the trained first multi-layer perceptron; Coordinates to obtain the basic color value of the effective location point; according to the relative direction and the trained second multi-layer perceptron, obtain the basis function of the effective location point.
  • the rendering module 143 is configured to: map the coordinates of the effective location point into a vector with the first dimension; input the vector with the first dimension into the first multi-layer perceptron , to obtain the transparency and color coefficient of the effective location point.
  • the rendering module 143 is configured to: map the relative direction into a vector with a second dimension; input the vector with the second dimension into the second multi-layer perceptron to obtain the The basis function of the effective position points.
  • the image processing device 14 further includes an update module, configured to: obtain a synthetic view under the target viewpoint after the color of each second pixel of the view to be rendered is rendered; obtain the A real view under the target viewpoint; according to the synthetic view and the real view, a composite loss is obtained; according to the composite loss, parameter values of the first multilayer perceptron and the second multilayer perceptron are updated.
  • an update module configured to: obtain a synthetic view under the target viewpoint after the color of each second pixel of the view to be rendered is rendered; obtain the A real view under the target viewpoint; according to the synthetic view and the real view, a composite loss is obtained; according to the composite loss, parameter values of the first multilayer perceptron and the second multilayer perceptron are updated.
  • the rendering module 143 is further configured to: use the updated first multi-layer perceptron and the updated second multi-layer perceptron to re-render the color of the mth second pixel until The synthetic loss of satisfies the condition or the number of updates satisfies the condition.
  • the image processing device 14 further includes a depth map obtaining module, configured to: perform three-dimensional reconstruction on the included scene according to at least one view under the second reference viewpoint, and obtain the camera of the scene at the first reference viewpoint point cloud data in a coordinate system; determining a disparity map of the scene; and obtaining a depth map at the first reference viewpoint according to the disparity map and the point cloud data.
  • a depth map obtaining module configured to: perform three-dimensional reconstruction on the included scene according to at least one view under the second reference viewpoint, and obtain the camera of the scene at the first reference viewpoint point cloud data in a coordinate system; determining a disparity map of the scene; and obtaining a depth map at the first reference viewpoint according to the disparity map and the point cloud data.
  • the depth map obtaining module is configured to: obtain the transparency map of at least one plane of the scene according to at least one view under the second reference viewpoint; obtain the transparency map of the at least one plane and the corresponding Depth of the plane, the disparity map of the scene is synthesized.
  • the depth map obtaining module is configured to: obtain an inverse proportional coefficient between the disparity map and the depth map according to the disparity map and the point cloud data; , to obtain the depth map under the first reference viewpoint.
  • the above method is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.
  • the computer software products are stored in a storage medium and include several instructions to make
  • the electronic device executes all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: various media that can store program codes such as U disk, mobile hard disk, read-only memory (Read Only Memory, ROM), magnetic disk or optical disk.
  • embodiments of the present application are not limited to any specific combination of hardware and software.
  • FIG. 15 is a schematic diagram of hardware entities of the electronic device according to the embodiment of the present application.
  • the electronic device 15 includes a memory 151 and a processor 152.
  • the memory 151 stores a computer program that can run on the processor 152, and the processor 152 implements the steps in the methods provided in the above-mentioned embodiments when executing the program.
  • the memory 151 is configured to store instructions and applications executable by the processor 152, and may also cache data to be processed or processed by each module in the processor 152 and the electronic device 15 (for example, image data, audio data, etc. , voice communication data and video communication data), can be implemented by flash memory (FLASH) or random access memory (Random Access Memory, RAM).
  • FLASH FLASH
  • RAM Random Access Memory
  • the electronic device further includes a decoder 161 and a display device 162; wherein, the decoder 161 is configured to decode the code stream sent by the encoding end to obtain the depth under the first reference viewpoint and, transmitting the depth map to the processor 152; the processor 152 is configured to execute the steps in the image processing method provided in the above-mentioned embodiments according to the depth map, so as to finally obtain a synthetic view under the target viewpoint; and , transmit the synthesized view to the display device 162; the display device 162 displays or plays according to the received synthesized view; wherein, the processor 152 can divide the depth map into regions according to a specific region division algorithm, so as to obtain at least one region .
  • the code stream may also carry the total number of regions, so that the processor 152 may perform region division on the depth map according to the total number of regions decoded by the decoder 161 .
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps in the method provided in the foregoing embodiments are implemented.
  • the disclosed devices and methods can be implemented in other ways.
  • the above-described embodiments of the touch screen system are only illustrative.
  • the division of the modules is only a logical function division.
  • the mutual coupling, or direct coupling, or communication connection between the various components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or modules may be in electrical, mechanical or other forms of.
  • modules described above as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules; they may be located in one place or distributed to multiple network units; Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional module in each embodiment of the present application can be integrated into one processing unit, or each module can be used as a single unit, or two or more modules can be integrated into one unit; the above-mentioned integration
  • the modules can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • the above-mentioned integrated units of the present application are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
  • the computer software products are stored in a storage medium and include several instructions to make
  • the electronic device executes all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes various media capable of storing program codes such as removable storage devices, ROMs, magnetic disks or optical disks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Image Generation (AREA)

Abstract

Provided are an image processing method and apparatus, a device, and a storage medium. The method comprises: according to the depth of a first pixel point in a depth map under a first reference viewpoint, performing region division on the depth map, and obtaining at least one region (101); inversely transforming the coordinates of an m-th second pixel point of a view to be rendered under a target viewpoint into at least one target region among the at least one region, and obtaining a presence position point of the m-th second pixel point in the at least one target region (102), M being greater than 0 and less than or equal to the total number of pixel points in the view to be rendered; and rendering the color of the m-th second pixel point according to the presence position point of the m-th second pixel point in the at least one target region (103).

Description

图像处理方法及装置、设备、存储介质Image processing method and device, equipment, storage medium 技术领域technical field
本申请实施例涉及图像技术,涉及但不限于图像处理方法及装置、设备、存储介质。The embodiments of the present application relate to image technologies, and relate to but are not limited to image processing methods, devices, equipment, and storage media.
背景技术Background technique
在虚拟现实、虚拟仿真和沉浸式远程视频会议等应用中,经常需要基于已知的图像合成任意视点下的视图,也就是合成视图。合成视图的质量的高低直接影响了用户对应用的体验。In applications such as virtual reality, virtual simulation, and immersive remote video conferencing, it is often necessary to synthesize views at arbitrary viewpoints based on known images, that is, synthetic views. The quality of the composite view directly affects the user's experience with the application.
发明内容Contents of the invention
本申请实施例提供的图像处理方法及装置、设备、存储介质,是这样实现的:The image processing method, device, equipment, and storage medium provided in the embodiments of the present application are implemented as follows:
本申请实施例提供的图像处理方法,包括:根据第一参考视点下的深度图(Depth Map)中第一像素点的深度,对所述深度图进行区域划分,得到至少一个区域;将目标视点下的待渲染视图的第m个第二像素点的坐标反变换至所述至少一个区域中的至少一个目标区域中,得到所述第m个第二像素点在所述至少一个目标区域中的存在位置点;m大于0且小于或等于所述待渲染视图的总像素点数;根据所述第m个第二像素点在所述至少一个目标区域中的存在位置点,渲染所述第m个第二像素点的颜色。The image processing method provided by the embodiment of the present application includes: according to the depth of the first pixel in the depth map (Depth Map) under the first reference viewpoint, performing region division on the depth map to obtain at least one region; The coordinates of the mth second pixel point of the view to be rendered are inversely transformed into at least one target area in the at least one area, and the coordinates of the mth second pixel point in the at least one target area are obtained. There is a position point; m is greater than 0 and less than or equal to the total number of pixels of the view to be rendered; according to the existence position point of the mth second pixel point in the at least one target area, render the mth second pixel point The color of the second pixel.
本申请实施例提供的图像处理装置,包括:区域划分模块,用于根据第一参考视点下的深度图中第一像素点的深度,对所述深度图进行区域划分,得到至少一个区域;坐标反变换模块,用于将目标视点下的待渲染视图的第m个第二像素点的坐标反变换至所述至少一个区域中的至少一个目标区域中,得到所述第m个第二像素点在所述至少一个目标区域中的存在位置点;其中,m大于0且小于或等于所述待渲染视图的总像素点数;渲染模块,用于根据所述第m个第二像素点在所述至少一个目标区域中的存在位置点,渲染所述第m个第二像素点的颜色。The image processing device provided in the embodiment of the present application includes: a region division module, configured to perform region division on the depth map according to the depth of the first pixel point in the depth map under the first reference viewpoint to obtain at least one region; coordinates An inverse transformation module, configured to inversely transform the coordinates of the mth second pixel point of the view to be rendered under the target viewpoint into at least one target area in the at least one area, to obtain the mth second pixel point Existing position points in the at least one target area; wherein, m is greater than 0 and less than or equal to the total number of pixels of the view to be rendered; the rendering module is configured to place the mth second pixel in the For at least one position point in the target area, render the color of the mth second pixel point.
本申请实施例提供的电子设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现所述图像处理方法中的步骤。The electronic device provided by the embodiment of the present application includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the steps in the image processing method when executing the program.
本申请实施例提供的计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现所述图像处理方法中的步骤。The computer-readable storage medium provided by the embodiment of the present application stores a computer program thereon, and when the computer program is executed by a processor, the steps in the image processing method are implemented.
在本申请实施例中,根据第一参考视点下的深度图中第一像素点的深度,对所述深度图进行区域划分,而不是基于预先给定的平面分布规律对第一参考视点下的视图(Viewport)进行平面划分;如此,由于区域划分结合了实际场景中各个点的深度,因此,使得最终渲染的第二像素点的颜色更加准确,进而使得待渲染视图被渲染后(即合成视图)的图像细节更多。In the embodiment of the present application, the depth map is divided into regions according to the depth of the first pixel in the depth map under the first reference viewpoint, instead of dividing the pixels under the first reference viewpoint based on a predetermined plane distribution law. The view (Viewport) is divided into planes; in this way, because the area division combines the depth of each point in the actual scene, the color of the second pixel in the final rendering is more accurate, so that after the view to be rendered is rendered (that is, the synthetic view ) has more image detail.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本申请的实施例,并与说明书一起用于说明本申请的技术方案。The accompanying drawings here are incorporated into the specification and constitute a part of the specification. These drawings show embodiments consistent with the application, and are used together with the description to describe the technical solution of the application.
图1为本申请实施例图像处理方法的实现流程示意图;Fig. 1 is a schematic diagram of the implementation flow of the image processing method of the embodiment of the present application;
图2为本申请实施例图像处理方法的另一实现流程示意图;FIG. 2 is a schematic flow diagram of another implementation of the image processing method of the embodiment of the present application;
图3为本申请实施例图像处理方法的又一实现流程示意图;FIG. 3 is a schematic flow diagram of another implementation of the image processing method of the embodiment of the present application;
图4为本申请实施例由4个平面层组成的结合基函数的多平面图像(Multiplane Image,MPI)表征的示意图;Fig. 4 is the schematic diagram of the multiplane image (Multiplane Image, MPI) representation of the embodiment of the present application that is composed of 4 plane layers combined with basis functions;
图5为本申请实施例场景新视角合成模型NeX的工作流程(略去了获取H n(v)的过程); Fig. 5 is the workflow of the synthesis model NeX of scene new angle of view of the embodiment of the present application (the process of obtaining H n (v) is omitted);
图6为本申请实施例以平面数D=3为例的标准反向单应变换示意图;FIG. 6 is a schematic diagram of a standard inverse homography transformation taking plane number D=3 as an example in the embodiment of the present application;
图7为MPI平面层实例示意图;Fig. 7 is a schematic diagram of an example of an MPI plane layer;
图8为本申请实施例PMIP(Patch Multiplane Image)模型的工作流程示意图;Fig. 8 is the schematic workflow diagram of the PMIP (Patch Multiplane Image) model of the embodiment of the present application;
图9为本申请实施例获得PMPI形状的流程示意图;Fig. 9 is the schematic flow sheet that the embodiment of the present application obtains PMPI shape;
图10为本申请实施例区域数A=2,深度数为4的PMPI渲染示意图;Fig. 10 is a schematic diagram of PMPI rendering in which the area number A=2 and the depth number is 4 in the embodiment of the present application;
图11为本申请实施例深度图的计算流程示意图;FIG. 11 is a schematic diagram of a calculation flow chart of a depth map according to an embodiment of the present application;
图12为在fern场景的合成效果对比示意图;Figure 12 is a schematic diagram of the comparison of the synthesis effect in the fern scene;
图13在trex场景的合成效果对比示意图;Figure 13 Schematic diagram of the comparison of the synthesis effect in the trex scene;
图14为本申请实施例图像处理装置的结构示意图;FIG. 14 is a schematic structural diagram of an image processing device according to an embodiment of the present application;
图15为本申请实施例的电子设备的硬件实体示意图;FIG. 15 is a schematic diagram of a hardware entity of an electronic device according to an embodiment of the present application;
图16为本申请实施例的电子设备的另一硬件实体示意图。FIG. 16 is a schematic diagram of another hardware entity of the electronic device according to the embodiment of the present application.
具体实施方式detailed description
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请的具体技术方案做进一步详细描述。以下实施例用于说明本申请,但不用来限制本申请的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the specific technical solutions of the present application will be further described in detail below in conjunction with the drawings in the embodiments of the present application. The following examples are used to illustrate the present application, but not to limit the scope of the present application.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。In the following description, references to "some embodiments" describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.
需要指出,本申请实施例所涉及的术语“第一\第二\第三”是为了区别类似或不同的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。It should be pointed out that the term "first\second\third" involved in the embodiment of this application is to distinguish similar or different objects, and does not represent a specific ordering of objects. Understandably, "first\second\third The specific order or sequence of "three" can be interchanged where permitted so that the embodiments of the application described herein can be practiced in other orders than those illustrated or described herein.
本申请实施例提供一种图像处理方法,该方法应用于电子设备,电子设备可以是任意具有数据处理能力的设备,例如该电子设备为笔记本电脑、手机、服务器、电视机或投影仪等。An embodiment of the present application provides an image processing method, which is applied to an electronic device, and the electronic device may be any device capable of data processing, for example, the electronic device is a notebook computer, a mobile phone, a server, a TV, or a projector.
图1为本申请实施例图像处理方法的实现流程示意图,如图1所示,所述方法可以包括以下步骤101至步骤103:Fig. 1 is a schematic diagram of the implementation flow of the image processing method of the embodiment of the present application. As shown in Fig. 1, the method may include the following steps 101 to 103:
步骤101,根据第一参考视点下的深度图中第一像素点的深度,对所述深度图进行区域划分,得到至少一个区域。 Step 101, according to the depth of the first pixel in the depth map under the first reference viewpoint, perform region division on the depth map to obtain at least one region.
需要说明的是,对于深度图的区域划分范围不做限制,在一些实施例中,可以将深度图的某一图块或者某几个图块进行所述区域划分;在另一些实施例中,可以将整个深度图进行所述区域划分。It should be noted that there is no limit to the area division range of the depth map. In some embodiments, a certain block or several blocks of the depth map can be divided into the area; in other embodiments, The entire depth map can be divided into regions.
对于划分得到的区域的数量可以是特定数量,也可以不是特定数量。不是特定数量时,划分得到的区域的数量与实际场景相关,即与深度图中第一像素点的深度分布有关。The number of divided regions may or may not be a specific number. If it is not a specific number, the number of divided regions is related to the actual scene, that is, related to the depth distribution of the first pixel in the depth map.
在本申请实施例中,对于深度图的获取方法不做限定。例如,可以基于双目立体视觉,即通过电子设备携带的两个相隔一定距离的摄像头同时获取同一场景的两幅图像,通过立体匹配算法找到两幅图像中对应的像素点,然后根据三角原理计算出视差信息,而视差信息通过转换可用于表征场景中物体的深度信息。又如,通过电子设备携带的主动测距传感器实现对场景的深度信息的采集;其中,主动测距传感器例如可以是飞行时间(Time of flight,TOF)相机、结构光设备或激光雷达等。再如,电子设备还可以通过如下实施例的步骤201至步骤203,得到第一参考视点下的深度图,即,基于至少一 张第二参考视点下的视图,得到第一参考视点下的深度图;其中,第一参考视点与第二参考视点不同。相比于双目立体视觉和主动测距传感器的方法,该方法不需要电子设备具有双目摄像头,也不需要具有主动测距传感器即可得到深度图,从而使得本申请实施例提供的图像处理方法能够适用于更多的电子设备中,其普适性更强。In the embodiment of the present application, no limitation is imposed on the method for obtaining the depth map. For example, it can be based on binocular stereo vision, that is, two images of the same scene are simultaneously acquired through two cameras carried by an electronic device at a certain distance, and the corresponding pixels in the two images are found through a stereo matching algorithm, and then calculated according to the triangulation principle The disparity information can be obtained, and the disparity information can be used to represent the depth information of objects in the scene through conversion. As another example, the acquisition of the depth information of the scene is realized through the active ranging sensor carried by the electronic device; wherein, the active ranging sensor may be, for example, a time of flight (Time of flight, TOF) camera, a structured light device, or a laser radar. For another example, the electronic device may also obtain the depth map under the first reference viewpoint through steps 201 to 203 of the following embodiment, that is, obtain the depth under the first reference viewpoint based on at least one view under the second reference viewpoint Figure; where the first reference viewpoint is different from the second reference viewpoint. Compared with the method of binocular stereo vision and active ranging sensor, this method does not require the electronic device to have a binocular camera, nor does it need to have an active ranging sensor to obtain a depth map, so that the image processing provided by the embodiment of the present application The method can be applied to more electronic devices, and its universality is stronger.
在一些实施例中,还可以通过接收编码端发送的码流,对码流进行解码,从而得到第一参考视点下的深度图。In some embodiments, the code stream may also be decoded by receiving the code stream sent by the encoding end, so as to obtain the depth map under the first reference viewpoint.
在本申请实施例中,对于区域划分方法也不做限定。可以根据第一像素点之间的深度关系,对深度图或者深度图中的某一图块或某几个图块进行区域划分。例如,将深度相等的像素点划分在同一区域。又如,将深度差在特定范围内的像素点划分在同一区域。In the embodiment of the present application, there is no limitation on the area division method. The depth map or a block or blocks in the depth map may be divided into regions according to the depth relationship between the first pixels. For example, divide pixels with equal depth into the same area. In another example, pixels with depth differences within a specific range are divided into the same area.
步骤102,将目标视点下的待渲染视图的第m个第二像素点的坐标反变换至所述至少一个区域中的至少一个目标区域中,得到所述第m个第二像素点在所述至少一个目标区域中的存在位置点;其中,m大于0且小于或等于待渲染视图的总像素点数。Step 102: Inversely transform the coordinates of the mth second pixel point of the view to be rendered under the target viewpoint into at least one target area in the at least one area, and obtain the mth second pixel point in the Existing position points in at least one target area; wherein, m is greater than 0 and less than or equal to the total number of pixels of the view to be rendered.
需要说明的是,所述至少一个目标区域可以是所述至少一个区域中的所有区域,也可以是一个或多个区域;是所有区域时,无需对所述至少一个区域进行筛选,直接将区域划分得到的所述至少一个区域作为所述目标区域即可。It should be noted that the at least one target area may be all areas in the at least one area, or one or more areas; when it is all areas, there is no need to screen the at least one area, and the area The at least one divided area may be used as the target area.
所述至少一个目标区域是其中的一个或多个区域时,在一些实施例中,可以将所述至少一个区域中的任意区域作为目标区域,例如,从中随机抽取特定数量的区域作为目标区域;在另一些实施例中,还可以将从所述至少一个区域中挑选出满足特定条件的区域作为目标区域。例如,将像素点数目大于特定数目的区域作为目标区域;又如,将区域深度小于特定深度的区域作为目标区域。When the at least one target area is one or more of them, in some embodiments, any area in the at least one area can be used as the target area, for example, a specific number of areas are randomly selected as the target area; In some other embodiments, an area satisfying a specific condition may also be selected from the at least one area as the target area. For example, an area whose number of pixels is larger than a specific number is used as a target area; another example is an area whose depth is smaller than a specific depth is used as a target area.
在一些实施例中,可以通过如下公式(1)将目标视点下的待渲染视图的第二像素点的齐次坐标(u t,v t,1)反变换至目标区域中,从而得到该像素点在第一参考视点下的该目标区域中的存在位置点的齐次坐标(u s,v s,1): In some embodiments, the homogeneous coordinates (u t ,v t ,1) of the second pixel point of the view to be rendered under the target viewpoint can be inversely transformed into the target area by the following formula (1), so as to obtain the pixel The homogeneous coordinates (u s ,v s ,1) of the point in the target area under the first reference viewpoint:
Figure PCTCN2021103290-appb-000001
Figure PCTCN2021103290-appb-000001
式中,~表示按某种比例相等的意思,R和t是第一参考视点的相机坐标系到目标视点的相机坐标系的旋转矩阵和平移向量。a是目标区域的区域深度的负值,如果同一目标区域的像素点的深度不相等,可以将该区域的像素点的深度均值或中值等作为该区域的深度。n=(0,0,1)是第一参考视点的相机坐标系下MPI平面的单位法向量。k s和k t是第一参考视点和目标视点分别对应的相机内参。 In the formula, ~ means equal in a certain proportion, and R and t are the rotation matrix and translation vector from the camera coordinate system of the first reference viewpoint to the camera coordinate system of the target viewpoint. a is the negative value of the region depth of the target region. If the depths of the pixels in the same target region are not equal, the average or median depth of the pixels in the region can be used as the depth of the region. n=(0,0,1) is the unit normal vector of the MPI plane in the camera coordinate system of the first reference viewpoint. k s and k t are camera internal parameters corresponding to the first reference viewpoint and the target viewpoint respectively.
步骤103,根据所述第m个第二像素点在所述至少一个目标区域中的存在位置点,渲染所述第m个第二像素点的颜色。Step 103: Render the color of the mth second pixel according to the position of the mth second pixel in the at least one target area.
在一些实施例中,可以通过如下实施例的步骤210和步骤211实现步骤103。In some embodiments, step 103 may be implemented through steps 210 and 211 of the following embodiments.
在本申请实施例中,根据第一参考视点下的深度图中第一像素点的深度,对所述深度图进行区域划分,而不是基于预先给定的平面分布规律对第一参考视点下的视图进行平面划分;如此,由于区域划分结合了实际场景中各个点的深度,因此,使得最终渲染的第二像素点的颜色更加准确,进而使得待渲染视图被渲染后(即合成视图)的图像细节更多。In the embodiment of the present application, the depth map is divided into regions according to the depth of the first pixel in the depth map under the first reference viewpoint, instead of dividing the pixels under the first reference viewpoint based on a predetermined plane distribution law. The view is plane-divided; in this way, since the area division combines the depth of each point in the actual scene, the color of the second pixel in the final rendering is more accurate, and then the image of the view to be rendered after being rendered (that is, the synthetic view) More details.
本申请实施例再提供一种图像处理方法,图2为本申请实施例图像处理方法的实现流程示意图,如图2所示,该方法可以包括以下步骤201至步骤211:The embodiment of the present application further provides an image processing method. FIG. 2 is a schematic diagram of the implementation process of the image processing method in the embodiment of the present application. As shown in FIG. 2, the method may include the following steps 201 to 211:
步骤201,根据至少一张第二参考视点下的视图对包括的场景进行三维重建,得到所述场景在第一参考视点的相机坐标系下的点云数据。 Step 201, perform three-dimensional reconstruction on the included scene according to at least one view under the second reference viewpoint, and obtain point cloud data of the scene under the camera coordinate system of the first reference viewpoint.
在一些实施例中,可以将场景的稀疏视图作为colmap工具的输入,进行由运动到结构(SFM)的相机参数估计和多维立体重建(MVS),从而得到该点云数据;其中,点云数据中点的坐标表示为(x,y,d),d表示该点相对于第一参考视点的相机的深度。In some embodiments, the sparse view of the scene can be used as the input of the colmap tool to perform camera parameter estimation and multi-dimensional stereo reconstruction (MVS) from motion to structure (SFM), thereby obtaining the point cloud data; wherein, the point cloud data The coordinates of the midpoint are denoted as (x,y,d), where d represents the depth of the point relative to the camera of the first reference viewpoint.
需要说明的是,场景的稀疏视图即为不同第二参考视点下的视图。It should be noted that the sparse views of the scene are views under different second reference viewpoints.
步骤202,确定所述场景的视差图。 Step 202, determining the disparity map of the scene.
在本申请实施例中,确定视差图的方法可以是多种多样的。例如,前文提到的基于双目立体视觉的方法。又如,在一些实施例中,电子设备可以这样实现步骤202:根据至少一张第二参考视点下的视图,得到所述场景的至少一个平面的透明度图;根据所述至少一个平面的透明度图和对应的平面深度,合成所述场景的视差图;如此,相比于基于双目立体视觉的方法,没有安装双目摄像头的电子设备依然可以实现所述图像处理方法,因此,其普适性更强,且节约了电子设备的硬件成本。In the embodiment of the present application, there may be various methods for determining the disparity map. For example, the method based on binocular stereo vision mentioned above. As another example, in some embodiments, the electronic device may implement step 202 in the following way: Obtain a transparency map of at least one plane of the scene according to at least one view under the second reference viewpoint; and the corresponding plane depth to synthesize the disparity map of the scene; thus, compared to the method based on binocular stereo vision, electronic devices without binocular cameras can still implement the image processing method, so its universality Stronger, and save the hardware cost of electronic equipment.
例如,以所述至少一张第二参考视点下的视图为输入,通过NeX模型合成场景的MPI表征,基于此,根据如下公式(2)合成所述视差图
Figure PCTCN2021103290-appb-000002
For example, using the at least one view under the second reference viewpoint as input, the MPI representation of the scene is synthesized through the NeX model, and based on this, the disparity map is synthesized according to the following formula (2)
Figure PCTCN2021103290-appb-000002
Figure PCTCN2021103290-appb-000003
Figure PCTCN2021103290-appb-000003
其中,d i表示第i个MPI平面(由远及近排序)的深度,α i表示第i个平MPI面的透明度。D表示透明度图的数目。 Among them, d i represents the depth of the i-th MPI plane (sorted from far to near), and α i represents the transparency of the i-th flat MPI plane. D represents the number of transparency maps.
步骤203,根据所述视差图和所述点云数据,得到所述第一参考视点下的深度图。 Step 203, obtain a depth map under the first reference viewpoint according to the disparity map and the point cloud data.
在一些实施例中,电子设备可以这样实现步骤203:根据所述视差图和所述点云数据,得到所述视差图与所述深度图的反比例系数;根据所述反比例系数和所述视差图,得到所述第一参考视点下的深度图。In some embodiments, the electronic device can implement step 203 in this way: according to the disparity map and the point cloud data, obtain the inverse proportional coefficient between the disparity map and the depth map; according to the inverse proportional coefficient and the disparity map , to obtain the depth map under the first reference viewpoint.
可以理解地,视差与深度成反比例关系,因此,可以先确定反比例系数,然后基于该系数,将视差图转换为深度图。It can be understood that the disparity is inversely proportional to the depth. Therefore, the inverse proportion coefficient can be determined first, and then the disparity map can be converted into a depth map based on the coefficient.
对于反比例系数的确定方法,例如,可以根据如下公式(3)计算得到:For the determination method of the inverse proportional coefficient, for example, it can be calculated according to the following formula (3):
Figure PCTCN2021103290-appb-000004
Figure PCTCN2021103290-appb-000004
其中,反比例系数记为σ,P s为点云数据,(x,y,d)是点在第一参考视点的相机坐标系中的坐标。 Among them, the inverse proportional coefficient is recorded as σ, P s is the point cloud data, and (x, y, d) are the coordinates of the point in the camera coordinate system of the first reference viewpoint.
步骤204,根据所述深度图中第一像素点的深度,确定所述第一像素点之间的深度关系; Step 204, according to the depth of the first pixel in the depth map, determine the depth relationship between the first pixels;
步骤205,根据所述深度关系,对所述深度图进行区域划分,得到至少一个区域。 Step 205, according to the depth relationship, perform region division on the depth map to obtain at least one region.
在一些实施例中,将深度相同或深度差在特定范围内的第一像素点划分在同一区域。可以采用OTSU算法或者超像素分割算法实现区域划分。In some embodiments, the first pixel points with the same depth or a depth difference within a specific range are divided into the same area. The region division can be realized by using the OTSU algorithm or the superpixel segmentation algorithm.
步骤206,确定所述第一参考视点所在的相机坐标系与所述目标视点所在的相机坐标的变换关系。 Step 206, determining the transformation relationship between the camera coordinate system where the first reference viewpoint is located and the camera coordinate system where the target viewpoint is located.
在一些实施例中,变换关系包括旋转矩阵和平移向量。In some embodiments, the transformation relationship includes a rotation matrix and a translation vector.
步骤207,获取所述第一参考视点对应的相机内参和所述目标视点对应的相机内参; Step 207, acquiring the internal camera parameters corresponding to the first reference viewpoint and the internal camera parameters corresponding to the target viewpoint;
步骤208,确定所述至少一个区域中的至少一个目标区域的区域深度。 Step 208, determine the area depth of at least one target area in the at least one area.
在一些实施例中,对于同一目标区域的像素点的深度不同的情况,可以将该区域的像素点的深度均值或中值等作为该区域的区域深度;对于同一目标区域的像素点的深度相同的情况,可以将该区域的任一像素点的深度作为该区域的区域深度。In some embodiments, when the depths of pixels in the same target area are different, the mean or median depth of the pixels in the area can be used as the area depth of the area; for the same depth of pixels in the same target area In the case of , the depth of any pixel in the area can be used as the area depth of the area.
步骤209,根据所述变换关系、所述第一参考视点和所述目标视点分别对应的相机内参以及所述区域深度,对所述第m个第二像素点的齐次坐标进行反向单应变换,得到 所述第m个第二像素点在所述至少一个目标区域中的存在位置点。Step 209: Perform a reverse homography on the homogeneous coordinates of the mth second pixel point according to the transformation relationship, the camera intrinsic parameters corresponding to the first reference viewpoint and the target viewpoint, and the region depth transform to obtain the existing location point of the mth second pixel point in the at least one target area.
步骤210,从所述第m个第二像素点在所述至少一个目标区域中的存在位置点中,筛选出满足条件的位置点作为有效位置点。 Step 210, from the existing position points of the mth second pixel point in the at least one target area, select the position points satisfying the condition as valid position points.
在一些实施例中,如果所述存在位置点在对应区域中,则将该位置点作为有效位置点;否则,如果所述存在位置点不在对应区域中,则视为无效位置点,舍弃。In some embodiments, if the existing location point is in the corresponding area, the location point is regarded as a valid location point; otherwise, if the existing location point is not in the corresponding area, it is regarded as an invalid location point and discarded.
步骤211,根据所述有效位置点,渲染所述第m个第二像素点的颜色。Step 211: Render the color of the mth second pixel according to the effective position point.
在一些实施例中,可以这样实现步骤211:确定所述有效位置点的颜色系数、透明度、基础颜色值和基函数;其中,所述基函数的自变量为所述有效位置点与所述目标视点的相对方向;根据所述有效位置点的颜色系数、基础颜色值和基函数,得到所述有效位置点从所述相对方向被观察到的颜色值;将每一所述有效位置点的透明度和所述被观察到的颜色值进行合成,得到合成颜色值;利用所述合成颜色值,渲染所述第m个第二像素点的颜色。进一步地,在一些实施例中,可以通过如下实施例的步骤304,确定所述有效位置点的透明度和颜色系数;通过如下实施例的步骤305确定有效位置点的基础颜色值;通过如下实施例的步骤306确定所述有效位置点的基函数。In some embodiments, step 211 can be implemented as follows: determine the color coefficient, transparency, basic color value and basis function of the effective position point; wherein, the independent variable of the basis function is the effective position point and the target The relative direction of the viewpoint; according to the color coefficient, basic color value and basis function of the effective position point, the observed color value of the effective position point from the relative direction is obtained; the transparency of each effective position point is Combining with the observed color value to obtain a composite color value; using the composite color value to render the color of the mth second pixel. Further, in some embodiments, the transparency and color coefficient of the effective location point can be determined through step 304 of the following embodiment; the basic color value of the effective location point can be determined through step 305 of the following embodiment; through the following embodiment Step 306 of determining the basis functions of the effective location points.
在一些实施例中,所述相对方向可以是目标视点相对于有效位置点的单位方向向量,还可以是有效位置点相对于目标视点的单位方向向量。In some embodiments, the relative direction may be a unit direction vector of the target viewpoint relative to the effective location point, or may be a unit direction vector of the effective location point relative to the target viewpoint.
可以理解地,在本申请实施例中,先对从所述第m个第二像素点在所述至少一个目标区域中的存在位置点中筛选出有效位置点,然后基于有效位置点而不是每一存在位置点,渲染所述第m个第二像素点的颜色;如此,能够节约计算量,从而提高渲染效率,进而提升合成视图的合成效率。It can be understood that, in the embodiment of the present application, effective position points are first selected from the existing position points of the mth second pixel point in the at least one target area, and then based on the effective position points instead of each Once there is a position point, render the color of the mth second pixel point; in this way, the amount of calculation can be saved, thereby improving rendering efficiency, and further improving the synthesis efficiency of the synthesized view.
本申请实施例再提供一种图像处理方法,图3为本申请实施例图像处理方法的实现流程示意图,如图3所示,该方法可以包括以下步骤301至步骤309:The embodiment of the present application further provides an image processing method. FIG. 3 is a schematic diagram of the implementation process of the image processing method in the embodiment of the present application. As shown in FIG. 3 , the method may include the following steps 301 to 309:
步骤301,根据第一参考视点下的深度图中第一像素点的深度,对所述深度图进行区域划分,得到至少一个区域; Step 301, divide the depth map into regions according to the depth of the first pixel in the depth map under the first reference viewpoint, and obtain at least one region;
步骤302,将目标视点下的待渲染视图的第m个第二像素点的坐标反变换至所述至少一个区域中的至少一个目标区域中,得到所述第m个第二像素点在所述至少一个目标区域中的存在位置点;其中,m大于0且小于或等于待渲染视图的总像素点数;Step 302: Inversely transform the coordinates of the mth second pixel point of the view to be rendered under the target viewpoint into at least one target area in the at least one area, and obtain the mth second pixel point in the at least one target area. Existing position points in at least one target area; wherein, m is greater than 0 and less than or equal to the total number of pixels of the view to be rendered;
步骤303,从所述第m个第二像素点在所述至少一个目标区域中的存在位置点中,筛选出满足条件的位置点作为有效位置点; Step 303, from among the existing position points of the mth second pixel point in the at least one target area, select the position points satisfying the conditions as valid position points;
步骤304,根据所述有效位置点的坐标和已训练得到的第一多层感知机,得到所述有效位置点的透明度和颜色系数。Step 304: Obtain the transparency and color coefficient of the effective location point according to the coordinates of the effective location point and the trained first multi-layer perceptron.
在一些实施例中,可以将所述有效位置点的坐标映射为具有第一维度的向量;将所述具有第一维度的向量输入至所述第一多层感知机中,得到所述有效位置点的透明度和颜色系数。In some embodiments, the coordinates of the effective position point can be mapped to a vector with the first dimension; the vector with the first dimension is input into the first multi-layer perceptron to obtain the effective position The transparency and color factor of the point.
在本申请实施例中,对于第一维度的大小不做限定,可以是56维度的,也可以是任意维度的。In the embodiment of the present application, there is no limitation on the size of the first dimension, which may be 56 dimensions or any dimension.
进一步地,可以通过如下公式(4),实现对有效位置点的空间坐标(x,y,d)的映射:Further, the mapping to the space coordinates (x, y, d) of the effective position point can be realized by the following formula (4):
Figure PCTCN2021103290-appb-000005
Figure PCTCN2021103290-appb-000005
步骤305,根据所述有效位置点的坐标,得到所述有效位置点的基础颜色值; Step 305, according to the coordinates of the effective location point, obtain the basic color value of the effective location point;
步骤306,根据所述相对方向和已训练得到的第二多层感知机,得到所述有效位置点的基函数。 Step 306 , according to the relative direction and the trained second multi-layer perceptron, obtain the basis function of the effective location point.
在一些实施例中,将所述相对方向映射为具有第二维度的向量;将所述具有第二维 度的向量输入至所述第二多层感知机中,得到所述有效位置点的基函数。In some embodiments, the relative direction is mapped to a vector with a second dimension; the vector with the second dimension is input into the second multi-layer perceptron to obtain the basis function of the effective position point .
例如,将有效位置点相对于目标视点的单位方向向量v=(v x,v y,v z)中的v x,v y分别带入上述公式(4),从而得到具有第二维度的向量。对于第二维度的大小可以是任意的,h值也可以任意设置。 For example, substituting v x and v y in the unit direction vector v=(v x , v y , v z ) of the effective position point relative to the target viewpoint into the above formula (4), so as to obtain a vector with the second dimension . The size of the second dimension can be arbitrary, and the value of h can also be set arbitrarily.
步骤307,根据所述有效位置点的颜色系数、基础颜色值和基函数,得到所述有效位置点从所述相对方向被观察到的颜色值。Step 307: Obtain the color value of the effective location point viewed from the relative direction according to the color coefficient, the basic color value and the basis function of the effective location point.
在一些实施例中,可以根据如下公式(5)得到该有效位置点P从相对方向v被观察到的颜色值C P(v): In some embodiments, the color value C P (v) of the effective position point P observed from the relative direction v can be obtained according to the following formula (5):
Figure PCTCN2021103290-appb-000006
Figure PCTCN2021103290-appb-000006
其中,v代表点P相对于目标视点的单位方向向量,H n(v)代表与v相关的基函数,例如基函数个数N=8。k 0 P代表P点的基础颜色值(例如RGB值,当然也不限于该颜色格式,还可以通过其他颜色格式表示),[k 1 P,...,k N P]代表P点的颜色系数。[k 0 P,k 1 P,...,k N P]只与P点的坐标相关,与v无关。 Wherein, v represents the unit direction vector of the point P relative to the target viewpoint, H n (v) represents the basis function related to v, for example, the number of basis functions N=8. k 0 P represents the basic color value of point P (such as RGB value, of course, it is not limited to this color format, and can also be expressed in other color formats), [k 1 P ,...,k N P ] represents the color of point P coefficient. [k 0 P ,k 1 P ,...,k N P ] is only related to the coordinates of point P, and has nothing to do with v.
步骤308,将每一所述有效位置点的透明度和所述被观察到的颜色值进行合成,得到第m个第二像素点的合成颜色值。 Step 308 , combining the transparency of each effective position point with the observed color value to obtain the synthesized color value of the mth second pixel point.
例如,可以根据如下公式(6)计算得到第m个第二像素点的合成颜色值C tFor example, the composite color value C t of the mth second pixel can be calculated according to the following formula (6):
Figure PCTCN2021103290-appb-000007
Figure PCTCN2021103290-appb-000007
其中,i表示第i个有效位置点,D表示有效位置点的总数,C i表示有效位置点从v方向被观察到的颜色值,α i表示有效位置点的透明度。 Among them, i represents the i-th effective position point, D represents the total number of effective position points, C i represents the color value of the effective position point observed from the v direction, and α i represents the transparency of the effective position point.
步骤309,利用所述合成颜色值,渲染所述第m个第二像素点的颜色。 Step 309, using the composite color value to render the color of the mth second pixel.
在一些实施例中,所述方法还包括:在所述待渲染视图的每一第二像素点的颜色被渲染后,得到所述目标视点下的合成视图;获取所述目标视点下的真实视图;根据所述合成视图和所述真实视图,得到合成损失;根据所述合成损失,更新所述第一多层感知机和所述第二多层感知机的参数值;如此,使得第一多层感知机和第二多层感知机得到的结果更加准确,从而在下次对类似场景进行新视角合成时,能够得到图像质量更好的合成视图。In some embodiments, the method further includes: obtaining a synthetic view under the target viewpoint after the color of each second pixel of the view to be rendered is rendered; obtaining a real view under the target viewpoint ; According to the synthetic view and the real view, a synthetic loss is obtained; according to the synthetic loss, update the parameter values of the first multi-layer perceptron and the second multi-layer perceptron; thus, the first multi-layer perceptron The results obtained by the layer perceptron and the second multi-layer perceptron are more accurate, so that the next time a similar scene is synthesized from a new view, a synthetic view with better image quality can be obtained.
在一些实施例中,合成损失可以根据如下公式(7)计算得到:In some embodiments, the composite loss can be calculated according to the following formula (7):
L=L rec+γTV(k 0)   (7); L=L rec +γTV(k 0 ) (7);
其中,L rec可以根据如下公式(8)计算得到;TV(k 0)为正则项总变差损失,γ表示正则项系数。 Wherein, L rec can be calculated according to the following formula (8); TV(k 0 ) is the total variation loss of the regularization term, and γ represents the coefficient of the regularization term.
Figure PCTCN2021103290-appb-000008
Figure PCTCN2021103290-appb-000008
其中,
Figure PCTCN2021103290-appb-000009
是指合成视图,I是指真实视图,ω为平衡权重。
in,
Figure PCTCN2021103290-appb-000009
refers to the synthetic view, I refers to the real view, and ω is the balancing weight.
上述图像处理方法可以应用于在线使用阶段,也可以应用于离线训练阶段。对于离线训练阶段,在一些实施例中,所述方法还包括:利用更新后的第一多层感知机和更新后的第二多层感知机,重新渲染所述第m个第二像素点的颜色,直至得到的合成损失满足条件或者更新次数满足条件,得到可以在在线使用阶段使用的第一多层感知机和第二多层感知机。The above image processing method can be applied to the online use stage, and can also be applied to the offline training stage. For the offline training phase, in some embodiments, the method further includes: using the updated first multi-layer perceptron and the updated second multi-layer perceptron to re-render the mth second pixel color, until the obtained composite loss meets the condition or the number of updates meets the condition, and the first multilayer perceptron and the second multilayer perceptron that can be used in the online use stage are obtained.
前文提到,第一参考视点下的深度图可以通过各种方法获取得到。例如,深度图可 以通过解码编码端发送的码流得到,相应地,对于编码端的编码方法,在一些实施例中,编码装置可以根据第一参考视点下的深度图中第一像素点的深度,对所述深度图进行区域划分,得到至少一个区域;然后,将划分得到的区域总数和深度图进行编码,生成码流;从而,在解码端,解码装置可以通过解码码流得到的区域总数和深度图,然后将这些信息传输给图像处理装置,由图像处理装置根据所述区域总数,对深度图进行区域划分,从而得到所述至少一个区域,然后执行如上述图像处理方法中的其他内容,进而得到合成视图;以及,将合成视图传输给显示装置进行图像显示或播放。As mentioned above, the depth map under the first reference viewpoint can be obtained through various methods. For example, the depth map can be obtained by decoding the code stream sent by the encoding end. Correspondingly, for the encoding method of the encoding end, in some embodiments, the encoding device can, according to the depth of the first pixel in the depth map under the first reference viewpoint, The depth map is divided into regions to obtain at least one region; then, the total number of regions obtained by division and the depth map are encoded to generate a code stream; thus, at the decoding end, the decoding device can obtain the total number of regions and the depth map obtained by decoding the code stream. The depth map, and then transmit these information to the image processing device, and the image processing device divides the depth map into regions according to the total number of regions, so as to obtain the at least one region, and then perform other content as in the above image processing method, And then obtain the synthesized view; and, transmit the synthesized view to the display device for image display or play.
在另一些实施例中,对于编码端的编码方法,在另一些实施例中,编码装置可以将深度图进行编码,生成码流;从而,在解码端,解码装置可以通过解码码流得到深度图,然后将深度图传输给图像处理装置,由图像处理装置根据特定的区域划分算法对该深度图进行区域划分,从而得到所述至少一个区域,然后执行如上述图像处理方法中的其他内容,进而得到合成视图;以及,将合成视图传输给显示装置进行图像显示或播放。In other embodiments, for the encoding method at the encoding end, in other embodiments, the encoding device can encode the depth map to generate a code stream; thus, at the decoding end, the decoding device can obtain the depth map by decoding the code stream, Then the depth map is transmitted to the image processing device, and the image processing device performs region division on the depth map according to a specific region division algorithm, so as to obtain the at least one region, and then execute other content as in the above image processing method, and then obtain synthetic view; and, transmitting the synthetic view to a display device for image display or playback.
下面将说明本申请实施例在一个实际的应用场景中的示例性应用。An exemplary application of the embodiment of the present application in an actual application scenario will be described below.
场景新视角合成模型NeX基于MPI和基函数(Basis function)获得了较好的场景新视角渲染结果。NeX模型对MPI的颜色图(color frame)进行了改造,以此为MPI的颜色图增加了随视角变化的效果。结合基函数的MPI的表征如图4所示,其中,基函数结合在颜色图的RGB值上,结合方式如下公式(9)所示:The scene new perspective synthesis model NeX is based on MPI and basis function (Basis function) to obtain better scene new perspective rendering results. The NeX model modifies the color frame of MPI to add an effect that changes with the viewing angle for the color frame of MPI. The characterization of MPI combined with basis functions is shown in Figure 4, where the basis functions are combined on the RGB values of the color map, and the combination method is shown in the following formula (9):
Figure PCTCN2021103290-appb-000010
Figure PCTCN2021103290-appb-000010
其中,P代表在第一参考视点为原点的空间坐标系下MPI中的点坐标位置(以下简称点P),v代表点P相对于目标视点的单位方向向量,则C P(v)代表点P从v方向观察得到的颜色值(RGB格式)。H n(v)代表与v相关的基函数,基函数个数N=8。k 0 P代表P点的基础RGB值,等效于原始MPI中的颜色值。[k 1 P,...,k N P]代表P点的RGB系数。[k 0 P,k 1 P,...,k N P]只与P点的坐标相关,与v无关。 Among them, P represents the point coordinate position in MPI under the space coordinate system with the first reference viewpoint as the origin (hereinafter referred to as point P), v represents the unit direction vector of point P relative to the target viewpoint, and C P (v) represents point P is the color value (RGB format) observed from the v direction. H n (v) represents the basis function related to v, and the number of basis functions is N=8. k 0 P represents the underlying RGB value of point P, which is equivalent to the color value in the original MPI. [k 1 P ,...,k N P ] represents the RGB coefficients of point P. [k 0 P ,k 1 P ,...,k N P ] is only related to the coordinates of point P, and has nothing to do with v.
NeX模型以场景的稀疏视图作为输入,可以输出在输入视角附近的新视图。NeX模型的整体流程如图5所示:The NeX model takes as input a sparse view of the scene and can output new views around the input viewpoint. The overall flow of the NeX model is shown in Figure 5:
首先,对于P点的空间坐标(x,y,d)和单位方向向量v=(v x,v y,v z),采用如下公式(10)对其分别进行位置编码,得到相应的位置编码向量: First, for the spatial coordinates (x, y, d) and the unit direction vector v=(v x , v y , v z ) of point P, use the following formula (10) to perform position encoding on them respectively to obtain the corresponding position encoding vector:
Figure PCTCN2021103290-appb-000011
Figure PCTCN2021103290-appb-000011
将x,y,d分别归一化到[-1,1]范围。其中x,y分别通过式2映射为20维向量(h设置为10),d通过式2映射Normalize x,y,d to [-1,1] range respectively. Among them, x and y are respectively mapped to 20-dimensional vectors (h is set to 10) through formula 2, and d is mapped through formula 2
为16维向量(h设置为8)。三个向量按序拼接为56维向量,作为第一多层感知机F θ的真实输入。 is a 16-dimensional vector (h is set to 8). The three vectors are sequentially spliced into a 56-dimensional vector, which is used as the real input of the first multi-layer perceptron F θ .
将(v x,v y,v d)中的v x,v y通过式2分别映射为6维向量(h设置为3),按序拼接为12维向量作为第二多层感知机G φ的输入。 Map v x and v y in (v x , v y , v d ) to 6-dimensional vectors (h is set to 3) respectively through formula 2, and sequentially splice them into 12-dimensional vectors as the second multi-layer perceptron G φ input of.
a.采用第一多层感知机F θ以P点的空间坐标(x,y,d)的位置编码向量作为输入,学习P点在对应的MPI中的透明度图的α值和在对应的颜色图中的RGB系数[k 1 P,...,k N P]; a. Use the first multi-layer perceptron F θ to take the position encoding vector of the spatial coordinates (x, y, d) of point P as input, and learn the alpha value of the transparency map of point P in the corresponding MPI and the corresponding color RGB coefficients in the graph [k 1 P ,...,k N P ];
b.采用第二多层感知机G φ以P点相对于目标视点(即观察点)的单位方向向量v中v x,v y的位置编码向量为输入,学习P点在MPI中的颜色图的基函数H n(v); b. Use the second multi-layer perceptron G φ to input the position encoding vectors of v x and v y in the unit direction vector v of point P relative to the target viewpoint (ie, the observation point), and learn the color map of point P in MPI basis function H n (v);
c.采用显式存储训练的方式学习P点在MPI颜色图的基础RGB值k 0 Pc. Learn the basic RGB value k 0 P of point P in the MPI color map by means of explicit storage training;
d.采用上式(9)所示的方法计算得到P点在MPI颜色图的RGB值;D. adopt the method shown in above formula (9) to calculate and obtain the RGB value of P point in MPI color map;
e.由MPI、第一参考视点对应的相机参数和目标视点对应的相机参数渲染得到目标视点下的新视图。渲染方式采用标准反向单应变换(Standard inverse homography),如下公式(11)所示:e. A new view under the target viewpoint is obtained by rendering the MPI, the camera parameters corresponding to the first reference viewpoint, and the camera parameters corresponding to the target viewpoint. The rendering method adopts the standard inverse homography (Standard inverse homography), as shown in the following formula (11):
Figure PCTCN2021103290-appb-000012
Figure PCTCN2021103290-appb-000012
其中,R和t是世界坐标系下从第一参考视点的相机坐标系到目标视点的相机坐标系的旋转矩阵和平移向量。a是MPI中平面深度值的负值。n=(0,0,1)是第一参考视点的相机坐标系下MPI平面的单位法向量。k s和k t是第一参考视点和目标视点分别对应的相机内参。(u t,v t,1)是目标视点下图像(即待渲染视图)中像素点的齐次坐标。(u s,v s,1)是第一参考视点下P点在对应平面中的齐次坐标。 Among them, R and t are the rotation matrix and translation vector from the camera coordinate system of the first reference viewpoint to the camera coordinate system of the target viewpoint in the world coordinate system. a is the negative of the plane depth value in MPI. n=(0,0,1) is the unit normal vector of the MPI plane in the camera coordinate system of the first reference viewpoint. k s and k t are camera internal parameters corresponding to the first reference viewpoint and the target viewpoint respectively. (u t ,v t ,1) are the homogeneous coordinates of the pixels in the image (that is, the view to be rendered) under the target viewpoint. (u s ,v s ,1) are the homogeneous coordinates of point P in the corresponding plane under the first reference viewpoint.
对于待渲染视图中的每一个像素点(u t,v t,1),在MPI中的第i个平面(由远及近排序)中存在一个对应的点(u si,v si,1)。如图6所示,假设MPI的平面数为D,则每一个像素点(u t,v t,1)在MPI中存在D个对应的像素点。 For each pixel point (u t ,v t ,1) in the view to be rendered, there is a corresponding point (u si ,v si ,1) in the i-th plane (sorted from far to near) in MPI . As shown in FIG. 6 , assuming that the number of planes of the MPI is D, each pixel point (u t , v t ,1) has D corresponding pixel points in the MPI.
f.由步骤e中获得了与待渲染视图中的像素点(u t,v t,1)相对应的一系列P点(u si,v si,1)。由P点处的RGB值C i和α值α i,根据如下公式12计算得到像素点(u t,v t,1)的RGB值C tf. A series of P points (u si ,v si ,1) corresponding to the pixel points (u t ,v t ,1) in the view to be rendered are obtained from step e. From the RGB value C i and the α value α i at point P, the RGB value C t of the pixel point (u t ,v t ,1) is calculated according to the following formula 12:
Figure PCTCN2021103290-appb-000013
Figure PCTCN2021103290-appb-000013
g.在训练过程中将输出图像(即合成视图)与该目标视点下的真实视图相比较,以重建误差(Reconstruction loss)度量二者的区别。重建误差L rec根据如下公式(13)计算得到: g. During the training process, the output image (that is, the synthetic view) is compared with the real view under the target viewpoint, and the difference between the two is measured by reconstruction loss. The reconstruction error L rec is calculated according to the following formula (13):
Figure PCTCN2021103290-appb-000014
Figure PCTCN2021103290-appb-000014
其中,
Figure PCTCN2021103290-appb-000015
是NeX模型的合成视图,I是该目标视点下的真实视图,平衡权重ω=0.05。
in,
Figure PCTCN2021103290-appb-000015
is the synthetic view of the NeX model, I is the real view under the target viewpoint, and the balance weight ω=0.05.
h.在训练过程中,为了保证输出图像的平滑性,引入正则项总变差损失TV(k 0)。二者一起组成了训练过程中的损失函数L,如下公式(14)所示: h. In the training process, in order to ensure the smoothness of the output image, a regular term total variation loss TV(k 0 ) is introduced. The two together constitute the loss function L in the training process, as shown in the following formula (14):
L=L rec+γTV(k 0)    (14); L=L rec +γ TV(k 0 ) (14);
其中,正则项系数γ=0.03。Among them, the coefficient of the regular term γ=0.03.
NeX模型采用的场景表征是结合了基函数的MPI表征。因此,NeX模型包含了MPI的缺陷。在现实场景中,大部分空间区域中是没有可见表面的。直观体现是MPI中的颜色图和透明度图大部分区域为无效值,即不包含可见信息,例如图7所示的MPI平面层实例,展示了第40个平面层(a)到第45个平面层(f)。第一行为颜色图,第二行为透明度图(黑色即为无效区域)。The scene representation adopted by the NeX model is the MPI representation combined with basis functions. Therefore, the NeX model incorporates the flaws of MPI. In real-world scenarios, most regions of space have no visible surfaces. Intuitively, most areas of the color map and transparency map in MPI are invalid values, that is, they do not contain visible information. For example, the MPI plane layer example shown in Figure 7 shows the 40th plane layer (a) to the 45th plane Layer (f). The first row is a color map, and the second row is a transparency map (black is the invalid area).
在最终的学习结果中,大部分的MPI区域是无效值。这是因为MPI的深度范围和平面的分布规律是提前给定的,忽视了场景中可见平面的位置信息。从采样的角来看,MPI的采样位置与场景中的有效信息位置(有可见表面的位置)存在脱节的情况,从而导致MPI的采样率不高,进而表现为NeX的合成视图的细节缺失。In the final learning results, most of the MPI regions are invalid values. This is because the depth range of MPI and the distribution of planes are given in advance, ignoring the position information of the visible planes in the scene. From the sampling point of view, there is a disconnect between the sampling position of MPI and the effective information position in the scene (the position with visible surfaces), which leads to the low sampling rate of MPI, which is manifested as the lack of details of NeX's synthetic view.
进一步地,在一些实施例中,提供一种基于PMPI与基函数的场景视角渲染模型(以下简称PMPI模型)。分块多平面图像是在MPI基础上引入了场景的深度信息,由此得 到随场景深度自适应变化的形状(区域划分和每个区域的深度范围)。将PMPI和基函数作为场景表征,围绕PMPI的特点建立了完整的新视图渲染模型。该模型以场景的稀疏视图作为输入(无需输入场景的深度图),输出给定视角的新视图(即目标视点下的合成视图)。Further, in some embodiments, a scene perspective rendering model based on PMPI and basis functions (hereinafter referred to as PMPI model) is provided. The block multi-plane image introduces the depth information of the scene on the basis of MPI, and thus obtains the shape (region division and depth range of each region) that changes adaptively with the depth of the scene. Using PMPI and basis functions as scene representation, a complete new view rendering model is built around the characteristics of PMPI. The model takes a sparse view of the scene as input (no need to input a depth map of the scene), and outputs a new view of a given viewpoint (i.e., a synthetic view under the target viewpoint).
该模型的工作流程如图8所示,包括如下步骤d至步骤g:The workflow of the model is shown in Figure 8, including the following steps d to g:
d.由深度图确定PMPI的形状(区域边界及每个区域的深度范围),例如图9所示,其中,901为在区域数A小于10的情况下,通过算法Otsu算法对深度图进行区域划分得到的结果,例如最终划分为前景掩膜和背景掩膜。902为在区域数A大于或等于10的情况下,通过超像素分割算法对深度图进行区域划分得到的结果。PMPI的区域数A及最大深度dmax需要根据场景的复杂度和最大深度提前给定。在一些实施例中区域数A=2;d. determine the shape of the PMPI (region boundary and the depth range of each region) by the depth map, as shown in Figure 9 for example, wherein, 901 is under the situation that area number A is less than 10, carries out region by algorithm Otsu algorithm to depth map The result of the division, for example, the final division into a foreground mask and a background mask. 902 is the result obtained by performing region division on the depth map through the superpixel segmentation algorithm when the region number A is greater than or equal to 10. The number A of PMPI areas and the maximum depth dmax need to be given in advance according to the complexity and maximum depth of the scene. In some embodiments the number of regions A=2;
e.采用第一多层感知机MLP1以PMPI中点的空间坐标(x,y,d)(以下也简称为P点)的56维位置编码作为输入,学习P点在PMPI中的透明度图的α值和颜色图中的RGB系数[k 1 P,...,k N P]; e. Adopt the first multi-layer perceptron MLP1 to use the 56-dimensional position encoding of the spatial coordinates (x, y, d) (hereinafter also referred to as P point) of the PMPI midpoint as input, and learn the transparency map of the P point in the PMPI α values and RGB coefficients in the colormap [k 1 P ,...,k N P ];
f.采用显式存储训练的方式学习P点在PMPI颜色图中的基础RGB值k 0 Pf. learn the basic RGB value k 0 P of point P in the PMPI color map by means of explicit storage training;
g.采用第二多层感知机MLP2以P点相对于目标视点(观察点)的单位方向向量v中v x,v y的位置编码向量为输入,学习P点在PMPI中颜色图中的基函数H n(v),基函数数量N=8; g. The second multi-layer perceptron MLP2 is used to input the position encoding vector of v x and v y in the unit direction vector v of point P relative to the target viewpoint (observation point), and learn the basis of point P in the color map in PMPI. Function H n (v), the number of basis functions N=8;
h.采用上述公式(9)所示的方法计算得到P点在PMPI颜色图中的RGB值;h. adopt the method shown in above-mentioned formula (9) to calculate and obtain the RGB value of P point in PMPI color map;
i.由PMPI、第一参考视点对应的相机参数和目标视点对应的相机参数渲染得到目标视点下的新视图。渲染方式仍然是如上述公式(11)所示的标准反向单应变换(Standard inverse homography)。i. A new view under the target viewpoint is obtained by rendering the PMPI, the camera parameters corresponding to the first reference viewpoint, and the camera parameters corresponding to the target viewpoint. The rendering method is still the standard inverse homography shown in the above formula (11).
对于待渲染视图的每一个像素点,使用上述公式(11)所示的标准反向单应变换得到了其在每一个深度上的对应P点坐标。以区域数A=2,深度数为4的PMPI为例,可以得到7个PMPI对应点(PMPI有7个不同的深度,因为区域最远深度相同)。然而,如图10所示,与待渲染视图的像素点对应的P点数量不确定(少于等于7),而且所在的深度也会随着待渲染视图的像素点的变化而变化。在一些实施例中,使用深度图的区域分割得到的前景掩膜和背景掩膜,对7个点的坐标进行筛选,得到有效的P点;For each pixel of the view to be rendered, the standard inverse homography shown in the above formula (11) is used to obtain its corresponding point P coordinates at each depth. Taking PMPI with area number A=2 and depth number as 4 as an example, 7 PMPI corresponding points can be obtained (PMPI has 7 different depths because the farthest depth of the area is the same). However, as shown in FIG. 10 , the number of P points corresponding to the pixels of the view to be rendered is uncertain (less than or equal to 7), and the depth thereof will also change with the pixels of the view to be rendered. In some embodiments, using the foreground mask and the background mask obtained by the region segmentation of the depth map, the coordinates of the 7 points are screened to obtain an effective P point;
j.采用式4合成待渲染视图的像素点的RGB值;j. Using Formula 4 to synthesize the RGB values of the pixels of the view to be rendered;
k.根据目标视点下的合成视图和目标视点下的真实视图,计算重建损失,以式6作为训练过程中的损失函数。k. Calculate the reconstruction loss according to the synthetic view under the target viewpoint and the real view under the target viewpoint, and use Equation 6 as the loss function in the training process.
其中,深度图的确定流程如图11所示,其中,①:通过简化版(去除基函数)NeX模型获取MPI;②:由MPI的透明度图(α图)合成视差图;③:由视差图和colmap得到的稀疏点云计算深度图:Among them, the determination process of the depth map is shown in Figure 11, wherein, ①: obtain the MPI through the simplified version (remove the basis function) NeX model; ②: synthesize the disparity map from the transparency map (α map) of the MPI; ③: use the disparity map And the sparse point cloud computing depth map obtained by colmap:
a.对于场景的稀疏视图,使用colmap工具进行由运动到结构(SfM)的相机参数估计和多维立体重建(MVS),得到稀疏点云。稀疏点云的坐标是(x,y,d),其中d是相对于参考相机的深度;a. For the sparse view of the scene, use the colmap tool for camera parameter estimation and multidimensional stereo reconstruction (MVS) from motion to structure (SfM) to obtain a sparse point cloud. The coordinates of the sparse point cloud are (x,y,d), where d is the depth relative to the reference camera;
b.以场景的稀疏视图为输入,如图5使用NeX模型合成场景的MPI表征,但是其中基函数个数设置为0。即不考虑场景中的非朗伯面反射效果;b. Take the sparse view of the scene as input, as shown in Figure 5, use the NeX model to synthesize the MPI representation of the scene, but the number of basis functions is set to 0. That is, the non-Lambertian surface reflection effect in the scene is not considered;
c.由MPI的透明度层合成场景的视差图。由视差图和colmap得到的稀疏点云计算深度图,如图11中的③所示。c. The disparity map of the scene is synthesized by the transparency layer of MPI. The sparse point cloud computing depth map obtained from the disparity map and colmap is shown in ③ in Figure 11.
根据如下公式(15)合成视差图
Figure PCTCN2021103290-appb-000016
Synthesize the disparity map according to the following formula (15)
Figure PCTCN2021103290-appb-000016
Figure PCTCN2021103290-appb-000017
Figure PCTCN2021103290-appb-000017
其中,d i是MPI第i个平面(由远及近排序)的深度。视差和深度成反比例关系,反比例系数确定。反比例系数记为σ。在一些实施例中,如下公式(16)所示,通过最小化L2损失获得σ的最佳值σ′: Among them, d i is the depth of the ith plane of MPI (ordered from far to near). Parallax and depth are inversely proportional, and the inverse proportionality coefficient is determined. The inverse proportional coefficient is denoted as σ. In some embodiments, the optimal value σ' of σ is obtained by minimizing the L2 loss as shown in the following equation (16):
Figure PCTCN2021103290-appb-000018
Figure PCTCN2021103290-appb-000018
其中,P s为稀疏点云,(x,y,d)是点在第一参考视点的相机坐标系中的坐标。由视差图和反比例系数即可计算得到深度图。 Among them, P s is the sparse point cloud, and (x, y, d) are the coordinates of the point in the camera coordinate system of the first reference viewpoint. The depth map can be calculated from the disparity map and the inverse proportional coefficient.
PMPI模型与NeX模型在真实场景fern和trex上进行了性能对比。输出图像尺寸固定为1008×756。PMPI模型合成MPI的过程中训练400轮次,合成PMPI训练4000轮次,NeX训练4000轮次。其余设置均相同。二者训练所耗时间几乎相同。二者在测试集上的数据比较如下表1所示:The PMPI model is compared with the NeX model on the real scene fern and trex. The output image size is fixed at 1008×756. The PMPI model is trained for 400 rounds in the process of synthesizing MPI, 4000 rounds of synthetic PMPI training, and 4000 rounds of NeX training. The rest of the settings are the same. The training time of the two is almost the same. The data comparison between the two on the test set is shown in Table 1 below:
表1 PMPI模型(以PMPI指代)与NeX模型的测试性能对比Table 1 Comparison of test performance between PMPI model (referred to as PMPI) and NeX model
Figure PCTCN2021103290-appb-000019
Figure PCTCN2021103290-appb-000019
合成的直观结果如图12和13所示,其中,图12为在fern场景的合成效果对比,红框121和122为NeX模型的合成结果,红框123和124为PMPI模型的合成结果。图13为在trex场景的合成效果对比,红框131和132为NeX模型的合成结果,红框133和134为PMPI模型的合成结果。The intuitive results of the synthesis are shown in Figures 12 and 13, where Figure 12 is the comparison of the synthesis effects in the fern scene, the red boxes 121 and 122 are the synthesis results of the NeX model, and the red boxes 123 and 124 are the synthesis results of the PMPI model. Figure 13 is a comparison of the synthesis effect in the trex scene, the red boxes 131 and 132 are the synthesis results of the NeX model, and the red boxes 133 and 134 are the synthesis results of the PMPI model.
可见,PMPI模型的合成结果优于NeX的合成效果。而且受益于PMPI在背景区域的优势,PMPI模型的合成结果相对于NeX模型在场景的背景的细节表现更加突出。It can be seen that the synthesis result of PMPI model is better than that of NeX. And benefiting from the advantages of PMPI in the background area, the synthetic results of the PMPI model are more prominent than the details of the NeX model in the background of the scene.
基于前述的实施例,本申请实施例提供的图像处理装置,包括所包括的各模块、以及各模块所包括的各单元,可以通过各种类型的处理器来实现;当然也可通过具体的逻辑电路实现。Based on the aforementioned embodiments, the image processing device provided by the embodiments of the present application, including the included modules and the units included in each module, can be implemented by various types of processors; of course, it can also be implemented by specific logic circuit implementation.
图14为本申请实施例图像处理装置的结构示意图,如图14所示,图像处理装置14包括:FIG. 14 is a schematic structural diagram of an image processing device according to an embodiment of the present application. As shown in FIG. 14 , the image processing device 14 includes:
区域划分模块141,用于根据第一参考视点下的深度图中第一像素点的深度,对所述深度图进行区域划分,得到至少一个区域;A region division module 141, configured to perform region division on the depth map according to the depth of the first pixel in the depth map under the first reference viewpoint, to obtain at least one region;
坐标反变换模块142,用于将目标视点下的待渲染视图的第m个第二像素点的坐标反变换至所述至少一个区域中的至少一个目标区域中,得到所述第m个第二像素点在所述至少一个目标区域中的存在位置点;其中,m大于0且小于或等于所述待渲染视图的总像素点数;A coordinate inverse transformation module 142, configured to inversely transform the coordinates of the m th second pixel point of the view to be rendered under the target viewpoint into at least one target area in the at least one area, to obtain the m th second pixel point Existing position points of pixels in the at least one target area; wherein, m is greater than 0 and less than or equal to the total number of pixels of the view to be rendered;
渲染模块143,用于根据所述第m个第二像素点在所述至少一个目标区域中的存在位置点,渲染所述第m个第二像素点的颜色。The rendering module 143 is configured to render the color of the mth second pixel according to the position of the mth second pixel in the at least one target area.
在一些实施例中,区域划分模块141,用于:根据所述深度图中第一像素点的深度,确定所述第一像素点之间的深度关系;以及根据所述深度关系,对所述深度图进行区域划分,得到至少一个区域。In some embodiments, the area division module 141 is configured to: determine the depth relationship between the first pixels according to the depth of the first pixel in the depth map; and determine the depth relationship between the first pixels according to the depth relationship. The depth map is divided into regions to obtain at least one region.
在一些实施例中,区域划分模块141,用于:将深度相同或深度差在特定范围内的第一像素点划分在同一区域。In some embodiments, the area division module 141 is configured to: divide the first pixel points with the same depth or a depth difference within a specific range into the same area.
在一些实施例中,坐标反变换模块142,用于:确定所述第一参考视点所在的相机坐标系与所述目标视点所在的相机坐标的变换关系;获取所述第一参考视点对应的相机内参和所述目标视点对应的相机内参;确定所述至少一个目标区域的区域深度;根据所述变换关系、所述第一参考视点和所述目标视点分别对应的相机内参以及所述区域深度,对所述第m个第二像素点的齐次坐标进行反向单应变换,得到所述第m个第二像素点在所述至少一个目标区域中的存在位置点。In some embodiments, the coordinate inverse transformation module 142 is configured to: determine the transformation relationship between the camera coordinate system where the first reference viewpoint is located and the camera coordinates where the target viewpoint is located; obtain the camera corresponding to the first reference viewpoint The internal reference and the camera internal reference corresponding to the target viewpoint; determining the region depth of the at least one target region; according to the transformation relationship, the camera internal reference corresponding to the first reference viewpoint and the target viewpoint, and the region depth, Inverse homography transformation is performed on the coordinates of the m th second pixel to obtain the existing position of the m th second pixel in the at least one target area.
在一些实施例中,渲染模块143,用于:从所述第m个第二像素点在所述至少一个目标区域中的存在位置点中,筛选出满足条件的位置点作为有效位置点;以及根据所述有效位置点,渲染所述第m个第二像素点的颜色。In some embodiments, the rendering module 143 is configured to: select the position points satisfying the conditions from the position points of the mth second pixel point in the at least one target area as valid position points; and According to the effective position point, render the color of the mth second pixel point.
在一些实施例中,渲染模块143,用于:确定所述有效位置点的颜色系数、透明度、基础颜色值和基函数;其中,所述基函数的自变量为所述有效位置点与所述目标视点的相对方向;根据所述有效位置点的颜色系数、基础颜色值和基函数,得到所述有效位置点从所述相对方向被观察到的颜色值;将每一所述有效位置点的透明度和所述被观察到的颜色值进行合成,得到合成颜色值;利用所述合成颜色值,渲染所述第m个第二像素点的颜色。In some embodiments, the rendering module 143 is configured to: determine the color coefficient, transparency, basic color value and basis function of the effective position point; wherein, the argument of the basis function is the effective position point and the The relative direction of the target viewpoint; according to the color coefficient, basic color value and basis function of the effective position point, the observed color value of the effective position point from the relative direction is obtained; the effective position point of each Combining the transparency and the observed color value to obtain a composite color value; using the composite color value to render the color of the mth second pixel.
在一些实施例中,渲染模块143,用于:根据有效位置点的坐标和已训练得到的第一多层感知机,得到所述有效位置点的透明度和颜色系数;根据所述有效位置点的坐标,得到所述有效位置点的基础颜色值;根据所述相对方向和已训练得到的第二多层感知机,得到所述有效位置点的基函数。In some embodiments, the rendering module 143 is configured to: obtain the transparency and color coefficient of the effective location point according to the coordinates of the effective location point and the trained first multi-layer perceptron; Coordinates to obtain the basic color value of the effective location point; according to the relative direction and the trained second multi-layer perceptron, obtain the basis function of the effective location point.
在一些实施例中,渲染模块143,用于:将所述有效位置点的坐标映射为具有第一维度的向量;将所述具有第一维度的向量输入至所述第一多层感知机中,得到所述有效位置点的透明度和颜色系数。In some embodiments, the rendering module 143 is configured to: map the coordinates of the effective location point into a vector with the first dimension; input the vector with the first dimension into the first multi-layer perceptron , to obtain the transparency and color coefficient of the effective location point.
在一些实施例中,渲染模块143,用于:将所述相对方向映射为具有第二维度的向量;将所述具有第二维度的向量输入至所述第二多层感知机中,得到所述有效位置点的基函数。In some embodiments, the rendering module 143 is configured to: map the relative direction into a vector with a second dimension; input the vector with the second dimension into the second multi-layer perceptron to obtain the The basis function of the effective position points.
在一些实施例中,图像处理装置14还包括更新模块,用于:在所述待渲染视图的每一第二像素点的颜色被渲染后,得到所述目标视点下的合成视图;获取所述目标视点下的真实视图;根据所述合成视图和所述真实视图,得到合成损失;根据所述合成损失,更新所述第一多层感知机和所述第二多层感知机的参数值。In some embodiments, the image processing device 14 further includes an update module, configured to: obtain a synthetic view under the target viewpoint after the color of each second pixel of the view to be rendered is rendered; obtain the A real view under the target viewpoint; according to the synthetic view and the real view, a composite loss is obtained; according to the composite loss, parameter values of the first multilayer perceptron and the second multilayer perceptron are updated.
在一些实施例中,渲染模块143还用于:利用更新后的第一多层感知机和更新后的第二多层感知机,重新渲染所述第m个第二像素点的颜色,直至得到的合成损失满足条件或者更新次数满足条件。In some embodiments, the rendering module 143 is further configured to: use the updated first multi-layer perceptron and the updated second multi-layer perceptron to re-render the color of the mth second pixel until The synthetic loss of satisfies the condition or the number of updates satisfies the condition.
在一些实施例中,图像处理装置14还包括深度图获得模块,用于:根据至少一张第二参考视点下的视图对包括的场景进行三维重建,得到所述场景在第一参考视点的相机坐标系下的点云数据;确定所述场景的视差图;根据所述视差图和所述点云数据,得到所述第一参考视点下的深度图。In some embodiments, the image processing device 14 further includes a depth map obtaining module, configured to: perform three-dimensional reconstruction on the included scene according to at least one view under the second reference viewpoint, and obtain the camera of the scene at the first reference viewpoint point cloud data in a coordinate system; determining a disparity map of the scene; and obtaining a depth map at the first reference viewpoint according to the disparity map and the point cloud data.
在一些实施例中,深度图获得模块,用于:根据至少一张第二参考视点下的视图,得到所述场景的至少一个平面的透明度图;根据所述至少一个平面的透明度图和对应的平面深度,合成所述场景的视差图。In some embodiments, the depth map obtaining module is configured to: obtain the transparency map of at least one plane of the scene according to at least one view under the second reference viewpoint; obtain the transparency map of the at least one plane and the corresponding Depth of the plane, the disparity map of the scene is synthesized.
在一些实施例中,深度图获得模块,用于:根据所述视差图和所述点云数据,得到所述视差图与所述深度图的反比例系数;根据所述反比例系数和所述视差图,得到所述第一参考视点下的深度图。In some embodiments, the depth map obtaining module is configured to: obtain an inverse proportional coefficient between the disparity map and the depth map according to the disparity map and the point cloud data; , to obtain the depth map under the first reference viewpoint.
以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请装置实施例中未披露的技术细节,请参照本申请方法实施例 的描述而理解。The description of the above device embodiment is similar to the description of the above method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the device embodiments of the present application, please refer to the description of the method embodiments of the present application for understanding.
需要说明的是,本申请实施例中,如果以软件功能模块的形式实现上述的方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得电子设备执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本申请实施例不限制于任何特定的硬件和软件结合。It should be noted that, in the embodiment of the present application, if the above method is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solutions of the embodiments of the present application or the part that contributes to the related technologies can be embodied in the form of software products. The computer software products are stored in a storage medium and include several instructions to make The electronic device executes all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: various media that can store program codes such as U disk, mobile hard disk, read-only memory (Read Only Memory, ROM), magnetic disk or optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.
对应地,本申请实施例提供一种电子设备,图15为本申请实施例的电子设备的硬件实体示意图,如图15所示,所述电子设备15包括存储器151和处理器152,所述存储器151存储有可在处理器152上运行的计算机程序,所述处理器152执行所述程序时实现上述实施例中提供的方法中的步骤。Correspondingly, the embodiment of the present application provides an electronic device. FIG. 15 is a schematic diagram of hardware entities of the electronic device according to the embodiment of the present application. As shown in FIG. 15 , the electronic device 15 includes a memory 151 and a processor 152. The memory 151 stores a computer program that can run on the processor 152, and the processor 152 implements the steps in the methods provided in the above-mentioned embodiments when executing the program.
需要说明的是,存储器151配置为存储由处理器152可执行的指令和应用,还可以缓存在处理器152以及电子设备15中各模块待处理或已经处理的数据(例如,图像数据、音频数据、语音通信数据和视频通信数据),可以通过闪存(FLASH)或随机访问存储器(Random Access Memory,RAM)实现。It should be noted that the memory 151 is configured to store instructions and applications executable by the processor 152, and may also cache data to be processed or processed by each module in the processor 152 and the electronic device 15 (for example, image data, audio data, etc. , voice communication data and video communication data), can be implemented by flash memory (FLASH) or random access memory (Random Access Memory, RAM).
在一些实施例中,如图16所示,电子设备还包括解码器161和显示装置162;其中,解码器161,用于对编码端发送的码流进行解码,得到第一参考视点下的深度图;以及,将该深度图传输给处理器152;处理器152,用于根据所述深度图执行上述实施例中提供的图像处理方法中的步骤,从而最终得到目标视点下的合成视图;以及,将所述合成视图传输给显示装置162;显示装置162根据接收的合成视图进行显示或播放;其中,处理器152可以根据特定的区域划分算法对该深度图进行区域划分,从而得到至少一个区域。In some embodiments, as shown in FIG. 16 , the electronic device further includes a decoder 161 and a display device 162; wherein, the decoder 161 is configured to decode the code stream sent by the encoding end to obtain the depth under the first reference viewpoint and, transmitting the depth map to the processor 152; the processor 152 is configured to execute the steps in the image processing method provided in the above-mentioned embodiments according to the depth map, so as to finally obtain a synthetic view under the target viewpoint; and , transmit the synthesized view to the display device 162; the display device 162 displays or plays according to the received synthesized view; wherein, the processor 152 can divide the depth map into regions according to a specific region division algorithm, so as to obtain at least one region .
在一些实施例中,码流中还可以携带区域总数,这样,处理器152可以根据解码器161解码得到的区域总数,对深度图进行区域划分。In some embodiments, the code stream may also carry the total number of regions, so that the processor 152 may perform region division on the depth map according to the total number of regions decoded by the decoder 161 .
对应地,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述实施例中提供的方法中的步骤。Correspondingly, an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps in the method provided in the foregoing embodiments are implemented.
这里需要指出的是:以上存储介质和设备实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请存储介质和设备实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。It should be pointed out here that: the descriptions of the above storage medium and device embodiments are similar to the descriptions of the above method embodiments, and have similar beneficial effects to those of the method embodiments. For technical details not disclosed in the storage medium and device embodiments of the present application, please refer to the description of the method embodiments of the present application for understanding.
应理解,说明书通篇中提到的“一个实施例”或“一实施例”或“一些实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”或“在一些实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。It should be understood that reference throughout this specification to "one embodiment" or "an embodiment" or "some embodiments" means that a particular feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the present application . Thus, appearances of "in one embodiment" or "in an embodiment" or "in some embodiments" in various places throughout the specification are not necessarily referring to the same embodiments. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation. The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article or apparatus comprising that element.
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其 它的方式实现。以上所描述的触摸屏系统的实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个模块或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或模块的间接耦合或通信连接,可以是电性的、机械的或其它形式的。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods can be implemented in other ways. The above-described embodiments of the touch screen system are only illustrative. For example, the division of the modules is only a logical function division. In actual implementation, there may be other division methods, such as: multiple modules or components can be combined , or can be integrated into another system, or some features can be ignored, or not implemented. In addition, the mutual coupling, or direct coupling, or communication connection between the various components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or modules may be in electrical, mechanical or other forms of.
上述作为分离部件说明的模块可以是、或也可以不是物理上分开的,作为模块显示的部件可以是、或也可以不是物理模块;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部模块来实现本实施例方案的目的。The modules described above as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules; they may be located in one place or distributed to multiple network units; Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各实施例中的各功能模块可以全部集成在一个处理单元中,也可以是各模块分别单独作为一个单元,也可以两个或两个以上模块集成在一个单元中;上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional module in each embodiment of the present application can be integrated into one processing unit, or each module can be used as a single unit, or two or more modules can be integrated into one unit; the above-mentioned integration The modules can be implemented in the form of hardware, or in the form of hardware plus software functional units.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps to realize the above method embodiments can be completed by hardware related to program instructions, and the aforementioned programs can be stored in computer-readable storage media. When the program is executed, the execution includes The steps of the foregoing method embodiments; and the foregoing storage media include: removable storage devices, read-only memory (Read Only Memory, ROM), magnetic disks or optical disks and other media that can store program codes.
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得电子设备执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the above-mentioned integrated units of the present application are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solutions of the embodiments of the present application or the part that contributes to the related technologies can be embodied in the form of software products. The computer software products are stored in a storage medium and include several instructions to make The electronic device executes all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes various media capable of storing program codes such as removable storage devices, ROMs, magnetic disks or optical disks.
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。The methods disclosed in several method embodiments provided in this application can be combined arbitrarily to obtain new method embodiments under the condition of no conflict.
本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。The features disclosed in several product embodiments provided in this application can be combined arbitrarily without conflict to obtain new product embodiments.
本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。The features disclosed in several method or device embodiments provided in this application can be combined arbitrarily without conflict to obtain new method embodiments or device embodiments.
以上所述,仅为本申请的实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above is only the embodiment of the present application, but the scope of protection of the present application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, and should covered within the scope of protection of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Claims (20)

  1. 一种图像处理方法,所述方法包括:An image processing method, the method comprising:
    根据第一参考视点下的深度图中第一像素点的深度,对所述深度图进行区域划分,得到至少一个区域;performing region division on the depth map according to the depth of the first pixel in the depth map under the first reference viewpoint to obtain at least one region;
    将目标视点下的待渲染视图的第m个第二像素点的坐标反变换至所述至少一个区域中的至少一个目标区域中,得到所述第m个第二像素点在所述至少一个目标区域中的存在位置点;其中,m大于0且小于或等于所述待渲染视图的总像素点数;Inversely transforming the coordinates of the mth second pixel point of the view to be rendered under the target viewpoint into at least one target area in the at least one area, to obtain the mth second pixel point in the at least one target area Existing position points in the area; wherein, m is greater than 0 and less than or equal to the total number of pixels of the view to be rendered;
    根据所述第m个第二像素点在所述至少一个目标区域中的存在位置点,渲染所述第m个第二像素点的颜色。Rendering the color of the mth second pixel according to the position of the mth second pixel in the at least one target area.
  2. 根据权利要求1所述的方法,其中,所述根据第一参考视点下的深度图中第一像素点的深度,对所述深度图进行区域划分,得到至少一个区域,包括:The method according to claim 1, wherein, according to the depth of the first pixel in the depth map under the first reference viewpoint, the depth map is divided into regions to obtain at least one region, including:
    根据所述深度图中第一像素点的深度,确定所述第一像素点之间的深度关系;determining a depth relationship between the first pixels according to the depths of the first pixels in the depth map;
    根据所述深度关系,对所述深度图进行区域划分,得到至少一个区域。According to the depth relationship, region division is performed on the depth map to obtain at least one region.
  3. 根据权利要求2所述的方法,其中,所述根据所述深度关系,对所述深度图进行区域划分,得到至少一个区域,包括:The method according to claim 2, wherein, according to the depth relationship, performing region division on the depth map to obtain at least one region, comprising:
    将深度相同或深度差在特定范围内的第一像素点划分在同一区域。The first pixel points with the same depth or the depth difference within a specific range are divided into the same area.
  4. 根据权利要求1至3任一项所述的方法,其中,所述将目标视点下的待渲染视图的第m个第二像素点的坐标反变换至所述至少一个区域中的至少一个目标区域中,得到所述第m个第二像素点在所述至少一个目标区域区域中的存在位置点,包括:The method according to any one of claims 1 to 3, wherein said inverse transforming the coordinates of the mth second pixel point of the view to be rendered under the target viewpoint to at least one target area in the at least one area , obtaining the existence position point of the mth second pixel point in the at least one target area area, including:
    确定所述第一参考视点所在的相机坐标系与所述目标视点所在的相机坐标的变换关系;determining the transformation relationship between the camera coordinate system where the first reference viewpoint is located and the camera coordinate where the target viewpoint is located;
    获取所述第一参考视点对应的相机内参和所述目标视点对应的相机内参;Acquiring internal camera parameters corresponding to the first reference viewpoint and internal camera parameters corresponding to the target viewpoint;
    确定所述至少一个目标区域的区域深度;determining a zone depth of the at least one target zone;
    根据所述变换关系、所述第一参考视点和所述目标视点分别对应的相机内参以及所述区域深度,对所述第m个第二像素点的齐次坐标进行反向单应变换,得到所述第m个第二像素点在所述至少一个目标区域中的存在位置点。According to the transformation relationship, the camera intrinsic parameters corresponding to the first reference viewpoint and the target viewpoint, and the region depth, perform a reverse homography transformation on the homogeneous coordinates of the m-th second pixel point to obtain An existing position of the mth second pixel in the at least one target area.
  5. 根据权利要求1所述的方法,其中,所述根据所述第m个第二像素点在所述至少一个目标区域中的存在位置点,渲染所述第m个第二像素点的颜色,包括:The method according to claim 1, wherein the rendering the color of the mth second pixel according to the position of the mth second pixel in the at least one target area includes :
    从所述第m个第二像素点在所述至少一个目标区域中的存在位置点中,筛选出满足条件的位置点作为有效位置点;From the existing position points of the mth second pixel point in the at least one target area, select the position points satisfying the conditions as valid position points;
    根据所述有效位置点,渲染所述第m个第二像素点的颜色。According to the effective position point, render the color of the mth second pixel point.
  6. 根据权利要求5所述的方法,其中,所述根据所述有效位置点,渲染所述第m个第二像素点的颜色,包括:The method according to claim 5, wherein said rendering the color of the mth second pixel according to the effective position point comprises:
    确定所述有效位置点的颜色系数、透明度、基础颜色值和基函数;其中,所述基函数的自变量为所述有效位置点与所述目标视点的相对方向;Determine the color coefficient, transparency, basic color value and basis function of the effective position point; wherein, the argument of the basis function is the relative direction between the effective position point and the target viewpoint;
    根据所述有效位置点的颜色系数、基础颜色值和基函数,得到所述有效位置点从所述相对方向被观察到的颜色值;Obtaining the observed color value of the effective position point from the relative direction according to the color coefficient, basic color value and basis function of the effective position point;
    将每一所述有效位置点的透明度和所述被观察到的颜色值进行合成,得到合成颜色值;Combining the transparency of each effective position point with the observed color value to obtain a composite color value;
    利用所述合成颜色值,渲染所述第m个第二像素点的颜色。Using the composite color value, render the color of the mth second pixel.
  7. 根据权利要求6所述的方法,其中,所述确定所述有效位置点的颜色系数、透明度和基础颜色值和基函数,包括:The method according to claim 6, wherein said determining the color coefficient, transparency and basic color value and basis function of said effective location point comprises:
    根据所述有效位置点的坐标和已训练得到的第一多层感知机,得到所述有效位置点 的透明度和颜色系数;Obtain the transparency and the color coefficient of the effective location point according to the coordinates of the effective location point and the trained first multi-layer perceptron;
    根据所述有效位置点的坐标,得到所述有效位置点的基础颜色值;Obtaining the basic color value of the effective location point according to the coordinates of the effective location point;
    根据所述相对方向和已训练得到的第二多层感知机,得到所述有效位置点的基函数。According to the relative direction and the trained second multi-layer perceptron, the basis functions of the effective position points are obtained.
  8. 根据权利要求7所述的方法,其中,所述根据所述有效位置点的坐标和已训练得到的第一多层感知机,得到所述有效位置点的透明度和颜色系数,包括:The method according to claim 7, wherein, obtaining the transparency and the color coefficient of the effective position point according to the coordinates of the effective position point and the trained first multi-layer perceptron, comprising:
    将所述有效位置点的坐标映射为具有第一维度的向量;mapping the coordinates of the effective location point into a vector with a first dimension;
    将所述具有第一维度的向量输入至所述第一多层感知机中,得到所述有效位置点的透明度和颜色系数。The vector with the first dimension is input into the first multi-layer perceptron to obtain the transparency and color coefficient of the effective position point.
  9. 根据权利要求7所述的方法,其中,所述根据所述相对方向和已训练得到的第二多层感知机,得到所述有效位置点的基函数,包括:The method according to claim 7, wherein said obtaining the basis function of said effective location point according to said relative direction and the trained second multi-layer perceptron comprises:
    将所述相对方向映射为具有第二维度的向量;mapping the relative direction to a vector having a second dimension;
    将所述具有第二维度的向量输入至所述第二多层感知机中,得到所述有效位置点的基函数。The vector with the second dimension is input into the second multi-layer perceptron to obtain the basis function of the effective location point.
  10. 根据权利要求7所述的方法,其中,所述方法还包括:The method according to claim 7, wherein the method further comprises:
    在所述待渲染视图的每一第二像素点的颜色被渲染后,得到所述目标视点下的合成视图;After the color of each second pixel point of the view to be rendered is rendered, a synthetic view under the target viewpoint is obtained;
    获取所述目标视点下的真实视图;Obtaining a real view under the target viewpoint;
    根据所述合成视图和所述真实视图,得到合成损失;obtaining a synthetic loss based on the synthetic view and the real view;
    根据所述合成损失,更新所述第一多层感知机和所述第二多层感知机的参数值。Updating parameter values of the first multilayer perceptron and the second multilayer perceptron based on the combined loss.
  11. 根据权利要求10所述的方法,其中,所述方法还包括:The method according to claim 10, wherein the method further comprises:
    利用更新后的第一多层感知机和更新后的第二多层感知机,重新渲染所述第m个第二像素点的颜色,直至得到的合成损失满足条件或者更新次数满足条件。Using the updated first multi-layer perceptron and the updated second multi-layer perceptron, re-render the color of the mth second pixel until the obtained composite loss meets the condition or the number of updates meets the condition.
  12. 根据权利要求1所述的方法,其中,所述第一参考视点下的深度图的获得过程包括:The method according to claim 1, wherein the obtaining process of the depth map under the first reference viewpoint comprises:
    根据至少一张第二参考视点下的视图对包括的场景进行三维重建,得到所述场景在第一参考视点的相机坐标系下的点云数据;performing three-dimensional reconstruction on the included scene according to at least one view under the second reference viewpoint, and obtaining point cloud data of the scene in the camera coordinate system of the first reference viewpoint;
    确定所述场景的视差图;determining a disparity map for the scene;
    根据所述视差图和所述点云数据,得到所述第一参考视点下的深度图。Obtain a depth map under the first reference viewpoint according to the disparity map and the point cloud data.
  13. 根据权利要求12所述的方法,其中,所述确定所述场景的视差图,包括:The method according to claim 12, wherein said determining the disparity map of said scene comprises:
    根据至少一张第二参考视点下的视图,得到所述场景的至少一个平面的透明度图;Obtain a transparency map of at least one plane of the scene according to at least one view under the second reference viewpoint;
    根据所述至少一个平面的透明度图和对应的平面深度,合成所述场景的视差图。A disparity map of the scene is synthesized from the transparency map of the at least one plane and the corresponding plane depths.
  14. 根据权利要求12所述的方法,其中,所述根据所述视差图和所述点云数据,得到所述第一参考视点下的深度图,包括:The method according to claim 12, wherein said obtaining the depth map under the first reference viewpoint according to the disparity map and the point cloud data comprises:
    根据所述视差图和所述点云数据,得到所述视差图与所述深度图的反比例系数;Obtaining an inverse proportionality coefficient between the disparity map and the depth map according to the disparity map and the point cloud data;
    根据所述反比例系数和所述视差图,得到所述第一参考视点下的深度图。A depth map under the first reference viewpoint is obtained according to the inverse proportional coefficient and the disparity map.
  15. 一种图像处理装置,包括:An image processing device, comprising:
    区域划分模块,用于根据第一参考视点下的深度图中第一像素点的深度,对所述深度图进行区域划分,得到至少一个区域;An area division module, configured to perform area division on the depth map according to the depth of the first pixel in the depth map under the first reference viewpoint, to obtain at least one area;
    坐标反变换模块,用于将目标视点下的待渲染视图的第m个第二像素点的坐标反变换至所述至少一个区域中的至少一个目标区域中,得到所述第m个第二像素点在所述至少一个目标区域中的存在位置点;其中,m大于0且小于或等于所述待渲染视图的总像素点数;A coordinate inverse transformation module, configured to inversely transform the coordinates of the mth second pixel point of the view to be rendered under the target viewpoint into at least one target area in the at least one area, to obtain the mth second pixel Existing position points of points in the at least one target area; wherein, m is greater than 0 and less than or equal to the total number of pixels of the view to be rendered;
    渲染模块,用于根据所述第m个第二像素点在所述至少一个目标区域中的存在位置 点,渲染所述第m个第二像素点的颜色。A rendering module, configured to render the color of the mth second pixel according to the position of the mth second pixel in the at least one target area.
  16. 根据权利要求15所述的装置,其中,所述区域划分模块,用于:The device according to claim 15, wherein the area division module is configured to:
    根据所述深度图中第一像素点的深度,确定所述第一像素点之间的深度关系;以及determining a depth relationship between the first pixels according to the depths of the first pixels in the depth map; and
    根据所述深度关系,对所述深度图进行区域划分,得到至少一个区域。According to the depth relationship, region division is performed on the depth map to obtain at least one region.
  17. 根据权利要求16所述的装置,其中,所述区域划分模块,用于:将深度相同或深度差在特定范围内的第一像素点划分在同一区域。The device according to claim 16, wherein the area division module is configured to: divide the first pixel points with the same depth or a depth difference within a specific range into the same area.
  18. 根据权利要求15至17任一项所述的装置,其中,所述渲染模块,用于:The device according to any one of claims 15 to 17, wherein the rendering module is configured to:
    从所述第m个第二像素点在所述至少一个目标区域中的存在位置点中,筛选出满足条件的位置点作为有效位置点;以及From the existing position points of the mth second pixel point in the at least one target area, select the position points satisfying the condition as valid position points; and
    根据所述有效位置点,渲染所述第m个第二像素点的颜色。According to the effective position point, render the color of the mth second pixel point.
  19. 一种电子设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至14任一项所述图像处理方法中的步骤。An electronic device, comprising a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the image processing method described in any one of claims 1 to 14 when executing the program A step of.
  20. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现权利要求1至14任一项所述图像处理方法中的步骤。A computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps in the image processing method according to any one of claims 1 to 14 are realized.
PCT/CN2021/103290 2021-06-29 2021-06-29 Image processing method and apparatus, device, and storage medium WO2023272531A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180100038.4A CN117730530A (en) 2021-06-29 2021-06-29 Image processing method and device, equipment and storage medium
PCT/CN2021/103290 WO2023272531A1 (en) 2021-06-29 2021-06-29 Image processing method and apparatus, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/103290 WO2023272531A1 (en) 2021-06-29 2021-06-29 Image processing method and apparatus, device, and storage medium

Publications (1)

Publication Number Publication Date
WO2023272531A1 true WO2023272531A1 (en) 2023-01-05

Family

ID=84690169

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/103290 WO2023272531A1 (en) 2021-06-29 2021-06-29 Image processing method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN117730530A (en)
WO (1) WO2023272531A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116664741A (en) * 2023-06-13 2023-08-29 北京东方融创信息技术有限公司 Industrial configuration scene rendering method of high-definition pipeline
CN117994444A (en) * 2024-04-03 2024-05-07 浙江华创视讯科技有限公司 Reconstruction method, device and storage medium of complex scene

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103945209A (en) * 2014-04-28 2014-07-23 华南理工大学 DIBR method based on block projection
CN104270624A (en) * 2014-10-08 2015-01-07 太原科技大学 Region-partitioning 3D video mapping method
CN104869386A (en) * 2015-04-09 2015-08-26 东南大学 Virtual viewpoint synthesizing method based on layered processing
US20190244379A1 (en) * 2018-02-07 2019-08-08 Fotonation Limited Systems and Methods for Depth Estimation Using Generative Models
US20200226816A1 (en) * 2019-01-14 2020-07-16 Fyusion, Inc. Free-viewpoint photorealistic view synthesis from casually captured video
US20200288102A1 (en) * 2019-03-06 2020-09-10 Electronics And Telecommunications Research Institute Image processing method and apparatus
CN112233165A (en) * 2020-10-15 2021-01-15 大连理工大学 Baseline extension implementation method based on multi-plane image learning view synthesis

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103945209A (en) * 2014-04-28 2014-07-23 华南理工大学 DIBR method based on block projection
CN104270624A (en) * 2014-10-08 2015-01-07 太原科技大学 Region-partitioning 3D video mapping method
CN104869386A (en) * 2015-04-09 2015-08-26 东南大学 Virtual viewpoint synthesizing method based on layered processing
US20190244379A1 (en) * 2018-02-07 2019-08-08 Fotonation Limited Systems and Methods for Depth Estimation Using Generative Models
US20200226816A1 (en) * 2019-01-14 2020-07-16 Fyusion, Inc. Free-viewpoint photorealistic view synthesis from casually captured video
US20200288102A1 (en) * 2019-03-06 2020-09-10 Electronics And Telecommunications Research Institute Image processing method and apparatus
CN112233165A (en) * 2020-10-15 2021-01-15 大连理工大学 Baseline extension implementation method based on multi-plane image learning view synthesis

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116664741A (en) * 2023-06-13 2023-08-29 北京东方融创信息技术有限公司 Industrial configuration scene rendering method of high-definition pipeline
CN116664741B (en) * 2023-06-13 2024-01-19 北京东方融创信息技术有限公司 Industrial configuration scene rendering method of high-definition pipeline
CN117994444A (en) * 2024-04-03 2024-05-07 浙江华创视讯科技有限公司 Reconstruction method, device and storage medium of complex scene

Also Published As

Publication number Publication date
CN117730530A (en) 2024-03-19

Similar Documents

Publication Publication Date Title
US11288857B2 (en) Neural rerendering from 3D models
US10474227B2 (en) Generation of virtual reality with 6 degrees of freedom from limited viewer data
CN113811920A (en) Distributed pose estimation
WO2019238114A1 (en) Three-dimensional dynamic model reconstruction method, apparatus and device, and storage medium
WO2023272531A1 (en) Image processing method and apparatus, device, and storage medium
JP2017532847A (en) 3D recording and playback
JP2016522485A (en) Hidden reality effect and intermediary reality effect from reconstruction
CN115210532A (en) System and method for depth estimation by learning triangulation and densification of sparse points for multi-view stereo
WO2015188666A1 (en) Three-dimensional video filtering method and device
WO2023056840A1 (en) Method and apparatus for displaying three-dimensional object, and device and medium
WO2020184174A1 (en) Image processing device and image processing method
CN113129352A (en) Sparse light field reconstruction method and device
CN115797561A (en) Three-dimensional reconstruction method, device and readable storage medium
CN108305281A (en) Calibration method, device, storage medium, program product and the electronic equipment of image
KR20230133293A (en) Enhancement of 3D models using multi-view refinement
CN111161398A (en) Image generation method, device, equipment and storage medium
Li et al. Bringing instant neural graphics primitives to immersive virtual reality
Jin et al. From capture to display: A survey on volumetric video
Fachada et al. Chapter View Synthesis Tool for VR Immersive Video
US20120147008A1 (en) Non-uniformly sampled 3d information representation method
CN113592875B (en) Data processing method, image processing method, storage medium, and computing device
JP7264788B2 (en) Presentation system, server and terminal
CN113628190B (en) Depth map denoising method and device, electronic equipment and medium
CN116681818B (en) New view angle reconstruction method, training method and device of new view angle reconstruction network
JP7505481B2 (en) Image processing device and image processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21947487

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180100038.4

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE