WO2022042062A1 - 二维图像的三维化方法、装置、设备及计算机可读存储介质 - Google Patents

二维图像的三维化方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2022042062A1
WO2022042062A1 PCT/CN2021/104972 CN2021104972W WO2022042062A1 WO 2022042062 A1 WO2022042062 A1 WO 2022042062A1 CN 2021104972 W CN2021104972 W CN 2021104972W WO 2022042062 A1 WO2022042062 A1 WO 2022042062A1
Authority
WO
WIPO (PCT)
Prior art keywords
pixel
dimensional image
migration
image
pixels
Prior art date
Application number
PCT/CN2021/104972
Other languages
English (en)
French (fr)
Inventor
罗越
李昱
单瀛
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP21859902.5A priority Critical patent/EP4099692A4/en
Priority to JP2022559723A priority patent/JP7432005B2/ja
Publication of WO2022042062A1 publication Critical patent/WO2022042062A1/zh
Priority to US18/077,549 priority patent/US20230113902A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/261Image signal generators with monoscopic-to-stereoscopic image conversion
    • H04N13/268Image signal generators with monoscopic-to-stereoscopic image conversion based on depth image-based rendering [DIBR]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/261Image signal generators with monoscopic-to-stereoscopic image conversion
    • G06T5/70
    • G06T5/77
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/128Adjusting depth or disparity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/15Processing image signals for colour aspects of image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/257Colour aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/261Image signal generators with monoscopic-to-stereoscopic image conversion
    • H04N13/264Image signal generators with monoscopic-to-stereoscopic image conversion using the relative movement of objects in two video frames or fields
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/286Image signal generators having separate monoscopic and stereoscopic modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/286Image signal generators having separate monoscopic and stereoscopic modes
    • H04N13/289Switching between monoscopic and stereoscopic modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present application relates to image processing technology, and in particular, to a method, apparatus, electronic device, and computer-readable storage medium for three-dimensionalizing a two-dimensional image.
  • AI Artificial intelligence
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • Image processing is a typical application of artificial intelligence.
  • More and more application products provide the function of diversified display of images.
  • Related application products can not only display two-dimensional images, but also display two-dimensional images.
  • the 3D image is subjected to 3D processing, thereby displaying a 3D video related to the content of the 2D image.
  • the applicant found that the generation of the 3D video mainly depends on the 3D modeling based on the multi-angle camera, but requires It consumes a lot of computing resource cost and time cost.
  • Embodiments of the present application provide a three-dimensionalization method, apparatus, electronic device, and computer-readable storage medium for a two-dimensional image, which can quickly and accurately generate a three-dimensional video based on a two-dimensional image.
  • An embodiment of the present application provides a method for three-dimensionalizing a two-dimensional image.
  • the method is executed by an electronic device, and the method includes:
  • transition images of the multiple viewing angles are packaged in sequence to obtain a three-dimensional video.
  • An embodiment of the present application provides a three-dimensionalization device for a two-dimensional image, and the device includes:
  • a depth module configured to perform depth perception processing on the two-dimensional image to obtain the depth value of each pixel in the two-dimensional image
  • a migration module configured to perform migration processing of multiple viewing angles on the two-dimensional image to obtain a migration result corresponding to each viewing angle of the two-dimensional image
  • a color determination module configured to determine each pixel in the migration image corresponding to each viewing angle based on the depth value of each pixel in the two-dimensional image and the migration result of the two-dimensional image corresponding to each viewing angle the color value of the point;
  • a generation module configured to generate a migration image corresponding to the viewing angle based on the color value of each pixel in the migration image of each viewing angle
  • the encapsulation module is configured to encapsulate a plurality of transition images of the viewing angles in sequence to obtain a three-dimensional video.
  • An embodiment of the present application provides a method for three-dimensionalizing a two-dimensional image.
  • the method is executed by an electronic device, and the method includes:
  • the video is obtained by executing the three-dimensionalization method for a two-dimensional image provided by the embodiment of the present application.
  • An embodiment of the present application provides a three-dimensionalization device for a two-dimensional image, and the device includes:
  • the display module is used to display two-dimensional images on the human-computer interaction interface
  • a playing module configured to play a three-dimensional video generated based on the two-dimensional image in response to a three-dimensionalization operation for the two-dimensional image
  • the video is obtained by executing the three-dimensionalization method for a two-dimensional image provided by the embodiment of the present application.
  • An embodiment of the present application provides an electronic device, and the electronic device includes:
  • the processor is configured to implement the three-dimensionalization method of the two-dimensional image provided by the embodiment of the present application when executing the executable instructions stored in the memory.
  • the embodiments of the present application provide a computer-readable storage medium storing executable instructions for implementing the three-dimensionalization method for a two-dimensional image provided by the embodiments of the present application when executed by a processor.
  • the perspective transformation process at the 2D image level is realized, so that the image 3D process is realized at the 2D image processing level, replacing the 3D scene modeling process.
  • the two-dimensional image is accurately three-dimensionalized to generate a three-dimensional video, and the cost of computing resources and time-consuming costs of the background or terminal is reduced.
  • FIG. 1 is a schematic structural diagram of a three-dimensional system for a two-dimensional image provided by an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of an electronic device applying a three-dimensional image 3D method provided by an embodiment of the present application
  • 3A-3E are schematic flowcharts of a method for three-dimensionalizing a two-dimensional image provided by an embodiment of the present application
  • FIG. 4 is a schematic flowchart of a method for three-dimensionalizing a two-dimensional image provided by an embodiment of the present application
  • 5 is a depth map of a three-dimensionalization method for a two-dimensional image provided by an embodiment of the present application
  • FIG. 6 is an edge marker diagram of a three-dimensional method for a two-dimensional image provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of obtaining a movement parameter of a three-dimensional image 3D method provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of the contribution of the three-dimensionalization method for a two-dimensional image provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a query of a method for three-dimensionalizing a two-dimensional image provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of the effect of the three-dimensionalization method for a two-dimensional image provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a package of a method for three-dimensionalizing a two-dimensional image provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of the contribution of the method for three-dimensionalizing a two-dimensional image provided by an embodiment of the present application.
  • first ⁇ second ⁇ third is only used to distinguish similar objects, and does not represent a specific ordering of objects. It is understood that “first ⁇ second ⁇ third” Where permitted, the specific order or sequence may be interchanged to enable the embodiments of the application described herein to be practiced in sequences other than those illustrated or described herein.
  • a depth map In 3D computer graphics and computer vision, a depth map is an image or image channel that contains information about the distance from the surface of a scene object to the viewpoint, used to simulate 3D shapes or reconstruct 3D shapes, depth The map can be generated by a 3D scanner.
  • a digital image is a two-dimensional signal that records the grayscale or color of the image in the row and column directions.
  • a pixel is the smallest logical unit of a computer image.
  • Depth estimation Estimate its depth information based on the image, which can be based on image content understanding, based on focus, based on defocus, and based on light and dark changes for depth estimation. The classification is done in blocks, and then the depth information of each category of scenes is estimated by the respective applicable methods.
  • the main solution is to predict the depth information of the scene through deep learning, perform three-dimensional modeling of the scene, and fill in and predict the occluded parts, and then simulate the camera. Move, change the perspective of the camera, re-render the image from the new perspective to obtain the image from the new perspective, and then display the video with three-dimensional effect.
  • the three-dimensional video generation method mainly predicts scene depth information through deep learning, and then uses three-dimensional Modeling and constructing the scene, filling the occluded part through deep learning, and re-rendering the image from a new perspective by simulating the movement of the camera to obtain a video with 3D effect, but the process of 3D modeling is complicated and time-consuming, As a result, the entire calculation process is complicated and the time overhead is large, so it is not suitable for supporting online functions with low latency.
  • the embodiments of the present application provide a three-dimensional method, device, electronic device, and computer-readable storage medium for a two-dimensional image, which can quickly and accurately reconstruct image scenes from different viewing angles to realize the three-dimensional display of images.
  • Exemplary applications of the electronic devices provided in the embodiments of the present application may be notebook computers, tablet computers, desktop computers, set-top boxes, smart homes such as smart TVs, mobile devices (for example, mobile phones, portable music players, Various types of user terminals such as personal digital assistants, dedicated messaging devices, portable game devices) can also be servers.
  • an exemplary application when the device is implemented as a server will be described.
  • FIG. 1 is a schematic structural diagram of a two-dimensional image three-dimensional system provided by an embodiment of the present application.
  • a terminal 400 is connected to a server 200 through a network 300.
  • the network 300 may be a wide area network or a local area network, or Or a combination of the two
  • the terminal 400 uploads the image to be processed to the server 200
  • the server 200 performs three-dimensional processing of the two-dimensional image to obtain a three-dimensional video corresponding to the image to be processed
  • the server 200 returns the three-dimensional video to the server 200. It is transmitted to the terminal 400 and played on the terminal 400 .
  • the terminal 400 may also return the rendering mode specified by the image three-dimensionalization operation to the server 200, and the server 200 determines the corresponding multiple viewing angles according to the rendering mode, and For the packaging sequence of the migration images, the server 200 obtains migration images corresponding to multiple viewing angles according to the corresponding viewing angles, and packages the migration images according to the packaging sequence to generate a three-dimensional video, which is sent back to the terminal 400 for playback.
  • the terminal 400 and the server 200 cooperate to complete the three-dimensional processing of the two-dimensional image as an example. It can be understood that, as an alternative, the terminal 400 can rely on its own capabilities to complete the three-dimensional processing of the two-dimensional image.
  • the terminal 400 implements the three-dimensionalization method for two-dimensional images provided by the embodiments of the present application by running a computer program.
  • the computer program may be a native program or software module in an operating system; it may be a native program or a software module.
  • application program APP, Application
  • APP application program
  • APP Application
  • a program that needs to be installed in the operating system to run such as a video APP or a live broadcast APP
  • it can also be a small program, that is, a program that can be run only by downloading it into the browser environment; It can be a video applet or a live broadcast applet that can be embedded into any APP.
  • the above-mentioned computer programs may be any form of application, module or plug-in.
  • the server 200 may be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, Cloud servers for basic cloud computing services such as network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
  • the terminal 400 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto.
  • the terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of an electronic device applying a three-dimensional method for a two-dimensional image provided by an embodiment of the present application, taking the electronic device as an example of a terminal 400 that independently completes three-dimensional processing of a two-dimensional image depending on its own capabilities 2 , the terminal 400 shown in FIG. 2 includes: at least one processor 410 , memory 450 , at least one network interface 420 and user interface 430 .
  • the various components in terminal 400 are coupled together by bus system 440 . It is understood that the bus system 440 is used to implement the connection communication between these components.
  • the bus system 440 also includes a power bus, a control bus, and a status signal bus. For clarity, however, the various buses are labeled as bus system 440 in FIG. 2 .
  • the processor 410 may be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP, Digital Signal Processor), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., where a general-purpose processor may be a microprocessor or any conventional processor or the like.
  • DSP Digital Signal Processor
  • User interface 430 includes one or more output devices 431 that enable display of media content, including one or more speakers and/or one or more visual display screens.
  • User interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, and other input buttons and controls.
  • Memory 450 may be removable, non-removable, or a combination thereof.
  • Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like.
  • Memory 450 optionally includes one or more storage devices that are physically remote from processor 410 .
  • Memory 450 includes volatile memory or non-volatile memory, and may also include both volatile and non-volatile memory.
  • the non-volatile memory may be a read-only memory (ROM, Read Only Memory), and the volatile memory may be a random access memory (RAM, Random Access Memory).
  • ROM read-only memory
  • RAM random access memory
  • the memory 450 described in the embodiments of the present application is intended to include any suitable type of memory.
  • memory 450 is capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
  • the operating system 451 includes system programs for processing various basic system services and performing hardware-related tasks, such as framework layer, core library layer, driver layer, etc., for implementing various basic services and processing hardware-based tasks;
  • Display module 453 for enabling display of information (eg, user interface for operating peripherals and displaying content and information) via one or more output devices 431 associated with user interface 430 (eg, a display screen, speakers, etc.) );
  • An input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.
  • the three-dimensionalization device for two-dimensional images provided by the embodiments of the present application may be implemented in software.
  • FIG. 2 shows the three-dimensionalization device 455 for two-dimensional images stored in the memory 450, which may be a program and Software in the form of plug-ins, including the following software modules: depth module 4551, migration module 4552, color determination module 4553, generation module 4554, encapsulation module 4555, display module 4556 and playback module 4557.
  • the implemented functions can be combined arbitrarily or further split, and the functions of each module will be described below.
  • the three-dimensionalization method of the two-dimensional image provided by the embodiment of the present application will be described in conjunction with the exemplary application and implementation of the electronic device provided by the embodiment of the present application.
  • the terminal 400 can be completed independently or the terminal 400 and the server 200 described above can be cooperatively completed.
  • FIG. 3A is a schematic flowchart of a three-dimensionalization method for a two-dimensional image provided by an embodiment of the present application, which will be described with reference to steps 101 to 105 shown in FIG. 3A .
  • step 101 depth perception processing is performed on the two-dimensional image to obtain the depth value of each pixel in the two-dimensional image.
  • the depth value of a pixel in a two-dimensional image is the depth value of the pixel perceived by the depth perception algorithm, that is, the original depth value below.
  • the problem of depth estimation in the field of computer vision belongs to three-dimensional reconstruction. From spatial geometry, when The relationship between domain transformation and focal length change derives the depth distance. Depth estimation can be used in the fields of 3D modeling, scene understanding, and image synthesis for depth perception. The depth estimation of images based on deep learning is based on the relationship between pixel depth values. Fitting a function maps an image into a depth map, and monocular depth estimation usually uses image data from a single perspective as input to directly predict the depth value corresponding to each pixel in the image.
  • step 102 the migration processing of multiple viewing angles is performed on the two-dimensional image to obtain a migration result corresponding to each viewing angle of the two-dimensional image.
  • the migration is to migrate each pixel in the two-dimensional image to a canvas of the same size as the two-dimensional image
  • the migration results corresponding to each viewing angle include: the position of each pixel in the canvas of each viewing angle, for
  • the 3D processing of 2D images can be based on different styles of 3D processing, for example, to form a 3D video with a zoom in, a 3D video with a shaky camera, and a 3D video with a zoom out. 3D processing for different styles , it is necessary to determine the corresponding viewing angle and the packaging order of the migrated images.
  • the terminal When the terminal receives a three-dimensional operation for a two-dimensional image, it determines the rendering style specified by the three-dimensional operation, and then determines multiple viewing angles corresponding to the rendering style and corresponding rendering styles.
  • the encapsulation order of the migration images of the perspectives assuming that for a certain rendering style, the migration images of two perspectives need to be determined, that is, the migration of two perspectives needs to be performed on each pixel in the two-dimensional image, and two perspectives are obtained respectively. The result of the transfer of perspective.
  • step 103 based on the depth value of each pixel in the two-dimensional image and the migration result of the two-dimensional image corresponding to each viewing angle, the color value of each pixel in the migration image corresponding to each viewing angle is determined.
  • the color of each pixel in the migration image corresponding to each viewing angle is determined.
  • the following technical solution can be performed before the value is obtained by taking the depth value of each pixel in the two-dimensional image obtained by depth perception processing as the original depth value, and performing depth repair processing on the original depth value of each pixel in the two-dimensional image, The repaired depth value of each pixel in the two-dimensional image is obtained, and the corresponding original depth value is replaced based on the repaired depth value of each pixel.
  • the technical solution of the above depth restoration processing is mainly used to perform depth edge restoration on the depth value obtained by depth perception. Executed before step 103 is executed.
  • FIG. 5 is a depth map of the three-dimensionalization method for a two-dimensional image provided by an embodiment of the present application.
  • the depth estimation result obtained in the depth perception process in step 101 may produce uneven continuous beating at the edge,
  • each grid represents a pixel
  • the background object is a black grid 501
  • the foreground object is a white grid 502.
  • there is no gray grid in the middle which conforms to the principle of the same object and the same depth.
  • there are The discontinuous jumping phenomenon that is, the step jumping of the black grid 501 and the gray grid 503, and the gray grid 503 and the white grid 502, causes different depth estimation results for the same object. Therefore, it is necessary to perform depth inpainting for the edge, and the edge depth can be improved by means of fast median replacement.
  • the white grid 502 represents the foreground image of the two-dimensional image
  • the black grid 501 represents the background image of the two-dimensional image
  • the depth of the foreground image is smaller than that of the background image
  • the depth value at the edges of the two different color grids jumps greatly .
  • the above-mentioned depth repair processing is performed on the original depth value of each pixel in the two-dimensional image to obtain the repaired depth value of each pixel in the two-dimensional image, which can be achieved by the following technical solutions: based on the two-dimensional image The original depth value of each pixel in the two-dimensional image determines the edge pixels in the two-dimensional image and the non-edge pixels in the two-dimensional image; based on the edge pixels, determine the pixels to be replaced in the two-dimensional image that need median replacement, And the reserved pixels that do not need median replacement; the original depth values of all non-edge pixels in the connected region of the pixels to be replaced are sorted in descending order, and the median of the descending sorting results is used as the repair of the pixels to be replaced Depth value; keep the original depth value of the preserved pixel as the repaired depth value of the preserved pixel.
  • the above-mentioned determination of edge pixels in the two-dimensional image and non-edge pixels in the two-dimensional image based on the original depth value of each pixel in the two-dimensional image can be achieved by the following technical solutions: Any pixel in the dimensional image performs the following processing: when the absolute difference between the regularization processing result of the original depth value of the pixel point and the regularization processing result of the original depth value of at least one adjacent pixel point is not less than the difference threshold When the pixel point is determined as a non-edge pixel point; among them, the adjacent pixel point is the pixel point located in the adjacent position of any pixel point; when the regularization processing result of the original depth value of the pixel point is the same as that of each adjacent pixel point When the absolute difference between the regularization processing results of the original depth value of the point is less than the difference threshold, the pixel point is determined as an edge pixel point.
  • the depth map obtained by the depth perception process is regularized, so that the value range of the depth map is reduced to the interval of 0 to 1, see formula (1):
  • D.max refers to the maximum depth value in all pixels of the depth map
  • Dmin refers to the minimum depth value of all pixels in the depth map
  • Norm(D) is the result of regularization processing.
  • FIG. 6 is an edge marker diagram of a three-dimensional method for a two-dimensional image provided by an embodiment of the present application. If all absolute difference values (the above four absolute values) are less than the difference threshold, for example, the difference threshold is If it is set to 0.04, the pixel is marked as 1, and it is determined as an edge pixel; otherwise, the pixel is marked as 0, and it is determined as a non-edge pixel. For example, assuming that the normalized depth of the point is D(i, j), i represents the horizontal position of the point, and j represents the vertical position of the point, see formula (2) for judgment:
  • abs(D(i, j)-D(i+1, j) is the regularization processing result of the depth value of the pixel point (i, j) and the regularization of the depth value of the pixel point (i+1, j)
  • the absolute difference between the results of the normalization processing, abs(D(i,j)-D(i-1,j)) is the normalization processing result of the depth value of the pixel point (i,j) and the pixel point (i- 1, j)
  • the absolute difference between the regularization results of the depth value, abs(D(i,j)-D(i,j+1)) is the regularization of the depth value of the pixel (i,j)
  • the absolute difference between the result of the normalization process and the result of the regularization process of the depth value of the pixel point (i, j+1), abs(D(i,j)-D(i,j-1)) is the pixel point ( The absolute difference between the regularization processing result of the depth value of i, j
  • the pixel When the maximum value of the above four absolute differences is less than the difference threshold, the pixel is marked as 1, and it is determined as an edge pixel; otherwise, the pixel is marked as 0, and it is determined as a non-edge pixel.
  • the above-mentioned determination based on edge pixels in a two-dimensional image to be replaced by a pixel point that needs to be replaced by a median value, and a reserved pixel point that does not need to be replaced by a median value can be achieved by the following technical solutions: For a two-dimensional image Perform the following processing for any pixel point in the pixel point: when there is at least one edge pixel point in the connected area of the pixel point, determine the pixel point as the pixel to be replaced; when there is no edge pixel point in the connected area of the pixel point, determine Pixels are reserved pixels.
  • a connected area is delineated with each pixel as the center (central pixel), and the connected area refers to a set of multiple pixel points that have direct and indirect connected relationships with the pixel point, Specifically, the connected area can be a square of size k*k with the pixel as the center. Taking the textured pixel shown in FIG. 6 as the center pixel 602 as an example, the connected area is shown in FIG. 6 .
  • the size of the dotted box 601 is 3*3. If there are edge pixels (points marked as 1) in the connected area, the median replacement process is performed on the center pixel, otherwise, the median replacement process is not required.
  • the specific method of median replacement processing is as follows: first, determine the pixel (center pixel) that needs to be subjected to median replacement processing, and when there is at least one edge pixel in the connected area of the pixel (center pixel), replace The pixel is determined to be the pixel to be replaced, that is, the pixel that needs to be subjected to median replacement processing, and the depth values of all non-edge pixels are obtained from the connected area, and these depth values are arranged from small to large.
  • the median value replaces the depth value of the central pixel, and each pixel in the two-dimensional image is used as the central pixel to perform the above processing, thereby completing the traversal of all pixels, that is, completing the depth repair processing of the depth value of the two-dimensional image,
  • the greater the depth the farther the visual distance of the pixel is.
  • the visual distance is used to represent the distance between the viewpoint and the object in the image.
  • the visual distance of distant objects is greater than that of close-range objects.
  • FIG. 3B is a schematic flowchart of a three-dimensionalization method for a two-dimensional image provided by an embodiment of the present application.
  • the two-dimensional image is subjected to migration processing of multiple viewing angles to obtain a two-dimensional image corresponding to each
  • the transition result of the viewing angle can be explained by steps 1021-1023 shown in FIG. 3B .
  • step 1021 the depth value of each pixel in the two-dimensional image is updated to obtain the updated depth value of each pixel in the two-dimensional image.
  • the updated depth value has a negative correlation with the repaired depth value or the original depth value of the corresponding pixel
  • the depth value used for updating may be the original depth value or the depth value processed by the depth inpainting.
  • the depth value used for updating is the depth repaired depth value that has undergone the depth repair process.
  • the value is the original depth value.
  • the update process may be a reciprocal calculation, and a new depth map is obtained after the update process, see formula (3):
  • Norm(D) is the result of the regularization processing of the pixel
  • D is the updated depth value
  • the parameters used for the subtraction are not limited to 0.5
  • the depth value obtained after the update processing with reference to the above formula (3) is between -0.5
  • the visual distance is used to represent the distance between the viewpoint and the object in the picture.
  • the visual distance of distant objects is greater than that of close-range objects.
  • step 1022 a plurality of movement parameters corresponding to the plurality of viewing angles in a one-to-one manner are determined.
  • the movement parameters include horizontal movement parameters and vertical movement parameters.
  • the process of determining the movement parameters may be to acquire the movement parameters indiscriminately at intervals on the circumference.
  • FIG. 7 is a schematic diagram of the movement parameter acquisition of the three-dimensionalization method of the two-dimensional image provided by the embodiment of the present application, Take (v, u) to represent the movement parameter, v is the vertical movement parameter, u is the horizontal movement parameter, and a point is acquired and collected at every fixed angle on a circle with a radius of 1, and the ordinate and abscissa of the point are (v , u), obtaining multiple groups of (v, u) on the entire circle can be used to render multiple migration images from different perspectives, and then when performing three-dimensional processing of different rendering styles for the same image, the corresponding styles can be directly obtained. , and then encapsulate it in the order of the corresponding rendering style.
  • the process of determining the movement parameters may be to obtain the movement parameters individually on the circumference according to the viewing angle corresponding to the rendering style. Compared with the implementation of obtaining the movement parameters indiscriminately, only the movement parameters of several viewing angles need to be obtained. , so that only image migration and rendering of several perspectives need to be performed.
  • step 1023 the following processing is performed for each viewing angle: determining a horizontal movement vector positively correlated with setting the movement sensitivity parameter, updating the depth value, the horizontal movement parameter, and the width of the two-dimensional image; determining and setting the movement sensitivity parameter, Update the depth value, the vertical movement parameter, and the vertical movement vector that is positively correlated with the height of the two-dimensional image; obtain the original position corresponding to each pixel in the two-dimensional image in the migration image canvas of the viewing angle, and move horizontally with the original position as the starting point
  • the vector and the vertical movement vector are subjected to displacement processing to obtain the migration position of each pixel in the two-dimensional image in the migration image canvas.
  • the specific calculation method of the horizontal movement vector is as follows: the horizontal movement vector is obtained by multiplying the horizontal movement parameter u, the movement sensitivity parameter scale e, the updated depth value of the pixel, and the width w of the two-dimensional image, when the multiplication result is a negative number When , move the pixel in the negative direction in the horizontal direction. When the multiplication result is positive, move the pixel in the positive direction in the horizontal direction.
  • the specific calculation method of the vertical movement vector is as follows: the vertical movement parameter v, the movement The sensitive parameter scale, the updated depth value of the pixel, and the height h of the two-dimensional image are multiplied to obtain the vertical movement vector.
  • the multiplication result is a negative number
  • the pixel is moved in the negative direction in the vertical direction
  • the multiplication result is a positive number
  • move the pixel in the vertical direction to the positive direction for example, for the pixel (i, j) on the two-dimensional image, see formula (4) for the horizontal migration position A and formula (5) for the vertical migration position B :
  • the movement sensitivity parameter scale is a preset constant, the larger the setting is, the larger the movement range is, D(i, j) is the updated depth value of the updated depth map, and the value range can be between -0.5 and 0.5, and u is the level Movement parameter, scale is the movement sensitive parameter, w is the width of the two-dimensional image, v is the vertical movement parameter, and h is the height of the two-dimensional image.
  • the front and back objects move in different directions, and the objects closer or farther away from the viewpoint move farther.
  • This movement mode satisfies the three-dimensional display law.
  • the position and the vertical migration position are not integers, so that the implementation in step 103 can be taken to contribute color components to the surrounding positions.
  • FIG. 3C is a schematic flowchart of a three-dimensionalization method for a two-dimensional image provided by an embodiment of the present application.
  • Step 103 is based on the depth value of each pixel in the two-dimensional image and the two-dimensional image corresponding to each From the migration result of the viewing angle, determining the color value of each pixel in the migrated image corresponding to each viewing angle can be described by steps 1031-1033 shown in FIG. 3C .
  • step 1031 the contributing pixels of the pixels to be dyed are determined.
  • a contributing pixel is a pixel whose migration position is located in the connected area of the pixel to be dyed in the two-dimensional image
  • the connected area refers to a collection of multiple pixels that have direct and indirect connections with the pixel to be dyed.
  • the connected area can be a 3*3 square centered on the pixels to be dyed
  • the migration image finally obtained by the migration image canvas is composed of the pixels to be dyed.
  • the migration results of each viewing angle obtained in step 102 include: 2.
  • Each pixel in the 2D image is migrated to a migration position in the migration image canvas of the viewing angle, wherein the size of the migration image canvas is the same as that of the 2D image.
  • FIG. 12 is a schematic diagram of the contribution of the three-dimensionalization method for a two-dimensional image provided by an embodiment of the present application.
  • the migration image canvas includes 49 pixels to be dyed
  • FIG. 12 also shows a two-dimensional image.
  • the three pixel points A, B and C in the image are migrated in the migration position of the image canvas after the image migration.
  • the pixel points 34 to be dyed the area with a lattice pattern is determined with the pixel points 34 to be dyed as the center. Connected area, since the three pixel points A, B and C in the two-dimensional image are all in the connected area, therefore, the three pixel points A, B and C in the two-dimensional image are all contributing pixels of the pixel point 34 to be dyed.
  • step 1032 based on the migration position of each pixel in the two-dimensional image in the migration image canvas of the viewing angle and the depth value of each pixel in the two-dimensional image, determine the contribution weight of the contributing pixel corresponding to the pixel to be dyed .
  • step 1032 based on the migration position of each pixel in the 2D image in the migration image canvas of the viewing angle and the depth value of each pixel in the 2D image, it is determined that the contributing pixel corresponds to the pixel to be dyed
  • the contribution weight of the point can be realized by the following technical scheme: when the contribution pixel is located in the lower right area or the directly lower area in the connected area of the pixel to be dyed, the migration result of the contribution pixel is rounded up to obtain the The absolute difference between the migration result and the corresponding round-up result, and the contribution weight that is positively correlated with the updated depth value of the contributing pixel; when the contributing pixel is located in the upper left area or the upper area in the connected area of the pixels to be dyed , the migration result of the contribution pixel is rounded down, and the contribution weight that is positively correlated with the absolute difference between the migration result and the corresponding down-rounded result, and the updated depth value of the contribution pixel is obtained; when the contribution pixel When the point is located in the
  • the contribution weights of the color components contributed by the point B to the pixels to be dyed are different.
  • the contribution weight of the contribution pixel B to the pixel 34 to be dyed can be the horizontal distance from the contribution pixel B to the pixel 34 to be dyed.
  • the product of the absolute value of the vertical coordinate difference, for example, the contribution weight of the contributing pixel point B to the coloring pixel point 34 can be abs(i'-floor(i'))*abs(j'-floor(j')), where , the floor(i') operation and the floor(j') operation are round-up operations.
  • the round-off operation is a round-down operation.
  • the contribution weight can also be multiplied by exp(t*D(i, j)) to update the contribution weight (t can be 10), where D(i, j) is the contribution pixel point
  • D(i, j) is the contribution pixel point
  • the update depth value of so that the update depth value of the contributing pixel in the two-dimensional image is taken into account for the influence of the pixel to be dyed, because after the depth value is updated, the D(i, The larger j) is, the contribution weight will increase exponentially, which conforms to the motion law in 3D visual effects.
  • the contributing pixel when the contributing pixel is located in the lower right region or the directly lower region in the connected region of the pixels to be dyed, for example, the contributing pixel B is located in the lower right region in the connected region of the pixels to be dyed, the contributing pixel C is the directly lower area in the connected area of the pixels to be dyed, and the migration results (horizontal migration position and vertical migration position) of the contributing pixels are rounded up, and the migration results and the corresponding upward rounding results are obtained.
  • the absolute difference between and the updated depth value of the contributing pixels are positively correlated with the contribution weights.
  • the directly lower area refers to the migration position of the contributing pixels in the connected area that is directly below the pixels to be dyed, and the lower right area.
  • the region refers to that the migration position of the contributing pixels in the connected region is in the fourth quadrant relative to the pixels to be dyed.
  • the contribution pixel A is located in the lower right area in the connected area of the pixels to be dyed, and the contribution pixel is The migration results (horizontal migration position and vertical migration position) are rounded down to obtain a positive correlation with the absolute difference between the migration result and the corresponding rounded down result, and the updated depth value of the contributing pixels contribution weight.
  • the upper-left area means that the migration position of the contributing pixels in the connected area is directly above the pixels to be dyed, and the upper left area refers to the migration position of the contributing pixels in the connected area that is at the second position relative to the pixels to be dyed. quadrant.
  • the horizontal migration position of the contributing pixel is rounded up, and the vertical migration position of the contributing pixel is processed.
  • a rounding down process is performed to obtain a contribution weight that is positively correlated with the absolute difference between the migration result and the corresponding rounding result, and the updated depth value of the contributing pixel point.
  • the right area means that the migration position of the contributing pixels in the connected area is directly to the right of the pixels to be dyed, and the upper right area refers to that the migration position of the contributing pixels in the connected area is at the first position relative to the pixels to be dyed. one quadrant.
  • the result of the migration of the contributing pixel is rounded down, and the result of the vertical migration of the contributing pixel is rounded up.
  • the rounding process is performed to obtain a contribution weight that is positively correlated with the absolute difference between the migration result and the corresponding rounding result, and the updated depth value of the contribution pixel point.
  • the right left area means that the migration position of the contributing pixels in the connected area is directly to the left of the pixels to be dyed, and the lower left area means that the migration positions of the contributing pixels in the connected area are at the first position relative to the pixels to be dyed.
  • the center of the pixel to be dyed is taken as the origin, and vertical and horizontal coordinate axes are established in the manner shown in FIG. 12 , so as to obtain the above-mentioned first to fourth quadrants.
  • step 1033 the color value of each contributing pixel is weighted based on the contribution weight of each contributing pixel to obtain the color value of the pixel to be dyed.
  • the contribution pixel point A, the contribution pixel point B and the contribution pixel point C are respectively the contribution weights weightA, weightB and weightC of the pixels to be dyed, and the RGB color value of the contribution pixel point A in the two-dimensional image is multiplied by the weightA and the contribution pixel point A.
  • step 104 a transition image corresponding to a viewing angle is generated based on the color value of each pixel in the transition image of each viewing angle.
  • step 105 the transition images of multiple viewing angles are packaged in sequence to obtain a three-dimensional video.
  • FIG. 3D is a schematic flowchart of the three-dimensionalization method for a two-dimensional image provided by an embodiment of the present application.
  • step 105 the migration images of multiple viewing angles are packaged in sequence, and the three-dimensional video can be obtained by using FIG. 3D Steps 1051-1053 are shown for illustration.
  • step 1051 based on the depth value of each pixel point in the two-dimensional image, the vacant pixel filling process is performed on the transition image of each viewing angle.
  • step 1051 based on the depth value of each pixel in the two-dimensional image, vacancy pixel filling processing is performed on the migration image of each viewing angle, which can be implemented by the following technical solutions: For each pixel to be dyed, the following processing is performed: when there is no contribution pixel corresponding to the pixel to be dyed in the connected area of the pixel to be dyed, the position of the pixel to be dyed is determined as a vacant position; Each vacancy position performs the following processing: taking the vacancy position as the center, based on the depth value of some pixel points in the two-dimensional image, the reference pixel point of the pixel point to be dyed in the connected area of the vacancy position is searched; based on the color value of the reference pixel point, Fill in the color value of the pixels to be dyed.
  • the pixel 00 to be dyed in the obtained migration image is actually blank, and the pixel to be dyed 00 will be
  • the position is determined to be a vacant position, so it is necessary to fill in the pixel 00 to be dyed, and the reference pixel corresponding to the pixel 00 to be dyed needs to be determined accordingly, so that the color value of the reference pixel in the migration image is filled in the pixel to be dyed.
  • the vacant position of point 00 In the vacant position of point 00.
  • querying the reference pixel points of the pixel points to be dyed in the connected region of the vacant position can be realized by the following technical solutions: Multiple sets of query directions with the vacancy position as the starting point; wherein, the first direction included in each set of query directions is opposite to the second direction; the following processing is performed for each set of query directions: in the first direction in the connected area of the vacancy position, Determine the pixel point of the non-vacancy position closest to the vacancy position, and in the second direction in the connected area of the vacancy position, determine the pixel point of the non-vacancy position closest to the vacancy position; determine the pixel point determined in the first direction and the pixel distance between the pixel points determined in the second direction; determine the two pixel points corresponding to the minimum pixel distance in multiple sets of query directions; determine the two pixel points based on the depth values of some pixels in the two-dimensional image The rendering depth value is determined, and the pixel point
  • the pixel point with the dot matrix image in the middle of FIG. 9 is the pixel point to be dyed that needs to be filled, and its position is the vacant position, and multiple sets of queries starting from the vacant position
  • the direction can obtain different groups of query directions according to a fixed angle. Assuming that there are two groups of query directions, the first direction included in the first group of query directions is opposite to the second direction, and the first direction included in the second group of query directions is opposite to the second direction. The directions are opposite, there is an included angle between the line where the first group of query directions is located and the line where the second group of query directions is located. Taking the first and second directions shown in FIG.
  • the pixel points in the nearest non-vacant position are the pixels that make up the migration image, and the connected area here can be within the range of the image at most, that is, the entire migration image is regarded as connected area, or designate a limited range area as a connected area, determine the pixel distance between the pixels in the first direction and the non-vacant position determined in the second direction, so as to obtain the first pixel of the first set of query directions
  • the distance and the second pixel distance of the second set of query directions, the first pixel distance and the two pixel points corresponding to the minimum pixel distance in the second pixel distance (pixel points in non-vacant positions, that is, pixels with a striped pattern) point) based on the depth values of some pixels in the two-dimensional image, determine the rendering depth values of two pixels, and determine the pixel with a larger rendering depth value as the reference pixel of the pixel to be dyed, so that the distance
  • the migration result of each viewing angle includes: each pixel in the 2D image is migrated to a migration position in the migration image canvas of the viewing angle, wherein the size of the migration image canvas is the same as the size of the 2D image; Part of the pixel points are the contribution pixels of the target pixel point; the above-mentioned determination of the rendering depth value of the two pixel points based on the depth value of some pixel points in the two-dimensional image can be realized by the following technical solutions: The pixel is used as the target pixel, and the following processing is performed: the contribution pixel of the target pixel is determined, and the contribution pixel is the pixel in the two-dimensional image that is located in the connected area with the migration position of the target pixel; The migration position in the migration image canvas is determined, and the contribution weight of the contribution pixel to the target pixel is determined; based on the contribution weight of the contribution pixel, the depth value of the contribution pixel is weighted to obtain the rendering depth value of the two pixels.
  • the way of acquiring the rendering depth values of two pixels in the migration image corresponding to the minimum pixel distance is similar to that of acquiring the color values of the pixels to be colored in step 103, except that the weighted and summed components are contributing pixels
  • the depth value of the point rather than the RGB color value, takes the two pixel points as the target pixel points to obtain the rendering depth value, so as to obtain the rendering depth value of the two pixel points, that is, the weighted summation processing is performed twice, and the color
  • the value-filling process can also be implemented using a deep-learned model.
  • step 1052 Gaussian blurring is performed on the filling result of the vacant pixels of the transition image of each viewing angle to obtain a Gaussian blurred image.
  • Gaussian blur processing can be understood as taking the color value of the target pixel of Gaussian blurring as the average value of surrounding pixels, taking the target pixel of Gaussian blurring as the center point, and taking the color value of surrounding points (points closely surrounding the center point) as the center point.
  • the average value is used as the color value of the center point, which is smoothed numerically. It is equivalent to producing a blurring effect in graphics.
  • the target pixel of Gaussian blur is the pixel that has been filled with color values. point.
  • step 1053 the Gaussian blurred images of each viewing angle are encapsulated in sequence to obtain a three-dimensional video.
  • FIG. 11 is a schematic diagram of the package of the three-dimensional image 3D method provided by the embodiment of the present application.
  • the package of the Gaussian blurred image of each viewing angle is determined.
  • Sequence if Gaussian blurring is not performed, the packaging sequence of the color value-filled migration images of each viewing angle is determined.
  • the migration images of each viewing angle can also be directly Encapsulation is performed, that is, the encapsulation order of the migrated images of each viewing angle is determined, that is, the encapsulation order substantially corresponds to the viewing angle.
  • the new images of the first viewing angle, the second viewing angle, the third viewing angle and the fourth viewing angle are packaged according to the packaging order of the corresponding viewing angles pointed by the rendering style, so as to obtain a three-dimensional video with the corresponding rendering style.
  • FIG. 3E is a schematic flowchart of a three-dimensionalization method for a two-dimensional image provided by an embodiment of the present application, which will be described with reference to steps 201 - 209 shown in FIG. 3E .
  • step 201 a two-dimensional image is displayed on the human-computer interaction interface.
  • step 202 in response to a three-dimensionalization operation for the two-dimensional image, the two-dimensional image is sent to the server.
  • the server performs depth perception processing on the two-dimensional image to obtain the depth value of each pixel in the two-dimensional image.
  • step 204 the server performs migration processing of multiple viewing angles for each pixel in the two-dimensional image, to obtain a migration result corresponding to each viewing angle.
  • step 205 the server determines the color value of each pixel in the migrated image corresponding to each viewing angle based on the depth value of each pixel in the two-dimensional image and the migration result of each viewing angle.
  • step 206 the server generates a transition image corresponding to a viewing angle based on the color value of each pixel in the transition image of each viewing angle.
  • step 207 the server encapsulates the transition images of multiple viewing angles in sequence to obtain a three-dimensional video.
  • step 208 the server sends the three-dimensional video to the terminal.
  • step 209 the terminal plays the three-dimensional video generated based on the two-dimensional image.
  • the electronic album client receives the user's selection operation on the photo, displays the target photo of the selection operation as the image to be processed, and in response to the three-dimensionalization operation on the to-be-processed image, the electronic album client calls the two-dimensional image provided by the embodiment of the present application.
  • the three-dimensionalization method generates a preview of the three-dimensional video of the to-be-processed image, and in response to the user's adjustment operation on the three-dimensional video, the electronic album client adjusts the packaging sequence of images of multiple viewing angles according to the adjustment method specified in the adjustment operation, for example, packaging The order is adjusted so that the visual effect is pulled from the close-up to the distant, or from the distant to the close-up.
  • the three-dimensionalization method for a two-dimensional image provided by the embodiment of the present application predicts the scene depth information through a deep learning model, and uses an image processor to perform transformation processing on the two-dimensional image level through a unified electronic device architecture algorithm. Padding and blurring can quickly acquire images from new perspectives to achieve depth-based composition of multiple 3D perspectives.
  • the processing process of 3D transformation on the basis of 2D image includes: processing of predicted depth map, 3D scene re-rendering, vacancy filling and blurring.
  • the whole process can be processed in parallel on the graphics processor of the graphics card, which is faster and can It avoids obtaining 3D video effects with excellent effects on the premise of 3D scene modeling in the related art, and can meet the requirements of terminals to obtain 3D video through 2D images, and because the re-rendering method and filling method are based on the initial depth prediction result , so that the rendering and filling conform to the laws of the 3D scene.
  • the terminal uploads the to-be-processed image to the background, so that the background invokes the three-dimensionalization method for a two-dimensional image provided by the embodiment of the present application to perform three-dimensional processing on the to-be-processed image to obtain a three-dimensional video, which is then sent back to the terminal for display and playback.
  • the three-dimensionalization method for a two-dimensional image provided by the embodiment of the present application generates a two-dimensional image based on an input by performing depth estimation processing on the image, depth edge restoration processing, determining a multi-view image based on depth, and performing processing such as vacancy filling and blurring. Two-dimensional images from multiple perspectives of the same scene are combined to form a three-dimensional structured video.
  • FIG. 4 is a schematic flowchart of a method for 3Dizing a 2D image provided by an embodiment of the present application.
  • the processing flow includes a depth perception process, a depth edge repair process, a multi-view re-rendering process, a vacancy filling and blurring process, and The process of generating 3D video results.
  • the input image is subjected to depth perception to obtain the predicted depth map, the depth edge repair processing is performed on the obtained depth estimation result, and the 3D scene is re-rendered based on the depth estimation repair result.
  • Multi-perspective re-rendering results are filled with vacancy and blurred, and then a 3D video result is generated based on the results of vacancy filling and blurring.
  • the whole process can be processed in parallel on the image processor and graphics card, and the processing speed is fast.
  • Excellent 3D video effects can be obtained without 3D modeling processing, which can meet the requirements of the terminal to obtain 3D video based on 2D images. Since the re-rendering method and filling method are based on the original depth estimation results, rendering The processing and filling processing conform to the law of the scene.
  • D the depth estimation result
  • Each pixel value represents the depth of the pixel in the input image.
  • the greater the depth the farther the distance of the pixel is.
  • FIG. 5 which is a three-dimensional image of a two-dimensional image provided by an embodiment of the present application.
  • the depth map of the method the depth estimation result obtained in the depth perception process will produce uneven continuous beating at the edge, each grid in Figure 5 represents a pixel, the background object is a black grid, and the foreground object is a white grid.
  • the method of fast median replacement is used to improve the edge depth.
  • Norm(D) (DD.min)/(D.max–D.min)
  • D.max refers to the maximum depth value among all pixels in the depth map
  • D min refers to all pixels in the depth map
  • the minimum depth value in the depth map for each pixel of the depth map, calculate the absolute value of the difference between the depth values of each pixel in the depth map and the upper, lower, left, and right adjacent 4 pixels to judge the pixel. Whether the point is in the edge area, see FIG. 6, FIG.
  • FIG. 6 is the edge marker diagram of the three-dimensional method of the two-dimensional image provided by the embodiment of the present application, if all the absolute values of the difference (the above four absolute values) are less than the difference threshold, For example, if the difference threshold is set to 0.04, the pixel is marked as 1, and it is determined as an edge pixel.
  • D(i, j) the normalized depth of the point
  • i the horizontal position of the point
  • j the vertical position of the point
  • max(abs(D(i,j)-D(i+1,j)), abs(D(i,j)-D(i-1,j)), abs(D(i,j)-D(i,j+1)), abs(D(i,j)-D(i,j-1))) ⁇ difference threshold then mark the pixel as 1, it is judged as an edge pixel, otherwise the pixel is marked as 0, and it is judged as a non-edge pixel.
  • a square of size k*k is taken with each pixel as the center, which is used to perform median replacement of the depth value of the center pixel. If there is a point with a value of 1 in the square, the center needs to be The point is subjected to median replacement processing, otherwise it is not necessary to perform median replacement processing.
  • the specific method of processing is to obtain all non-edge pixels from the depth map (k*k depth map) of points with a value of 1 (that is, The depth value of the point with a value of 0 in the edge marker map), and arrange these depth values from small to large, and replace the depth value of the central pixel with the median value after the arrangement, and all the pixels in the depth map After processing, the median replacement edge processing of the depth map is completed. For the obtained depth map, the greater the depth, the farther the distance characterizes the pixel.
  • FIG. 7 is the movement parameter acquisition of the three-dimensionalization method of the two-dimensional image provided by the embodiment of the present application Schematic diagram, for the combination of (v, u), based on a circle with a radius of 1, a point is obtained at every fixed angle, and its ordinate and abscissa are the values of (v, u), and more points are obtained on the entire circle.
  • the group (v, u) can be used to render multiple new images with different viewing angles.
  • each pixel A (i, j) of the original color map A will be moved to the target position A.
  • scale is a constant, the larger the setting, the greater the movement range, the depth map
  • FIG. 8 is a schematic diagram of the contribution of the three-dimensionalization method of the two-dimensional image provided by the embodiment of the present application
  • the projected point on the edge refers to (0, The projection point on the edge formed by 0) and (0, 1)
  • the projection point will affect the color value of the (0, 0) position and the (0, 1) position
  • the contribution weight of the projection point to the four pixels is the product of the absolute value of the difference between the horizontal and vertical coordinates of the corresponding four pixel points, where the upper left (LeftUp) and the lower right (RightDown) ) are corresponding points to each other, and the upper right (RightUp) and the lower left (LeftDown) are corresponding points to each other.
  • the pixel point contributes much less to the pixel point to be dyed, and the pixel point (i, j) in the original color map A is used to re-render the image A'
  • the corresponding position (i', j') in the vicinity of the corresponding position (i', j') is The contribution weight w of the pixel is multiplied by exp(t*D(i, j), and t can be 10, because the closer the pixel (close-range) after the update, the larger the D(i, j), the greater the contribution value. will also increase exponentially.
  • all pixels in the original color map A are based on (v, u) Perform movement processing, and contribute the corresponding GRB color components to the pixels to be dyed according to the weights to the corresponding points of the new rendered image A', multiply the color components and the weights, and accumulate them respectively, and finally divide the accumulated color components by the accumulated color components.
  • the weight component obtains the color value of the pixel to be dyed, so as to avoid the excessive contribution of the color component, perform depth rendering at the same time as color rendering, and perform depth rendering on the depth map after edge repair processing similar to the above process to obtain multi-view rendering.
  • the new depth map of the multi-view rendering still satisfies the rule that the farther the distance (distant), the greater the depth value.
  • FIG. 9 is a query schematic diagram of a three-dimensional method of a two-dimensional image provided by an embodiment of the present application.
  • the forward direction (first direction), these forward directions and the opposite direction (second direction) based on the point together form multiple groups of query objects, for each group of query objects, look for the nearest pixels in the two directions that are not vacant, If the query object is not searched beyond the image range, the query object is discarded.
  • the group of query objects with the shortest distance between the two pixels is selected, and the depth after rendering is based on the new perspective obtained during the depth rendering process.
  • value take the pixel with a large depth value (pixels with a large depth value in the opposite two directions) at the corresponding position (pixels in the first direction or pixels in the second direction), and use the RGB color value of the pixel to match the vacant position.
  • the principle based on the filling process is to use the background pixels that are close to the target pixel (vacancy position) to fill it.
  • FIG. 10 which is provided by the embodiment of the present application Schematic diagram of the effect of the three-dimensional method of the two-dimensional image.
  • the boundary in the left image is too obvious, and the visual effect of the missing pixels and surrounding pixels in the right image obtained after Gaussian blurring is not so sharp.
  • the above steps All of them can be operated on the image processor, with high speed and low latency. Combining all the rendered images of the new perspective into a video, the three-dimensional video result can be obtained.
  • scenes under different viewing angles can be reasonably reconstructed at the two-dimensional image level based on the depth estimation result, and the processing time is reduced compared with other three-dimensional modeling methods, which is convenient for mobile terminals to quickly Get the generated video results and display them.
  • the three-dimensional image three-dimensionalization device 455 may include: a depth module 4551, configured to perform depth perception processing on the two-dimensional image to obtain the depth value of each pixel in the two-dimensional image; a migration module 4552, configured to perform multiple viewing angles on the two-dimensional image.
  • the migration processing of the two-dimensional image is performed to obtain the migration result corresponding to each viewing angle of the two-dimensional image;
  • the color determination module 4553 is configured to determine the corresponding The color value of each pixel in the migration image of each viewing angle;
  • the generating module 4554 is configured to generate a migration image corresponding to the viewing angle based on the color value of each pixel in the migration image of each viewing angle;
  • the packaging module 4555 is configured to The migration images of multiple viewing angles are packaged in sequence to obtain a three-dimensional video.
  • the migration module 4552 is configured to, before determining the color value of each pixel in the migration image corresponding to each viewing angle based on the depth value of each pixel in the two-dimensional image and the migration result of each viewing angle : Take the depth value of each pixel in the two-dimensional image obtained through depth perception processing as the original depth value, and perform depth repair processing on the original depth value of each pixel in the two-dimensional image to obtain each pixel in the two-dimensional image. The repaired depth value of the point, replaces the corresponding original depth value based on the repaired depth value of each pixel.
  • the migration module 4552 is configured to: determine edge pixels in the two-dimensional image and non-edge pixels in the two-dimensional image based on the original depth value of each pixel in the two-dimensional image; based on the edge pixels Determine the pixels to be replaced that need median replacement in the two-dimensional image, and the reserved pixels that do not need median replacement; sort the original depth values of all non-edge pixels in the connected region of the pixels to be replaced in descending order processing, and the median value of the descending sorting result is used as the repaired depth value of the pixel to be replaced; the original depth value of the reserved pixel is retained as the repaired depth value of the reserved pixel.
  • the migration module 4552 is configured to: perform the following processing for any pixel point in the two-dimensional image: when the regularization processing result of the original depth value of the pixel point is the same as the original depth value of at least one adjacent pixel point When the absolute difference between the results of the regularization processing is not less than the difference threshold, the pixel point is determined as a non-edge pixel point; among them, the adjacent pixel point is the pixel point located in the adjacent position of any pixel point; when the pixel point is When the absolute difference between the regularization processing result of the original depth value of , and the regularization processing result of the original depth value of each adjacent pixel point is less than the difference threshold, the pixel point is determined as an edge pixel point.
  • the migration module 4552 is configured to perform the following processing for any pixel in the two-dimensional image: when there is at least one edge pixel in the connected region of the pixel, determine that the pixel is the pixel to be replaced ; When there is no edge pixel in the connected area of the pixel, determine the pixel as the reserved pixel.
  • the migration module 4552 is configured to: update the depth value of each pixel in the two-dimensional image to obtain the updated depth value of each pixel in the two-dimensional image; wherein, the updated depth value and the corresponding pixel The repair depth value of the point has a negative correlation; determine a plurality of movement parameters corresponding to multiple viewing angles, wherein the movement parameters include a horizontal movement parameter and a vertical movement parameter; perform the following processing for each viewing angle: determine and set The movement sensitivity parameter, the update depth value, the horizontal movement parameter, and the horizontal movement vector positively correlated with the width of the two-dimensional image; determine the positive correlation with the set movement sensitivity parameter, the updated depth value, the vertical movement parameter, and the height of the two-dimensional image Vertical movement vector; at the original position corresponding to each pixel in the two-dimensional image in the migration image canvas of the viewing angle, start the displacement according to the horizontal movement vector and the vertical movement vector, and determine the migration of each pixel in the migration image canvas Location.
  • the migration result of each viewing angle includes: each pixel in the 2D image is migrated to a migration position in the migration image canvas of the viewing angle, wherein the size of the migration image canvas is the same as the size of the 2D image;
  • the color determination module 4553 is configured to: take each pixel in the migration image of each viewing angle as a pixel to be dyed, and perform the following processing for each pixel to be dyed in the migration image canvas of each viewing angle: determine the pixel to be dyed Contribution pixels of points, where the contribution pixels are pixels whose migration positions are located in the connected area of the pixels to be dyed in the two-dimensional image; based on the migration position of each pixel in the two-dimensional image in the migration image canvas of the viewing angle , and the depth value of each pixel in the two-dimensional image, determine the contribution weight of the contribution pixel corresponding to the pixel to be dyed; based on the contribution weight of each contribution pixel, the color value of each contribution pixel is weighted to obtain The color value of the pixel
  • the encapsulation module 4555 is configured to: based on the depth value of each pixel in the two-dimensional image, perform vacancy pixel filling processing on the migration image of each viewing angle; fill vacancy pixels for the migration image of each viewing angle The result is subjected to Gaussian blurring to obtain a Gaussian blurred image; the Gaussian blurred image of each viewing angle is encapsulated in order to obtain a three-dimensional video.
  • the encapsulation module 4555 is configured to: for each pixel to be dyed in the migration image of each viewing angle, perform the following processing: when there is no corresponding pixel to be dyed in the connected area of the pixel to be dyed When contributing pixels, the positions of the pixels to be dyed are determined as vacant positions; the following processing is performed for each vacant position of the migration image: taking the vacant position as the center, based on the depth values of some pixels in the two-dimensional image, the depth of the vacant position is determined. Query the reference pixels of the pixels to be dyed in the connected area; based on the color values of the reference pixels, fill in the color values of the pixels to be dyed.
  • the encapsulation module 4555 is configured to: determine multiple sets of query directions starting from the vacancy position; wherein, the first direction included in each set of query directions is opposite to the second direction; for each set of query directions, execute the following Processing: In the first direction in the connected area of the vacancy position, determine the pixel point of the non-vacancy position closest to the vacancy position, and in the second direction in the connected area of the vacancy position, determine the non-vacancy position closest to the vacancy position The pixel point of the position; determine the pixel distance between the pixel point determined in the first direction and the pixel point determined in the second direction; determine the two pixel points corresponding to the minimum pixel distance in multiple groups of query directions; based on two The depth values of some pixels in the dimensional image are used to determine the rendering depth values of two pixels, and the pixel with the larger rendering depth value is determined as the reference pixel of the pixel to be dyed.
  • the migration result of each viewing angle includes: each pixel in the 2D image is migrated to a migration position in the migration image canvas of the viewing angle, wherein the size of the migration image canvas is the same as the size of the 2D image; Part of the pixels are the contribution pixels of the target pixel; the encapsulation module 4555 is configured to: take any one of the two pixels as the target pixel, and perform the following processing: determine the contribution pixel of the target pixel, and the contribution pixel A point is a pixel in the two-dimensional image that is located in the connected area with the migration position of the target pixel; based on the migration position of the contributing pixel in the migration image canvas of the viewing angle, the contribution weight of the contributing pixel to the target pixel is determined; based on the contribution The contribution weight of the pixel points, weights the depth value of the contributed pixel points, and obtains the rendering depth value of the two pixel points.
  • Embodiments of the present application provide a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the electronic device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the electronic device executes the method for three-dimensionalizing a two-dimensional image described above in the embodiments of the present application.
  • the embodiments of the present application provide a computer-readable storage medium storing executable instructions, wherein the executable instructions are stored, and when the executable instructions are executed by a processor, the processor will cause the processor to execute the method provided by the embodiments of the present application, for example , the three-dimensionalization method of two-dimensional images as shown in Figures 3A-3E.
  • the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; it may also include one or any combination of the foregoing memories of various equipment.
  • executable instructions may take the form of programs, software, software modules, scripts, or code, written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and which Deployment may be in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • executable instructions may, but do not necessarily correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in Hyper Text Markup Language (HTML)
  • HTML Hyper Text Markup Language
  • One or more scripts in a document stored in a single file dedicated to the program in question, or in multiple cooperating files (e.g., files that store one or more modules, subprograms, or code sections) .
  • executable instructions may be deployed to execute on one electronic device, or on multiple electronic devices located at one site, or alternatively, multiple electronic devices distributed across multiple sites and interconnected by a communication network execute on.
  • the perspective transformation process at the 2D image level is realized, and the image 3D process is realized at the 2D image processing level.
  • the image 3D process is realized at the 2D image processing level.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Processing Or Creating Images (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供了一种二维图像的三维化方法、装置、电子设备及计算机可读存储介质;方法包括:对二维图像进行深度感知处理,得到所述二维图像中每个像素点的深度值;对所述二维图像中每个像素点分别进行多个视角的迁移处理,得到对应每个视角的迁移结果;基于所述二维图像中每个像素点的深度值以及每个视角的迁移结果,确定对应每个视角的迁移图像中每个像素点的色彩值;基于每个视角的迁移图像中每个像素点的色彩值,生成对应视角的迁移图像;将多个视角的迁移图像按照顺序封装,得到三维化视频。通过本申请,能够基于一个二维图像快速且准确地生成三维化视频。

Description

二维图像的三维化方法、装置、设备及计算机可读存储介质
相关申请的交叉引用
本申请基于申请号为202010856161.6、申请日为2020年08月24日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及图像处理技术,尤其涉及一种二维图像的三维化方法、装置、电子设备及计算机可读存储介质。
背景技术
人工智能(AI,Artificial Intelligence)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法和技术及应用系统。
图像处理是人工智能的典型应用,随着互联网技术的发展,越来越多的应用产品提供了对图像进行多样化显示的功能,相关应用产品不仅能够对二维图像进行显示,还能够对二维图像进行三维化处理,从而显示与二维图像内容相关的三维化视频,申请人在实施本申请实施例的过程中发现三维化视频生成主要依赖于基于多角度摄像头的三维建模,但是需要耗费大量的计算资源成本以及时间成本。
发明内容
本申请实施例提供一种二维图像的三维化方法、装置、电子设备及计算机可读存储介质,能够基于一个二维图像快速且准确地生成三维化视频。
本申请实施例的技术方案是这样实现的:
本申请实施例提供一种二维图像的三维化方法,所述方法由电子设备执行,所述方法包括:
对二维图像进行深度感知处理,得到所述二维图像中每个像素点的深度值;
对所述二维图像进行多个视角的迁移处理,得到所述二维图像对应每个视角的迁移结果;
基于所述二维图像中每个像素点的深度值以及所述二维图像对应所述每个视角的迁移结果,确定对应所述每个视角的迁移图像中每个像素点的色彩值;
基于所述每个视角的迁移图像中每个像素点的色彩值,生成对应所述视角的迁移图像;
将所述多个视角的迁移图像按照顺序封装,得到三维化视频。
本申请实施例提供一种二维图像的三维化装置,所述装置包括:
深度模块,配置为对二维图像进行深度感知处理,得到所述二维图像中每个像素点的深度值;
迁移模块,配置为对所述二维图像进行多个视角的迁移处理,得到所述二维图像对应每个视角的迁移结果;
色彩确定模块,配置为基于所述二维图像中每个像素点的深度值以及所述二维图像对应所述每个视角的迁移结果,确定对应所述每个视角的迁移图像中每个像素点的色彩值;
生成模块,配置为基于所述每个视角的迁移图像中每个像素点的色彩值,生成对应所述视角的迁移图像;
封装模块,配置为将多个所述视角的迁移图像按照顺序封装,得到三维化视频。
本申请实施例提供一种二维图像的三维化方法,所述方法由电子设备执行,所述方法包括:
在人机交互界面显示二维图像;
响应于针对所述二维图像的三维化操作,播放基于所述二维图像生成的三维化视频;
其中,所述视频是通过执行本申请实施例提供的二维图像的三维化方法得到的。
本申请实施例提供一种二维图像的三维化装置,所述装置包括:
显示模块,用于在人机交互界面显示二维图像;
播放模块,用于响应于针对所述二维图像的三维化操作,播放基于所述二维图像生成的三维化视频;
其中,所述视频是通过执行本申请实施例提供的二维图像的三维化方法得到的。
本申请实施例提供一种电子设备,所述电子设备包括:
存储器,用于存储可执行指令;
处理器,用于执行所述存储器中存储的可执行指令时,实现本申请实施例提供的二维图像的三维化方法。
本申请实施例提供一种计算机可读存储介质,存储有可执行指令,用于被处理器执行时,实现本申请实施例提供的二维图像的三维化方法。
本申请实施例具有以下有益效果:
通过对二维图像进行多视角迁移以及对应视角迁移图像生成处理,实现了二维图像层面的视角变换过程,从而在二维图像处理层面上实现了图像三维化过程,以替代三维场景建模过程,在准确进行二维图像三维化以生成三维化视频的同时降低了后台或者终端的计算资源成本以及耗时成本。
附图说明
图1是本申请实施例提供的二维图像的三维化系统的结构示意图;
图2是本申请实施例提供的应用二维图像的三维化方法的电子设备的结构示意图;
图3A-3E是本申请实施例提供的二维图像的三维化方法的流程示意图;
图4是本申请实施例提供的二维图像的三维化方法的流程示意图;
图5是本申请实施例提供的二维图像的三维化方法的深度图;
图6是本申请实施例提供的二维图像的三维化方法的边缘标记图;
图7是本申请实施例提供的二维图像的三维化方法的移动参数获取示意图;
图8是本申请实施例提供的二维图像的三维化方法的贡献示意图;
图9是本申请实施例提供的二维图像的三维化方法的查询示意图;
图10是本申请实施例提供的二维图像的三维化方法的效果示意图;
图11是本申请实施例提供的二维图像的三维化方法的封装示意图;
图12是本申请实施例提供的二维图像的三维化方法的贡献原理图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述,所描述的实施例不应视为对本申请的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
在以下的描述中,所涉及的术语“第一\第二\第三”仅仅是是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
对本申请实施例进行进一步详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
1)深度图:在三维计算机图形和计算机视觉中,深度图是一种图像或图像通道,其中包含与场景对象的表面到视点的距离有关的信息,用于模拟三维形状或重建三维形状,深度图可以由三维扫描仪生成。
2)像素:数字图像是二维信号,记录了图像在行、列方向上的灰度或色彩,像素是一个计算机图像的最小逻辑单位。
3)深度估计:基于图像来估计它的深度信息,可以基于图像内容理解,基于聚焦,基于散焦,基于明暗变化进行深度估计,图像内容理解的深度估计方法主要是通过对图像中的各个景物分块进行分类,然后对每个类别的景物分别用各自适用的方法估计它们的深度信息。
相关技术中,针对基于二维图像生成含有三维视觉效果的视频的问题,主要解决方式是通过深度学习预测场景深度信息,对场景进行三维建模且对遮挡部分进行填补预测,再通过模拟相机的移动,改变相机的视角,重新渲染新视角下的图像以获取新视角下的图像,进而展示具有三维效果的视频,相关技术中三维化视频生成方法主要通过深度学习预测场景深度信息,再通过三维建模构建该场景,并对遮挡部分通过深度学习进行填充,通过模拟相机的移动,重新渲染新视角下的图像,以获取带有三维效果的视频,但是三维建模的过程复杂且耗时,导致整个计算过程复杂且时间开销大,从而不适合支持延时性低的在线功能。
本申请实施例提供一种二维图像的三维化方法、装置、电子设备和计算机可读存储介质,能够快速准确地重建不同视角下的图像场景以实现图像的三维化展示,下面说明本申请实施例提供的电子设备的示例性应用,本申请实施例提供的电子设备可以为笔记本电脑,平板电脑,台式计算机,机顶盒,智能家居如智能电视,移动设备(例如,移动电话,便携式音乐播放器,个人数字助理,专用消息设备,便携式游戏设备)等各种类型的用户终端,也可以为服务器。下面,将说明设备实施为服务器时的示例性应用。
参见图1,图1是本申请实施例提供的二维图像的三维化系统的结构示意图,为实现支撑一个图像编辑应用,终端400通过网络300连接服务器200,网络300可以是广域网或者局域网,又或者是二者的组合,终端400将待处理图像上传至服务器200,由服务器200对待图像处理进行二维图像的三维化处理,得到对应待处理图像的三维化视频,服务器200将三维化视频回传至终端400,并在终端400上进行播放。
在一些实施方式中,终端400将待处理图像上传至服务器200的同时还可以将图像三维化操作所指定的渲染模式返回至服务器200,由服务器200按照渲染模式确定出对应的多个视角,以及迁移图像的封装顺序,服务器200按照对应的视角获取对应多个视角的迁移图像,按照封装顺序对迁移图像进行封装,生成三维化视频,回传至终端400进行播放。
在图1中是以终端400和服务器200协同完成二维图像的三维化处理为例说明,可以理解地,作为替换方案,终端400可以依赖于自身的能力完成二维图像的三维化处理。
在一些实施例中,终端400通过运行计算机程序来实现本申请实施例提供的二维图像的三维化方法,例如,计算机程序可以是操作系统中的原生程序或软件模块;可以是本地(N ative)应用程序(APP,Application),即需要在操作系统中安装才能运行的程序,例如视频APP或直播APP;也可以是小程序,即只需要下载到浏览器环境中就可以运行的程序; 还可以是能够嵌入至任意APP中的视频小程序或直播小程序。总而言之,上述计算机程序可以是任意形式的应用程序、模块或插件。
在一些实施例中,服务器200可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。终端400可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表等,但并不局限于此。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请实施例中不做限制。
参见图2,图2是本申请实施例提供的应用二维图像的三维化方法的电子设备的结构示意图,以电子设备是依赖于自身能力独立完成二维图像的三维化处理的终端400为例,图2所示的终端400包括:至少一个处理器410、存储器450、至少一个网络接口420和用户接口430。终端400中的各个组件通过总线系统440耦合在一起。可理解,总线系统440用于实现这些组件之间的连接通信。总线系统440除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图2中将各种总线都标为总线系统440。
处理器410可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、数字信号处理器(DSP,Digital Signal Processor),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。
用户接口430包括使得能够显示媒体内容的一个或多个输出装置431,包括一个或多个扬声器和/或一个或多个视觉显示屏。用户接口430还包括一个或多个输入装置432,包括有助于用户输入的用户接口部件,比如键盘、鼠标、麦克风、触屏显示屏、摄像头、其他输入按钮和控件。
存储器450可以是可移除的,不可移除的或其组合。示例性的硬件设备包括固态存储器,硬盘驱动器,光盘驱动器等。存储器450可选地包括在物理位置上远离处理器410的一个或多个存储设备。
存储器450包括易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。非易失性存储器可以是只读存储器(ROM,Read Only Memory),易失性存储器可以是随机存取存储器(RAM,Random Access Memory)。本申请实施例描述的存储器450旨在包括任意适合类型的存储器。
在一些实施例中,存储器450能够存储数据以支持各种操作,这些数据的示例包括程序、模块和数据结构或者其子集或超集,下面示例性说明。
操作系统451,包括用于处理各种基本系统服务和执行硬件相关任务的系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务;
网络通信模块452,用于经由一个或多个(有线或无线)网络接口420到达其他电子设备,示例性的网络接口420包括:蓝牙、无线相容性认证(WiFi)、和通用串行总线(USB,Universal Serial Bus)等;
显示模块453,用于经由一个或多个与用户接口430相关联的输出装置431(例如,显示屏、扬声器等)使得能够显示信息(例如,用于操作外围设备和显示内容和信息的用户接口);
输入处理模块454,用于对一个或多个来自一个或多个输入装置432之一的一个或多个用户输入或互动进行检测以及翻译所检测的输入或互动。
在一些实施例中,本申请实施例提供的二维图像的三维化装置可以采用软件方式实现,图2示出了存储在存储器450中的二维图像的三维化装置455,其可以是程序和插件等形式的软件,包括以下软件模块:深度模块4551、迁移模块4552、色彩确定模块4553、生成模块4554、封装模块4555、显示模块4556和播放模块4557,这些模块是逻辑上的,因此根据所实现的功能可以进行任意的组合或进一步拆分,将在下文中说明各个模块的功能。
将结合本申请实施例提供的电子设备的示例性应用和实施,说明本申请实施例提供的二维图像的三维化方法,本申请实施例提供的二维图像的三维化方法可以由上文的终端400独立完成或者由上文所述的终端400和服务器200协同完成。
将结合本申请实施例提供的终端的示例性应用和实施,说明本申请实施例提供的二维图像的三维化方法。
参见图3A,图3A是本申请实施例提供的二维图像的三维化方法的流程示意图,将结合图3A示出的步骤101-105进行说明。
在步骤101中,对二维图像进行深度感知处理,得到二维图像中每个像素点的深度值。
作为示例,二维图像中像素点的深度值是使用深度感知算法所感知到的像素点的深度值,即下文的原始深度值,深度估计问题在计算机视觉领域属于三维重建,从空间几何,时域变换和焦距变化的关系推导深度距离,深度估计可以用于三维建模、场景理解、深度感知的图像合成等领域,基于深度学习的图像深度估计依据是通过像素深度值关系反映深度关系,通过拟合一个函数将图像映射成深度图,单目深度估计通常利用单一视角的图像数据作为输入,直接预测图像中每个像素对应的深度值。
在步骤102中,对二维图像进行多个视角的迁移处理,得到二维图像对应每个视角的迁移结果。
作为示例,迁移是将二维图像中每个像素点迁移到与二维图像相同尺寸的画布中,对应每个视角的迁移结果包括:每个像素点在每个视角的画布中的位置,对二维图像的三维化处理可以是基于不同风格的三维化处理,例如,形成镜头拉近的三维视频,形成镜头晃动的三维视频,形成镜头拉远的三维视频,针对于不同风格的三维化处理,需要确定出相应视角以及迁移图像的封装顺序,终端接收到针对二维图像的三维化操作时,确定出三维化操作所指定的渲染风格,并进而确定出对应渲染风格的多个视角以及对应视角的迁移图像的封装顺序,假设对应某一渲染风格而言,需要确定两个视角的迁移图像,即需要对二维图像中每个像素点分别进行两个视角的迁移处理,分别得到两个视角的迁移结果。
在步骤103中,基于二维图像中每个像素点的深度值以及二维图像对应每个视角的迁移结果,确定对应每个视角的迁移图像中每个像素点的色彩值。
在一些实施例中,在执行步骤103中基于二维图像中每个像素点的深度值以及二维图像对应每个视角的迁移结果,确定对应每个视角的迁移图像中每个像素点的色彩值之前,可以执行以下技术方案:将通过深度感知处理得到的二维图像中每个像素点的深度值作为原始深度值,对二维图像中每个像素点的原始深度值进行深度修复处理,得到二维图像中每个像素点的修复深度值,基于每个像素点的修复深度值替换对应的原始深度值。
作为示例,上述深度修复处理的技术方案主要是用于对深度感知得到的深度值进行深度边缘修复,上述对深度值进行深度修复的过程可以是在执行步骤102之前执行或者在执行步骤102之后且在执行步骤103之前执行。
作为示例,参见图5,图5是本申请实施例提供的二维图像的三维化方法的深度图,步骤101中深度感知过程中获得的深度估计结果在边缘处会产生不均匀的连续跳动,图5中每个格子代表一个像素点,背景对象是黑色格子501,前景对象是白色格子502,理论上不存在介于中间的灰色格子,才符合同一物体同一深度的原则,然而图5中存在不连续跳动现象,即黑格子501与灰格子503紧邻,灰格子503与白格子502紧邻的阶跃跳动,导致本来属于同一对象深度估计结果出现不同。因此,需要针对边缘进行深度修复,可以采取快速中值替换的方式对边缘深度进行改善。
作为示例,白格子502表征二维图像的前景图像,黑格子501表征二维图像的背景图像,前景图像的深度比背景图像的深度小,两个不同颜色格子的边缘处的深度值跳动较大。
在一些实施例中,上述对二维图像中每个像素点的原始深度值进行深度修复处理,得到二维图像中每个像素点的修复深度值,可以通过以下技术方案实现:基于二维图像中每个像 素点的原始深度值,确定二维图像中的边缘像素点以及二维图像中的非边缘像素点;基于边缘像素点确定二维图像中需要进行中值替换的待替换像素点、以及不需要进行中值替换的保留像素点;将待替换像素点的连通区域内所有非边缘像素点的原始深度值进行降序排序处理,并将降序排序结果的中值作为待替换像素点的修复深度值;保留保留像素点的原始深度值作为保留像素点的修复深度值。
在一些实施例中,上述基于二维图像中每个像素点的原始深度值,确定二维图像中的边缘像素点以及二维图像中的非边缘像素点,可以通过以下技术方案实现:针对二维图像中任意一个像素点执行以下处理:当像素点的原始深度值的正则化处理结果与至少一个相邻像素点的原始深度值的正则化处理结果之间的绝对差值不小于差值阈值时,将像素点确定为非边缘像素点;其中,相邻像素点为位于任意一个像素点的相邻位置的像素点;当像素点的原始深度值的正则化处理结果与每个相邻像素点的原始深度值的正则化处理结果之间的绝对差值均小于差值阈值时,将像素点确定为边缘像素点。
作为示例,将深度感知过程得到的深度图进行正则化处理,使得深度图的取值范围缩小至0至1的区间中,参见公式(1):
Norm(D)=(D-D.min)/(D.max–D.min)                  (1);
其中,D.max指的是深度图所有像素点中的最大深度值,Dmin指的是深度图所有像素点中的最小深度值,Norm(D)是正则化处理结果。
对深度图的每个像素,计算深度图中每个像素点与上、下、左、右相邻4个像素点的正则化处理结果的绝对差值,用以判断该像素点是否在边缘区域,参见图6,图6是本申请实施例提供的二维图像的三维化方法的边缘标记图,若所有绝对差值(上述四个绝对值)均小于差值阈值,例如,差值阈值被设置为0.04,则将该像素点标记为1,判别为边缘像素点,否则将该像素点标记为0,判别为非边缘像素点。例如,假设该点正则化后的深度为D(i,j),i表示该点的水平位置,j表示该点的垂直位置,参见公式(2)进行判断:
max(abs(D(i,j)-D(i+1,j)),abs(D(i,j)-D(i-1,j)),abs(D(i,j)-D(i,j+1)),abs(D(i,j)-D(i,j-1)))<差值阈值;
其中,abs(D(i,j)-D(i+1,j)是像素点(i,j)的深度值的正则化处理结果与像素点(i+1,j)的深度值的正则化处理结果之间的绝对差值,abs(D(i,j)-D(i-1,j))是像素点(i,j)的深度值的正则化处理结果与像素点(i-1,j)的深度值的正则化处理结果之间的绝对差值,abs(D(i,j)-D(i,j+1))是像素点(i,j)的深度值的正则化处理结果与像素点(i,j+1)的深度值的正则化处理结果之间的绝对差值,abs(D(i,j)-D(i,j-1))是像素点(i,j)的深度值的正则化处理结果与像素点(i,j-1)的深度值的正则化处理结果之间的绝对差值。
当上述四个绝对差值中的最大值小于差值阈值时,将该像素点标记为1,判别为边缘像素点,否则将该像素点标记为0,判别为非边缘像素点。
在一些实施例中,上述基于边缘像素点确定二维图像中需要进行中值替换的待替换像素点、以及不需要进行中值替换的保留像素点,可以通过以下技术方案实现:针对二维图像中任意一个像素点执行以下处理:当像素点的连通区域内中至少存在一个边缘像素点时,确定像素点为待替换像素点;当像素点的连通区域内中不存在边缘像素点时,确定像素点为保留像素点。
作为示例,在图6所示的边缘标记图中以每个像素为中心(中心像素)划定连通区域,连通区域指的是与该像素点存在直接和间接连通关系的多个像素点集合,具体该连通区域可以为以该像素为中心的k*k大小的正方形,以将图6中示出的带纹路像素点作为中心像素602为例进行说明,它的连通区域为图6中示出的尺寸为3*3的虚线框601,如果连通区域中存在边缘像素点(标记为1的点),则对该中心像素进行中值替换处理,否则不需要进行中值替换处理。
作为示例,中值替换处理的具体方式如下:首先确定出需要进行中值替换处理的像素点(中心像素),当像素点(中心像素)的连通区域内中至少存在一个边缘像素点时,将该像素点确定为待替换像素点,即需要进行中值替换处理的像素点,从连通区域中获取所有非边缘像素点的深度值,并对这些深度值进行小到大排列,利用排列后的中值对该中心像素的深度值进行替换,将二维图像中的每个像素均作为中心像素执行上述处理,从而完成对所有像素的遍历,即完成二维图像的深度值的深度修复处理,对于获得的经过修复的深度图而言,深度越大则表征该像素的视觉距离越远,视觉距离用于表征视点与图中对象的距离,远景对象的视觉距离大于近景对象的视觉距离。
基于图3A,参见图3B,图3B是本申请实施例提供的二维图像的三维化方法的流程示意图,步骤102中对二维图像进行多个视角的迁移处理,得到二维图像对应每个视角的迁移结果可以通过图3B示出的步骤1021-1023进行说明。
在步骤1021中,对二维图像中每个像素点的深度值进行更新,得到二维图像中每个像素点的更新深度值。
作为示例,更新深度值与对应像素点的修复深度值或者原始深度值成负相关关系,用于进行更新的深度值可以是原始深度值或者是经过深度修复处理的深度值,当深度修复方案是在执行步骤102之前执行,则用于进行更新的深度值为经过深度修复处理的修复深度值,当深度修复方案是在执行步骤102之后且在执行步骤103之前执行,即用于进行更新的深度值为原始深度值。
作为示例,更新处理可以是进行倒数计算,更新处理后获取新的深度图,参见公式(3):
D=(1/Norm(D))–0.5                      (3);
其中,Norm(D)为像素点的正则化处理结果,D为更新深度值,相减所使用的参数不局限于0.5,参考上述公式(3)进行更新处理后得到的深度值介于-0.5和0.5之间,更新深度值越大表征视觉距离越小,即像素的距离越近,视觉距离用于表征视点与图中对象的距离,远景对象的视觉距离大于近景对象的视觉距离。
在步骤1022中,确定分别与多个视角一一对应的多个移动参数。
作为示例,移动参数包括水平移动参数与垂直移动参数。
在一些实施例中,确定移动参数的过程可以是在圆周上按照间隔无差别获取移动参数,参见图7,图7是本申请实施例提供的二维图像的三维化方法的移动参数获取示意图,采取(v,u)表示移动参数,v为垂直移动参数,u为水平移动参数,在半径为1的圆周上每隔固定角度获取采集一个点,该点的纵坐标和横坐标即为(v,u)的取值,整个圆上获取多组(v,u)即可用于渲染多个不同视角的迁移图像,进而在针对相同图像进行不同渲染风格的三维化处理时,可以直接获取相应风格的迁移图像,进而按照对应渲染风格的顺序封装即可。
在一些实施例中,确定移动参数的过程可以是在圆周上按照渲染风格所对应的视角个性化获取移动参数,相较于无差别获取移动参数的实施方式,只需要获取若干个视角的移动参数,从而仅需要执行若干个视角的图像迁移与渲染。
在步骤1023中,针对每个视角执行以下处理:确定与设定移动灵敏参数、更新深度值、水平移动参数、以及二维图像的宽度正相关的水平移动矢量;确定与设定移动灵敏参数、更新深度值、垂直移动参数、以及二维图像的高度正相关的垂直移动矢量;获取二维图像中每个像素点在视角的迁移图像画布中对应的原始位置,以原始位置为起点按照水平移动矢量以及垂直移动矢量进行位移处理,得到二维图像中每个像素点在迁移图像画布中的迁移位置。
作为示例,水平移动矢量的具体计算方式如下:将水平移动参数u、移动灵敏参数scal e、像素的更新深度值、以及二维图像的宽度w相乘得到水平移动矢量,当相乘结果为负数时,则将像素在水平方向上向负方向移动,相乘结果为正数时,则将像素在水平方向上向正方向移动,垂直移动矢量的具体计算方式如下:将垂直移动参数v、移动灵敏参数scale、像素的更新深度值、以及二维图像的高度h相乘得到垂直移动矢量,当相乘结果为负数时,则将像素在垂直方向上向负方向移动,相乘结果为正数时,则将像素在垂直方向上向正方向移 动,例如,针对二维图像上的像素(i,j),水平的迁移位置A参见公式(4)和垂直的迁移位置B参见公式(5):
A=i+u*scale*D(i,j)*w               (4);
B=j+v*scale*D(i,j)*h               (5);
其中,移动灵敏参数scale为预设常量,设置越大则移动幅度越大,D(i,j)为更新深度图的更新深度值,取值范围可以在-0.5与0.5之间,u为水平移动参数,scale为移动灵敏参数,w为二维图像的宽度,v为垂直移动参数,h为二维图像的高度。
通过上述迁移过程可以实现以下效果:靠前和靠后的对象向不同的方向移动,且距离视点越近或者越远的对象的移动距离远大,这样的移动模式满足三维显示规律,鉴于水平的迁移位置和垂直的迁移位置并不为整数,从而可以采取步骤103中的实施方式向周边位置贡献色彩分量。
基于图3A,参见图3C,图3C是本申请实施例提供的二维图像的三维化方法的流程示意图,步骤103中基于二维图像中每个像素点的深度值以及二维图像对应每个视角的迁移结果,确定对应每个视角的迁移图像中每个像素点的色彩值可以通过图3C示出的步骤1031-1033进行说明。
将每个视角的迁移图像中每个像素点作为待染色像素点,并针对每个视角的迁移图像画布中每个待染色像素点执行以下处理:
在步骤1031中,确定待染色像素点的贡献像素点。
作为示例,贡献像素点是二维图像中迁移位置位于待染色像素点的连通区域内的像素点,连通区域指的是与待染色像素点存在直接和间接联通关系的多个像素点集合,具体该连通区域可以为以待染色像素点为中心的3*3大小的正方形,迁移图像画布最终所得到的迁移图像由待染色像素点组成,步骤102中得到的每个视角的迁移结果包括:二维图像中每个像素点被迁移到视角的迁移图像画布中的迁移位置,其中,迁移图像画布的尺寸与二维图像的尺寸相同。
作为示例,参见图12,图12是本申请实施例提供的二维图像的三维化方法的贡献原理图,例如,迁移图像画布中包括49个待染色像素点,图12中还显示了二维图像中三个像素点A、B和C进行图像迁移后在迁移图像画布的迁移位置,针对于待染色像素点34而言,具有点阵图案的区域为以待染色像素点34为中心确定的连通区域,由于二维图像中三个像素点A、B和C均处于连通区域内,因此,二维图像中三个像素点A、B和C均为待染色像素点34的贡献像素点。
在步骤1032中,基于二维图像中每个像素点在视角的迁移图像画布中的迁移位置、以及二维图像中每个像素点的深度值,确定贡献像素点对应待染色像素点的贡献权重。
在一些实施例中,步骤1032中基于二维图像中每个像素点在视角的迁移图像画布中的迁移位置、以及二维图像中每个像素点的深度值,确定贡献像素点对应待染色像素点的贡献权重,可以通过以下技术方案实现:当贡献像素点位于待染色像素点的连通区域中的右下区域或者正下区域时,对贡献像素点的迁移结果进行向上取整处理,获取与迁移结果和对应向上取整结果之间的绝对差值、以及贡献像素点的更新深度值成正相关的贡献权重;当贡献像素点位于待染色像素点的连通区域中的左上区域或者正上区域时,对贡献像素点的迁移结果进行向下取整处理,获取与迁移结果和对应向下取整结果之间的绝对差值、以及贡献像素点的更新深度值成正相关的贡献权重;当贡献像素点位于待染色像素点的连通区域中的右上区域或者正右区域时,对贡献像素点的水平迁移结果进行向上取整处理,对贡献像素点的垂直迁移结果进行向下取整处理,获取与迁移结果和对应取整结果之间的绝对差值、以及贡献像素点的更新深度值成正相关的贡献权重;当贡献像素点位于待染色像素点的连通区域中的左下区域或者正左区域时,对贡献像素点的迁移结果进行向下取整处理,对贡献像素点的垂直迁移结果进行向上取整处理,获取与迁移结果和对应取整结果之间的绝对差值、以及贡献像素点的更新深度值成正相关的贡献权重。
作为示例,参见图12,由于贡献像素点A和贡献像素点B针对待染色像素点的相对距离不同以及贡献像素点A和贡献像素点B的更新深度值不同,从而贡献像素点A和贡献像素点B针对待染色像素点贡献色彩分量的贡献权重不同,以贡献像素点B为例,该贡献像素点B对待染色像素点34的贡献权重可以为贡献像素点B到待染色像素点34的横纵坐标差值绝对值的乘积,例如,贡献像素点B对待染色像素点34的贡献权重可以为abs(i’-floor(i’))*abs(j’-floor(j’)),其中,floor(i’)操作以及floor(j’)操作为向上取整操作,在计算贡献像素点针右下的待染色像素点的过程中取整操作为向下取整操作,在上述计算得到的贡献权重的基础上还可以将贡献权重乘以exp(t*D(i,j))以更新贡献权重(t可以取值为10),此处的D(i,j)为贡献像素点的更新深度值,从而将贡献像素点在二维图像中的更新深度值针对待染色像素点的影响考虑进来,由于更新深度值后,视觉距离越近的像素点(近景)的D(i,j)越大,其贡献权重也会指数型增大,符合三维视觉效果中的运动规律。
作为示例,当贡献像素点位于待染色像素点的连通区域中的右下区域或者正下区域时,例如,贡献像素点B即位于待染色像素点的连通区域中的右下区域,贡献像素点C即位于待染色像素点的连通区域中的正下区域,对贡献像素点的迁移结果(水平的迁移位置以及垂直的迁移位置)进行向上取整处理,获取与迁移结果和对应向上取整结果之间的绝对差值、以及所述贡献像素点的更新深度值成正相关的贡献权重,正下区域指的是在连通区域内贡献像素点的迁移位置处于待染色像素点的正下方,右下区域指的是在连通区域内贡献像素点的迁移位置处于相对于待染色像素点的第四象限。
作为示例,当贡献像素点位于待染色像素点的连通区域中的左上区域或者正上区域时,例如,贡献像素点A即位于待染色像素点的连通区域中的右下区域,对贡献像素点的迁移结果(水平的迁移位置以及垂直的迁移位置)进行向下取整处理,获取与迁移结果和对应向下取整结果之间的绝对差、以及所述贡献像素点的更新深度值成正相关的贡献权重。正上区域指的是在连通区域内贡献像素点的迁移位置处于待染色像素点的正上方,左上区域指的是在连通区域内贡献像素点的迁移位置处于相对于待染色像素点的第二象限。
作为示例,当贡献像素点位于待染色像素点的连通区域中的右上区域或者正右区域时,对贡献像素点的水平的迁移位置进行向上取整处理,对贡献像素点的垂直的迁移位置进行向下取整处理,获取与迁移结果和对应取整结果之间的绝对差值、以及所述贡献像素点的更新深度值成正相关的贡献权重。正右区域指的是在连通区域内贡献像素点的迁移位置处于待染色像素点的正右方,右上区域指的是在连通区域内贡献像素点的迁移位置处于相对于待染色像素点的第一象限。
作为示例,当贡献像素点位于待染色像素点的连通区域中的左下区域或者正左区域时,对贡献像素点的迁移结果进行向下取整处理,对贡献像素点的垂直迁移结果进行向上取整处理,获取与迁移结果和对应取整结果之间的绝对差值、以及所述贡献像素点的更新深度值成正相关的贡献权重。正左区域指的是在连通区域内贡献像素点的迁移位置处于待染色像素点的正左方,左下区域指的是在连通区域内贡献像素点的迁移位置处于相对于待染色像素点的第三象限。
作为示例,以待染色像素点的中心作为原点,按照图12所示的方式建立垂直与水平的坐标轴,从而得到上述第一象限至第四象限。
在步骤1033中,基于每个贡献像素点的贡献权重对每个贡献像素点的色彩值进行加权处理,得到待染色像素点的色彩值。
作为示例,贡献像素点A、贡献像素点B以及贡献像素点C分别针对待染色像素点的贡献权重weightA、weightB以及weightC,将weightA与贡献像素点A在二维图像中的RGB色彩值相乘,将weightB与贡献像素点B在二维图像中的RGB色彩值相乘,将weightC与贡献像素点C在二维图像中的RGB色彩值相乘,将相乘结果相加之后再除以weightA、weightB以及weightC的和,从而得到待染色像素点的色彩值。
在步骤104中,基于每个视角的迁移图像中每个像素点的色彩值,生成对应视角的迁移图像。
在步骤105中,将多个视角的迁移图像按照顺序封装,得到三维化视频。
基于图3A,参见图3D,图3D是本申请实施例提供的二维图像的三维化方法的流程示意图,步骤105中将多个视角的迁移图像按照顺序封装,得到三维化视频可以通过图3D示出的步骤1051-1053进行说明。
在步骤1051中,基于二维图像中每个像素点的深度值,对每个视角的迁移图像进行空缺像素填补处理。
在一些实施例中,步骤1051中基于二维图像中每个像素点的深度值,对每个视角的迁移图像进行空缺像素填补处理,可以通过以下技术方案实现:针对每个视角的迁移图像中的每个待染色像素点,执行以下处理:当待染色像素点的连通区域内不存在对应待染色像素点的贡献像素点时,将待染色像素点的位置确定为空缺位置;针对迁移图像的每个空缺位置执行以下处理:以空缺位置为中心,基于二维图像中部分像素点的深度值在空缺位置的连通区域内查询待染色像素点的参考像素点;基于参考像素点的色彩值,对待染色像素点进行颜色值填补处理。
作为示例,参见图12,图12中待染色像素点00的连通区域内并不存在贡献像素点,从而得到的迁移图像中待染色像素点00实际上是空白的,将将待染色像素点的位置确定为空缺位置,从而需要针对待染色像素点00进行填补处理,相应的需要确定对应待染色像素点00的参考像素点,从而将参考像素点在迁移图像中的色彩值填补在待染色像素点00的空缺位置中。
在一些实施例中,上述以空缺位置为中心,基于二维图像中部分像素点的深度值在空缺位置的连通区域内查询待染色像素点的参考像素点,可以通过以下技术方案实现:确定以空缺位置为起点的多组查询方向;其中,每组查询方向所包括的第一方向与第二方向相反;针对每组查询方向执行以下处理:在空缺位置的连通区域内的第一方向上,确定与空缺位置最近的非空缺位置的像素点,并在空缺位置的连通区域内的第二方向上,确定与空缺位置最近的非空缺位置的像素点;确定在第一方向上确定的像素点与在第二方向上确定的像素点之间的像素距离;确定多组查询方向中最小像素距离所对应的两个像素点;基于二维图像中部分像素点的深度值,确定两个像素点的渲染深度值,并将渲染深度值较大的像素点确定为待染色像素点的参考像素点。
作为示例,参见图9,假设图9中最中间的具有点阵图像的像素点为需要进行填补处理的待染色像素点,其所在的位置即为空缺位置,以空缺位置为起点的多组查询方向可以按照固定角度获取不同组的查询方向,假设存在两组查询方向,第一组查询方向所包括的第一方向与第二方向相反,第二组查询方向所包括的第一方向与第二方向相反,第一组查询方向所在的直线与第二组查询方向所在的直线之间存在夹角,以图9所示的第一方向和第二方向为例,以空缺位置为起点,在空缺位置的连通区域内的第一方向上,确定与空缺位置最近的非空缺位置的像素点(具有条纹图案的像素点),并在空缺位置的连通区域内的第二方向上,确定与空缺位置最近的非空缺位置的像素点(具有条纹图案的像素点),此处所确定的像素点即为组成迁移图像的像素点,此处的连通区域最大可以为图像范围内,即将整个迁移图像作为连通区域,或者指定一个有限范围的区域作为连通区域,确定在第一方向上与第二方向上确定的非空缺位置的像素点之间的像素距离,从而得到的第一组查询方向的第一像素距离、以及第二组查询方向的第二像素距离,将第一像素距离以及第二像素距离中最小像素距离所对应的两个像素点(非空缺位置的像素点,也即具有条纹图案的像素点),基于二维图像中部分像素点的深度值,确定两个像素点的渲染深度值,并将渲染深度值较大的像素点确定为待染色像素点的参考像素点,从而可以将距离空缺位置相近的背景像素点(深度值大的像素点)对空缺位置进行填补。
在一些实施例中,每个视角的迁移结果包括:二维图像中每个像素点被迁移到视角的迁移图像画布中的迁移位置,其中,迁移图像画布的尺寸与二维图像的尺寸相同;部分像素点为目标像素点的贡献像素点;上述基于二维图像中部分像素点的深度值,确定两个像素点的渲染深度值,可以通过以下技术方案实现:将两个像素点中任意一个像素点作为目标像素点,并执行以下处理:确定目标像素点的贡献像素点,贡献像素点是二维图像中与迁移位置位于目标像素点的连通区域内的像素点;基于贡献像素点在视角的迁移图像画布中的迁移位置,确定贡献像素点针对目标像素点的贡献权重;基于贡献像素点的贡献权重,对贡献像素点的深度值进行加权处理,得到两个像素点的渲染深度值。
作为示例,最小像素距离所对应的迁移图像中的两个像素点的渲染深度值的获取方式与步骤103中获取待染色像素点的色彩值类似,区别仅在于进行加权求和的分量是贡献像素点的深度值,而非RGB色彩值,将两个像素点分别作为目标像素点求取渲染深度值,从而得到两个像素点的渲染深度值,即分别进行两次加权求和处理,进行颜色值填补处理的过程还可以利用经过深度学习的模型实现。
在步骤1052中,针对每个视角的迁移图像的空缺像素填补结果进行高斯模糊处理,得到高斯模糊图像。
作为示例,高斯模糊处理可以理解成将高斯模糊的目标像素的色彩值取为周边像素的平均值,将高斯模糊的目标像素为中心点,将周围点(紧密围绕中心点的点)的色彩值的平均值作为中心点的色彩值,在数值上是平滑化处理,在图形上就相当于产生模糊效果,作为目标像素的中间点失去细节,高斯模糊的目标像素为经过颜色值填补处理的像素点。
在步骤1053中,将每个视角的高斯模糊图像按照顺序封装,得到三维化视频。
作为示例,参见图11,图11是本申请实施例提供的二维图像的三维化方法的封装示意图,响应于三维化操作中所指向的渲染风格,确定出每个视角的高斯模糊图像的封装顺序,若是不进行高斯模糊处理,则确定出每个视角的经过颜色值填补处理的迁移图像的封装顺序,在对图像质量要求较低的场景中,还可以直接将每个视角的迁移图像直接进行封装,即确定出每个视角的迁移图像的封装顺序,即封装顺序实质上与视角对应,当三维化操作中所指向的渲染风格所对应的视角为第一视角、第二视角、第三视角以及第四视角时,将第一视角、第二视角、第三视角以及第四视角的新图像按照渲染风格所指向的对应视角的封装顺序进行封装,从而得到具有对应渲染风格的三维化视频。
参见图3E,图3E是本申请实施例提供的二维图像的三维化方法的流程示意图,将结合图3E示出的步骤201-209进行说明。在步骤201中,在人机交互界面显示二维图像。在步骤202中,响应于针对二维图像的三维化操作,将二维图像发送至服务器。在步骤203中,服务器对二维图像进行深度感知处理,得到二维图像中每个像素点的深度值。在步骤204中,服务器对二维图像中每个像素点分别进行多个视角的迁移处理,得到对应每个视角的迁移结果。在步骤205中,服务器基于二维图像中每个像素点的深度值以及每个视角的迁移结果,确定对应每个视角的迁移图像中每个像素点的色彩值。在步骤206中,服务器基于每个视角的迁移图像中每个像素点的色彩值,生成对应视角的迁移图像。在步骤207中,服务器将多个视角的迁移图像按照顺序封装,得到三维化视频。在步骤208中,服务器将三维化视频发送至终端。在步骤209中,终端播放基于二维图像生成的三维化视频。
下面,将说明本申请实施例提供的二维图像的三维化方法在一个实际的应用场景中的示例性应用。
电子相册客户端接收到用户针对照片的选择操作,显示选择操作的目标相片作为待处理图像,响应于针对待处理图像的三维化操作,电子相册客户端调用本申请实施例提供的二维图像的三维化方法生成待处理图像的三维化视频的预览,响应于用户针对三维化视频的调整操作,电子相册客户端按照调整操作中指定的调整方式调整多个视角的图像的封装顺序,例如,封装顺序调整为视觉效果由近景拉至远景,或者由远景拉至近景。
本申请实施例提供的二维图像的三维化方法通过深度学习模型预测出场景深度信息后,通过统一电子设备架构算法,使用图像处理器在二维图像层面上进行变换处理,在变换处理之后通过填补处理以及模糊处理可以快速获取新视角下的图像,以实现基于深度的多个三维视角合成。在二维图像的基础上进行三维变换的处理过程包括:预测深度图的处理,三维场景重新渲染,空缺填补和模糊,整个过程可以在显卡的图像处理器上并行处理,速度较快,并且在避免了相关技术中三维场景建模的前提下获取效果优异的三维化视频效果,可以满足终端通过二维图像获取三维化视频的要求,并且由于重渲染方式和填补方式都基于最初的深度预计结果,使得渲染和填补符合三维场景规律。
终端将待处理图像上传至后台,以使后台调用本申请实施例提供的二维图像的三维化方法对待处理图像进行三维化处理得到三维化视频,进而回传到终端进行展示播放。本申请实施例提供的二维图像的三维化方法通过对图像进行深度估计处理,深度边缘修复处理,基于深度确定多视角图像,进行空缺填补和模糊等处理,基于一张输入的二维图像生成同一场景多视角下的二维图像,从而组合而成三维结构化的视频。
参见图4,图4是本申请实施例提供的二维图像的三维化方法的流程示意图,处理流程中包括深度感知过程、深度边缘修复过程、多视角重新渲染过程、空缺填补与模糊过程、以及三维化视频结果生成过程,在深度感知过程中对输入图像进行深度感知从而获取预测的深度图,对得到的深度估计结果进行深度边缘修复处理,基于深度估计修复结果进行三维场景重新渲染,并基于多视角重新渲染结果进行空缺填补和模糊处理,进而基于空缺填补和模糊处理的结果生成三维化视频结果,上述整个过程可以在图像处理器以及显卡上并行处理,处理速度较快,通过上述过程在不进行三维建模处理也能获取效果优异的三维化视频效果,从而可以满足终端基于二维图像获取三维化视频的要求,由于重渲染的方式和填补方式都基于最初的深度估计结果,使得渲染处理和填补处理符合场景规律。
在深度感知过程中,通过深度学习模型F对输入图像I进行深度估计处理,得到与输入图像I分辨率一致的深度估计结果D,即D=F(I),深度估计结果(深度图D)的每个像素值表示输入图像中像素的深度,对于获得的深度图而言,深度越大表征该像素的距离越远,参见图5,图5是本申请实施例提供的二维图像的三维化方法的深度图,深度感知过程中获得的深度估计结果在边缘处会产生不均匀的连续跳动,图5中每个格子代表一个像素点,背景对象是黑色格子,前景对象是白色格子,理论上不存在介于中间的灰色格子,才符合同一物体同一深度的原则,然而图5中存在不连续跳动现象,即黑格子与灰格子紧邻,灰格子与白格子紧邻的阶跃跳动,导致本来属于同一对象深度估计结果出现不同。
因此,在深度边缘修复过程中采取快速中值替换的方式对边缘深度进行改善,首先将深度感知过程得到的深度图进行正则化处理,使得深度图的取值范围缩小至0至1的区间中,即Norm(D)=(D-D.min)/(D.max–D.min),D.max指的是深度图所有像素点中的最大深度值,D min指的是深度图所有像素点中的最小深度值,对深度图的每个像素,计算深度图中每个像素点与上、下、左、右相邻4个像素点的深度值的差值绝对值,用以判断该像素点是否在边缘区域,参见图6,图6是本申请实施例提供的二维图像的三维化方法的边缘标记图,若所有差值绝对值(上述四个绝对值)均小于差值阈值,例如,差值阈值被设置为0.04,则将该像素点标记为1,判别为边缘像素点,例如,假设该点正则化后的深度为D(i,j),i表示该点的水平位置,j表示该点的垂直位置,如果max(abs(D(i,j)-D(i+1,j)),abs(D(i,j)-D(i-1,j)),abs(D(i,j)-D(i,j+1)),abs(D(i,j)-D(i,j-1)))<差值阈值,则将该像素点标记为1,判别为边缘像素点,否则将该像素点标记为0,判别为非边缘像素点。
在边缘标记图中以每个像素为中心取一个k*k大小的正方形,用于对该中心像素的深度值进行中值替换,如果正方形中存在取值为1的点,则需要对该中心点进行中值替换处理,否则不需要进行中值替换处理,处理的具体方式是从存在取值为1的点的深度图(k*k的深度图)中获取所有的非边缘像素点(即边缘标记图中取值为0的点)的深度值,并从对这些 深度值进行小到大排列,利用排列后的中值对该中心像素的深度值进行替换,对深度图中所有的像素进行处理后,即完成深度图的中值替换边缘处理,对于获得的深度图而言,深度越大,则表征该像素的距离越远。
在进行多视角渲染的过程中,基于进行过深度边缘修复的深度图,对原有色彩图像进行多视角的重新渲染,先对经过深度边缘修复处理的深度图再次进行更新处理,获取新的深度图D=(1/Norm(D))–0.5,其中,Norm(D)为上述正则化处理,Norm(D)=(D-Dmin)/(D.max–D.min),该操作使得所有深度取值介于-0.5和0.5之间,新深度值越大表征视觉距离越小,即像素的距离越近。
假设原色彩图A的尺寸为h*w,在垂直方向和水平方向分别设置移动参数v和u,参见图7,图7是本申请实施例提供的二维图像的三维化方法的移动参数获取示意图,对于(v,u)的组合,基于半径为1的圆上,每隔固定角度获取一个点,其纵坐标和横坐标即为(v,u)的取值,整个圆上获取的多组(v,u)即可用于渲染多个不同视角的新图像,基于经过边缘修复处理的深度图D,原色彩图A的每个像素A(i,j),将会移动到目标位置A(i+u*scale*D(i,j)*w,j+v*scale*D(i,j)*h),其中,scale为一个常量,设置越大则移动幅度越大,深度图的深度值取值范围在-0.5与0.5之间,满足了靠前和靠后的物体向不同的方向移动,且越近或者越远的物体的移动距离远大,这样的移动模式满足三维显示规律,鉴于目标位置i’=i+u*scale*D(i,j)*w;j’=j+v*scale*D(i,j)*h并不为整数,采用基于喷溅法(Splatting)的方式,向周边的点贡献色彩分量,参见图8,图8是本申请实施例提供的二维图像的三维化方法的贡献示意图,在边上的投影点例如指的是(0,0)和(0,1)所构成的边上的投影点,该投影点会影响(0,0)位置和(0,1)位置的颜色值,图8中的投影点(i’,j’)介于四个像素位置之间,该投影点对四个像素的贡献权重为到对应四个像素点的横纵坐标差值绝对值的乘积,此处左上(LeftUp)和右下(RightDown)互为对应点,右上(RightUp)和左下(LeftDown)互为对应点,例如,投影点对左上的贡献权重为w=abs(i’-floor(i’))*abs(j’-floor(j’)),其中,floor操作为向上取整操作,在针对右下的贡献权重的计算中取整操作为向下取整操作,由于原色彩图A的多个像素点移动后有可能向新视角画布中同一位置的点贡献色彩分量,基于深度的影响,针对新视角画布中某一位置的待染色像素点,靠近背景的像素点对该待染色像素点产生的贡献相较于其他像素点对该待染色像素点产生的贡献要少得多,将原色彩图A中的像素点(i,j)对重渲染图像A’中对应位置(i’,j’)的附近四个像素点的贡献权重w乘以exp(t*D(i,j),t可以取值为10,由于更新后越近的像素点(近景)的D(i,j)越大,其贡献值也会指数型增大,在色彩渲染过程中,对原色彩图A中所有的像素基于(v,u)进行移动处理,并且向新的渲染图A’对应点按权重向待染色像素点贡献对应GRB色彩分量,将色彩分量和权重进行相乘后分别进行累加,最后将累积色彩分量除以累积权重分量得到待染色像素点的色彩值,以避免色彩分量贡献过大,在色彩渲染的同时进行深度渲染,对经过边缘修复处理后的深度图进行与上述过程类似的深度渲染,获取多视角渲染的新深度图,多视角渲染的新深度图仍然满足距离越远(远景),深度值越大的规律。
在空缺填补与模糊处理过程中,由于遮挡的原因导致色彩渲染得到的多视角图像中某些位置周围并不存在原来的图像像素点对其贡献色彩分量,从而将这个位置视为需要填补的空缺位置,针对对这些空缺位置的填补,执行以下处理:参见图9,图9是本申请实施例提供的二维图像的三维化方法的查询示意图,以该空缺位置为起点,获取多个不同的前进方向(第一方向),这些前进方向和基于该点的反方向(第二方向)一起组成多组查询对象,针对每组查询对象,向两个方向端寻找最邻近的不是空缺的像素,超出图像范围未搜索到则舍弃该查询对象,对于所有两端都搜寻到目标像素的查询对象,取两头像素距离最短的那组查询对象,并且基于深度渲染过程中得到的新视角渲染后的深度值,取对应位置(第一方向的像素点或者第二方向的像素点中)深度值大的像素(相反两个方向中深度值大的像素),利用该像素的RGB色彩值对该空缺位置进行填补,填补处理所基于的原理是使用离目标像素(空 缺位置)相近的背景像素点对其进行填补,对于所有填补后的像素点(空缺位置的像素点,经过填补处理后不再空缺),针对该点使用周围像素点进行高斯模糊处理,以避免填补缺失后的像素点与周围像素点的重复性过高,导致视觉效果不佳,参见图10,图10是本申请实施例提供的二维图像的三维化方法的效果示意图,左边的图中边界太过明显,在经过高斯模糊则后得到的右图中填补缺失后的像素点与周围像素点的视觉效果不那么锐利,以上步骤都可以在图像处理器上进行操作,速度较快且延时性低,将所有渲染得到的新视角的图像组合成视频,即可得到三维化的视频结果。
根据本申请实施例提供的二维图像的三维化方法,能够基于深度估计结果在二维图像层面上合理重建不同视角下的场景,相较于其他三维建模方法降低处理时间,便于移动端快速获取生成的视频结果并进行展示。
下面继续说明本申请实施例提供的二维图像的三维化装置455实施为软件模块的示例性结构,在一些实施例中,如图2所示,存储在存储器450的二维图像的三维化装置455中的软件模块可以包括:深度模块4551,配置为对二维图像进行深度感知处理,得到二维图像中每个像素点的深度值;迁移模块4552,配置为对二维图像进行多个视角的迁移处理,得到二维图像对应每个视角的迁移结果;色彩确定模块4553,配置为基于二维图像中每个像素点的深度值以及二维图像对应每个视角的迁移结果,确定对应每个视角的迁移图像中每个像素点的色彩值;生成模块4554,配置为基于每个视角的迁移图像中每个像素点的色彩值,生成对应视角的迁移图像;封装模块4555,配置为将多个视角的迁移图像按照顺序封装,得到三维化视频。
在一些实施例中,迁移模块4552,配置为在基于二维图像中每个像素点的深度值以及每个视角的迁移结果,确定对应每个视角的迁移图像中每个像素点的色彩值之前:将通过深度感知处理得到的二维图像中每个像素点的深度值作为原始深度值,对二维图像中每个像素点的原始深度值进行深度修复处理,得到二维图像中每个像素点的修复深度值,基于每个像素点的修复深度值替换对应的原始深度值。
在一些实施例中,迁移模块4552,配置为:基于二维图像中每个像素点的原始深度值,确定二维图像中的边缘像素点以及二维图像中的非边缘像素点;基于边缘像素点确定二维图像中需要进行中值替换的待替换像素点、以及不需要进行中值替换的保留像素点;将待替换像素点的连通区域内所有非边缘像素点的原始深度值进行降序排序处理,并将降序排序结果的中值作为待替换像素点的修复深度值;保留保留像素点的原始深度值作为保留像素点的修复深度值。
在一些实施例中,迁移模块4552,配置为:针对二维图像中任意一个像素点执行以下处理:当像素点的原始深度值的正则化处理结果与至少一个相邻像素点的原始深度值的正则化处理结果之间的绝对差值不小于差值阈值时,将像素点确定为非边缘像素点;其中,相邻像素点为位于任意一个像素点的相邻位置的像素点;当像素点的原始深度值的正则化处理结果与每个相邻像素点的原始深度值的正则化处理结果之间的绝对差值均小于差值阈值时,将像素点确定为边缘像素点。
在一些实施例中,迁移模块4552,配置为:针对二维图像中任意一个像素点执行以下处理:当像素点的连通区域内中至少存在一个边缘像素点时,确定像素点为待替换像素点;当像素点的连通区域内中不存在边缘像素点时,确定像素点为保留像素点。
在一些实施例中,迁移模块4552,配置为:对二维图像中每个像素点的深度值进行更新,得到二维图像中每个像素点的更新深度值;其中,更新深度值与对应像素点的修复深度值成负相关关系;确定分别与多个视角一一对应的多个移动参数,其中,移动参数包括水平移动参数与垂直移动参数;针对每个视角执行以下处理:确定与设定移动灵敏参数、更新深度值、水平移动参数、以及二维图像的宽度正相关的水平移动矢量;确定与设定移动灵敏参数、更新深度值、垂直移动参数、以及二维图像的高度正相关的垂直移动矢量;在与视角的迁移图 像画布中与二维图像中每个像素点对应的原始位置,按照水平移动矢量以及垂直移动矢量开始进行位移,确定每个像素点在迁移图像画布中的迁移位置。
在一些实施例中,每个视角的迁移结果包括:二维图像中每个像素点被迁移到视角的迁移图像画布中的迁移位置,其中,迁移图像画布的尺寸与二维图像的尺寸相同;色彩确定模块4553,配置为:将每个视角的迁移图像中每个像素点作为待染色像素点,并针对每个视角的迁移图像画布中每个待染色像素点执行以下处理:确定待染色像素点的贡献像素点,其中,贡献像素点是二维图像中迁移位置位于待染色像素点的连通区域内的像素点;基于二维图像中每个像素点在视角的迁移图像画布中的迁移位置、以及二维图像中每个像素点的深度值,确定贡献像素点对应待染色像素点的贡献权重;基于每个贡献像素点的贡献权重对每个贡献像素点的色彩值进行加权处理,得到待染色像素点的色彩值。
在一些实施例中,封装模块4555,配置为:基于二维图像中每个像素点的深度值,对每个视角的迁移图像进行空缺像素填补处理;针对每个视角的迁移图像的空缺像素填补结果进行高斯模糊处理,得到高斯模糊图像;将每个视角的高斯模糊图像按照顺序封装,得到三维化视频。
在一些实施例中,封装模块4555,配置为:针对每个视角的迁移图像中的每个待染色像素点,执行以下处理:当待染色像素点的连通区域内不存在对应待染色像素点的贡献像素点时,将待染色像素点的位置确定为空缺位置;针对迁移图像的每个空缺位置执行以下处理:以空缺位置为中心,基于二维图像中部分像素点的深度值在空缺位置的连通区域内查询待染色像素点的参考像素点;基于参考像素点的色彩值,对待染色像素点进行颜色值填补处理。
在一些实施例中,封装模块4555,配置为:确定以空缺位置为起点的多组查询方向;其中,每组查询方向所包括的第一方向与第二方向相反;针对每组查询方向执行以下处理:在空缺位置的连通区域内的第一方向上,确定与空缺位置最近的非空缺位置的像素点,并在空缺位置的连通区域内的第二方向上,确定与空缺位置最近的非空缺位置的像素点;确定在第一方向上确定的像素点与在第二方向上确定的像素点之间的像素距离;确定多组查询方向中最小像素距离所对应的两个像素点;基于二维图像中部分像素点的深度值,确定两个像素点的渲染深度值,并将渲染深度值较大的像素点确定为待染色像素点的参考像素点。
在一些实施例中,每个视角的迁移结果包括:二维图像中每个像素点被迁移到视角的迁移图像画布中的迁移位置,其中,迁移图像画布的尺寸与二维图像的尺寸相同;部分像素点为目标像素点的贡献像素点;封装模块4555,配置为:将两个像素点中任意一个像素点作为目标像素点,并执行以下处理:确定目标像素点的贡献像素点,贡献像素点是二维图像中与迁移位置位于目标像素点的连通区域内的像素点;基于贡献像素点在视角的迁移图像画布中的迁移位置,确定贡献像素点针对目标像素点的贡献权重;基于贡献像素点的贡献权重,对贡献像素点的深度值进行加权处理,得到两个像素点的渲染深度值。
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。电子设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该电子设备执行本申请实施例上述的二维图像的三维化方法。
本申请实施例提供一种存储有可执行指令的计算机可读存储介质,其中存储有可执行指令,当可执行指令被处理器执行时,将引起处理器执行本申请实施例提供的方法,例如,如图3A-3E示出的二维图像的三维化方法。
在一些实施例中,计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEP ROM、闪存、磁表面存储器、光盘、或CD-ROM等存储器;也可以是包括上述存储器之一或任意组合的各种设备。
在一些实施例中,可执行指令可以采用程序、软件、软件模块、脚本或代码的形式,按任意形式的编程语言(包括编译或解释语言,或者声明性或过程性语言)来编写,并且其 可按任意形式部署,包括被部署为独立的程序或者被部署为模块、组件、子例程或者适合在计算环境中使用的其它单元。
作为示例,可执行指令可以但不一定对应于文件系统中的文件,可以可被存储在保存其它程序或数据的文件的一部分,例如,存储在超文本标记语言(HTML,Hyper Text Mar kup Language)文档中的一个或多个脚本中,存储在专用于所讨论的程序的单个文件中,或者,存储在多个协同文件(例如,存储一个或多个模块、子程序或代码部分的文件)中。
作为示例,可执行指令可被部署为在一个电子设备上执行,或者在位于一个地点的多个电子设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个电子设备上执行。
综上,通过本申请实施例对二维图像进行多视角迁移以及对应视角迁移图像生成处理,实现了二维图像层面的视角变换过程,从而在二维图像处理层面上实现了图像三维化过程,以替代三维场景建模过程,在准确进行图像三维化生成三维化视频的同时降低了后台计算资源成本以及耗时成本。
以上,仅为本申请的实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和范围之内所作的任何修改、等同替换和改进等,均包含在本申请的保护范围之内。

Claims (15)

  1. 一种二维图像的三维化方法,所述方法由电子设备执行,所述方法包括:
    对二维图像进行深度感知处理,得到所述二维图像中每个像素点的深度值;
    对所述二维图像进行多个视角的迁移处理,得到所述二维图像对应每个视角的迁移结果;
    基于所述二维图像中每个像素点的深度值以及所述二维图像对应所述每个视角的迁移结果,确定对应所述每个视角的迁移图像中每个像素点的色彩值;
    基于所述每个视角的迁移图像中每个像素点的色彩值,生成对应所述视角的迁移图像;
    将所述多个视角的迁移图像按照顺序封装,得到三维化视频。
  2. 根据权利要求1所述的方法,其中,所述基于所述二维图像中每个像素点的深度值以及所述二维图像对应所述每个视角的迁移结果,确定对应所述每个视角的迁移图像中每个像素点的色彩值之前,所述方法还包括:
    将通过所述深度感知处理得到的所述二维图像中每个像素点的深度值作为原始深度值;
    对所述二维图像中每个像素点的原始深度值进行深度修复处理,得到所述二维图像中每个像素点的修复深度值;
    基于每个像素点的修复深度值替换对应的原始深度值。
  3. 根据权利要求2所述的方法,其中,所述对所述二维图像中每个像素点的原始深度值进行深度修复处理,得到所述二维图像中每个像素点的修复深度值,包括:
    基于所述二维图像中每个像素点的原始深度值,确定所述二维图像中的边缘像素点以及所述二维图像中的非边缘像素点;
    基于所述边缘像素点确定所述二维图像中需要进行中值替换的待替换像素点、以及不需要进行中值替换的保留像素点;
    将所述待替换像素点的连通区域内所有非边缘像素点的原始深度值进行降序排序处理,并将降序排序结果的中值作为所述待替换像素点的修复深度值;
    保留所述保留像素点的原始深度值作为所述保留像素点的修复深度值。
  4. 根据权利要求3所述的方法,其中,所述基于所述二维图像中每个像素点的原始深度值,确定所述二维图像中的边缘像素点以及所述二维图像中的非边缘像素点,包括:
    针对所述二维图像中任意一个像素点执行以下处理:
    当所述像素点的原始深度值的正则化处理结果与至少一个相邻像素点的原始深度值的正则化处理结果之间的绝对差值不小于差值阈值时,将所述像素点确定为所述非边缘像素点;
    当所述像素点的原始深度值的正则化处理结果与每个相邻像素点的原始深度值的正则化处理结果之间的绝对差值均小于所述差值阈值时,将所述像素点确定为所述边缘像素点;
    其中,所述相邻像素点为位于所述任意一个像素点的相邻位置的像素点。
  5. 根据权利要求3所述的方法,其中,所述基于所述边缘像素点确定所述二维图像中需要进行中值替换的待替换像素点、以及不需要进行中值替换的保留像素点,包括:
    针对所述二维图像中任意一个像素点执行以下处理:
    当所述像素点的连通区域内中至少存在一个边缘像素点时,确定所述像素点为所述待替换像素点;
    当所述像素点的连通区域内中不存在边缘像素点时,确定所述像素点为所述保留像素点。
  6. 根据权利要求1所述的方法,其中,所述对所述二维图像进行多个视角的迁移处理,得到所述二维图像对应每个视角的迁移结果,包括:
    对所述二维图像中每个像素点的深度值进行更新,得到所述二维图像中每个像素点的更新深度值;
    其中,所述更新深度值与对应像素点的修复深度值成负相关关系;
    确定分别与所述多个视角一一对应的多个移动参数,其中,所述移动参数包括水平移动参数与垂直移动参数;
    针对所述每个视角执行以下处理:
    确定与设定移动灵敏参数、所述更新深度值、所述水平移动参数、以及所述二维图像的宽度正相关的水平移动矢量;
    确定与所述设定移动灵敏参数、所述更新深度值、所述垂直移动参数、以及所述二维图像的高度正相关的垂直移动矢量;
    获取所述二维图像中每个像素点在所述视角的迁移图像画布中对应的原始位置,以所述原始位置为起点按照所述水平移动矢量以及所述垂直移动矢量进行位移处理,得到所述二维图像中每个像素点在所述迁移图像画布中的迁移位置。
  7. 根据权利要求1所述的方法,其中,
    所述每个视角的迁移结果包括:所述二维图像中每个像素点被迁移到所述视角的迁移图像画布中的迁移位置,其中,所述迁移图像画布的尺寸与所述二维图像的尺寸相同;
    所述基于所述二维图像中每个像素点的深度值以及所述二维图像对应所述每个视角的迁移结果,确定对应所述每个视角的迁移图像中每个像素点的色彩值,包括:
    将所述每个视角的迁移图像中每个像素点作为待染色像素点,并针对所述每个视角的迁移图像画布中每个待染色像素点执行以下处理:
    确定所述待染色像素点的贡献像素点,其中,所述贡献像素点是所述二维图像中迁移位置位于所述待染色像素点的连通区域内的像素点;
    基于所述二维图像中每个像素点在所述视角的迁移图像画布中的迁移位置、以及所述二维图像中每个像素点的深度值,确定所述贡献像素点对应所述待染色像素点的贡献权重;
    基于每个所述贡献像素点的贡献权重对每个所述贡献像素点的色彩值进行加权处理,得到所述待染色像素点的色彩值。
  8. 根据权利要求1所述的方法,其中,所述将所述多个视角的迁移图像按照顺序封装,得到三维化视频,包括:
    基于所述二维图像中每个像素点的深度值,对所述每个视角的迁移图像进行空缺像素填补处理;
    针对所述每个视角的迁移图像的空缺像素填补结果进行高斯模糊处理,得到高斯模糊图像;
    将所述每个视角的高斯模糊图像按照顺序封装,得到三维化视频。
  9. 根据权利要求8所述的方法,其中,所述基于所述二维图像中每个像素点的深度值,对所述每个视角的迁移图像进行空缺像素填补处理,包括:
    针对所述每个视角的迁移图像中的每个待染色像素点,执行以下处理:
    当所述待染色像素点的连通区域内不存在对应所述待染色像素点的贡献像素点时,将所述待染色像素点的位置确定为空缺位置;
    针对所述迁移图像的每个空缺位置执行以下处理:
    以所述空缺位置为中心,基于所述二维图像中部分像素点的深度值在所述空缺位置的连通区域内查询所述待染色像素点的参考像素点;
    基于所述参考像素点的色彩值,对所述待染色像素点进行颜色值填补处理。
  10. 根据权利要求9所述的方法,其中,所述以所述空缺位置为中心,基于所述二维图像中部分像素点的深度值在所述空缺位置的连通区域内查询所述待染色像素点的参考像素点,包括:
    确定以所述空缺位置为起点的多组查询方向;
    其中,每组查询方向所包括的第一方向与第二方向相反;
    针对每组查询方向执行以下处理:
    在所述空缺位置的连通区域内的第一方向上,确定与所述空缺位置最近的非空缺位置的像素点,并在所述空缺位置的连通区域内的第二方向上,确定与所述空缺位置最近的非空缺位置的像素点;
    确定在所述第一方向上确定的像素点与在所述第二方向上确定的像素点之间的像素距离;
    确定多组查询方向中最小像素距离所对应的两个像素点;
    基于所述二维图像中部分像素点的深度值,确定所述两个像素点的渲染深度值,并将渲染深度值较大的像素点确定为所述待染色像素点的参考像素点。
  11. 根据权利要求10所述的方法,其中,
    所述每个视角的迁移结果包括:所述二维图像中每个像素点被迁移到所述视角的迁移图像画布中的迁移位置,其中,所述迁移图像画布的尺寸与所述二维图像的尺寸相同;
    所述部分像素点为目标像素点的贡献像素点;
    所述基于所述二维图像中部分像素点的深度值,确定所述两个像素点的渲染深度值,包括:
    将所述两个像素点中任意一个像素点作为目标像素点,并执行以下处理:
    确定所述目标像素点的贡献像素点,所述贡献像素点是所述二维图像中所述迁移位置位于所述目标像素点的连通区域内的像素点;
    基于所述贡献像素点在所述视角的迁移图像画布中的迁移位置,确定所述贡献像素点针对所述目标像素点的贡献权重;
    基于所述贡献像素点的贡献权重,对所述贡献像素点的深度值进行加权处理,得到所述两个像素点的渲染深度值。
  12. 一种二维图像的三维化方法,所述方法由电子设备执行,所述方法包括:
    在人机交互界面显示二维图像;
    响应于针对所述二维图像的三维化操作,播放基于所述二维图像生成的三维化视频;
    其中,所述视频是通过执行权利要求1至11任一项所述的二维图像的三维化方法得到的。
  13. 一种二维图像的三维化装置,包括:
    深度模块,配置为对二维图像进行深度感知处理,得到所述二维图像中每个像素点的深度值;
    迁移模块,配置为对所述二维图像进行多个视角的迁移处理,得到所述二维图像对应每个视角的迁移结果;
    色彩确定模块,配置为基于所述二维图像中每个像素点的深度值以及所述二维图像对应所述每个视角的迁移结果,确定对应所述每个视角的迁移图像中每个像素点的色彩值;
    生成模块,配置为基于所述每个视角的迁移图像中每个像素点的色彩值,生成对应所述视角的迁移图像;
    封装模块,配置为将多个所述视角的迁移图像按照顺序封装,得到三维化视频。
  14. 一种电子设备,所述电子设备包括:
    存储器,用于存储可执行指令;
    处理器,用于执行所述存储器中存储的可执行指令时,实现权利要求1至11任一项或者权利要求12所述的二维图像的三维化方法。
  15. 一种计算机可读存储介质,存储有可执行指令,用于被处理器执行时,实现权利要求1至11任一项或者权利要求12所述的二维图像的三维化处理方法。
PCT/CN2021/104972 2020-08-24 2021-07-07 二维图像的三维化方法、装置、设备及计算机可读存储介质 WO2022042062A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP21859902.5A EP4099692A4 (en) 2020-08-24 2021-07-07 THREE-DIMENSIONAL PROCESSING METHOD AND DEVICE FOR A TWO-DIMENSIONAL IMAGE, DEVICE AND COMPUTER-READABLE STORAGE MEDIUM
JP2022559723A JP7432005B2 (ja) 2020-08-24 2021-07-07 二次元画像の三次元化方法、装置、機器及びコンピュータプログラム
US18/077,549 US20230113902A1 (en) 2020-08-24 2022-12-08 Three-dimensionalization method and apparatus for two-dimensional image, device and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010856161.6 2020-08-24
CN202010856161.6A CN111970503B (zh) 2020-08-24 2020-08-24 二维图像的三维化方法、装置、设备及计算机可读存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/077,549 Continuation US20230113902A1 (en) 2020-08-24 2022-12-08 Three-dimensionalization method and apparatus for two-dimensional image, device and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2022042062A1 true WO2022042062A1 (zh) 2022-03-03

Family

ID=73390078

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/104972 WO2022042062A1 (zh) 2020-08-24 2021-07-07 二维图像的三维化方法、装置、设备及计算机可读存储介质

Country Status (5)

Country Link
US (1) US20230113902A1 (zh)
EP (1) EP4099692A4 (zh)
JP (1) JP7432005B2 (zh)
CN (1) CN111970503B (zh)
WO (1) WO2022042062A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111970503B (zh) * 2020-08-24 2023-08-22 腾讯科技(深圳)有限公司 二维图像的三维化方法、装置、设备及计算机可读存储介质
CN113891057A (zh) * 2021-11-18 2022-01-04 北京字节跳动网络技术有限公司 视频的处理方法、装置、电子设备和存储介质
CN115205456A (zh) * 2022-07-01 2022-10-18 维沃移动通信有限公司 三维模型构建方法、装置、电子设备及存储介质
CN115861572B (zh) * 2023-02-24 2023-05-23 腾讯科技(深圳)有限公司 一种三维建模方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102724529A (zh) * 2012-05-28 2012-10-10 清华大学 虚拟视点视频序列的生成方法及生成装置
EP3057066A1 (en) * 2015-02-10 2016-08-17 DreamWorks Animation LLC Generation of three-dimensional imagery from a two-dimensional image using a depth map
CN108900825A (zh) * 2018-08-16 2018-11-27 电子科技大学 一种2d图像到3d图像的转换方法
CN110390712A (zh) * 2019-06-12 2019-10-29 阿里巴巴集团控股有限公司 图像渲染方法及装置、三维图像构建方法及装置
CN111193919A (zh) * 2018-11-15 2020-05-22 中兴通讯股份有限公司 一种3d显示方法、装置、设备及计算机可读介质
CN111970503A (zh) * 2020-08-24 2020-11-20 腾讯科技(深圳)有限公司 二维图像的三维化方法、装置、设备及计算机可读存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7573475B2 (en) * 2006-06-01 2009-08-11 Industrial Light & Magic 2D to 3D image conversion
US8947422B2 (en) * 2009-09-30 2015-02-03 Disney Enterprises, Inc. Gradient modeling toolkit for sculpting stereoscopic depth models for converting 2-D images into stereoscopic 3-D images
JP2011139281A (ja) * 2009-12-28 2011-07-14 Sony Corp 三次元画像生成装置、三次元画像表示装置、三次元画像生成方法およびプログラム
US20110304618A1 (en) * 2010-06-14 2011-12-15 Qualcomm Incorporated Calculating disparity for three-dimensional images
US9210405B2 (en) * 2012-03-22 2015-12-08 Qualcomm Technologies, Inc. System and method for real time 2D to 3D conversion of video in a digital camera
CN102790896A (zh) * 2012-07-19 2012-11-21 彩虹集团公司 一种2d转3d的转换方法
US9736449B1 (en) * 2013-08-12 2017-08-15 Google Inc. Conversion of 2D image to 3D video
CN105513112B (zh) * 2014-10-16 2018-11-16 北京畅游天下网络技术有限公司 图像处理方法和装置
CN106341676B (zh) * 2016-09-29 2017-06-16 济南大学 基于超像素的深度图像预处理和深度空洞填充方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102724529A (zh) * 2012-05-28 2012-10-10 清华大学 虚拟视点视频序列的生成方法及生成装置
EP3057066A1 (en) * 2015-02-10 2016-08-17 DreamWorks Animation LLC Generation of three-dimensional imagery from a two-dimensional image using a depth map
CN108900825A (zh) * 2018-08-16 2018-11-27 电子科技大学 一种2d图像到3d图像的转换方法
CN111193919A (zh) * 2018-11-15 2020-05-22 中兴通讯股份有限公司 一种3d显示方法、装置、设备及计算机可读介质
CN110390712A (zh) * 2019-06-12 2019-10-29 阿里巴巴集团控股有限公司 图像渲染方法及装置、三维图像构建方法及装置
CN111970503A (zh) * 2020-08-24 2020-11-20 腾讯科技(深圳)有限公司 二维图像的三维化方法、装置、设备及计算机可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4099692A4 *

Also Published As

Publication number Publication date
US20230113902A1 (en) 2023-04-13
CN111970503A (zh) 2020-11-20
EP4099692A1 (en) 2022-12-07
JP2023519728A (ja) 2023-05-12
CN111970503B (zh) 2023-08-22
EP4099692A4 (en) 2023-08-30
JP7432005B2 (ja) 2024-02-15

Similar Documents

Publication Publication Date Title
WO2022042062A1 (zh) 二维图像的三维化方法、装置、设备及计算机可读存储介质
CN109771951B (zh) 游戏地图生成的方法、装置、存储介质和电子设备
US9299152B2 (en) Systems and methods for image depth map generation
CN107274476B (zh) 一种阴影图的生成方法及装置
WO2021030002A1 (en) Depth-aware photo editing
US9396530B2 (en) Low memory content aware image modification
JP2019525515A (ja) マルチビューシーンのセグメンテーションおよび伝播
EP3882862A1 (en) Picture rendering method and apparatus, and storage medium and electronic apparatus
US20110273466A1 (en) View-dependent rendering system with intuitive mixed reality
US11763479B2 (en) Automatic measurements based on object classification
KR20210137235A (ko) 임의의 뷰 생성
US9754398B1 (en) Animation curve reduction for mobile application user interface objects
Zha et al. A real-time global stereo-matching on FPGA
US20210407125A1 (en) Object recognition neural network for amodal center prediction
Griffiths et al. OutCast: Outdoor Single‐image Relighting with Cast Shadows
BR102020027013A2 (pt) Método para gerar uma imagem multiplano adaptativa a partir de uma única imagem de alta resolução
CN117372602B (zh) 一种异构三维多对象融合渲染方法、设备及系统
US20170148177A1 (en) Image processing apparatus, image processing method, and program
CN116980579A (zh) 一种基于图像虚化的图像立体成像方法及相关装置
JP7387029B2 (ja) ソフトレイヤ化および深度認識インペインティングを用いた単画像3d写真技術
US11636578B1 (en) Partial image completion
Yao et al. Real-time stereo to multi-view conversion system based on adaptive meshing
CN109729285B (zh) 熔线格特效生成方法、装置、电子设备及存储介质
Yan et al. Stereoscopic image generation from light field with disparity scaling and super-resolution
Lumentut et al. 6-DOF motion blur synthesis and performance evaluation of light field deblurring

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21859902

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021859902

Country of ref document: EP

Effective date: 20220901

ENP Entry into the national phase

Ref document number: 2022559723

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE