CN116569218A - Image processing method and image processing apparatus - Google Patents

Image processing method and image processing apparatus Download PDF

Info

Publication number
CN116569218A
CN116569218A CN202080107407.8A CN202080107407A CN116569218A CN 116569218 A CN116569218 A CN 116569218A CN 202080107407 A CN202080107407 A CN 202080107407A CN 116569218 A CN116569218 A CN 116569218A
Authority
CN
China
Prior art keywords
information
neural network
network model
characteristic information
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080107407.8A
Other languages
Chinese (zh)
Inventor
郑凯
李选富
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN116569218A publication Critical patent/CN116569218A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics

Abstract

An image processing method and an image processing device improve the restoration capability of image colors and details, and further effectively improve the image quality. The method comprises the following steps: acquiring spatial position information and view angle information of a picture in a camera (S101); inputting the spatial position information and the view angle information into a first neural network model to obtain spatial light field characteristic information corresponding to the spatial position information and the view angle information, wherein the spatial light field characteristic information comprises spatial three-dimensional characteristic information and color characteristic information, and the first neural network model is used for processing three-dimensional information of an image (S102); the spatial light field characteristic information is input to a second neural network model for restoring color information and detail information of the image to obtain a target image (S103).

Description

Image processing method and image processing apparatus Technical Field
The present disclosure relates to the field of image technology, and in particular, to an image processing method and an image processing apparatus.
Background
With the development of Virtual Reality (VR) technology, VR devices are becoming simpler, easier to use, and more popular. However, unlike the rapid improvements in VR devices, high quality VR content remains very limited. The reason is that, unlike 2-dimensional (2D) digital contents that are conventionally displayed, in order to enhance the feeling of being personally on the scene of a user (such as the display contents changing with the movement of the user), VR devices need to acquire three-dimensional light field contents of a scene, generate images of arbitrary perspectives through three-dimensional light field content rendering, and thus display VR contents of high quality.
In order to display the high quality VR content, one way that industry currently commonly adopts is: the 5-dimensional (5D) coordinates of the spatial points on the camera light are sampled by a neural ray field (NeRF), the density and color of the image at that coordinate are synthesized based on that coordinate, and the final image is obtained using classical volume rendering techniques. When the method is used for generating the image, the correlation among pixels is ignored through pixel-by-pixel calculation, and the recovery capability of the color and the detail of the image is insufficient.
Disclosure of Invention
The image processing method and the image processing device improve the restoration capability of the color and the detail of the image, and further effectively improve the image quality.
In a first aspect, there is provided an image processing method, including: acquiring spatial position information and visual angle information of a picture in a camera; inputting the spatial position information and the visual angle information into a first neural network model to obtain spatial light field characteristic information corresponding to the spatial position information and the visual angle information, wherein the spatial light field characteristic information comprises spatial three-dimensional characteristic information and color characteristic information, and the first neural network model is used for processing three-dimensional information of an image; and inputting the spatial light field characteristic information into a second neural network model to obtain a target image, wherein the second neural network model is used for restoring color information and detail information of the image.
According to the image processing method, the target image under the current view angle is generated by combining the two trained neural network models, the three-dimensional light field information and the color characteristic information of the target image are effectively reconstructed by using the first neural network model, the color information and the detail information of the image are effectively restored by using the second neural network model, and the color detail task is decoupled from the space light field task generation, so that the restoration capacity of the color and the detail of the image is improved, and the image quality is effectively improved.
It should be understood that the above spatial position information refers to the position of a ray in three-dimensional space, and may be represented by three-dimensional coordinates (x, y, z). The viewing angle information refers to the direction of the ray of the light emitted from the spatial position in the three-dimensional space, and may be represented by (θ,) Two parameters represent. The above spatial position information and view angle information may also be collectively referred to as 5D coordinates, which are defined by coordinates (x, y, z, θ,) And (3) representing.
It should be understood that the first neural network may also be referred to as a three-dimensional characterization network model or a light field reconstruction network model, and embodiments of the present application are not limited herein. The second neural network model may be a neural network model that processes image color information and detail information, such as a convolutional neural network (convolution neural network, CNN) network model.
With reference to the first aspect, in certain implementations of the first aspect, the second neural network model includes an encoding network and a decoding network; inputting the spatial light field characteristic information into a second neural network model to obtain a target image, wherein the method comprises the following steps of: and inputting the spatial light field characteristic information into the decoding network to obtain the target image.
In the embodiment of the application, the spatial light field characteristic information comprises spatial three-dimensional characteristic information and color characteristic information, and the VR equipment improves the recovery capability of the VR equipment on the color information and the detail information of the target image and improves the image quality through the decoding network in the first neural network model and the second neural network model.
With reference to the first aspect, in certain implementation manners of the first aspect, before the inputting the spatial location information and the perspective information into the first neural network model, the method further includes: inputting the color information of the sample image into the coding network to obtain the color characteristic information and detail characteristic information of the sample image; and training the first neural network model by using the color characteristic information and the detail characteristic information of the sample image and the space position information corresponding to the sample image.
In this embodiment, the VR device trains the first neural network model without using the image as a truth reference, and trains the first neural network model network by using the spatial position information and the view angle information of the corresponding image as inputs, so that the first neural network model can learn the first intermediate representation, and the second intermediate representation output by the first neural network is more accurate. The VR device may pass the second intermediate representation through the decoding network, which may output a higher quality image.
With reference to the first aspect, in certain implementation manners of the first aspect, training the first neural network model includes: and training the first neural network model by taking the color characteristic information and the detail characteristic information of the sample image as true values and taking the spatial position information corresponding to the sample image as input.
It should be appreciated that when the color feature information and detail feature information of the image generated by the encoding network in the second neural network model can be decoded by the decoding network in the second neural network model and accurately restored to a higher quality image, it indicates that the training of the second neural network model is completed.
According to the embodiment of the application, the second neural network model is trained by adopting the segmentation training method, so that the first intermediate representation (comprising high-dimensional characteristic information of color details) is generated, then the first intermediate representation is used as a true value to train the first neural network model, the first neural network model can learn the implicit representation of the light field, and the first neural network model can output more accurate intermediate representation, namely the second intermediate representation. Compared with the method for directly adopting end-to-end training of the three-dimensional light field representation and decoding network, the segmentation training method provided by the embodiment of the application is easier to converge and has higher training efficiency.
In a second aspect, there is provided an image processing apparatus including: the device comprises an acquisition module and a processing module; wherein the acquisition module is used for: acquiring spatial position information and visual angle information of a picture in a camera; the spatial position information and the visual angle information are input into a first neural network model, spatial light field characteristic information corresponding to the spatial position information and the visual angle information is obtained, the spatial light field characteristic information comprises spatial three-dimensional characteristic information and color characteristic information, and the first neural network model is used for processing three-dimensional information of an image; and inputting the spatial light field characteristic information into a second neural network model to obtain a target image, wherein the second neural network model is used for restoring color information and detail information of the image.
With reference to the second aspect, in certain implementations of the second aspect, the second neural network model includes an encoding network and a decoding network; the processing module is specifically used for: and inputting the spatial light field characteristic information into the decoding network to obtain the target image.
With reference to the second aspect, in some implementations of the second aspect, the processing module is specifically configured to: before the space position information and the visual angle information are input into a first neural network model, color information of a sample image is input into the coding network, and color characteristic information and detail characteristic information of the sample image are obtained; and training the first neural network model by using the color characteristic information and the detail characteristic information of the sample image and the space position information corresponding to the sample image.
With reference to the second aspect, in some implementations of the second aspect, the processing module is specifically configured to: and training the first neural network model by taking the color characteristic information and the detail characteristic information of the sample image as true values and taking the spatial position information corresponding to the sample image as input.
In a third aspect, there is provided another image processing apparatus comprising: a processor coupled to the memory and operable to execute instructions in the memory to implement the method of any one of the possible implementations of the first aspect. Optionally, the apparatus further comprises a memory. Optionally, the apparatus further comprises a communication interface, the processor being coupled to the communication interface.
In a fourth aspect, there is provided a processor comprising: input circuit, output circuit and processing circuit. The processing circuitry is configured to receive signals via the input circuitry and to transmit signals via the output circuitry such that the processor performs the method of any one of the possible implementations of the first aspect described above.
In a specific implementation process, the processor may be a chip, the input circuit may be an input pin, the output circuit may be an output pin, and the processing circuit may be a transistor, a gate circuit, a trigger, various logic circuits, and the like. The input signal received by the input circuit may be received and input by, for example and without limitation, a receiver, the output signal may be output by, for example and without limitation, a transmitter and transmitted by a transmitter, and the input circuit and the output circuit may be the same circuit, which functions as the input circuit and the output circuit, respectively, at different times. The embodiments of the present application do not limit the specific implementation manner of the processor and the various circuits.
In a fifth aspect, a processing device is provided that includes a processor and a memory. The processor is configured to read instructions stored in the memory to perform the method according to any one of the possible implementations of the first aspect.
Optionally, the processor is one or more and the memory is one or more.
Alternatively, the memory may be integrated with the processor or the memory may be separate from the processor.
In a specific implementation process, the memory may be a non-transient (non-transitory) memory, for example, a Read Only Memory (ROM), which may be integrated on the same chip as the processor, or may be separately disposed on different chips.
The processing means in the fifth aspect may be a chip, and the processor may be implemented by hardware or by software, and when implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor, implemented by reading software code stored in a memory, which may be integrated in the processor, or may reside outside the processor, and exist separately.
In a sixth aspect, there is provided a computer program product comprising: a computer program (which may also be referred to as code, or instructions) which, when executed, causes a computer to perform the method of any one of the possible implementations of the first aspect.
In a seventh aspect, a computer readable storage medium is provided, which stores a computer program (which may also be referred to as code, or instructions) which, when run on a computer, causes the computer to perform the method of any one of the possible implementations of the first aspect.
Drawings
FIG. 1 is a schematic flow chart of an image processing method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of an image processing procedure provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of a training process of a first neural network model provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of a training process for a second neural network model provided by an embodiment of the present application;
fig. 5 is a schematic block diagram of an image processing apparatus provided in an embodiment of the present application;
fig. 6 is a schematic block diagram of another image processing apparatus provided in an embodiment of the present application.
Detailed Description
The technical solutions in the present application will be described below with reference to the accompanying drawings.
Virtual Reality (VR), also known as a virtual environment, a mood or an artificial environment, refers to a technique of generating a virtual world by means of a computer that can directly apply visual, auditory and tactile sensations to participants and allow them to interactively observe and operate.
The VR technology has very broad prospects, and at present, VR devices are rapidly developing and becoming simple, easy to use and popular. However, unlike VR devices, which are evolving rapidly, high quality VR digital content is very limited. Unlike conventionally displayed 2D digital content, VR devices need to acquire three-dimensional light field content of a scene in order to enhance the feeling of being personally on the scene (e.g., display content changes with human motion), and capturing three-dimensional light field content of a scene requires very complex hardware devices, limiting the flexibility of acquiring three-dimensional light field content.
In order to enable a user to obtain accurate light field information for a VR device to display by only holding the VR device around a scene, the following two ways are generally adopted in the industry.
In a first implementation, the VR device may be based on image rendering (image based rendering, IBR) techniques, i.e., the ability to generate images at different perspectives or different coordinates. The VR device can acquire information of the whole scene through the IBR and generate images with any view angles in real time. However, for most scenarios, IBR presents two significant challenges. First, IBR requires reconstructing a 3D (3D) model, but the reconstructed 3D model must be sufficiently detailed and exhibit occlusion relationships of objects in the scene. Second, the object surface color and texture generated by the 3D model need to depend on the representation capability of the input image, but increasing the input data set reduces the speed and performance of the model. Therefore, the method has certain requirements on the performance of VR equipment, and has insufficient capability of recovering information such as color, detail and the like of the image.
In a second implementation, the VR device may synthesize a complex scene representation using a sparse picture dataset through the neuro-ray field NeRF, and then synthesize the complex scene representation by combining the 5D coordinates (e.g., spatial location (x, y, z) and view direction (θ,) Sampling to synthesize density (density) and color (color) at the corresponding viewing angle. The VR device may then use classical volume rendering techniques (volume rendering) on the density (density) and color (color) at the new view to obtain an image corresponding to the 5D coordinates, thereby implementing the task of continuous generation of new view images representing the entire scene. However, the method adopts a fully-connected deep learning network to calculate the data set pixel by pixel, does not use correlation among pixels, is isolated from each other, and has insufficient detail recovery capability for certain scenes.
In view of this, the application provides an image processing method and an image processing device, which generate a target image under a current view angle by combining two trained neural network models, effectively reconstruct three-dimensional light field information and color characteristic information of the target image by using a first neural network model, and effectively restore color information and detail information of the image by using a second neural network model, thereby improving the restoration capability of the color and detail of the image and effectively improving the image quality.
Before describing the method and apparatus provided in the embodiments of the present application, the following description is made.
First, in the embodiments shown below, terms and english abbreviations, such as viewing angle information, color information, or spatial position information, etc., are given as exemplary examples for convenience of description, and should not constitute any limitation to the present application. This application does not exclude the possibility of defining other terms in existing or future protocols that perform the same or similar functions.
Second, the first, second and various numerical numbers in the embodiments shown below are merely for convenience of description and are not intended to limit the scope of the embodiments of the present application. For example, a first neural network model, a second neural network model, etc., distinguish between different neural networks, etc.
Third, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, and c may represent: a, b, or c, or a and b, or a and c, or b and c, or a, b and c, wherein a, b and c can be single or multiple.
In order to make the purposes and technical solutions of the present application clearer and more intuitive, the image processing method and the image processing apparatus provided in the present application will be described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
It should be understood that the method of the embodiments of the present application may be performed by a VR device provided with a camera, which may be, for example, VR glasses, VR headset, etc., which embodiments of the present application are not limited to.
Fig. 1 is a schematic flow chart of an image processing method 100 in an embodiment of the present application. As shown in fig. 1, the method 100 may include the steps of:
s101, acquiring spatial position information and view angle information of a picture in a camera.
It should be understood that the above spatial position information refers to the position of a ray in three-dimensional space, and may be represented by three-dimensional coordinates (x, y, z). The viewing angle information refers to the direction of the ray of the light emitted from the spatial position in the three-dimensional space, and may be represented by (θ,) Two parameters represent. The above spatial position information and view angle information may also be collectively referred to as 5D coordinates, which are defined by coordinates (x, y, z, θ,) And (3) representing.
S102, inputting the spatial position information and the visual angle information into a first neural network model to obtain spatial light field characteristic information corresponding to the spatial position information and the visual angle information, wherein the spatial light field characteristic information comprises spatial three-dimensional characteristic information and color characteristic information, and the first neural network model is used for processing three-dimensional information of an image.
It should be understood that the first neural network may also be referred to as a three-dimensional characterization network model or a light field reconstruction network model, and embodiments of the present application are not limited herein.
S103, inputting the spatial light field characteristic information into a second neural network model to obtain a target image, wherein the second neural network model is used for restoring color information and detail information of the image.
In this embodiment of the present application, the color feature information included in the spatial light field feature information may be in a trichromatic format (red green blue color mode, RGB) or a YUV format, where "Y" represents brightness (luminance or luma), and "U" and "V" represent chromaticity (chroma) to describe the color and saturation of the image, which is used to specify the color of the pixel. For the spatial light field characteristic information with different formats, VR devices can adopt different processing modes.
In one possible implementation manner, when the color characteristic information included in the spatial light field characteristic information is in RGB format, the VR device may input the spatial light field characteristic information including the color characteristic information in RGB format to the second neural network model, to obtain the target image.
In another possible implementation manner, when the color characteristic information included in the spatial light field characteristic information is in a YUV format, the VR device may convert the color characteristic information in the YUV format from the YUV format to an RGB format, and then input the spatial light field characteristic information including the converted color characteristic information into the second neural network model, so as to obtain the target image.
It should be appreciated that the second neural network model described above may be a neural network model that processes image color information and detail information, such as a convolutional neural network (convolution neural network, CNN) network model.
According to the image processing method, the target image under the current view angle is generated by combining the two trained neural network models, the three-dimensional light field information and the color characteristic information of the target image are effectively reconstructed by using the first neural network model, the color information and the detail information of the image are effectively restored by using the second neural network model, and the color detail task is decoupled from the space light field task generation, so that the restoration capacity of the color and the detail of the image is improved, and the image quality is effectively improved.
Assuming that peak signal-to-noise ratio (peak signal to noise ratio, PSNR) is used as a measure of image quality, the method of the embodiments of the present application increases the PSNR of the target image from 32 to 34, i.e., the image quality is improved, as compared to the second implementation in the prior art described above.
Optionally, the second neural network model is an RGB network specifically processing color detail information, and the first neural network model is a NeRF network specifically processing 3D information. According to the embodiment of the application, the two networks are combined, so that the overall output effect can be improved.
As an alternative embodiment, the second neural network model includes an encoding network and a decoding network; inputting the spatial light field characteristic information into a second neural network model to obtain a target image, wherein the method comprises the following steps of: and inputting the spatial light field characteristic information into the decoding network to obtain the target image.
Fig. 2 shows a processing procedure of the image processing method provided in the embodiment of the present application. As shown in fig. 2, the VR device may input the spatial position information and the perspective information of the image to the first neural network model to obtain spatial light field feature information, and then input the spatial light field feature information to the decoding network in the second neural network model to generate the target image.
In the embodiment of the application, the spatial light field characteristic information comprises spatial three-dimensional characteristic information and color characteristic information, and the VR equipment improves the recovery capability of the VR equipment on the color information and the detail information of the target image and improves the image quality through the decoding network in the first neural network model and the second neural network model.
The use of the neural network model provided in the embodiment of the present application is described in detail above with reference to fig. 1 and 2, and the training process of the neural network model will be described in detail below with reference to fig. 3 and 4. The training process includes training of a first neural network model and training of a second neural network model.
Training the first neural network model may include: inputting the color information of the sample image into the coding network of the second neural network model to obtain the color characteristic information and detail characteristic information of the sample image; and training the first neural network model by using the color characteristic information and the detail characteristic information of the sample image and the space position information corresponding to the sample image.
It should be appreciated that the above image color information and detail information may be collectively referred to as a first intermediate representation. The first intermediate representation contains information such as color, detail, field and relevance of the image. For example, the color, texture details and location information of things in an image, the color, details, relationships between locations between different things, and so forth.
In one possible implementation, the VR device may map color feature information of the image to a high-dimensional feature space through the encoding network to obtain the first intermediate representation. The VR device may train the first neural network model using the spatial location information corresponding to the first intermediate representation and the sample image, and obtain a second intermediate representation. The VR device inputs the second intermediate representation to the decoding network to obtain a training result of the sample image.
In this embodiment of the present application, the VR device may train the first neural network model using spatial location information corresponding to the first intermediate representation and the sample image, so that the first neural network model learns parameters in the first intermediate representation.
As an alternative embodiment, the training the first neural network model includes: and training the first neural network model by taking the color characteristic information and the detail characteristic information of the sample image as true values and taking the spatial position information corresponding to the sample image as input.
Fig. 3 illustrates a first neural network model training process provided by an embodiment of the present application. As shown in fig. 3, the VR device may generate color feature information and detail feature information (i.e., the first intermediate representation) by inputting color information of the sample image to a coding network in the second neural network model, and extracting the color information of the sample image through the coding network. Then, the VR device inputs the spatial position information and the view angle information of the sample image into the first neural network model, and trains the first neural network model by using the color feature information and the detail feature information as true values, so that the first neural network model can learn the information such as the image color, the detail, the field, the correlation and the like contained in the first intermediate representation, and the output result of the first neural network model can be close to the true values, thereby completing the training of the first neural network model.
In this embodiment, the VR device trains the first neural network model without using the image as a truth reference, and trains the first neural network model network by using the spatial position information and the view angle information of the corresponding image as inputs, so that the first neural network model can learn the first intermediate representation, and the second intermediate representation output by the first neural network is more accurate. The VR device may pass the second intermediate representation through the decoding network, which may output a higher quality image.
It should be appreciated that the VR device may also train the second neural network model before the VR device uses the second neural network model, or before the VR device obtains the true values required for the training of the first neural network model via the encoding network of the second neural network model.
Since the second neural network model includes an encoding network and a decoding network, training of the second neural network model may be performed with the encoding network and the decoding network. FIG. 4 illustrates a training process for a second neural network model provided by an embodiment of the present application. As shown in fig. 4, the VR device may input color information of the sample image to an encoding network in the second neural network model, generate color feature information and detail feature information of the sample image through the encoding network, input the obtained color feature information and detail feature information to a decoding network in the second neural network model, and obtain a decoded image.
It should be appreciated that when the color feature information and detail feature information of the image generated by the encoding network in the second neural network model can be decoded by the decoding network in the second neural network model and accurately restored to a higher quality image, it indicates that the training of the second neural network model is completed.
According to the embodiment of the application, the second neural network model is trained by adopting the segmentation training method, so that the first intermediate representation (comprising high-dimensional characteristic information of color details) is generated, then the first intermediate representation is used as a true value to train the first neural network model, the first neural network model can learn the implicit representation of the light field, and the first neural network model can output more accurate intermediate representation, namely the second intermediate representation. Compared with the method for directly adopting end-to-end training of the three-dimensional light field representation and decoding network, the segmentation training method provided by the embodiment of the application is easier to converge and has higher training efficiency.
It should be understood that the sequence numbers of the above processes do not mean the order of execution, and the execution order of the processes should be determined by the functions and internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present application.
The image processing method provided in the embodiment of the present application is described in detail above with reference to fig. 1 to 4, and the image processing apparatus provided in the embodiment of the present application will be described in detail below with reference to fig. 5 and 6.
Fig. 5 shows an image processing apparatus 500 provided in an embodiment of the present application, the apparatus 500 including: an acquisition module 501 and a processing module 502.
The acquiring module 501 is configured to acquire spatial position information and view angle information of a picture in the camera; the processing module 502 is configured to input the spatial location information and the perspective information into a first neural network model, obtain spatial light field feature information corresponding to the spatial location information and the perspective information, where the spatial light field feature information includes spatial three-dimensional feature information and color feature information, and the first neural network model is configured to process three-dimensional information of an image; and inputting the spatial light field characteristic information into a second neural network model to obtain a target image, wherein the second neural network model is used for restoring color information and detail information of the image.
Optionally, the second neural network model includes an encoding network and a decoding network; the processing module 502 is configured to input the spatial light field characteristic information into the decoding network to obtain the target image.
Optionally, the processing module 502 is configured to input color information of a sample image to the encoding network before inputting the spatial location information and the perspective information to the first neural network model, to obtain color feature information and detail feature information of the sample image; and training the first neural network model by using the color characteristic information and the detail characteristic information of the sample image and the space position information corresponding to the sample image.
Optionally, the processing module 502 is configured to train the first neural network model by using color feature information and detail feature information of the sample image as true values and using spatial location information corresponding to the sample image as input.
It should be appreciated that the apparatus 500 herein is embodied in the form of functional modules. The term module herein may refer to an application specific integrated circuit (application specific integrated circuit, ASIC), an electronic circuit, a processor (e.g., a shared, dedicated, or group processor, etc.) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that support the described functionality. In an alternative example, it may be understood by those skilled in the art that the apparatus 500 may be specifically a VR device in the foregoing embodiment, or the functions of the VR device in the foregoing embodiment may be integrated in the apparatus 500, and the apparatus 500 may be configured to execute each flow and/or step corresponding to the VR device in the foregoing method embodiment, which is not repeated herein.
The apparatus 500 has functions to implement the corresponding steps performed by the VR device in the method described above; the above functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above.
In an embodiment of the present application, the apparatus 500 in fig. 5 may also be a chip or a chip system, for example: system on chip (SoC).
Fig. 6 shows another image processing apparatus 600 provided in an embodiment of the present application. The apparatus 600 includes a processor 601, a transceiver 602, and a memory 603. Wherein the processor 601, the transceiver 602 and the memory 603 communicate with each other through an internal connection path, the memory 603 is used for storing instructions, and the processor 601 is used for executing the instructions stored in the memory 603 to control the transceiver 602 to transmit signals and/or receive signals.
The transceiver 602 is configured to obtain spatial position information and view angle information of a picture in the camera; a processor 601, configured to input the spatial location information and the perspective information into a first neural network model, to obtain spatial light field feature information corresponding to the spatial location information and the perspective information, where the spatial light field feature information includes spatial three-dimensional feature information and color feature information, and the first neural network model is configured to process three-dimensional information of an image; and inputting the spatial light field characteristic information into a second neural network model to obtain a target image, wherein the second neural network model is used for restoring color information and detail information of the image.
Optionally, the second neural network model includes an encoding network and a decoding network; the processor 601 is configured to input the spatial light field characteristic information into the decoding network to obtain the target image.
Optionally, the processor 601 is configured to input color information of a sample image to the encoding network before inputting the spatial location information and the perspective information to the first neural network model, to obtain color feature information and detail feature information of the sample image; and training the first neural network model by using the color characteristic information and the detail characteristic information of the sample image and the space position information corresponding to the sample image.
Optionally, the processor 601 is configured to train the first neural network model by using color feature information and detail feature information of the sample image as true values and using spatial location information corresponding to the sample image as input.
It should be understood that the apparatus 600 may be specifically a VR device in the foregoing embodiment, or the functions of the VR device in the foregoing embodiment may be integrated in the apparatus 600, and the apparatus 600 may be configured to perform the steps and/or flows corresponding to the VR device in the foregoing method embodiment. The memory 603 may optionally include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type. The processor 601 may be configured to execute instructions stored in the memory, and when the processor executes the instructions, the processor may perform the steps and/or flows corresponding to VR devices in the above-described method embodiments.
It should be appreciated that in embodiments of the present application, the processor may be a central processing unit (Central Processing Unit, CPU), the processor may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor executes instructions in the memory to perform the steps of the method described above in conjunction with its hardware. To avoid repetition, a detailed description is not provided herein.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (14)

  1. An image processing method, comprising:
    acquiring spatial position information and visual angle information of a picture in a camera;
    inputting the spatial position information and the visual angle information into a first neural network model to obtain spatial light field characteristic information corresponding to the spatial position information and the visual angle information, wherein the spatial light field characteristic information comprises spatial three-dimensional characteristic information and color characteristic information, and the first neural network model is used for processing three-dimensional information of an image;
    and inputting the spatial light field characteristic information into a second neural network model to obtain a target image, wherein the second neural network model is used for restoring color information and detail information of the image.
  2. The method of claim 1, wherein the second neural network model comprises an encoding network and a decoding network;
    The step of inputting the spatial light field characteristic information into a second neural network model to obtain a target image comprises the following steps:
    and inputting the spatial light field characteristic information into the decoding network to obtain the target image.
  3. The method of claim 2, wherein prior to said inputting the spatial location information and the perspective information to a first neural network model, the method further comprises:
    inputting the color information of the sample image into the coding network to obtain the color characteristic information and detail characteristic information of the sample image;
    and training the first neural network model by utilizing the color characteristic information and the detail characteristic information of the sample image and the space position information corresponding to the sample image.
  4. The method of claim 3, wherein the training the first neural network model comprises:
    and training the first neural network model by taking the color characteristic information and the detail characteristic information of the sample image as true values and taking the spatial position information corresponding to the sample image as input.
  5. An image processing apparatus, comprising:
    the acquisition module is used for acquiring the spatial position information and the visual angle information of the picture in the camera;
    The processing module is used for inputting the spatial position information and the visual angle information into a first neural network model to obtain spatial light field characteristic information corresponding to the spatial position information and the visual angle information, wherein the spatial light field characteristic information comprises spatial three-dimensional characteristic information and color characteristic information, and the first neural network model is used for processing three-dimensional information of an image; and inputting the spatial light field characteristic information into a second neural network model to obtain a target image, wherein the second neural network model is used for restoring color information and detail information of the image.
  6. The apparatus of claim 5, wherein the second neural network model comprises an encoding network and a decoding network;
    the processing module is specifically configured to:
    and inputting the spatial light field characteristic information into the decoding network to obtain the target image.
  7. The apparatus of claim 6, wherein the processing module is specifically configured to:
    before the spatial position information and the visual angle information are input into a first neural network model, color information of a sample image is input into the coding network, and color characteristic information and detail characteristic information of the sample image are obtained;
    And training the first neural network model by utilizing the color characteristic information and the detail characteristic information of the sample image and the space position information corresponding to the sample image.
  8. The apparatus of claim 7, wherein the processing module is specifically configured to:
    and training the first neural network model by taking the color characteristic information and the detail characteristic information of the sample image as true values and taking the spatial position information corresponding to the sample image as input.
  9. An image processing apparatus, comprising: a processor coupled to the memory, the processor configured to execute instructions stored in the memory to perform the steps of:
    acquiring spatial position information and visual angle information of a picture in a camera;
    inputting the spatial position information and the visual angle information into a first neural network model to obtain spatial light field characteristic information corresponding to the spatial position information and the visual angle information, wherein the spatial light field characteristic information comprises spatial three-dimensional characteristic information and color characteristic information, and the first neural network model is used for processing three-dimensional information of an image;
    and inputting the spatial light field characteristic information into a second neural network model to obtain a target image, wherein the second neural network model is used for restoring color information and detail information of the image.
  10. The apparatus of claim 9, wherein the second neural network model comprises an encoding network and a decoding network;
    the processor is specifically configured to:
    and inputting the spatial light field characteristic information into the decoding network to obtain the target image.
  11. The apparatus of claim 10, wherein the processor is specifically configured to:
    before the spatial position information and the visual angle information are input into a first neural network model, color information of a sample image is input into the coding network, and color characteristic information and detail characteristic information of the sample image are obtained;
    and training the first neural network model by utilizing the color characteristic information and the detail characteristic information of the sample image and the space position information corresponding to the sample image.
  12. The apparatus of claim 11, wherein the processor is specifically configured to:
    and training the first neural network model by taking the color characteristic information and the detail characteristic information of the sample image as true values and taking the spatial position information corresponding to the sample image as input.
  13. A computer readable storage medium storing a computer program comprising instructions for implementing the method of any one of claims 1 to 4.
  14. A chip system, comprising: a processor for calling and running a computer program from a memory, causing a communication device in which the chip system is installed to perform the method of any one of claims 1 to 4.
CN202080107407.8A 2020-12-24 2020-12-24 Image processing method and image processing apparatus Pending CN116569218A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/139145 WO2022133944A1 (en) 2020-12-24 2020-12-24 Image processing method and image processing apparatus

Publications (1)

Publication Number Publication Date
CN116569218A true CN116569218A (en) 2023-08-08

Family

ID=82157246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080107407.8A Pending CN116569218A (en) 2020-12-24 2020-12-24 Image processing method and image processing apparatus

Country Status (2)

Country Link
CN (1) CN116569218A (en)
WO (1) WO2022133944A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272575B (en) * 2022-07-28 2024-03-29 中国电信股份有限公司 Image generation method and device, storage medium and electronic equipment
CN115714888B (en) * 2022-10-09 2023-08-29 名之梦(上海)科技有限公司 Video generation method, device, equipment and computer readable storage medium
CN116071484B (en) 2023-03-07 2023-06-20 清华大学 Billion-pixel-level large scene light field intelligent reconstruction method and billion-pixel-level large scene light field intelligent reconstruction device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2579903C (en) * 2004-09-17 2012-03-13 Cyberextruder.Com, Inc. System, method, and apparatus for generating a three-dimensional representation from one or more two-dimensional images
CN108510573B (en) * 2018-04-03 2021-07-30 南京大学 Multi-view face three-dimensional model reconstruction method based on deep learning
CN109255843A (en) * 2018-09-26 2019-01-22 联想(北京)有限公司 Three-dimensional rebuilding method, device and augmented reality AR equipment
CN110163953B (en) * 2019-03-11 2023-08-25 腾讯科技(深圳)有限公司 Three-dimensional face reconstruction method and device, storage medium and electronic device
CN110400337B (en) * 2019-07-10 2021-10-26 北京达佳互联信息技术有限公司 Image processing method, image processing device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2022133944A1 (en) 2022-06-30

Similar Documents

Publication Publication Date Title
WO2020192568A1 (en) Facial image generation method and apparatus, device and storage medium
Zhang et al. Multi-scale single image dehazing using perceptual pyramid deep network
US11348285B2 (en) Mesh compression via point cloud representation
CN112166604B (en) Volume capture of objects with a single RGBD camera
CN110378838B (en) Variable-view-angle image generation method and device, storage medium and electronic equipment
CN116569218A (en) Image processing method and image processing apparatus
CN110175986B (en) Stereo image visual saliency detection method based on convolutional neural network
CN110599395B (en) Target image generation method, device, server and storage medium
CN110381268B (en) Method, device, storage medium and electronic equipment for generating video
CN110689599A (en) 3D visual saliency prediction method for generating countermeasure network based on non-local enhancement
CN113850168A (en) Fusion method, device and equipment of face pictures and storage medium
CN108701355A (en) GPU optimizes and the skin possibility predication based on single Gauss online
CN114863002A (en) Virtual image generation method and device, terminal equipment and computer readable medium
CN113723317A (en) Reconstruction method and device of 3D face, electronic equipment and storage medium
CN107766803B (en) Video character decorating method and device based on scene segmentation and computing equipment
CN108509830B (en) Video data processing method and device
CN116205962A (en) Monocular depth estimation method and system based on complete context information
CN116740261A (en) Image reconstruction method and device and training method and device of image reconstruction model
CN109658488B (en) Method for accelerating decoding of camera video stream through programmable GPU in virtual-real fusion system
CN110533773A (en) A kind of three-dimensional facial reconstruction method, device and relevant device
Sharma et al. A novel 3d-unet deep learning framework based on high-dimensional bilateral grid for edge consistent single image depth estimation
CN116863069A (en) Three-dimensional light field face content generation method, electronic equipment and storage medium
Tian et al. Deformable convolutional network constrained by contrastive learning for underwater image enhancement
CN115205487A (en) Monocular camera face reconstruction method and device
CN116883770A (en) Training method and device of depth estimation model, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination