WO2022133944A1

WO2022133944A1 - Image processing method and image processing apparatus

Info

Publication number: WO2022133944A1
Application number: PCT/CN2020/139145
Authority: WO
Inventors: 郑凯; 李选富
Original assignee: 华为技术有限公司
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2022-06-30
Also published as: CN116569218A

Abstract

An image processing method and an image processing apparatus, which improve the capability of recovering an image color and a detail, thereby effectively improving the image quality. The method comprises: acquiring spatial position information and viewing angle information of a picture in a camera (S101); inputting the spatial position information and the viewing angle information into a first neural network model, so as to obtain light field spatial feature information corresponding to the spatial position information and the viewing angle information, wherein the light field spatial feature information comprises spatial three-dimensional feature information and color feature information, and the first neural network model is used for processing three-dimensional information of an image (S102); and inputting the light field spatial feature information into a second neural network model, so as to obtain a target image, wherein the second neural network model is used for restoring color information and detail information of an image (S103).

Description

Image processing method and image processing device

technical field

The present application relates to the field of image technology, and in particular, to an image processing method and an image processing device.

Background technique

With the development of virtual reality (VR) technology, VR equipment is becoming more and more simple, easy to use and popular. However, unlike the rapid improvement of VR equipment, high-quality VR content is still very limited. The reason is that, unlike traditionally displayed two-dimensions (2D) digital content, in order to enhance the user's immersive experience (for example, the display content changes with the user's movement), the VR device needs to obtain the three-dimensional light field content of the scene. Generate images from any viewing angle through 3D light field content rendering to display high-quality VR content.

In order to display the above-mentioned high-quality VR content, a method commonly used in the industry is: sampling the 5-dimensional (5-dimensions, 5D) coordinates of the spatial points on the camera rays through a neural ray field (NeRF), The density and color of the image at this coordinate are synthesized based on this coordinate, and the final image is obtained using classical volume rendering techniques. When generating an image, this method calculates pixel by pixel, ignoring the correlation between pixels, and has insufficient ability to restore the color and details of the image.

SUMMARY OF THE INVENTION

The present application provides an image processing method and an image processing device, which improve the ability to restore the color and details of the image, thereby effectively improving the image quality.

In a first aspect, an image processing method is provided, which is characterized by comprising: acquiring spatial position information and perspective information of a picture in a camera; inputting the above-mentioned spatial position information and the above-mentioned perspective information into a first neural network model to obtain a The spatial position information and the spatial light field feature information corresponding to the above-mentioned viewing angle information, the above-mentioned spatial light field feature information includes spatial three-dimensional feature information and color feature information, and the above-mentioned first neural network model is used to process the three-dimensional information of the image; the above-mentioned spatial light field The feature information is input to the second neural network model to obtain the target image, and the above-mentioned second neural network model is used to restore the color information and detail information of the image.

In the image processing method of the embodiment of the present application, the target image under the current perspective is generated by combining two trained neural network models, the three-dimensional light field information and color feature information of the target image are effectively reconstructed by using the first neural network model, and the The second neural network model effectively restores the color information and detail information of the image, and decouples the task of optimizing color details from the task of generating a spatial light field, thereby improving the ability to restore the color and details of the image and effectively improving the image quality.

It should be understood that the above-mentioned spatial position information refers to the position of the light in the three-dimensional space, which may be represented by three-dimensional coordinates (x, y, z). The viewing angle information refers to the direction in the three-dimensional space of the ray emitted by the above-mentioned light from the above-mentioned spatial position, which can be used (θ,

) is represented by two parameters. The above-mentioned spatial position information and viewing angle information can also be collectively referred to as 5D coordinates, using coordinates (x, y, z, θ,

)express.

It should be understood that the above-mentioned first neural network may also be referred to as a three-dimensional representation network model or a light field reconstruction network model, which is not limited in this embodiment of the present application. The above-mentioned second neural network model may be a neural network model for processing image color information and detail information, such as a convolution neural network (CNN) network model.

With reference to the first aspect, in some implementations of the first aspect, the second neural network model includes an encoding network and a decoding network; the above-mentioned spatial light field feature information is input into the second neural network model to obtain a target image, The method includes: inputting the above-mentioned spatial light field feature information into the decoding network to obtain the target image.

In the embodiment of the present application, the spatial light field feature information includes spatial three-dimensional feature information and color feature information, and the VR device realizes that the VR device can detect the color of the target image through the decoding network in the first neural network model and the second neural network model. The improvement of the recovery ability of information and detailed information improves the image quality.

With reference to the first aspect, in some implementations of the first aspect, before the above-mentioned inputting the above-mentioned spatial position information and the above-mentioned perspective information into the first neural network model, the above-mentioned method further includes: inputting the color information of the sample image into the first neural network model. The encoding network obtains the color feature information and detail feature information of the sample image; the first neural network model is trained by using the color feature information and detail feature information of the sample image and the spatial position information corresponding to the sample image.

In the embodiment of the present application, the training of the first neural network model by the VR device no longer uses the image as the ground truth reference, but uses the first intermediate representation generated by the encoding network as the ground truth, and uses the spatial position of the corresponding image Information and perspective information are used as input to train the first neural network model network, so that the first neural network model can learn the first intermediate representation, so that the second intermediate representation output by the first neural network is more accurate. The VR device can pass the second intermediate representation through the decoding network, and can output higher quality images.

With reference to the first aspect, in some implementations of the first aspect, the above-mentioned training of the first neural network model includes: taking the color feature information and detail feature information of the above-mentioned sample image as true values, and using the corresponding The spatial position information is used as input, and the above-mentioned first neural network model is trained.

It should be understood that when the color feature information and detail feature information of the image generated by the encoding network in the second neural network model can be decoded by the decoding network in the second neural network model, and accurately restored to a higher quality image, It indicates that the training of the second neural network model is completed.

The embodiment of the present application adopts the method of segmented training, and the second neural network model can be trained first, so as to generate the above-mentioned first intermediate representation (high-dimensional feature information containing color details), and then the first intermediate representation is used as the true value to train the first neural network model, so that the first neural network model can learn the implicit representation of the light field, and let the first neural network model output a more accurate intermediate representation, that is, the above-mentioned second intermediate representation. Compared with the model that directly adopts the end-to-end training of the three-dimensional light field representation and decoding network, the segmentation training method of the embodiment of the present application is easier to converge, and the training efficiency is higher.

In a second aspect, an image processing apparatus is provided, comprising: an acquisition module and a processing module; wherein the acquisition module is used for: acquiring spatial position information and viewing angle information of a picture in a camera; The viewing angle information is input into the first neural network model, and the spatial light field feature information corresponding to the spatial position information and the viewing angle information is obtained, and the spatial light field feature information includes spatial three-dimensional feature information and color feature information. A neural network model is used to process the three-dimensional information of the image; and the spatial light field feature information is input into a second neural network model to obtain a target image, and the second neural network model is used to restore the color information and details of the image information.

In combination with the second aspect, in some implementations of the second aspect, the above-mentioned second neural network model includes an encoding network and a decoding network; the above-mentioned processing module is specifically configured to: input the above-mentioned spatial light field feature information into the above-mentioned decoding network to obtain The above target image.

With reference to the second aspect, in some implementations of the second aspect, the above-mentioned processing module is specifically configured to: before inputting the above-mentioned spatial position information and the above-mentioned perspective information into the first neural network model, input the color information of the sample image to the first neural network model. The encoding network obtains the color feature information and detail feature information of the sample image; and uses the color feature information and detail feature information of the sample image and the spatial position information corresponding to the sample image to train the first neural network model.

In combination with the second aspect, in some implementations of the second aspect, the above-mentioned processing module is specifically configured to: take the color feature information and detail feature information of the above-mentioned sample image as the true value, and take the spatial position information corresponding to the above-mentioned sample image as the input , train the first neural network model above.

In a third aspect, another image processing apparatus is provided, comprising: a processor, which is coupled to a memory and can be configured to execute instructions in the memory, so as to implement the method in any possible implementation manner of the first aspect. Optionally, the apparatus further includes a memory. Optionally, the apparatus further includes a communication interface to which the processor is coupled.

In a fourth aspect, a processor is provided, including: an input circuit, an output circuit, and a processing circuit. The processing circuit is configured to receive the signal through the input circuit and transmit the signal through the output circuit, so that the processor executes the method in any one of the possible implementation manners of the above first aspect.

In a specific implementation process, the above-mentioned processor may be a chip, the input circuit may be an input pin, the output circuit may be an output pin, and the processing circuit may be a transistor, a gate circuit, a flip-flop, and various logic circuits. The input signal received by the input circuit may be received and input by, for example, but not limited to, a receiver, the signal output by the output circuit may be, for example, but not limited to, output to and transmitted by a transmitter, and the input circuit and output The circuit can be the same circuit that acts as an input circuit and an output circuit at different times. The embodiments of the present application do not limit the specific implementation manners of the processor and various circuits.

In a fifth aspect, a processing apparatus is provided, including a processor and a memory. The processor is configured to read the instructions stored in the memory, so as to execute the method in any one of the possible implementation manners of the first aspect.

Optionally, there are one or more processors and one or more memories.

Alternatively, the memory may be integrated with the processor, or the memory may be provided separately from the processor.

In the specific implementation process, the memory can be a non-transitory memory, such as a read only memory (ROM), which can be integrated with the processor on the same chip, or can be separately set in different On the chip, the embodiment of the present application does not limit the type of the memory and the setting manner of the memory and the processor.

The processing device in the fifth aspect may be a chip, and the processor may be implemented by hardware or software. When implemented by hardware, the processor may be a logic circuit, an integrated circuit, etc.; when implemented by software When implemented, the processor can be a general-purpose processor, which is realized by reading software codes stored in a memory, and the memory can be integrated in the processor or located outside the processor and exist independently.

In a sixth aspect, a computer program product is provided. The computer program product includes: a computer program (also referred to as code, or instruction), which, when the computer program is executed, enables the computer to execute any one of the above-mentioned first aspects. method in method.

In a seventh aspect, a computer-readable storage medium is provided, the computer-readable storage medium stores a computer program (also referred to as code, or instruction) when it is run on a computer, causing the computer to execute the above-mentioned first aspect. method in any of the possible implementations.

Description of drawings

FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present application;

2 is a schematic diagram of an image processing process provided by an embodiment of the present application;

3 is a schematic diagram of a training process of a first neural network model provided by an embodiment of the present application;

4 is a schematic diagram of a training process of a second neural network model provided by an embodiment of the present application;

FIG. 5 is a schematic block diagram of an image processing apparatus provided by an embodiment of the present application;

FIG. 6 is a schematic block diagram of another image processing apparatus provided by an embodiment of the present application.

Detailed ways

The technical solutions in the present application will be described below with reference to the accompanying drawings.

Virtual reality (VR), also known as virtual environment, spiritual environment or artificial environment, refers to the use of computers to generate a virtual world that can directly apply visual, auditory and tactile sensations to participants and allow them to observe and operate interactively. Technology.

The prospect of VR technology is very broad. At present, VR equipment is developing rapidly and is becoming simple, easy to use and popular. However, unlike the rapid development of VR equipment, high-quality VR digital content is very limited. Different from traditionally displayed 2D digital content, in order to enhance the immersive experience (for example, the display content changes with the movement of people), the VR device needs to acquire the 3D light field content of the scene, and the capture of the 3D light field content of the scene needs to be very complicated. The hardware devices limit the flexibility of 3D light field content acquisition.

In order to realize that the user only needs to hold the VR device around the scene to obtain accurate light field information for display by the VR device, the industry usually adopts the following two methods.

In the first implementation manner, the VR device can be based on image-based rendering (IBR) technology, that is, the ability to generate images from different viewing angles or different coordinates. VR devices can obtain information of the entire scene through IBR, and generate images from any perspective in real time. However, for most scenarios, IBR presents two huge challenges. First, IBR needs to reconstruct a 3-dimensional (three dimensions, 3D) model, but the reconstructed 3D model must be detailed enough and show the occlusion relationship of objects in the scene. Secondly, the surface color and material of the object generated by the 3D model need to rely on the representation ability of the input image, but the increase of the input data set will reduce the speed and performance of the model. Therefore, this method has certain requirements on the performance of the VR device, and has insufficient ability to restore the color, details and other information of the image.

In the second implementation, the VR device can use the neural ray field NeRF to synthesize a complex scene representation using a sparse image data set, and then use the 5D coordinates of the spatial points on the camera ray (eg, the spatial position of the camera ray). (x, y, z) and viewing direction (θ,

)) sampling to synthesize the density and color at the corresponding viewing angle. Then, the VR device can use the classic volume rendering technology for the density and color of the new perspective to obtain the image corresponding to the 5D coordinates, so as to continuously represent the new perspective of the entire scene. The task of image generation. However, this method uses a fully-connected deep learning network to perform pixel-by-pixel computation on the dataset, and does not use the correlation between pixels. The pixels are isolated from each other, and the ability to restore details in some scenes is insufficient.

In view of this, the present application provides an image processing method and an image processing device, by combining two trained neural network models to generate a target image under the current perspective, and using the first neural network model to effectively reconstruct the three-dimensional light of the target image. Field information and color feature information, the second neural network model is used to effectively restore the color information and detail information of the image, thereby improving the ability to restore the color and detail of the image, and effectively improving the image quality.

Before introducing the method and device provided by the embodiments of the present application, the following points are first made.

First, in the embodiments shown below, terms and English abbreviations, such as viewing angle information, color information, or spatial position information, etc., are exemplary examples given for convenience of description, and should not constitute any limitation to the application. . This application does not exclude the possibility of defining other terms that can achieve the same or similar functions in existing or future agreements.

Second, in the embodiments shown below, the first, the second, and various numeral numbers are only for the convenience of description, and are not used to limit the scope of the embodiments of the present application. For example, the first neural network model, the second neural network model, etc. distinguish different neural networks and the like.

Third, "at least one" means one or more, and "plurality" means two or more. "And/or", which describes the association relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A, B can be singular or plural. The character "/" generally indicates that the associated objects are an "or" relationship. "At least one item(s) below" or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (a) of a, b and c may represent: a, or b, or c, or a and b, or a and c, or b and c, or a, b and c, wherein a, b, c can be single or multiple.

In order to make the purpose and technical solutions of the present application clearer and more intuitive, the image processing method and the image processing apparatus provided by the present application will be described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

It should be understood that the method in the embodiment of the present application may be performed by a VR device provided with a camera, and the VR device may be, for example, VR glasses, a VR headset, etc., which is not limited in the embodiment of the present application.

FIG. 1 is a schematic flowchart of an image processing method 100 in an embodiment of the present application. As shown in FIG. 1, the method 100 may include the following steps:

S101. Acquire spatial position information and viewing angle information of a picture in a camera.

)express.

S102: Input the above-mentioned spatial position information and the above-mentioned perspective information into the first neural network model, and obtain the spatial light field feature information corresponding to the above-mentioned spatial position information and the above-mentioned perspective information, and the above-mentioned spatial light field feature information includes spatial three-dimensional feature information and color Feature information, the above-mentioned first neural network model is used to process the three-dimensional information of the image.

It should be understood that the above-mentioned first neural network may also be referred to as a three-dimensional representation network model or a light field reconstruction network model, which is not limited in this embodiment of the present application.

S103. Input the above-mentioned spatial light field feature information into a second neural network model to obtain a target image, and the above-mentioned second neural network model is used to restore the color information and detail information of the image.

In this embodiment of the present application, the color feature information included in the above-mentioned spatial light field feature information may be a three-primary color format (red green blue color mode, RGB) or a YUV format, where "Y" represents brightness (luminance or luma), " U" and "V" represent the chrominance (chrominance or chroma), which is used to describe the color and saturation of the image, and is used to specify the color of the pixel. The VR device may adopt different processing methods for the spatial light field feature information in different formats.

In a possible implementation manner, when the color feature information included in the spatial light field feature information is in RGB format, the VR device may input the spatial light field feature information including the color feature information in the RGB format into the second neural network. network model to obtain the target image.

In another possible implementation, when the color feature information included in the above-mentioned spatial light field feature information is in YUV format, the VR device can convert the color feature information in YUV format from YUV format to RGB format, and then convert the color feature information including the conversion into RGB format. The spatial light field feature information of the latter color feature information is input to the above-mentioned second neural network model to obtain a target image.

It should be understood that the above-mentioned second neural network model may be a neural network model for processing image color information and detail information, such as a convolution neural network (CNN) network model.

Assuming that the peak signal-to-noise ratio (PSNR) is used as the measurement standard of image quality, the method of the embodiment of the present application is compared with the second implementation manner in the above-mentioned prior art, and the PSNR of the target image is improved from 32 At 34, the image quality is improved.

Optionally, the above-mentioned second neural network model is an RGB network specially processing color detail information, and the above-mentioned first neural network model is a NeRF network specially processing 3D information. By combining the two networks in the embodiment of the present application, the overall output effect can be improved.

As an optional embodiment, the above-mentioned second neural network model includes an encoding network and a decoding network; inputting the above-mentioned spatial light field feature information into the second neural network model to obtain a target image includes: inputting the above-mentioned spatial light field characteristic information To the above-mentioned decoding network, the above-mentioned target image is obtained.

FIG. 2 shows a processing process of the image processing method provided by the embodiment of the present application. As shown in Figure 2, the VR device can input the spatial position information and perspective information of the above image into the first neural network model to obtain the spatial light field feature information, and then input the spatial light field feature information into the second neural network model The decoding network generates the target image.

The use of the neural network model provided by the embodiments of the present application is described in detail above with reference to FIG. 1 and FIG. 2 , and the training process of the neural network model will be described in detail below with reference to FIG. 3 and FIG. 4 . The training process includes training of the first neural network model and training of the second neural network model.

The training of the first neural network model may include: inputting the color information of the sample image into the encoding network of the above-mentioned second neural network model to obtain the color feature information and detail feature information of the above-mentioned sample image; using the color feature information of the above-mentioned sample image. and the detailed feature information and the spatial position information corresponding to the sample image, to train the first neural network model.

It should be understood that the above-mentioned image color information and detail information may be collectively referred to as the first intermediate representation. The first intermediate representation contains information such as color, detail, domain, and relevance of the image. For example, the color, texture details, and position information of things in the image, as well as the relationship between colors, details, and positions between different things, and so on.

In a possible implementation manner, the VR device may map the color feature information of the image to a high-dimensional feature space through the encoding network to obtain the first intermediate representation. The VR device can use the first intermediate representation and the spatial position information corresponding to the sample image to train the first neural network model and obtain the second intermediate representation. The VR device inputs the second intermediate representation into the above-mentioned decoding network to obtain the training result of the sample image.

In this embodiment of the present application, the VR device may use the first intermediate representation and the spatial position information corresponding to the sample image to train the first neural network model, so that the first neural network model can learn the parameters in the first intermediate representation.

As an optional embodiment, the above-mentioned training of the first neural network model includes: taking the color feature information and detail feature information of the above-mentioned sample image as true values, and using the spatial position information corresponding to the above-mentioned sample image as input, training the Describe the first neural network model.

FIG. 3 shows the training process of the first neural network model provided by the embodiment of the present application. As shown in FIG. 3 , the VR device can generate color feature information and detail feature information from the color information of the sample image by inputting the color information of the sample image into the encoding network in the second neural network model. (ie the first intermediate representation described above). Then, the VR device inputs the spatial position information and perspective information of the sample image into the first neural network model, and uses the color feature information and the detail feature information as true values to train the first neural network model, so that the first neural network model Information such as image color, details, domain, and correlation included in the first intermediate representation can be learned, so that the output result of the first neural network model can be close to the true value, thereby completing the training of the first neural network model.

It should be understood that before the VR device uses the above-mentioned second neural network model, or before the VR device obtains the true value required for training the above-mentioned first neural network model through the coding network of the second neural network model, the VR device can also Two neural network models are trained.

Since the second neural network model includes an encoding network and a decoding network, the training of the second neural network model can be performed on the encoding network and the decoding network together. FIG. 4 shows the training process of the second neural network model provided by the embodiment of the present application. As shown in Figure 4, the VR device can input the color information of the sample image into the coding network in the second neural network model, generate the color feature information and detail feature information of the sample image through the coding network, and then use the obtained color feature information and detail feature information are input into the decoding network in the second neural network model, and the decoded image is obtained.

It should be understood that the size of the sequence numbers of the above processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

The image processing method provided by the embodiment of the present application is described in detail above with reference to FIG. 1 to FIG. 4 , and the image processing apparatus provided by the embodiment of the present application will be described in detail below with reference to FIG. 5 and FIG. 6 .

FIG. 5 shows an image processing apparatus 500 provided by an embodiment of the present application. The apparatus 500 includes: an acquisition module 501 and a processing module 502 .

Among them, the acquisition module 501 is used to acquire the spatial position information and the perspective information of the picture in the camera; the processing module 502 is used to input the spatial position information and the perspective information into the first neural network model, and obtain and the spatial position information and the perspective information. The spatial light field feature information corresponding to the position information and the viewing angle information, the spatial light field feature information includes spatial three-dimensional feature information and color feature information, and the first neural network model is used to process the three-dimensional information of the image; The spatial light field feature information is input to a second neural network model to obtain a target image, and the second neural network model is used to restore the color information and detail information of the image.

Optionally, the above-mentioned second neural network model includes an encoding network and a decoding network; the processing module 502 is configured to input the above-mentioned spatial light field feature information into the above-mentioned decoding network to obtain the above-mentioned target image.

Optionally, the processing module 502 is used to input the color information of the sample image into the encoding network before inputting the above-mentioned spatial position information and the above-mentioned perspective information into the first neural network model, and obtain the color feature information of the above-mentioned sample image. and detail feature information; and use the color feature information and detail feature information of the sample image and the spatial position information corresponding to the sample image to train the first neural network model.

Optionally, the processing module 502 is configured to use the color feature information and detail feature information of the sample image as true values, and use the spatial position information corresponding to the sample image as input to train the first neural network model.

It should be understood that the apparatus 500 here is embodied in the form of functional modules. The term "module" as used herein may refer to an application specific integrated circuit (ASIC), an electronic circuit, a processor for executing one or more software or firmware programs (eg, a shared processor, a dedicated processor, or a group of processors, etc.) and memory, merge logic, and/or other suitable components to support the described functions. In an optional example, those skilled in the art can understand that the apparatus 500 may be specifically the VR device in the foregoing embodiment, or the functions of the VR device in the foregoing embodiment may be integrated in the apparatus 500, and the apparatus 500 may be used to execute In order to avoid repetition, the various processes and/or steps corresponding to the VR device in the above method embodiments will not be repeated here.

The above-mentioned apparatus 500 has a function of implementing the corresponding steps performed by the VR device in the above-mentioned method; the above-mentioned functions may be implemented by hardware, or by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions.

In the embodiment of the present application, the apparatus 500 in FIG. 5 may also be a chip or a system of chips, such as a system on chip (system on chip, SoC).

FIG. 6 shows another image processing apparatus 600 provided by an embodiment of the present application. The apparatus 600 includes a processor 601 , a transceiver 602 and a memory 603 . The processor 601, the transceiver 602 and the memory 603 communicate with each other through an internal connection path, the memory 603 is used to store instructions, and the processor 601 is used to execute the instructions stored in the memory 603 to control the transceiver 602 to send signals and / or receive signals.

Wherein, the transceiver 602 is used to obtain the spatial position information and view angle information of the picture in the camera; the processor 601 is used to input the spatial position information and the view angle information into the first neural network model, and obtain the spatial position information and the view angle information. The spatial light field feature information corresponding to the position information and the viewing angle information, the spatial light field feature information includes spatial three-dimensional feature information and color feature information, and the first neural network model is used to process the three-dimensional information of the image; The spatial light field feature information is input to a second neural network model to obtain a target image, and the second neural network model is used to restore the color information and detail information of the image.

Optionally, the above-mentioned second neural network model includes an encoding network and a decoding network; the processor 601 is configured to input the above-mentioned spatial light field feature information into the above-mentioned decoding network to obtain the above-mentioned target image.

Optionally, the processor 601 is used to input the color information of the sample image into the encoding network before inputting the above-mentioned spatial position information and the above-mentioned perspective information into the first neural network model, and obtain the color feature information of the above-mentioned sample image. and detail feature information; and use the color feature information and detail feature information of the sample image and the spatial position information corresponding to the sample image to train the first neural network model.

Optionally, the processor 601 is configured to use the color feature information and detail feature information of the sample image as true values, and use the spatial position information corresponding to the sample image as input to train the first neural network model.

It should be understood that the apparatus 600 may be specifically the VR device in the above embodiments, or the functions of the VR device in the above embodiments may be integrated in the apparatus 600, and the apparatus 600 may be used to execute each of the above method embodiments corresponding to the VR device steps and/or processes. Optionally, the memory 603 may include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information. The processor 601 may be configured to execute the instructions stored in the memory, and when the processor executes the instructions, the processor may execute various steps and/or processes corresponding to the VR device in the foregoing method embodiments.

It should be understood that, in this embodiment of the present application, the processor may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs) ), field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In the implementation process, each step of the above-mentioned method can be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software. The steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory, and the processor executes the instructions in the memory, and completes the steps of the above method in combination with its hardware. To avoid repetition, detailed description is omitted here.

Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

An image processing method, comprising:

Obtain the spatial position information and viewing angle information of the picture in the camera;

Inputting the spatial position information and the viewing angle information into a first neural network model to obtain spatial light field feature information corresponding to the spatial position information and the viewing angle information, where the spatial light field feature information includes spatial three-dimensional features information and color feature information, the first neural network model is used to process the three-dimensional information of the image;

The spatial light field feature information is input into a second neural network model to obtain a target image, and the second neural network model is used to restore the color information and detail information of the image.
The method of claim 1, wherein the second neural network model comprises an encoding network and a decoding network;

The inputting the spatial light field feature information into the second neural network model to obtain the target image, including:

The spatial light field feature information is input into the decoding network to obtain the target image.
The method according to claim 2, wherein before the inputting the spatial position information and the viewing angle information into the first neural network model, the method further comprises:

inputting the color information of the sample image into the encoding network to obtain color feature information and detail feature information of the sample image;

The first neural network model is trained by using the color feature information and detail feature information of the sample image and the spatial position information corresponding to the sample image.
The method of claim 3, wherein the training the first neural network model comprises:

The color feature information and detail feature information of the sample image are used as true values, and the spatial position information corresponding to the sample image is used as input to train the first neural network model.
An image processing device, comprising:

The acquisition module is used to acquire the spatial position information and viewing angle information of the picture in the camera;

a processing module, configured to input the spatial position information and the viewing angle information into the first neural network model, and obtain the spatial light field feature information corresponding to the spatial position information and the viewing angle information, the spatial light field feature The information includes spatial three-dimensional feature information and color feature information, and the first neural network model is used to process the three-dimensional information of the image; and the spatial light field feature information is input into the second neural network model to obtain a target image, the said The second neural network model is used to restore the color information and detail information of the image.
The device according to claim 5, wherein the second neural network model comprises an encoding network and a decoding network;

The processing module is specifically used for:

The spatial light field feature information is input into the decoding network to obtain the target image.
The apparatus of claim 6, wherein the processing module is specifically configured to:

Before inputting the spatial position information and the perspective information into the first neural network model, input the color information of the sample image into the encoding network to obtain the color feature information and detail feature information of the sample image;

The first neural network model is trained by using the color feature information and detail feature information of the sample image and the spatial position information corresponding to the sample image.
The apparatus of claim 7, wherein the processing module is specifically configured to:

The color feature information and detail feature information of the sample image are used as true values, and the spatial position information corresponding to the sample image is used as input to train the first neural network model.
An image processing device, characterized in that it comprises: a processor, wherein the processor is coupled to a memory, and the processor is configured to execute instructions stored in the memory to perform the following steps:

Obtain the spatial position information and viewing angle information of the picture in the camera;

Inputting the spatial position information and the perspective information into the first neural network model to obtain spatial light field feature information corresponding to the spatial position information and the perspective information, where the spatial light field feature information includes spatial three-dimensional features information and color feature information, the first neural network model is used to process the three-dimensional information of the image;

And, inputting the spatial light field feature information into a second neural network model to obtain a target image, where the second neural network model is used to restore color information and detail information of the image.
The device according to claim 9, wherein the second neural network model comprises an encoding network and a decoding network;

The processor is specifically used for:

The spatial light field feature information is input to the decoding network to obtain the target image.
The apparatus of claim 10, wherein the processor is specifically configured to:

Before inputting the spatial position information and the perspective information into the first neural network model, input the color information of the sample image into the encoding network to obtain the color feature information and detail feature information of the sample image;

The first neural network model is trained by using the color feature information and detail feature information of the sample image and the spatial position information corresponding to the sample image.
The apparatus of claim 11, wherein the processor is specifically configured to:

The first neural network model is trained by taking the color feature information and detail feature information of the sample image as true values, and using the spatial position information corresponding to the sample image as an input.
A computer-readable storage medium, characterized by being used for storing a computer program, the computer program comprising instructions for implementing the method according to any one of claims 1 to 4.
A chip system, characterized by comprising: a processor for calling and running a computer program from a memory, so that a communication device installed with the chip system executes the method according to any one of claims 1 to 4.