WO2022133944A1 - Procédé de traitement d'images et appareil de traitement d'images - Google Patents
Procédé de traitement d'images et appareil de traitement d'images Download PDFInfo
- Publication number
- WO2022133944A1 WO2022133944A1 PCT/CN2020/139145 CN2020139145W WO2022133944A1 WO 2022133944 A1 WO2022133944 A1 WO 2022133944A1 CN 2020139145 W CN2020139145 W CN 2020139145W WO 2022133944 A1 WO2022133944 A1 WO 2022133944A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- neural network
- network model
- feature information
- image
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 43
- 238000003672 processing method Methods 0.000 title claims abstract description 15
- 238000003062 neural network model Methods 0.000 claims abstract description 146
- 238000000034 method Methods 0.000 claims abstract description 69
- 230000015654 memory Effects 0.000 claims description 35
- 238000012549 training Methods 0.000 claims description 29
- 238000004590 computer program Methods 0.000 claims description 8
- 238000004891 communication Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 description 11
- 238000013528 artificial neural network Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000009877 rendering Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 241000023320 Luma <angiosperm> Species 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000035807 sensation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
Definitions
- the present application relates to the field of image technology, and in particular, to an image processing method and an image processing device.
- VR virtual reality
- 3D three-dimensional light field content
- a method commonly used in the industry is: sampling the 5-dimensional (5-dimensions, 5D) coordinates of the spatial points on the camera rays through a neural ray field (NeRF), The density and color of the image at this coordinate are synthesized based on this coordinate, and the final image is obtained using classical volume rendering techniques.
- this method calculates pixel by pixel, ignoring the correlation between pixels, and has insufficient ability to restore the color and details of the image.
- the present application provides an image processing method and an image processing device, which improve the ability to restore the color and details of the image, thereby effectively improving the image quality.
- an image processing method which is characterized by comprising: acquiring spatial position information and perspective information of a picture in a camera; inputting the above-mentioned spatial position information and the above-mentioned perspective information into a first neural network model to obtain a
- the spatial position information and the spatial light field feature information corresponding to the above-mentioned viewing angle information, the above-mentioned spatial light field feature information includes spatial three-dimensional feature information and color feature information, and the above-mentioned first neural network model is used to process the three-dimensional information of the image; the above-mentioned spatial light field
- the feature information is input to the second neural network model to obtain the target image, and the above-mentioned second neural network model is used to restore the color information and detail information of the image.
- the target image under the current perspective is generated by combining two trained neural network models, the three-dimensional light field information and color feature information of the target image are effectively reconstructed by using the first neural network model, and the The second neural network model effectively restores the color information and detail information of the image, and decouples the task of optimizing color details from the task of generating a spatial light field, thereby improving the ability to restore the color and details of the image and effectively improving the image quality.
- the above-mentioned spatial position information refers to the position of the light in the three-dimensional space, which may be represented by three-dimensional coordinates (x, y, z).
- the viewing angle information refers to the direction in the three-dimensional space of the ray emitted by the above-mentioned light from the above-mentioned spatial position, which can be used ( ⁇ , ) is represented by two parameters.
- the above-mentioned spatial position information and viewing angle information can also be collectively referred to as 5D coordinates, using coordinates (x, y, z, ⁇ , )express.
- the above-mentioned first neural network may also be referred to as a three-dimensional representation network model or a light field reconstruction network model, which is not limited in this embodiment of the present application.
- the above-mentioned second neural network model may be a neural network model for processing image color information and detail information, such as a convolution neural network (CNN) network model.
- CNN convolution neural network
- the second neural network model includes an encoding network and a decoding network; the above-mentioned spatial light field feature information is input into the second neural network model to obtain a target image,
- the method includes: inputting the above-mentioned spatial light field feature information into the decoding network to obtain the target image.
- the spatial light field feature information includes spatial three-dimensional feature information and color feature information
- the VR device realizes that the VR device can detect the color of the target image through the decoding network in the first neural network model and the second neural network model.
- the improvement of the recovery ability of information and detailed information improves the image quality.
- the above-mentioned method before the above-mentioned inputting the above-mentioned spatial position information and the above-mentioned perspective information into the first neural network model, the above-mentioned method further includes: inputting the color information of the sample image into the first neural network model.
- the encoding network obtains the color feature information and detail feature information of the sample image; the first neural network model is trained by using the color feature information and detail feature information of the sample image and the spatial position information corresponding to the sample image.
- the training of the first neural network model by the VR device no longer uses the image as the ground truth reference, but uses the first intermediate representation generated by the encoding network as the ground truth, and uses the spatial position of the corresponding image Information and perspective information are used as input to train the first neural network model network, so that the first neural network model can learn the first intermediate representation, so that the second intermediate representation output by the first neural network is more accurate.
- the VR device can pass the second intermediate representation through the decoding network, and can output higher quality images.
- the above-mentioned training of the first neural network model includes: taking the color feature information and detail feature information of the above-mentioned sample image as true values, and using the corresponding The spatial position information is used as input, and the above-mentioned first neural network model is trained.
- the embodiment of the present application adopts the method of segmented training, and the second neural network model can be trained first, so as to generate the above-mentioned first intermediate representation (high-dimensional feature information containing color details), and then the first intermediate representation is used as the true value to train the first neural network model, so that the first neural network model can learn the implicit representation of the light field, and let the first neural network model output a more accurate intermediate representation, that is, the above-mentioned second intermediate representation.
- the segmentation training method of the embodiment of the present application is easier to converge, and the training efficiency is higher.
- an image processing apparatus comprising: an acquisition module and a processing module; wherein the acquisition module is used for: acquiring spatial position information and viewing angle information of a picture in a camera; The viewing angle information is input into the first neural network model, and the spatial light field feature information corresponding to the spatial position information and the viewing angle information is obtained, and the spatial light field feature information includes spatial three-dimensional feature information and color feature information.
- a neural network model is used to process the three-dimensional information of the image; and the spatial light field feature information is input into a second neural network model to obtain a target image, and the second neural network model is used to restore the color information and details of the image information.
- the above-mentioned second neural network model includes an encoding network and a decoding network; the above-mentioned processing module is specifically configured to: input the above-mentioned spatial light field feature information into the above-mentioned decoding network to obtain The above target image.
- the above-mentioned processing module is specifically configured to: before inputting the above-mentioned spatial position information and the above-mentioned perspective information into the first neural network model, input the color information of the sample image to the first neural network model.
- the encoding network obtains the color feature information and detail feature information of the sample image; and uses the color feature information and detail feature information of the sample image and the spatial position information corresponding to the sample image to train the first neural network model.
- the above-mentioned processing module is specifically configured to: take the color feature information and detail feature information of the above-mentioned sample image as the true value, and take the spatial position information corresponding to the above-mentioned sample image as the input , train the first neural network model above.
- another image processing apparatus comprising: a processor, which is coupled to a memory and can be configured to execute instructions in the memory, so as to implement the method in any possible implementation manner of the first aspect.
- the apparatus further includes a memory.
- the apparatus further includes a communication interface to which the processor is coupled.
- a processor including: an input circuit, an output circuit, and a processing circuit.
- the processing circuit is configured to receive the signal through the input circuit and transmit the signal through the output circuit, so that the processor executes the method in any one of the possible implementation manners of the above first aspect.
- the above-mentioned processor may be a chip
- the input circuit may be an input pin
- the output circuit may be an output pin
- the processing circuit may be a transistor, a gate circuit, a flip-flop, and various logic circuits.
- the input signal received by the input circuit may be received and input by, for example, but not limited to, a receiver
- the signal output by the output circuit may be, for example, but not limited to, output to and transmitted by a transmitter
- the circuit can be the same circuit that acts as an input circuit and an output circuit at different times.
- the embodiments of the present application do not limit the specific implementation manners of the processor and various circuits.
- a processing apparatus including a processor and a memory.
- the processor is configured to read the instructions stored in the memory, so as to execute the method in any one of the possible implementation manners of the first aspect.
- processors there are one or more processors and one or more memories.
- the memory may be integrated with the processor, or the memory may be provided separately from the processor.
- the memory can be a non-transitory memory, such as a read only memory (ROM), which can be integrated with the processor on the same chip, or can be separately set in different On the chip, the embodiment of the present application does not limit the type of the memory and the setting manner of the memory and the processor.
- ROM read only memory
- the processing device in the fifth aspect may be a chip, and the processor may be implemented by hardware or software.
- the processor When implemented by hardware, the processor may be a logic circuit, an integrated circuit, etc.; when implemented by software
- the processor can be a general-purpose processor, which is realized by reading software codes stored in a memory, and the memory can be integrated in the processor or located outside the processor and exist independently.
- a computer program product includes: a computer program (also referred to as code, or instruction), which, when the computer program is executed, enables the computer to execute any one of the above-mentioned first aspects. method in method.
- a computer-readable storage medium stores a computer program (also referred to as code, or instruction) when it is run on a computer, causing the computer to execute the above-mentioned first aspect. method in any of the possible implementations.
- FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the present application
- FIG. 2 is a schematic diagram of an image processing process provided by an embodiment of the present application.
- FIG. 3 is a schematic diagram of a training process of a first neural network model provided by an embodiment of the present application.
- FIG. 4 is a schematic diagram of a training process of a second neural network model provided by an embodiment of the present application.
- FIG. 5 is a schematic block diagram of an image processing apparatus provided by an embodiment of the present application.
- FIG. 6 is a schematic block diagram of another image processing apparatus provided by an embodiment of the present application.
- VR virtual reality
- virtual environment also known as virtual environment, spiritual environment or artificial environment
- VR refers to the use of computers to generate a virtual world that can directly apply visual, auditory and tactile sensations to participants and allow them to observe and operate interactively.
- VR technology is very broad. At present, VR equipment is developing rapidly and is becoming simple, easy to use and popular. However, unlike the rapid development of VR equipment, high-quality VR digital content is very limited. Different from traditionally displayed 2D digital content, in order to enhance the immersive experience (for example, the display content changes with the movement of people), the VR device needs to acquire the 3D light field content of the scene, and the capture of the 3D light field content of the scene needs to be very complicated. The hardware devices limit the flexibility of 3D light field content acquisition.
- the VR device can be based on image-based rendering (IBR) technology, that is, the ability to generate images from different viewing angles or different coordinates.
- IBR image-based rendering
- VR devices can obtain information of the entire scene through IBR, and generate images from any perspective in real time.
- IBR presents two huge challenges.
- IBR needs to reconstruct a 3-dimensional (three dimensions, 3D) model, but the reconstructed 3D model must be detailed enough and show the occlusion relationship of objects in the scene.
- the surface color and material of the object generated by the 3D model need to rely on the representation ability of the input image, but the increase of the input data set will reduce the speed and performance of the model. Therefore, this method has certain requirements on the performance of the VR device, and has insufficient ability to restore the color, details and other information of the image.
- the VR device can use the neural ray field NeRF to synthesize a complex scene representation using a sparse image data set, and then use the 5D coordinates of the spatial points on the camera ray (eg, the spatial position of the camera ray). (x, y, z) and viewing direction ( ⁇ , )) sampling to synthesize the density and color at the corresponding viewing angle. Then, the VR device can use the classic volume rendering technology for the density and color of the new perspective to obtain the image corresponding to the 5D coordinates, so as to continuously represent the new perspective of the entire scene.
- This method uses a fully-connected deep learning network to perform pixel-by-pixel computation on the dataset, and does not use the correlation between pixels. The pixels are isolated from each other, and the ability to restore details in some scenes is insufficient.
- the present application provides an image processing method and an image processing device, by combining two trained neural network models to generate a target image under the current perspective, and using the first neural network model to effectively reconstruct the three-dimensional light of the target image.
- Field information and color feature information the second neural network model is used to effectively restore the color information and detail information of the image, thereby improving the ability to restore the color and detail of the image, and effectively improving the image quality.
- the first, the second, and various numeral numbers are only for the convenience of description, and are not used to limit the scope of the embodiments of the present application.
- the first neural network model, the second neural network model, etc. distinguish different neural networks and the like.
- At least one means one or more, and “plurality” means two or more.
- And/or which describes the association relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A, B can be singular or plural.
- the character “/” generally indicates that the associated objects are an “or” relationship.
- At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
- At least one (a) of a, b and c may represent: a, or b, or c, or a and b, or a and c, or b and c, or a, b and c, wherein a, b, c can be single or multiple.
- the method in the embodiment of the present application may be performed by a VR device provided with a camera, and the VR device may be, for example, VR glasses, a VR headset, etc., which is not limited in the embodiment of the present application.
- FIG. 1 is a schematic flowchart of an image processing method 100 in an embodiment of the present application. As shown in FIG. 1, the method 100 may include the following steps:
- the above-mentioned spatial position information refers to the position of the light in the three-dimensional space, which may be represented by three-dimensional coordinates (x, y, z).
- the viewing angle information refers to the direction in the three-dimensional space of the ray emitted by the above-mentioned light from the above-mentioned spatial position, which can be used ( ⁇ , ) is represented by two parameters.
- the above-mentioned spatial position information and viewing angle information can also be collectively referred to as 5D coordinates, using coordinates (x, y, z, ⁇ , )express.
- S102 Input the above-mentioned spatial position information and the above-mentioned perspective information into the first neural network model, and obtain the spatial light field feature information corresponding to the above-mentioned spatial position information and the above-mentioned perspective information, and the above-mentioned spatial light field feature information includes spatial three-dimensional feature information and color Feature information, the above-mentioned first neural network model is used to process the three-dimensional information of the image.
- first neural network may also be referred to as a three-dimensional representation network model or a light field reconstruction network model, which is not limited in this embodiment of the present application.
- the color feature information included in the above-mentioned spatial light field feature information may be a three-primary color format (red green blue color mode, RGB) or a YUV format, where "Y” represents brightness (luminance or luma), " U” and “V” represent the chrominance (chrominance or chroma), which is used to describe the color and saturation of the image, and is used to specify the color of the pixel.
- the VR device may adopt different processing methods for the spatial light field feature information in different formats.
- the VR device may input the spatial light field feature information including the color feature information in the RGB format into the second neural network. network model to obtain the target image.
- the VR device when the color feature information included in the above-mentioned spatial light field feature information is in YUV format, the VR device can convert the color feature information in YUV format from YUV format to RGB format, and then convert the color feature information including the conversion into RGB format.
- the spatial light field feature information of the latter color feature information is input to the above-mentioned second neural network model to obtain a target image.
- the above-mentioned second neural network model may be a neural network model for processing image color information and detail information, such as a convolution neural network (CNN) network model.
- CNN convolution neural network
- the target image under the current perspective is generated by combining two trained neural network models, the three-dimensional light field information and color feature information of the target image are effectively reconstructed by using the first neural network model, and the The second neural network model effectively restores the color information and detail information of the image, and decouples the task of optimizing color details from the task of generating a spatial light field, thereby improving the ability to restore the color and details of the image and effectively improving the image quality.
- the method of the embodiment of the present application is compared with the second implementation manner in the above-mentioned prior art, and the PSNR of the target image is improved from 32 At 34, the image quality is improved.
- PSNR peak signal-to-noise ratio
- the above-mentioned second neural network model is an RGB network specially processing color detail information
- the above-mentioned first neural network model is a NeRF network specially processing 3D information.
- the above-mentioned second neural network model includes an encoding network and a decoding network; inputting the above-mentioned spatial light field feature information into the second neural network model to obtain a target image includes: inputting the above-mentioned spatial light field characteristic information To the above-mentioned decoding network, the above-mentioned target image is obtained.
- FIG. 2 shows a processing process of the image processing method provided by the embodiment of the present application.
- the VR device can input the spatial position information and perspective information of the above image into the first neural network model to obtain the spatial light field feature information, and then input the spatial light field feature information into the second neural network model
- the decoding network generates the target image.
- the spatial light field feature information includes spatial three-dimensional feature information and color feature information
- the VR device realizes that the VR device can detect the color of the target image through the decoding network in the first neural network model and the second neural network model.
- the improvement of the recovery ability of information and detailed information improves the image quality.
- the use of the neural network model provided by the embodiments of the present application is described in detail above with reference to FIG. 1 and FIG. 2 , and the training process of the neural network model will be described in detail below with reference to FIG. 3 and FIG. 4 .
- the training process includes training of the first neural network model and training of the second neural network model.
- the training of the first neural network model may include: inputting the color information of the sample image into the encoding network of the above-mentioned second neural network model to obtain the color feature information and detail feature information of the above-mentioned sample image; using the color feature information of the above-mentioned sample image. and the detailed feature information and the spatial position information corresponding to the sample image, to train the first neural network model.
- the first intermediate representation contains information such as color, detail, domain, and relevance of the image. For example, the color, texture details, and position information of things in the image, as well as the relationship between colors, details, and positions between different things, and so on.
- the VR device may map the color feature information of the image to a high-dimensional feature space through the encoding network to obtain the first intermediate representation.
- the VR device can use the first intermediate representation and the spatial position information corresponding to the sample image to train the first neural network model and obtain the second intermediate representation.
- the VR device inputs the second intermediate representation into the above-mentioned decoding network to obtain the training result of the sample image.
- the VR device may use the first intermediate representation and the spatial position information corresponding to the sample image to train the first neural network model, so that the first neural network model can learn the parameters in the first intermediate representation.
- the above-mentioned training of the first neural network model includes: taking the color feature information and detail feature information of the above-mentioned sample image as true values, and using the spatial position information corresponding to the above-mentioned sample image as input, training the Describe the first neural network model.
- FIG. 3 shows the training process of the first neural network model provided by the embodiment of the present application.
- the VR device can generate color feature information and detail feature information from the color information of the sample image by inputting the color information of the sample image into the encoding network in the second neural network model. (ie the first intermediate representation described above). Then, the VR device inputs the spatial position information and perspective information of the sample image into the first neural network model, and uses the color feature information and the detail feature information as true values to train the first neural network model, so that the first neural network model Information such as image color, details, domain, and correlation included in the first intermediate representation can be learned, so that the output result of the first neural network model can be close to the true value, thereby completing the training of the first neural network model.
- the first neural network model Information such as image color, details, domain, and correlation included in the first intermediate representation can be learned, so that the output result of the first neural network model can be close to the true value, thereby completing the training of the first neural network model.
- the training of the first neural network model by the VR device no longer uses the image as the ground truth reference, but uses the first intermediate representation generated by the encoding network as the ground truth, and uses the spatial position of the corresponding image Information and perspective information are used as input to train the first neural network model network, so that the first neural network model can learn the first intermediate representation, so that the second intermediate representation output by the first neural network is more accurate.
- the VR device can pass the second intermediate representation through the decoding network, and can output higher quality images.
- the VR device can also Two neural network models are trained.
- FIG. 4 shows the training process of the second neural network model provided by the embodiment of the present application.
- the VR device can input the color information of the sample image into the coding network in the second neural network model, generate the color feature information and detail feature information of the sample image through the coding network, and then use the obtained color feature information and detail feature information are input into the decoding network in the second neural network model, and the decoded image is obtained.
- the embodiment of the present application adopts the method of segmented training, and the second neural network model can be trained first, so as to generate the above-mentioned first intermediate representation (high-dimensional feature information containing color details), and then the first intermediate representation is used as the true value to train the first neural network model, so that the first neural network model can learn the implicit representation of the light field, and let the first neural network model output a more accurate intermediate representation, that is, the above-mentioned second intermediate representation.
- the segmentation training method of the embodiment of the present application is easier to converge, and the training efficiency is higher.
- FIG. 5 shows an image processing apparatus 500 provided by an embodiment of the present application.
- the apparatus 500 includes: an acquisition module 501 and a processing module 502 .
- the acquisition module 501 is used to acquire the spatial position information and the perspective information of the picture in the camera;
- the processing module 502 is used to input the spatial position information and the perspective information into the first neural network model, and obtain and the spatial position information and the perspective information.
- the spatial light field feature information corresponding to the position information and the viewing angle information, the spatial light field feature information includes spatial three-dimensional feature information and color feature information, and the first neural network model is used to process the three-dimensional information of the image;
- the spatial light field feature information is input to a second neural network model to obtain a target image, and the second neural network model is used to restore the color information and detail information of the image.
- the above-mentioned second neural network model includes an encoding network and a decoding network; the processing module 502 is configured to input the above-mentioned spatial light field feature information into the above-mentioned decoding network to obtain the above-mentioned target image.
- the processing module 502 is used to input the color information of the sample image into the encoding network before inputting the above-mentioned spatial position information and the above-mentioned perspective information into the first neural network model, and obtain the color feature information of the above-mentioned sample image. and detail feature information; and use the color feature information and detail feature information of the sample image and the spatial position information corresponding to the sample image to train the first neural network model.
- the processing module 502 is configured to use the color feature information and detail feature information of the sample image as true values, and use the spatial position information corresponding to the sample image as input to train the first neural network model.
- the apparatus 500 here is embodied in the form of functional modules.
- module as used herein may refer to an application specific integrated circuit (ASIC), an electronic circuit, a processor for executing one or more software or firmware programs (eg, a shared processor, a dedicated processor, or a group of processors, etc.) and memory, merge logic, and/or other suitable components to support the described functions.
- ASIC application specific integrated circuit
- the apparatus 500 may be specifically the VR device in the foregoing embodiment, or the functions of the VR device in the foregoing embodiment may be integrated in the apparatus 500, and the apparatus 500 may be used to execute In order to avoid repetition, the various processes and/or steps corresponding to the VR device in the above method embodiments will not be repeated here.
- the above-mentioned apparatus 500 has a function of implementing the corresponding steps performed by the VR device in the above-mentioned method; the above-mentioned functions may be implemented by hardware, or by executing corresponding software by hardware.
- the hardware or software includes one or more modules corresponding to the above functions.
- the apparatus 500 in FIG. 5 may also be a chip or a system of chips, such as a system on chip (system on chip, SoC).
- SoC system on chip
- FIG. 6 shows another image processing apparatus 600 provided by an embodiment of the present application.
- the apparatus 600 includes a processor 601 , a transceiver 602 and a memory 603 .
- the processor 601, the transceiver 602 and the memory 603 communicate with each other through an internal connection path, the memory 603 is used to store instructions, and the processor 601 is used to execute the instructions stored in the memory 603 to control the transceiver 602 to send signals and / or receive signals.
- the transceiver 602 is used to obtain the spatial position information and view angle information of the picture in the camera; the processor 601 is used to input the spatial position information and the view angle information into the first neural network model, and obtain the spatial position information and the view angle information.
- the spatial light field feature information is input to a second neural network model to obtain a target image, and the second neural network model is used to restore the color information and detail information of the image.
- the above-mentioned second neural network model includes an encoding network and a decoding network; the processor 601 is configured to input the above-mentioned spatial light field feature information into the above-mentioned decoding network to obtain the above-mentioned target image.
- the processor 601 is used to input the color information of the sample image into the encoding network before inputting the above-mentioned spatial position information and the above-mentioned perspective information into the first neural network model, and obtain the color feature information of the above-mentioned sample image. and detail feature information; and use the color feature information and detail feature information of the sample image and the spatial position information corresponding to the sample image to train the first neural network model.
- the processor 601 is configured to use the color feature information and detail feature information of the sample image as true values, and use the spatial position information corresponding to the sample image as input to train the first neural network model.
- the apparatus 600 may be specifically the VR device in the above embodiments, or the functions of the VR device in the above embodiments may be integrated in the apparatus 600, and the apparatus 600 may be used to execute each of the above method embodiments corresponding to the VR device steps and/or processes.
- the memory 603 may include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include non-volatile random access memory.
- the memory may also store device type information.
- the processor 601 may be configured to execute the instructions stored in the memory, and when the processor executes the instructions, the processor may execute various steps and/or processes corresponding to the VR device in the foregoing method embodiments.
- the processor may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs) ), field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- CPU Central Processing Unit
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
- each step of the above-mentioned method can be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
- the steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
- the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
- the storage medium is located in the memory, and the processor executes the instructions in the memory, and completes the steps of the above method in combination with its hardware. To avoid repetition, detailed description is omitted here.
- the disclosed system, apparatus and method may be implemented in other manners.
- the apparatus embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
- the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
- the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
- the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
Landscapes
- Engineering & Computer Science (AREA)
- Computer Graphics (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
Procédé de traitement d'images et appareil de traitement d'images, qui améliorent la capacité de récupération d'une couleur d'image et d'un détail, améliorant ainsi efficacement la qualité d'image. Le procédé consiste à : acquérir des informations de position spatiale et des informations d'angle de visualisation d'une image dans une caméra (S101) ; entrer les informations de position spatiale et les informations d'angle de visualisation dans un premier modèle de réseau de neurones artificiels, de façon à obtenir des informations de caractéristique spatiale de champ lumineux correspondant aux informations de position spatiale et aux informations d'angle de visualisation, les informations de caractéristique spatiale de champ lumineux comprenant des informations de caractéristique tridimensionnelle spatiale et des informations de caractéristique de couleur, et le premier modèle de réseau de neurones artificiels étant utilisé pour traiter des informations tridimensionnelles d'une image (S102) ; et entrer les informations de caractéristique spatiale de champ lumineux dans un second modèle de réseau de neurones artificiels, de façon à obtenir une image cible, le second modèle de réseau de neurones artificiels étant utilisé pour restaurer des informations de couleur et des informations de détail d'une image (S103).
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202080107407.8A CN116569218B (zh) | 2020-12-24 | 2020-12-24 | 图像处理方法和图像处理装置 |
PCT/CN2020/139145 WO2022133944A1 (fr) | 2020-12-24 | 2020-12-24 | Procédé de traitement d'images et appareil de traitement d'images |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/139145 WO2022133944A1 (fr) | 2020-12-24 | 2020-12-24 | Procédé de traitement d'images et appareil de traitement d'images |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022133944A1 true WO2022133944A1 (fr) | 2022-06-30 |
Family
ID=82157246
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/139145 WO2022133944A1 (fr) | 2020-12-24 | 2020-12-24 | Procédé de traitement d'images et appareil de traitement d'images |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116569218B (fr) |
WO (1) | WO2022133944A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115272575A (zh) * | 2022-07-28 | 2022-11-01 | 中国电信股份有限公司 | 图像生成方法及装置、存储介质和电子设备 |
CN115714888A (zh) * | 2022-10-09 | 2023-02-24 | 名之梦(上海)科技有限公司 | 视频生成方法、装置、设备与计算机可读存储介质 |
CN116071484A (zh) * | 2023-03-07 | 2023-05-05 | 清华大学 | 一种大场景稀疏光场十亿像素级智能重建方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100189342A1 (en) * | 2000-03-08 | 2010-07-29 | Cyberextruder.Com, Inc. | System, method, and apparatus for generating a three-dimensional representation from one or more two-dimensional images |
CN108510573A (zh) * | 2018-04-03 | 2018-09-07 | 南京大学 | 一种基于深度学习的多视点人脸三维模型重建的方法 |
CN109255843A (zh) * | 2018-09-26 | 2019-01-22 | 联想(北京)有限公司 | 三维重建方法、装置及增强现实ar设备 |
CN110163953A (zh) * | 2019-03-11 | 2019-08-23 | 腾讯科技(深圳)有限公司 | 三维人脸重建方法、装置、存储介质和电子装置 |
CN110400337A (zh) * | 2019-07-10 | 2019-11-01 | 北京达佳互联信息技术有限公司 | 图像处理方法、装置、电子设备及存储介质 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020040521A1 (fr) * | 2018-08-21 | 2020-02-27 | 삼성전자 주식회사 | Procédé de synthèse de vue intermédiaire de champ lumineux, système de synthèse de vue intermédiaire de champ lumineux, et procédé de compression de champ lumineux |
CN111310775B (zh) * | 2018-12-11 | 2023-08-25 | Tcl科技集团股份有限公司 | 数据训练方法、装置、终端设备及计算机可读存储介质 |
CN109903393B (zh) * | 2019-02-22 | 2021-03-16 | 清华大学 | 基于深度学习的新视角场景合成方法和装置 |
CN112017252B (zh) * | 2019-05-31 | 2024-06-11 | 华为技术有限公司 | 一种图像处理方法和相关设备 |
CN111382712B (zh) * | 2020-03-12 | 2023-06-02 | 厦门熵基科技有限公司 | 一种手掌图像识别方法、系统及设备 |
CN112102165B (zh) * | 2020-08-18 | 2022-12-06 | 北京航空航天大学 | 一种基于零样本学习的光场图像角域超分辨系统及方法 |
-
2020
- 2020-12-24 WO PCT/CN2020/139145 patent/WO2022133944A1/fr active Application Filing
- 2020-12-24 CN CN202080107407.8A patent/CN116569218B/zh active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100189342A1 (en) * | 2000-03-08 | 2010-07-29 | Cyberextruder.Com, Inc. | System, method, and apparatus for generating a three-dimensional representation from one or more two-dimensional images |
CN108510573A (zh) * | 2018-04-03 | 2018-09-07 | 南京大学 | 一种基于深度学习的多视点人脸三维模型重建的方法 |
CN109255843A (zh) * | 2018-09-26 | 2019-01-22 | 联想(北京)有限公司 | 三维重建方法、装置及增强现实ar设备 |
CN110163953A (zh) * | 2019-03-11 | 2019-08-23 | 腾讯科技(深圳)有限公司 | 三维人脸重建方法、装置、存储介质和电子装置 |
CN110400337A (zh) * | 2019-07-10 | 2019-11-01 | 北京达佳互联信息技术有限公司 | 图像处理方法、装置、电子设备及存储介质 |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115272575A (zh) * | 2022-07-28 | 2022-11-01 | 中国电信股份有限公司 | 图像生成方法及装置、存储介质和电子设备 |
CN115272575B (zh) * | 2022-07-28 | 2024-03-29 | 中国电信股份有限公司 | 图像生成方法及装置、存储介质和电子设备 |
CN115714888A (zh) * | 2022-10-09 | 2023-02-24 | 名之梦(上海)科技有限公司 | 视频生成方法、装置、设备与计算机可读存储介质 |
CN115714888B (zh) * | 2022-10-09 | 2023-08-29 | 名之梦(上海)科技有限公司 | 视频生成方法、装置、设备与计算机可读存储介质 |
CN116071484A (zh) * | 2023-03-07 | 2023-05-05 | 清华大学 | 一种大场景稀疏光场十亿像素级智能重建方法及装置 |
US11908067B1 (en) | 2023-03-07 | 2024-02-20 | Tsinghua University | Method and device for gigapixel-level light field intelligent reconstruction of large-scale scene |
Also Published As
Publication number | Publication date |
---|---|
CN116569218A (zh) | 2023-08-08 |
CN116569218B (zh) | 2024-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022133944A1 (fr) | Procédé de traitement d'images et appareil de traitement d'images | |
US10474227B2 (en) | Generation of virtual reality with 6 degrees of freedom from limited viewer data | |
US20230377183A1 (en) | Depth-Aware Photo Editing | |
US20230377269A1 (en) | Methods and systems for producing content in multiple reality environments | |
JP7135125B2 (ja) | 近赤外画像の生成方法、近赤外画像の生成装置、生成ネットワークの訓練方法、生成ネットワークの訓練装置、電子機器、記憶媒体及びコンピュータプログラム | |
WO2020192568A1 (fr) | Procédé et appareil de génération d'images faciales, dispositif et support de stockage | |
CN112166604B (zh) | 具有单个rgbd相机的对象的体积捕获 | |
US20220245912A1 (en) | Image display method and device | |
JP7566028B2 (ja) | 多様なポートレートから照明を学習すること | |
US10444931B2 (en) | Vantage generation and interactive playback | |
WO2023207379A1 (fr) | Procédé et appareil de traitement d'images, dispositif et support de stockage | |
US10521892B2 (en) | Image lighting transfer via multi-dimensional histogram matching | |
WO2023241459A1 (fr) | Procédé et système de communication de données, dispositif électronique et support de stockage | |
CN110533773A (zh) | 一种三维人脸重建方法、装置及相关设备 | |
CN116740261B (zh) | 图像重建方法和装置、图像重建模型的训练方法和装置 | |
KR20230149093A (ko) | 영상 처리 방법, 영상 처리를 위한 트레이닝 방법, 및 영상 처리 장치 | |
CN109658488B (zh) | 一种虚实融合系统中通过可编程gpu加速解码摄像头视频流的方法 | |
WO2024119997A1 (fr) | Procédé et appareil d'estimation d'éclairage | |
WO2021173489A1 (fr) | Appareil, procédé et système de fourniture de texture tridimensionnelle utilisant une représentation uv | |
Li et al. | Dynamic View Synthesis with Spatio-Temporal Feature Warping from Sparse Views | |
WO2024077791A1 (fr) | Procédé et appareil de génération vidéo, dispositif et support de stockage lisible par ordinateur | |
CN116630744A (zh) | 图像生成模型训练方法及图像生成方法、装置及介质 | |
Bai et al. | Local-to-Global Panorama Inpainting for Locale-Aware Indoor Lighting Prediction | |
Han et al. | Learning residual color for novel view synthesis | |
Zhang et al. | Survey on controlable image synthesis with deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20966516 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202080107407.8 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20966516 Country of ref document: EP Kind code of ref document: A1 |