CN114693551A

CN114693551A - Image processing method, device, equipment and readable storage medium

Info

Publication number: CN114693551A
Application number: CN202210300650.2A
Authority: CN
Inventors: 袁红亮; 张博宇; 王珏
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2022-07-01

Abstract

The application discloses an image processing method, an image processing device, image processing equipment and a readable storage medium, wherein the method comprises the following steps: acquiring a target image and a historical image in a target image sequence; acquiring target image correlation characteristics corresponding to a target image and historical image correlation characteristics corresponding to a historical image; generating a first mixed embedding feature aiming at the target image according to the target image association feature, generating a second mixed embedding feature aiming at the historical image according to the target image association feature and the historical image association feature, and performing feature accumulation on the target image and the historical image based on the first mixed embedding feature and the second mixed embedding feature to obtain a target accumulated feature corresponding to the target image; and determining a reconstruction result image corresponding to the target image according to the target image association characteristic, the target accumulation characteristic and the historical image. By the method and the device, the image quality of the reconstructed image can be improved in the image rendering service.

Description

Image processing method, device, equipment and readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method, an image processing apparatus, an image processing device, and a readable storage medium.

Background

With the rapid development of computer technology, game production, animation production and virtual reality technology are gradually emerging, and the requirement for rendering images (reconstructed noiseless images) of virtual scenes (such as game scenes, animation scenes, social scenes and the like) is higher and higher. However, in actual rendering, the rendered image is often very noisy, and thus the image needs to be denoised in an image space and the rendering result needs to be reconstructed.

The existing denoising (i.e. removing noise in an image) and reconstruction methods are directed to an image with a high Sampling Per Pixel (SPP) (e.g. SPP is greater than 1), and some neural network models (e.g. transform network models) are mainly used to denoise a rendered image. For example, in the prior art, a scene rendering image with a high SPP is mainly input into a neural network model, then, features of a current scene image are extracted through the neural network model, and then, inference is performed based on the extracted features, and a noise-free reconstructed image corresponding to the current image is output.

It can be seen that the existing method has strong pertinence, and for the image with low SPP, the image quality obtained by the existing method is not high (the noise removing effect is not ideal); meanwhile, the features of the current image extracted through automatic learning of the neural network model are single, only one side of the current image can be represented, and the quality of the image output based on the simple features is probably unable to meet the image quality requirement.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, image processing equipment and a readable storage medium, which can improve the image quality of a reconstructed image in an image rendering service.

An embodiment of the present application provides an image processing method, including:

acquiring a target image and a historical image in a target image sequence; the target image sequence is an image sequence obtained after image acquisition is carried out on a target scene based on the sampling number of target unit pixels; the historical image is the last image of the target image in the target image sequence;

acquiring target image correlation characteristics corresponding to a target image and historical image correlation characteristics corresponding to a historical image;

generating a first mixed embedding feature aiming at the target image according to the target image association feature, generating a second mixed embedding feature aiming at the historical image according to the target image association feature and the historical image association feature, and performing feature accumulation on the target image and the historical image based on the first mixed embedding feature and the second mixed embedding feature to obtain a target accumulated feature corresponding to the target image;

and determining a reconstruction result image corresponding to the target image according to the target image correlation characteristic, the target accumulation characteristic and the historical image.

An aspect of the present embodiment provides another image processing method, including:

acquiring a target sample noise image and a historical sample noise image in a sample image sequence; the sample image sequence is an image sequence obtained by carrying out image acquisition on a sample scene based on the first unit pixel sampling number; the historical sample noise image is a previous sample noise image of the target sample noise image in the sample image sequence; the first unit pixel sampling number is less than the unit pixel sampling number threshold;

acquiring target sample image correlation characteristics corresponding to the target sample noise images and historical sample image correlation characteristics corresponding to the historical sample noise images;

generating a first sample mixed embedding feature aiming at the target sample noise image according to the target sample image association feature, generating a second sample mixed embedding feature aiming at the historical sample noise image according to the target sample image association feature and the historical sample image association feature, and performing feature accumulation on the target sample noise image and the historical sample noise image based on the first sample mixed embedding feature and the second sample mixed embedding feature to obtain a target sample accumulated feature corresponding to the target sample noise image;

determining target sample fusion cascade characteristics corresponding to the target sample noise images according to the target sample image correlation characteristics, the target sample accumulation characteristics and the historical sample noise images;

inputting the target sample fusion cascade characteristic into an image reconstruction model, and outputting a target sample reconstruction image corresponding to a target sample noise image according to the target sample fusion cascade characteristic in the image reconstruction model;

acquiring a target label sampling image corresponding to the target sample noise image; the target label sampling image is an image obtained after image acquisition is carried out on a sample scene based on the second unit pixel sampling number; the first unit pixel sampling number is less than the second unit pixel sampling number;

and training the image reconstruction model according to the target label sampling image and the target sample reconstruction image to obtain a target image reconstruction model for performing image reconstruction processing on the target image in the target image sequence.

An aspect of an embodiment of the present application provides an image processing apparatus, including:

the image acquisition module is used for acquiring a target image and a historical image in a target image sequence; the target image sequence is an image sequence obtained after image acquisition is carried out on a target scene based on the sampling number of target unit pixels; the historical image is the last image of the target image in the target image sequence;

the characteristic acquisition module is used for acquiring target image correlation characteristics corresponding to a target image and historical image correlation characteristics corresponding to a historical image;

the feature generation module is used for generating a first mixed embedding feature aiming at the target image according to the target image correlation feature;

the characteristic generating module is further used for generating a second mixed embedding characteristic aiming at the historical image according to the target image correlation characteristic and the historical image correlation characteristic;

the feature accumulation module is used for performing feature accumulation on the target image and the historical image based on the first mixed embedding feature and the second mixed embedding feature to obtain a target accumulated feature corresponding to the target image;

and the image reconstruction module is used for determining a reconstruction result image corresponding to the target image according to the target image correlation characteristic, the target accumulation characteristic and the historical image.

In one embodiment, the target image association features include a first normal dimension feature and a first depth dimension feature corresponding to the target image;

the feature generation module includes:

a first feature input unit, configured to input a first normal dimension feature and a first depth dimension feature to the time sequence accumulation model;

the first feature convolution unit is used for performing convolution processing on the first normal dimension feature and the first depth dimension feature through a first convolution network layer of the time sequence accumulation model to obtain a first image embedding feature corresponding to the target image;

the first feature convolution unit is further used for performing convolution processing on the first normal dimension feature and the first depth dimension feature through a second convolution network layer of the time sequence accumulation model to obtain a second image embedding feature corresponding to the target image;

and the first feature operation unit is used for carrying out pixel multiplication operation processing on the first image embedding feature and the second image embedding feature to obtain a first mixed embedding feature aiming at the target image.

In one embodiment, the target image association features comprise first normal dimension features and first depth dimension features corresponding to the target image, and the historical image association features comprise second normal dimension features and second depth dimension features corresponding to the historical image;

the feature generation module includes:

the feature transformation unit is used for performing affine transformation on the second normal dimension feature and the second depth dimension feature respectively to obtain a normal transformation feature corresponding to the second normal dimension feature and a depth transformation feature corresponding to the second depth dimension feature;

the second feature input unit is used for inputting the first normal dimension feature, the first depth dimension feature, the normal transformation feature and the depth transformation feature into the time sequence accumulation model;

the second feature convolution unit is used for performing convolution processing on the first normal dimension feature and the first depth dimension feature through a first convolution network layer of the time sequence accumulation model to obtain a target image embedding feature corresponding to the target image;

the second feature convolution unit is also used for performing convolution processing on the normal transformation feature and the depth transformation feature through a second convolution network layer of the time sequence accumulation model to obtain a historical image embedding feature corresponding to the historical image;

and the second feature operation unit is used for carrying out pixel multiplication operation processing on the target image embedding feature and the historical image embedding feature to obtain a second mixed embedding feature aiming at the historical image.

In one embodiment, the feature accumulation module comprises:

the fusion coefficient determining unit is used for acquiring a logistic regression function and determining a first feature fusion coefficient aiming at the target image through the logistic regression function and the first mixed embedding feature;

the fusion coefficient determining unit is further used for determining a second feature fusion coefficient aiming at the historical image through a logistic regression function and the second mixed embedding feature;

and the feature accumulation unit is used for performing feature accumulation on the target image and the historical image according to the first feature fusion coefficient and the second feature fusion coefficient to obtain a target accumulated feature corresponding to the target image.

In one embodiment, the feature accumulation unit includes:

the image characteristic acquiring subunit is used for acquiring a target noise image characteristic, a target shadow characteristic and a target albedo characteristic corresponding to the target image, and acquiring a historical accumulated noise image characteristic, a historical accumulated shadow characteristic and a historical accumulated albedo characteristic corresponding to the historical image;

the shadow accumulation subunit is used for accumulating the target shadow features and the historical accumulated shadow features according to the first feature fusion coefficient and the second feature fusion coefficient to obtain target accumulated shadow features corresponding to the target image;

the noise accumulation subunit is used for accumulating the target noise image characteristics and the historical noise image characteristics according to the first characteristic fusion coefficient and the second characteristic fusion coefficient to obtain target accumulated noise image characteristics corresponding to the target acoustic image;

the albedo accumulation subunit is used for accumulating the target albedo characteristic and the historical accumulated albedo characteristic according to the first characteristic fusion coefficient and the second characteristic fusion coefficient to obtain a target accumulated albedo characteristic corresponding to the target image;

and the cumulative characteristic determining subunit is used for determining the target cumulative shadow characteristic, the target cumulative noise image characteristic and the target cumulative albedo characteristic as the target cumulative characteristic corresponding to the target image.

In one embodiment, the image reconstruction module comprises:

the cascade characteristic determining unit is used for determining a target fusion cascade characteristic corresponding to the target image according to the target image correlation characteristic, the target accumulation characteristic and the historical image;

and the reconstructed image determining unit is used for inputting the target fusion cascade characteristic into the target image reconstruction model and outputting a reconstruction result image corresponding to the target image according to the target fusion cascade characteristic in the target image reconstruction model.

In one embodiment, the target image association feature comprises a first normal dimension feature and a first depth dimension feature; the historical image association features comprise second normal dimension features and second depth dimension features corresponding to the historical images;

the cascade characteristic determination unit includes:

the characteristic fusion subunit is used for acquiring auxiliary characteristics corresponding to the target image; the assistant features comprise first normal dimension features and first depth dimension features;

the feature fusion subunit is further configured to determine, as a remaining dimension feature, a dimension feature, except the first normal dimension feature and the first depth dimension feature, in the auxiliary features;

the feature fusion subunit is further configured to perform feature fusion on the target cumulative feature, the first normal dimension feature, the first depth dimension feature and the remaining dimension feature to obtain an image fusion feature corresponding to the target image;

the feature transformation processing subunit is configured to perform affine transformation on the second normal dimension feature and the second depth dimension feature respectively to obtain a normal transformation feature corresponding to the second normal dimension feature and a depth transformation feature corresponding to the second depth dimension feature;

and the characteristic cascading subunit is used for cascading the image fusion characteristic, the normal transformation characteristic and the depth transformation characteristic to obtain a target fusion cascading characteristic corresponding to the target image.

In one embodiment, the reconstructed image determination unit comprises:

the parameter output subunit is used for determining a predicted high-quality image, a predicted low-quality image, a first image reconstruction parameter and a second image reconstruction parameter corresponding to the target fusion cascade feature through a target image reconstruction model; predicting that the high-quality image does not contain noise, wherein the resolution of the predicted high-quality image is greater than that of the predicted low-quality image;

the image fusion subunit is used for carrying out image fusion on the predicted high-quality image and the predicted low-quality image in the target image reconstruction model through the first image reconstruction parameter to obtain an initial reconstruction image corresponding to the target image;

and the image fusion subunit is further configured to acquire a historical reconstruction image corresponding to the historical image, and perform image fusion on the initial reconstruction image and the historical reconstruction image through the second image reconstruction parameter to obtain a reconstruction result image corresponding to the target image.

An aspect of an embodiment of the present application provides another image processing apparatus, including:

the sample image acquisition module is used for acquiring a target sample noise image and a historical sample noise image in the sample image sequence; the sample image sequence is an image sequence obtained by carrying out image acquisition on a sample scene based on the first unit pixel sampling number; the historical sample noise image is the last sample noise image of the target sample noise image in the sample image sequence; the first unit pixel sampling number is less than the unit pixel sampling number threshold;

the sample characteristic acquisition module is used for acquiring target sample image correlation characteristics corresponding to the target sample noise images and historical sample image correlation characteristics corresponding to the historical sample noise images;

the sample feature generation module is used for generating a first sample mixed embedding feature aiming at the target sample noise image according to the target sample image correlation feature;

the sample feature generation module is further used for generating a second sample mixed embedding feature aiming at the historical sample noise image according to the target sample image correlation feature and the historical sample image correlation feature;

the sample feature accumulation module is used for carrying out feature accumulation on the target sample noise image and the historical sample noise image based on the first sample mixed embedding feature and the second sample mixed embedding feature to obtain a target sample accumulated feature corresponding to the target sample noise image;

the sample characteristic cascading module is used for determining target sample fusion cascading characteristics corresponding to the target sample noise images according to the target sample image correlation characteristics, the target sample accumulation characteristics and the historical sample noise images;

the sample reconstruction image output module is used for inputting the target sample fusion cascade characteristics into an image reconstruction model, and outputting a target sample reconstruction image corresponding to the target sample noise image according to the target sample fusion cascade characteristics in the image reconstruction model;

the label image acquisition module is used for acquiring a target label sampling image corresponding to the target sample noise image; the target label sampling image is an image obtained after image acquisition is carried out on a sample scene based on the second unit pixel sampling number; the first unit pixel sampling number is less than the second unit pixel sampling number;

and the model training module is used for training the image reconstruction model according to the target label sampling image and the target sample reconstruction image to obtain a target image reconstruction model for performing image reconstruction processing on the target image in the target image sequence.

In one embodiment, the model training module comprises:

the to-be-operated image determining unit is used for determining a sample reconstructed image corresponding to the residual sample noise image in the sample image sequence as a to-be-operated reconstructed image and determining a label sampling image corresponding to the residual sample noise image as a to-be-operated label sampling image; the residual sample noise image is a sample noise image except the target sample noise image in the sample image sequence;

the loss value determining unit is used for determining a target loss value aiming at the image reconstruction model according to the reconstructed image to be operated, the label sampling image to be operated, the target label sampling image and the target sample reconstructed image;

and the model training unit is used for training the image reconstruction model according to the target loss value to obtain the target image reconstruction model.

In one embodiment, the loss value determination unit includes:

the loss value determining subunit is used for determining a spatial loss value for the image reconstruction model according to the reconstructed image to be operated, the label sampled image to be operated, the target label sampled image and the target sample reconstructed image;

the loss value determining subunit is further configured to determine a time sequence loss value for the image reconstruction model according to the to-be-computed reconstructed image, the to-be-computed label sampled image, the target label sampled image and the target sample reconstructed image;

the loss value determining subunit is further configured to determine a relative edge loss value for the image reconstruction model according to the to-be-computed reconstructed image, the to-be-computed label sampled image, the target label sampled image and the target sample reconstructed image;

and the target loss value determining subunit is further used for determining a target loss value for the image reconstruction model according to the spatial loss value, the time sequence loss value and the relative edge loss value.

In an embodiment, the loss value determining subunit is further specifically configured to obtain a spatial loss function, and determine a first spatial sub-loss value for the residual sample noise image according to the spatial loss function, the reconstructed image to be computed, and the tag sample image to be computed;

the loss value determining subunit is further specifically configured to determine a second spatial sub-loss value for the target sample noise image according to the spatial loss function, the target label sampled image, and the target sample reconstructed image;

and the loss value determining subunit is further specifically configured to fuse the first spatial sub-loss value and the second spatial sub-loss value to obtain a spatial loss value of the image reconstruction model.

In one embodiment, the loss value determination subunit is further specifically configured to obtain a first training weight for the residual sample noise image and a second training weight for the target sample noise image;

the loss value determining subunit is further specifically configured to perform operation processing on the first training weight and the first spatial sub-loss value to obtain a first operation spatial sub-loss value;

the loss value determining subunit is further specifically configured to perform operation processing on the second training weight and the second spatial sub-loss value to obtain a second operation spatial sub-loss value;

and the loss value determining subunit is further specifically configured to add the first operation space sub-loss value and the second operation space sub-loss value to obtain a space loss value of the image reconstruction model.

In one embodiment, the target sample noise image is the last sample noise image in the sequence of sample images;

the target loss value determining subunit is further specifically configured to obtain a history sample reconstructed image and a history label sampling image corresponding to the history sample noise image, and determine an affine transformation loss value for the image reconstruction model according to the history sample reconstructed image, the target sample reconstructed image, the history label sampling image and the target label sampling image;

the target loss value determining subunit is further specifically configured to obtain a target sample albedo feature corresponding to the target sample noise image, obtain, in the target sample accumulation feature, a target sample accumulation albedo feature corresponding to the target sample noise image, and determine an albedo loss value for the image reconstruction model according to the albedo loss function, the target sample albedo feature, and the target sample accumulation albedo feature;

and the target loss value determining subunit is further specifically configured to perform loss value fusion on the spatial loss value, the timing loss value, the relative edge loss value, the affine transformation loss value and the albedo loss value to obtain a target loss value of the image reconstruction model.

An aspect of an embodiment of the present application provides a computer device, including: a processor and a memory;

the memory stores a computer program that, when executed by the processor, causes the processor to perform the method in the embodiments of the present application.

An aspect of the embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, where the computer program includes program instructions, and the program instructions, when executed by a processor, perform the method in the embodiments of the present application.

In one aspect of the application, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided by one aspect of the embodiments of the present application.

In the embodiment of the application, for a target image sequence obtained after image acquisition is performed on a target scene by using a target unit pixel sampling number, no matter the target image sequence is a high pixel sampling number or a low pixel sampling number, a mixed embedding feature of two images can be calculated based on image association features of two images (which can be called as a target image and a history image, wherein the target image can be a current image, and the history image is an image previous to the current image) in the target image sequence, and feature accumulation is performed on the two images based on the mixed embedding feature, so that an accumulated feature can be obtained as a target accumulated feature corresponding to the target image; and then, according to the image association characteristic, the target accumulation characteristic and the historical image of the target image, a reconstructed image corresponding to the target image can be determined. It can be understood that according to the method and the device, mixed embedding features corresponding to the two images respectively can be calculated according to the correlation features of the current image and the previous frame image, feature accumulation is carried out on the two images based on the mixed embedding features, the obtained target accumulated features comprise features obtained after the two frames are fused, the fused features can represent the correlation of the two frames, and the feature inclusion dimensionality is wider; meanwhile, when a reconstruction result image of the target image is determined, the target image correlation characteristic of the target image, the target accumulation characteristic and the previous frame image (historical image) are determined together, when the reconstruction result image of the target image is determined, not only the characteristic of the current frame but also the characteristic of the historical frame which is earlier in time are considered, the time domain information of the current frame can be fully utilized, the obtained reconstruction result image has higher time domain stability, and the image quality is higher. In summary, no matter the image is obtained by high pixel sampling number or low pixel sampling number, the method and the device can perform feature accumulation according to the image association features of two frames of images in front and back in the target image sequence, and determine the reconstruction result image of the target image together through the accumulated features and the historical image, so that the method and the device have universality; by calculating the accumulated characteristics of the two frames before and after the image reconstruction method based on the accumulated characteristics, the time domain information of the image can be fully utilized, and therefore the image quality of the reconstructed image can be well improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a diagram of a network architecture provided by an embodiment of the present application;

fig. 2 is a schematic view of a scene for reconstructing an image according to an embodiment of the present application;

fig. 3 is a flowchart of a method of an image processing method according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of model training provided by an embodiment of the present application;

FIG. 5 is a diagram of a system architecture provided by an embodiment of the present application;

fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The present application relates to artificial intelligence and other related technologies, and for ease of understanding, the following description will give priority to the description of related concepts such as artificial intelligence.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The scheme provided by the embodiment of the application belongs to Computer Vision technology (CV) and Machine Learning (ML) belonging to the field of artificial intelligence.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to using a camera and a Computer to perform machine Vision such as identification and measurement on a target, and further performing image processing, so that the Computer processing becomes an image more suitable for observation or transmission to an instrument for detection. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronized positioning and mapping, among other technologies.

Machine Learning (ML) is a multi-domain cross subject, and relates to multi-domain subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

Furthermore, before further detailed description of the embodiments of the present application, some terms and expressions referred to in the embodiments of the present application will be described below, and the terms and expressions referred to in the embodiments of the present application are applicable to the following explanation.

1) Three-dimensional scene: based on a scene in a three-dimensional space constructed by a three-dimensional modeling technology, objects (i.e., three-dimensional objects) in the three-dimensional scene can be described by three-dimensional coordinates. The three-dimensional coordinates may refer to coordinates in a three-dimensional coordinate system including an x-axis, a y-axis, and a z-axis. In some embodiments, the three-dimensional scene may be a virtual scene (or called three-dimensional virtual scene), and the virtual scene is a scene different from the real world output by an electronic device, and a visual perception of the virtual scene can be formed by naked eyes or assistance of a specific device. The virtual scene can be a simulation environment of a real world, a semi-simulation semi-fictional virtual environment, or a pure fictional virtual environment.

2) Rendering (Render): in the embodiments of the present application, the process of rendering an image to a human-machine interaction interface (such as a human-machine interaction interface provided by a browser) to present the image in the human-machine interaction interface is referred to.

3) Pixel (Pixel): refers to elements in the image that are not continuously segmentable.

4) An image channel: for describing pixels in an image, in embodiments of the present application, an image channel may include at least one of a color channel and a transparency channel (which may also be referred to as an Alpha channel). The color channels are used to describe colors of pixels, and there are differences between the color channels in different color spaces (also called color modes), for example, the color channels in the RGB color space include a Red (Red) channel, a Green (Green) channel, and a blue (blue) channel; also for example, color channels in the HSL color space include a Hue (Hue) channel, a Saturation (Saturation) channel, and a brightness (Lightness) channel. The transparency channel is used to describe the transparency of the pixel.

Referring to fig. 1, fig. 1 is a diagram of a network architecture according to an embodiment of the present disclosure. As shown in fig. 1, the network architecture may include a service server 1000 and a terminal device cluster (i.e., a terminal device cluster). The terminal device cluster may include one or more terminal devices, and the number of terminal devices is not limited herein. As shown in fig. 1, the plurality of terminal devices may specifically include a terminal device 100a, a terminal device 100b, terminal devices 100c, …, and a terminal device 100 n. As shown in fig. 1, the terminal device 100a, the terminal device 100b, the terminal devices 100c, …, and the terminal device 100n may respectively perform a network connection with the service server 1000, so that each terminal device may perform data interaction with the service server 1000 through the network connection. The network connection here is not limited to a connection manner, and may be directly or indirectly connected through a wired communication manner, may be directly or indirectly connected through a wireless communication manner, and may also be connected through another manner, which is not limited herein.

Each terminal device shown in fig. 1 may be integrally installed with a target application, and when the target application runs in each terminal device, the background server corresponding to each terminal device may store service data in the application, and perform data interaction with the service server 1000 shown in fig. 1. The target application may include an application having a function of displaying data information such as text, images, audio, and video. For example, the application may be a multimedia application (e.g., a video application), and may be used for a user to upload a picture or a video, and may also be used for the user to play and view an image or a video uploaded by another person; the application may also be an entertainment-type application (e.g., a gaming application) that may be used for a user to play a game. The application may also be other applications with data information processing functions, such as a browser application, a real scene simulation application, a social application, and the like, and the applications will not be illustrated here. The target application may also be an applet, that is, a program independent program that can be executed only by downloading to a browser environment, and of course, the target application may be an independent application, or may also be a sub-application (e.g., an applet) embedded in an application, and the sub-application may be executed or closed by a user control. In general, the target application may be any form of application, module, or plug-in. For the game application, it may be any one of a First Person game (such as a First Person shooter game, First-Person Shooting, FPS), a third Person game, a Multiplayer Online tactical sports game (MOBA), and a Multiplayer live game, and the like, which is not limited herein.

In the embodiment of the present application, one terminal device may be selected from a plurality of terminal devices as a target terminal device, and the terminal device may include: the smart terminal may be, but is not limited to, a smart terminal carrying a data processing function (e.g., an image data processing function) such as a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart television, a smart speaker, a desktop computer, a smart watch, and a smart car. For example, the terminal device 100a shown in fig. 1 may be used as the target terminal device, and the target terminal device may integrate the target application, and at this time, the target terminal device may perform data interaction with the service server 1000.

The service server 1000 in the present application may interact with a terminal device (e.g., a target terminal device) through these applications. For example, after the target terminal device performs data acquisition processing on the three-dimensional scene to obtain data to be rendered, the service server 1000 may store the data to be rendered into an image buffer of the service server 1000. Then, the service server 1000 may send the data to be rendered to the target terminal device, the target terminal device may perform rendering processing on the received data to be rendered, the service server 1000 may send the rendered image obtained after the rendering processing to the target terminal device, and the target terminal device may perform noise removal and reconstruction processing on the received rendered image, so as to present a reconstructed image in a human-computer interaction interface of the target terminal device, and a user may view the reconstructed image through the human-computer interaction interface. Of course, the service server 1000 may also perform noise removal and reconstruction processing on the rendered image after the rendering processing, and then send the reconstructed image to the target terminal device, so that the target terminal device may directly present the reconstructed image in the human-computer interaction interface without performing reconstruction processing again.

In some embodiments, the terminal device may calculate data required for display through image calculation hardware (e.g., a GPU), complete loading, parsing, rendering, and reconstruction of display data (e.g., a scene image), and output an image capable of forming a visual perception of a three-dimensional scene through image output hardware (e.g., a screen), for example, render and reconstruct the scene image on a display screen of a smartphone. Taking the target terminal device as an example, after the target terminal device acquires and processes an image of a three-dimensional scene to obtain data to be rendered, the target terminal device may perform rendering and reconstruction on the data to be rendered, and present a reconstructed scene image (i.e., a reconstructed image) in a human-computer interaction interface.

In some embodiments, the service server 1000 or the terminal device may perform data acquisition processing on the three-dimensional scene when the three-dimensional scene meets the visual change condition, so as to obtain data to be rendered of the three-dimensional scene. The visual change condition is not limited, and for example, the visual change condition may be determined to be satisfied when a trigger operation for a three-dimensional scene is received in the human-computer interaction interface. When the visual change condition is satisfied, the terminal device may perform data acquisition processing on the three-dimensional scene being presented, or may also transmit the relevant data of the three-dimensional scene being presented to the service server 1000, so that the service server performs data acquisition processing on the three-dimensional scene being presented. The trigger operation here may be a contact operation, such as a click operation or a long-time press operation; and may also be a non-contact operation such as a voice input operation or a gesture input operation, etc. That is, the data to be rendered (also referred to as service data) may be service data triggered by a user through a trigger operation in the target application. For example, when a user uses a target application (e.g., a game application) in a target terminal device, the user generates a trigger operation for opening a parachute control in the game application, the trigger operation satisfies a visual change condition, and the terminal device may obtain service data triggered by opening the parachute control according to the trigger action for opening the parachute control, where the service data is to-be-rendered data corresponding to opening the parachute control in the game scene. The terminal device or the service server may perform rendering processing on the image to obtain a rendering layer (which may also be referred to as a rendering image, for example, an image after the parachute is opened) with a certain image quality.

It can be understood that a group of rendered image sequences can be generally obtained through image rendering, and when a current frame in the image sequences is reconstructed, a reconstruction result image of the current frame can be determined according to the current frame and a previous frame in the present application, and a specific implementation manner of the present application can be referred to in the following description in an embodiment corresponding to fig. 3.

It is understood that the method provided by the embodiment of the present application may be executed by a computer device, which includes but is not limited to a terminal device or a service server. The service server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, big data and artificial intelligence platforms.

The terminal device and the service server may be directly or indirectly connected through wired or wireless communication, which is not limited herein.

Alternatively, it is understood that the computer device (the service server 1000, the terminal device 100a, the terminal device 100b, and the like) may be a node in a distributed system, wherein the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication mode. The P2P Protocol is an application layer Protocol operating on a Transmission Control Protocol (TCP). In a distributed system, any form of computer device, such as a business server, an electronic device such as a terminal device, etc., may become a node in the blockchain system by joining the peer-to-peer network. For ease of understanding, the concept of blockchains will be explained below: the block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm, and is mainly used for sorting data according to a time sequence and encrypting the data into an account book, so that the data cannot be falsified or forged, and meanwhile, the data can be verified, stored and updated. When the computer device is a blockchain node, due to the property that the blockchain cannot be tampered and the anti-counterfeiting property, the data (such as the collected data to be rendered, the rendered scene image, the reconstructed image and the like) in the application can have authenticity and safety, and therefore the obtained result is more reliable after the relevant data processing is performed on the basis of the data.

The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent traffic, driving assistance and the like. For ease of understanding, please refer to fig. 2 together, and fig. 2 is a schematic view of a scene for reconstructing an image according to an embodiment of the present application. The service server 2000 shown in fig. 2 may be the service server 1000 shown in fig. 1, and the terminal device 2a shown in fig. 2 may be any one terminal device selected from the terminal device cluster in the embodiment corresponding to fig. 1, for example, the terminal device may be the terminal device 100 b.

As shown in fig. 2, a user a may be a target user, and the user a may start running a target application (here, taking a multiplayer competitive game application as an example) through a terminal device 2a, and the terminal device 2a needs to present a game scene screen of the multiplayer competitive game (which will be referred to as a target game hereinafter) in a display interface. As shown in fig. 2, when the terminal device 2a collects and renders the data of the scene of the target game, the rendered images are usually accompanied by noise, and in order to improve the quality of the game image to be presented, a series of image processing such as noise removal processing and reconstruction of the rendering result may be performed on the rendered images. In this embodiment, the terminal device 2a may send the rendered image sequence to the service server 2000, and the service server 2000 performs noise removal, reconstruction, and other processing on the images.

As shown in fig. 2, the image sequence 200 may be an image sequence obtained by the terminal device 2a after performing image acquisition and rendering on the target game scene, the terminal device 2a may send the image sequence 200 to the service server 2000, and the service server 2000 may perform reconstruction processing on the image sequence. When each image in the image sequence is subjected to reconstruction processing, a reconstruction result image of the image can be jointly predicted and output based on the image characteristics of the image and the image of the previous image of the image. For the sake of understanding, when a certain image is subjected to reconstruction processing, the image may be referred to as a current frame (or referred to as a current image, and may also be referred to as a target image), and an image that is previous to the current frame in the image sequence may be referred to as a history frame (or referred to as a history image).

As shown in fig. 2, taking an image 20a in the image sequence 200 as a target image and a previous image 20b of the image 20b as a history image as an example, a specific process for predicting and outputting a reconstruction result image of the target image 20a may be as follows. First, the service server 2000 may acquire an image-related feature of the target image 20a (which may be referred to as a target image-related feature). It can be understood that, when an image is acquired and rendered in a target game scene, in an image rendering process, auxiliary features (specifically, may include dimensional features such as a normal feature, a depth feature, an albedo feature (albedo), a motion vector feature, a metal degree feature, and roughness) corresponding to each image may be simultaneously obtained, where the image-related feature may refer to a partial feature (such as a normal dimensional feature and a depth dimensional feature) in the obtained auxiliary features in the image rendering process, and then the target image-related feature may specifically include a normal dimensional feature and a depth dimensional feature corresponding to the target image 20 a. The service server 2000 may also obtain image association features of the historical image 20b (for convenience of distinction, the image association features may be referred to as historical image association features), and similarly, the historical image association features may specifically include normal dimension features and depth dimension features corresponding to the historical image 20 b.

Further, the service server 2000 may generate a hybrid embedding feature (which may be referred to as a first hybrid embedding feature) for the target image 20a according to the target image association feature; a mixed embedding feature (embedding feature, which may be referred to as a second mixed embedding feature for convenience of distinction) for the history image 20b may be generated by associating the history image association feature with the target image. According to the first and second hybrid embedding features, the service server 2000 may perform feature accumulation on the target image 20a and the history image 20b, and the accumulated features (which may be referred to as accumulated features) may be used as accumulated features (which may be referred to as target accumulated features) of the target image 20a, which merge the features of the history image and the features of the target image, so that the correlation between the two images may be reflected to some extent. Then, a reconstruction result image of the target image 20a may be determined based on the target accumulated feature, the above-described target image associated feature, and the history image. For specific implementation manners of determining the mixed embedding features, performing feature accumulation based on the mixed embedding features, and determining the reconstructed result image based on the target accumulated features, reference may be made to the description in the embodiment corresponding to fig. 3 below.

The above describes a manner of predicting the reconstruction result image of the current frame only by using the image 20a as the current frame, and for each image in the image sequence 200, the service server 2000 can determine the corresponding reconstruction result image, the service server can return each reconstruction result image in the image sequence 200 to the terminal device 2a, and the terminal device 2a can successively display the reconstruction result images according to the time sequence (actually, the reconstruction result images may form a reconstruction video, and the actual display of the terminal device 2a may be understood as a high-quality game video).

Further, please refer to fig. 3, fig. 3 is a flowchart of a method of processing an image according to an embodiment of the present disclosure. The method may be executed by a terminal device (e.g., the terminal device shown in fig. 1 or fig. 2), or may be executed by a service server (e.g., the service server 1000 in the embodiment corresponding to fig. 1), or may be executed by both the terminal device and the service server. For ease of understanding, the present embodiment is described by taking the method as an example, where the method is executed by the service server. Wherein, the image processing method at least comprises the following steps S101-S103:

step S101, acquiring a target image and a historical image in a target image sequence; the target image sequence is an image sequence obtained after image acquisition is carried out on a target scene based on the sampling number of target unit pixels; the history image is an image that is previous to the target image in the target image sequence.

In the present application, a target scene may refer to a certain virtual scene, where the virtual scene may be a simulation environment of a real world, a semi-simulation semi-fictional virtual environment, or a pure fictional virtual environment. For example, a game virtual scene of a certain game application may be taken as a target scene.

According to the method and the device, data collection can be carried out on the target scene, and the collected data to be rendered are rendered. The target image sequence in the present application may refer to a rendering image sequence obtained after rendering processing is performed on the acquired data to be rendered. The data acquisition of the target scene may include: data collection can be carried out on a target scene presented in a human-computer interaction interface (such as a terminal display interface presenting a game scene in terminal equipment), and data to be rendered in a current visual field range (which refers to a visual field range in presentation) or a specific visual field range are obtained; or under the condition that the target scene is not presented, acquiring and processing the target scene according to a specific view range to obtain data to be rendered, wherein the view range used in the data acquisition and processing process can be preset or can be automatically determined. For example, when the target application is a game application, the user may launch the target application by clicking on a launch control of the target application. When the user clicks the start control, the terminal device may respond to a click trigger operation of the user for the start control to acquire service data (i.e., data to be rendered, for example, home page data of a game application) corresponding to the start control, where the data to be rendered is data to be rendered obtained by acquiring data of a game scene.

It is understood that, in rendering, the present application may perform sampling rendering on a target scene based on a target sampling number of unit pixels, where the target sampling number of unit pixels may refer to sampling number Per Pixel (SPP). The target unit pixel sampling number in this application may refer to a value lower than a unit pixel sampling number threshold, where the unit pixel sampling number threshold may be a human defined value (e.g., 0.3, 0.5 … …), and in a case where the set value of the unit pixel sampling number threshold is small (e.g., less than 1), the target scene is subjected to image acquisition and rendering based on the target unit pixel sampling number, which may be actually understood as sparse sampling. That is, the target image sequence in the present application may refer to a rendered image sequence obtained after sampling and rendering a target scene by sparse sampling. In the case that the unit pixel sampling number threshold is set to a small value (for example, less than 1), the target unit pixel sampling number may be a value less than 1, and it should be noted that, when the target unit pixel sampling number is less than 1, it may be characterized that some pixels are not collected for the target scene.

The target image in the present application may refer to any image in the target image sequence, and the history image may refer to a previous image of the target image in the target image sequence. When the target image is the first image in the target image sequence, the target image can be considered to have no history image.

Step S102, acquiring the target image correlation characteristics corresponding to the target image and the historical image correlation characteristics corresponding to the historical image.

In the present application, a renderer (e.g., a mixed-ray renderer) may be used to render a target scene and generate a target image sequence. During rendering, rasterization processing may be performed on a scene, so that an image auxiliary feature corresponding to each rendered image may be obtained, where the image auxiliary feature may specifically include a normal dimensional feature, a depth dimensional feature, a motion vector, an albedo dimension (albedo), a metal dimensional feature, a roughness dimensional feature, a shadow feature, a transparency dimensional feature, and the like. The image association feature here may refer to a part of features (for example, may include a normal dimension feature and a depth dimension feature) in the image assistant feature, so that the image association feature corresponding to the target image may be referred to as a target image association feature, and the image association feature corresponding to the historical image may be referred to as a historical image association feature. That is to say, in the present application, the image-related feature of a certain image may be determined based on the corresponding image auxiliary feature, the image-related feature may refer to a part of features (specifically, included features may be artificially defined, for example, a normal dimension feature and a depth dimension feature may be included) or all features in the image auxiliary feature, and the image auxiliary feature may also refer to an image feature of each rendered image obtained in the process of performing image acquisition and rendering on the target scene. Taking the target image as an example, the target image may be obtained by rendering a target scene, and in the rendering process, image features (such as the above-mentioned normal dimension feature, depth dimension feature, and metal dimension feature) corresponding to the target image may also be obtained, the image features of the target image obtained in the rendering process may be used as image assistant features of the target image (which may be referred to as target image assistant features), and the target image associated features may be partial features (which may also be all features) in the image assistant features of the target image.

Step S103, generating a first mixed embedding feature aiming at the target image according to the target image association feature, generating a second mixed embedding feature aiming at the historical image according to the target image association feature and the historical image association feature, and performing feature accumulation on the target image and the historical image based on the first mixed embedding feature and the second mixed embedding feature to obtain a target accumulated feature corresponding to the target image.

In the present application, as can be seen from the above description, the image association features may include a normal dimension feature and a depth dimension feature, and for convenience of distinction, the normal dimension feature of the target image may be referred to as a first normal dimension feature, and the depth dimension feature of the target image may be referred to as a first depth dimension feature, so that the target image association features may include the first normal dimension feature and the first depth dimension feature corresponding to the target image. The specific implementation manner for generating the first hybrid embedded feature for the target image according to the target image associated feature may be as follows: the first normal dimension feature and the first depth dimension feature may be input to a time series accumulation model; performing convolution processing on the first normal dimension characteristic and the first depth dimension characteristic through a first convolution network layer of the time sequence accumulation model to obtain a first image embedding characteristic corresponding to the target image; performing convolution processing on the first normal dimension characteristic and the first depth dimension characteristic through a second convolution network layer of the time sequence accumulation model to obtain a second image embedding characteristic corresponding to the target image; subsequently, the first image embedding feature and the second image embedding feature may be subjected to pixel multiplication operation processing, resulting in a first mixed embedding feature for the target image.

Similarly, the normal dimension feature of the history image may be referred to as a second normal dimension feature, and the depth dimension feature of the history image may be referred to as a second depth dimension feature, so that the history image association feature may include the second normal dimension feature and the second depth dimension feature corresponding to the history image. The specific implementation manner of generating the second hybrid embedded feature for the historical image according to the target image associated feature and the historical image associated feature may be as follows: affine transformation processing can be respectively carried out on the second normal dimension characteristic and the second depth dimension characteristic, so that a normal transformation characteristic corresponding to the second normal dimension characteristic and a depth transformation characteristic corresponding to the second depth dimension characteristic are obtained; subsequently, the first normal dimension feature, the first depth dimension feature, the normal transformation feature, and the depth transformation feature may be input to a temporal accumulation model; performing convolution processing on the first normal dimension feature and the first depth dimension feature through a first convolution network layer of the time sequence accumulation model to obtain a target image embedding feature corresponding to the target image (actually, the target image embedding feature and the first image embedding feature may be the same feature); the normal transformation characteristic and the depth transformation characteristic can be subjected to convolution processing through a second convolution network layer of the time sequence accumulation model, and historical image embedding characteristics corresponding to the historical image are obtained; subsequently, the target image embedding feature and the history image embedding feature may be subjected to pixel multiplication operation processing, resulting in a second mixed embedding feature for the history image.

It can be understood that, the present application may calculate, based on the normal dimension feature and the depth dimension feature respectively corresponding to a current frame image (i.e., a target image) and a previous frame image (i.e., a history image), a mixed embedding feature (a new mixed embedding feature, that is, a mixed embedding feature obtained after feature mixing different embedding features) respectively corresponding to the two images, and perform feature accumulation on the two images based on the two mixed embedding features.

The specific implementation manner of performing feature accumulation on the target image and the historical image based on the first mixed embedding feature and the second mixed embedding feature to obtain the target accumulated feature corresponding to the target image may be as follows: a logistic regression function can be obtained, and a first feature fusion coefficient for the target image can be determined through the logistic regression function and the first mixed embedding feature; a second feature fusion coefficient for the historical image can be determined by the logistic regression function and the second mixed embedding feature; and performing feature accumulation on the target image and the historical image according to the first feature fusion coefficient and the second feature fusion coefficient to obtain a target accumulated feature corresponding to the target image.

The specific implementation manner of performing feature accumulation on the target image and the historical image according to the first feature fusion coefficient and the second feature fusion coefficient to obtain the target accumulated feature corresponding to the target image may be as follows: the method comprises the steps of obtaining target noise image characteristics, target shadow characteristics and target albedo characteristics corresponding to a target image, and obtaining historical accumulated noise image characteristics, historical accumulated shadow characteristics and historical accumulated albedo characteristics corresponding to a historical image; according to the first feature fusion coefficient and the second feature fusion coefficient, the target shadow feature and the historical cumulative shadow feature can be accumulated to obtain a target cumulative shadow feature corresponding to the target image; according to the first feature fusion coefficient and the second feature fusion coefficient, the target noise image features and the historical noise image features can be accumulated to obtain target accumulated noise image features corresponding to the target sound image; according to the first feature fusion coefficient and the second feature fusion coefficient, accumulating the target albedo feature and the historical accumulated albedo feature to obtain a target accumulated albedo feature corresponding to the target image; subsequently, the object cumulative shadow feature, the object cumulative noise image feature, and the object cumulative albedo feature may be determined as the object cumulative feature corresponding to the object image. That is, the target cumulative features may include target cumulative shadow features, target cumulative noise image features, and target cumulative albedo features.

It is understood that the time-series accumulation model in this application may refer to a time-series Accumulator (Temporal Accumulator), two Neural networks may be included in the time-series Accumulator, each Neural Network may include two layers of Convolutional networks (CNN), the first Convolutional Network layer may refer to one of the Neural networks in the time-series Accumulator, and the second Convolutional Network layer may refer to the other Neural Network in the time-series Accumulator. After the target image associated feature and the historical image associated feature are obtained, the target image associated feature (i.e., the first normal dimension feature and the first depth dimension feature) may be input into a neural network (e.g., a first convolution network layer) in the timing accumulator, the first convolution network layer may output an embedding feature for the target image associated feature, and for convenience of distinction, the embedding feature output by the first convolution network layer may be referred to as a first image embedding feature corresponding to the target image. For another neural network (a second convolution network layer) of the time sequence accumulator, firstly, performing image affine transformation (warp) on a second normal dimension characteristic and a second depth dimension characteristic of the historical image, so as to obtain a normal transformation characteristic and a depth transformation characteristic; subsequently, the first normal dimension feature, the first depth dimension feature, the normal transformation feature and the depth transformation feature may be input into another neural network (i.e., a second convolutional network layer), the second convolutional network layer may output the first normal dimension feature and the embedding feature corresponding to the first depth dimension feature (it should be noted that, since the network structures of the two neural networks are different, although the two neural networks are the same input, the output embedding feature may still be different; the embedding feature may be referred to as a second image embedding feature corresponding to the target image), and the second convolutional network layer may output the normal transformation feature and the embedding feature corresponding to the depth transformation feature (may be referred to as a historical image embedding feature of the historical image).

Further, the first image embedding feature and the second image embedding feature may be multiplied in a pixel-level manner (i.e., pixel multiplication operation processing, which may be understood as pixel-to-pixel multiplication operation processing, so as to perform feature mixing on the two image embedding features), and the obtained operation result may be used as a new mixed embedding feature (i.e., the first mixed embedding feature) corresponding to the target image. That is to say, two different (or the same) embedding features (embedding features, which may be referred to as image embedding features in this application) for the target image correlation features can be output by the two network convolution layers of the time sequence accumulation model, and the two embedding features can be mixed in a pixel-level calculation manner to obtain a new mixed embedding feature, and since the mixed embedding feature is a mixed result, the mixed embedding feature can be referred to as a first mixed embedding feature, that is, the first mixed embedding feature is a mixed embedding feature obtained by feature mixing a first image embedding feature (generated based on the target image correlation features) and a second image embedding feature (generated based on the target image correlation features); similarly, the first image embedding feature and the history image embedding feature may be multiplied by a pixel level (i.e., pixel multiplication operation processing), the obtained operation result may be used as a new mixed embedding feature (i.e., a second mixed embedding feature) corresponding to the history image, that is, two embedding features corresponding to the target image association feature and the history image association feature may be output by one network convolution layer of the time-series accumulation model, respectively, and the two embedding features may be mixed by a pixel level calculation, so as to obtain a new mixed embedding feature, and since the mixed embedding feature is a mixed result, it may be referred to as a second mixed embedding feature, that is, the second mixed embedding feature is a feature mixing feature of the target image embedding feature (generated based on the target image association feature, and the target image embedding feature and the first image embedding feature may be the same embedding feature) and the history image embedding feature The resulting hybrid embedded feature. It should be understood that when feature accumulation is performed on the target image and the historical image subsequently, a blending factor needs to be used for accumulation, and the first and second blending embedding features are mainly used for determining the blending factor when feature accumulation is performed. For example, a logistic regression function (e.g., softmax function) may be called, through which logistic regression processing may be performed on the first mixed embedded feature and the second mixed embedded feature, respectively, and a result obtained by logistic regression on the first mixed embedded feature may be used as a mixing factor (which may be referred to as a first feature fusion coefficient) corresponding to the target image; the result obtained by performing logistic regression on the second mixed embedded feature can be used as a mixing factor (which may be referred to as a second feature fusion coefficient) corresponding to the historical image. According to the first feature fusion coefficient and the second feature fusion coefficient, feature accumulation can be performed on the target image and the historical image, and a target accumulated feature corresponding to the target image is obtained.

When the features of the target image and the historical image are accumulated, shadow features, noisy images and albedo can be accumulated. The specific implementation manner of performing feature accumulation based on the first feature fusion coefficient and the second feature fusion coefficient may be as shown in formula (1):

wherein, as shown in formula (1)

The method can be used for representing the accumulated characteristics (which can be the characteristics of accumulated shadow of the target; can also be the characteristics of accumulated noise image of the target; and can also be the characteristics of accumulated albedo of the target) corresponding to the current frame image (the target image, namely the t-th frame);

may refer to the cumulative features (when) up to the last frame image (i.e., the history image, frame t-1)

When shadow features are accumulated for an object, this

May refer to historical cumulative shadow features up to t-1 frame; when in use

When accumulating noisy image features for a target, the

May refer to historical accumulated noise image features up to t-1 frames; when in use

When the target is accumulated with albedo characteristics, this

May refer to historical cumulative albedo features up to t-1 frames); f. of^sCan be used for characterizing the relevant characteristics of the current frame image (when

When shadow features are accumulated for an object, f^sMay refer to the target shadow of the t-th frameCharacteristic; when in use

When accumulating the noise image feature for the object, f^sMay refer to the target noise image feature of the t-th frame; when in use

When the target is the cumulative albedo feature, f^sMay refer to the target albedo feature of the t-th frame). α in the formula (1) may refer to the first feature fusion coefficient, and β may refer to the second feature fusion coefficient; w () may be a warping operator that characterizes the re-projection of the previous frame image to the current frame image using motion vectors. In the process of rendering the images, the image auxiliary features of each image can be obtained, so that the target shadow features, the target albedo features and the target noise image features can be directly obtained from the image auxiliary features of the target images. It should be noted that, when the target image is an image containing noise, the target noise image feature may refer to the target image itself. And when the target image is the first image in the target image sequence, the method

I.e. the cumulative characteristic is its own characteristic, e.g. the object cumulative shadow characteristic is an object shadow characteristic of the object image.

It can be understood that, by the above formula (1), the target cumulative albedo feature, the target cumulative shadow feature and the target cumulative noise image feature corresponding to the target image can be obtained.

And step S104, determining a reconstruction result image corresponding to the target image according to the target image association characteristic, the target accumulation characteristic and the historical image.

In this application, determining a reconstruction result image corresponding to the target image according to the target image association feature, the target accumulation feature and the historical image may be understood as: and performing image reconstruction processing on the target image according to the target image correlation characteristic, the target accumulation characteristic and the historical image to obtain a reconstruction result image (namely, the reconstruction result image is actually the image obtained after performing the image reconstruction processing on the target image). That is, in the present application, in the process of performing image reconstruction processing on a target image, not only the target image-related feature and the target cumulative feature of the target image itself but also information related to a history image is required. For a reconstructed image corresponding to a target image according to the target image association feature, the target accumulation feature and the historical image, a specific implementation manner may be: determining target fusion cascade characteristics corresponding to the target image according to the target image association characteristics, the target accumulation characteristics and the historical image; subsequently, the target fusion cascade feature may be input into the target image reconstruction model, and in the target image reconstruction model, a reconstruction result image corresponding to the target image may be output according to the target fusion cascade feature.

The specific implementation manner of determining the target fusion cascade feature corresponding to the target image according to the target image association feature, the target accumulation feature and the historical image may be as follows: auxiliary features corresponding to the target image can be obtained; the auxiliary features comprise first normal dimension features and first depth dimension features; then, dimension features, except the first normal dimension feature and the first depth dimension feature, in the auxiliary features can be determined as remaining dimension features; subsequently, feature fusion can be performed on the target accumulated feature, the first normal dimension feature, the first depth dimension feature and the remaining dimension feature, so that an image fusion feature corresponding to the target image can be obtained; subsequently, affine transformation processing may be performed on the second normal dimension feature and the second depth dimension feature, so as to obtain a normal transformation feature corresponding to the second normal dimension feature and a depth transformation feature corresponding to the second depth dimension feature; the image fusion feature, the normal transformation feature and the depth transformation feature can be subjected to cascade processing to obtain a target fusion cascade feature corresponding to the target image.

It can be understood that after the image features (noise image features), the shadow features, and the albedo including noise are accumulated, the accumulated target accumulated noise image features, the accumulated target shadow features, and the accumulated target albedo features, and dimensional features such as a normal dimensional Feature, a depth dimensional Feature, a transparency dimensional Feature, a metal dimensional Feature, and a roughness dimensional Feature of the target image (which may refer to that, in the assistant features of the target image, dimensional features other than the line dimensional Feature and the depth dimensional Feature, such as the transparency dimensional Feature, the metal dimensional Feature, and the roughness dimensional Feature, are collectively referred to as remaining dimensional features) may be Feature-fused by a Feature Fusion device (Feature Fusion), so as to obtain image Fusion features corresponding to the target image. It should be understood that when the sampling number of the target unit pixel is small, the sampling rendering of the target scene can be actually understood as sparse sampling rendering, so that the obtained target image and each feature (such as a target accumulated shadow feature) are also sparse, the features of each dimension can be fused through the feature fusion device, and the obtained image fusion feature can sense information in different spaces. Wherein the means for feature fusion includes, but is not limited to, stitching in the channel dimension. Further, image cascade connection can be performed on the image fusion feature corresponding to the target image, the normal dimension feature (namely, the normal transformation feature) of the historical image after warp, and the depth dimension feature (the depth transformation feature), so that a cascade connection feature (which may be called as a target fusion cascade connection feature) can be obtained.

Further, the target fusion cascade feature can be input into a target image reconstruction model, and a reconstruction result image corresponding to the target image can be output through the target image reconstruction model. The specific implementation mode can be as follows: through the target image reconstruction model, a predicted high-quality image, a predicted low-quality image, a first image reconstruction parameter and a second image reconstruction parameter corresponding to the target fusion cascade feature can be determined; the high-quality image is predicted to contain no noise, and the resolution of the high-quality image is higher than that of the low-quality image; then, image fusion can be carried out on the predicted high-quality image and the predicted low-quality image in the target image reconstruction model through the first image reconstruction parameters, and an initial reconstruction image corresponding to the target image is obtained; and then, a historical reconstruction image corresponding to the historical image can be obtained, and the initial reconstruction image and the historical reconstruction image are subjected to image fusion through the second image reconstruction parameter, so that a reconstruction result image corresponding to the target image can be obtained.

It will be appreciated that a specific implementation for determining an initial reconstructed image of the target image by the target image reconstruction model can be as shown in equation (2):

wherein O is represented by the formula (2)_pThe method can be used for representing the initial reconstruction image corresponding to the target image; d^fA predicted high-quality image (high-resolution image) that can be used to characterize the target image as predicted by the reconstruction model; d is a radical of^cA predicted low-quality image (low-resolution image) that can be used to characterize the target image as predicted by the reconstruction model; d and U can be used for representing 2X times of down-sampling and nearest neighbor up-sampling operation respectively; alpha is alpha^sMay be used to characterize the first image reconstruction parameters described above. The subscript p for each parameter, as in equation (2), can be used to characterize the pixel (i.e., perform the operation in units of pixels). As shown in formula (2), the method mainly includes mixing the high-resolution image and the low-resolution image by using a multi-scale reconstruction method to obtain an initial reconstructed image corresponding to the target image.

It can be understood that the noise-removed image with higher resolution corresponding to the target image can be predicted through the target image reconstruction model, and the image can be used as a high-quality prediction image; two channel parameters alpha can also be output through a target image reconstruction model^sAnd alpha^tFor image reconstruction, the parameter α^sWhich may be referred to as the first image reconstruction parameter, parameter alpha^tMay be referred to as second image reconstruction parameters; the target image reconstruction model can also predict an image with lower resolution after 2X down sampling from the last image layer (such as the last image layer for predicting a high-quality image), and the image can be used as a predicted low-quality image. By parameter α^sAnd the formula (2) can be used for mixing the high-resolution image and the low-resolution image to obtain a mixed resultMay be used as an initial reconstructed image of the target image.

Further, based on another parameter α^O(i.e., the second image reconstruction parameters) a final reconstruction result image of the target image may be determined. The specific implementation manner can be shown as formula (3):

O_t＝α^tO_p+(1.0-α^t)O_t-1formula (3)

Wherein O is as shown in the formula (3)_tThe method can be used for representing a reconstruction result image corresponding to a target image; alpha (alpha) ("alpha")^tCan be used to characterize the second image reconstruction parameters; o is_t-1Can be used to characterize the corresponding reconstructed images of the historical images (i.e., the historical reconstructed images). The current output can be output O through the second image reconstruction parameter_pAlpha to the historical reconstructed image^tLinear mixing is performed for the factors.

It should be understood that the embodiment of the present application may calculate the correlation of each pixel between the normal feature and the depth feature of the current frame and the previous frame, may calculate a weighting factor (a first feature fusion coefficient and a second feature fusion coefficient) based on the correlation of the previous frame and the subsequent frame, and may perform feature fusion and feature accumulation on the features of the previous frame and the subsequent frame based on the weighting factor, so that the feature finally input to the reconstruction model may include the common feature of the previous frame and the subsequent frame, the feature dimension is richer, and the included content is deeper; an interpolation multi-scale image (initial reconstruction image) can be obtained based on the fusion characteristic and the accumulation characteristic, a reconstruction image of a current frame is reconstructed by the interpolation multi-scale image (initial reconstruction image) and a historical frame image, information of the historical image can be further utilized, and even if the sampling number of target unit pixels is small, time domain information under a low SPP sampling image can be fully utilized through the method, and an image obtained by sparse sampling can be reconstructed in real time. This makes it possible to output a high-quality reconstruction result image (i.e., a reconstruction result image as an output image) when outputting an image. For example, when an image of a target scene (such as a game scene) is output in a human-computer interaction interface of a terminal device, a high-quality reconstruction result image subjected to image reconstruction processing is output.

In the embodiment of the application, the mixed embedding features respectively corresponding to the two images can be calculated according to the correlation features of the current image and the previous frame image, feature accumulation is carried out on the two images based on the mixed embedding features, the obtained target accumulation features comprise features obtained by fusing the previous frame and the next frame, the fused features can represent the correlation of the previous frame and the next frame, and the feature inclusion dimension is wider; meanwhile, when a reconstruction result image of the target image is determined, the target image correlation characteristic of the target image, the target accumulation characteristic and the previous frame image (historical image) are determined together, when the reconstruction result image of the target image is determined, not only the characteristic of the current frame but also the characteristic of the historical frame which is earlier in time are considered, the time domain information of the current frame can be fully utilized, the obtained reconstruction result image has higher time domain stability, and the image quality is higher. In summary, no matter the image is obtained by high pixel sampling number or low pixel sampling number, the method and the device can perform feature accumulation according to the image association features of two frames of images in front and back in the target image sequence, and determine the reconstruction result image of the target image together through the accumulated features and the historical image, so that the method and the device have universality; by calculating the accumulated characteristics of the two frames before and after the image reconstruction method based on the accumulated characteristics, the time domain information of the image can be fully utilized, and therefore the image quality of the reconstructed image can be well improved.

Therefore, after the target fusion cascade characteristic of the target image is determined, the target fusion cascade characteristic can be input into the target image reconstruction model, and the predicted reconstruction image corresponding to the target fusion cascade characteristic (i.e. the reconstruction result image corresponding to the target image) is determined through the target image reconstruction model. In order to optimize the reconstruction effect of the target image reconstruction model after image reconstruction (i.e., improve the image quality of the image output by the model), the initial image reconstruction model can be trained and adjusted, so that the optimal mousse image reconstruction model obtained after training and adjustment can be obtained, and image reconstruction can be performed based on the trained target image reconstruction model. For ease of understanding, please refer to fig. 4 together, and fig. 4 is a schematic flowchart of a model training process provided in the embodiment of the present application. The process may be a specific process of obtaining a target image reconstruction model for training and adjusting the image reconstruction model. As shown in fig. 4, the flow may include at least the following steps S301 to S307:

step S301, acquiring a target sample noise image and a historical sample noise image in a sample image sequence; the sample image sequence is an image sequence obtained by carrying out image acquisition on a sample scene based on the first unit pixel sampling number; the historical sample noise image is the last sample noise image of the target sample noise image in the sample image sequence; the first number of unit pixel samples is less than the threshold number of unit pixel samples.

Specifically, the sample scene may refer to a certain virtual scene, for example, a game virtual scene of a certain game application may be used as the sample scene. Data acquisition can be carried out on the sample scene, and rendering processing can be carried out on the acquired sample data to be rendered. The sample image sequence in this application may refer to a rendered image sequence (which may be referred to as a sample noise image sequence) that is obtained by rendering acquired sample data to be rendered and contains noise. The first number of unit Pixel samples herein may refer to the number of Samples Per Pixel (SPP). The first unit pixel sampling number in this application may refer to a value lower than a unit pixel sampling number threshold, where the unit pixel sampling number threshold may be a human defined value (e.g. 0.3, 0.5 … …), and in a case where the set value of the unit pixel sampling number threshold is small (e.g. smaller than 1), the image collection rendering is performed on the sample scene based on the first unit pixel sampling number, which may be actually understood as sparse sampling. That is, the sample image sequence in the present application may refer to a rendered image sequence containing noise obtained after sampling and rendering a sample scene by sparse sampling. In the case that the unit pixel sampling number threshold is set to a small value (e.g., less than 1), the first unit pixel sampling number may be a value less than 1, and it should be noted that when the first unit pixel sampling number is less than 1, it may be characterized that some pixels are not collected for the sample scene. In a possible embodiment, the first unit pixel sampling number and the target unit pixel sampling number may be the same number.

The target sample noise image may refer to any sample noise image in the sample image sequence, and the historical sample noise image may refer to a last sample noise image of the target sample noise image in the sample image sequence.

Step S302, acquiring the target sample image correlation characteristics corresponding to the target sample noise image and the historical sample image correlation characteristics corresponding to the historical sample noise image.

Specifically, as with the target image correlation features and the historical image correlation features, the target sample image correlation features may include normal dimension features (which may be referred to as first sample normal dimension features) and depth dimension features (which may be referred to as first sample depth dimension features) corresponding to the target sample noise image; the historical sample image associated features may include a normal dimension feature (which may be referred to as a second sample normal dimension feature) and a depth dimension feature (which may be referred to as a second sample depth dimension feature) corresponding to the historical sample noise image.

For a specific manner of obtaining the target sample image associated feature and the historical sample image associated feature, reference may be made to the embodiment corresponding to fig. 3, and details of obtaining the target image associated feature and the historical image associated feature will not be repeated here.

Step S303, generating a first sample mixed embedding feature aiming at the target sample noise image according to the target sample image association feature, generating a second sample mixed embedding feature aiming at the historical sample noise image according to the target sample image association feature and the historical sample image association feature, and performing feature accumulation on the target sample noise image and the historical sample noise image based on the first sample mixed embedding feature and the second sample mixed embedding feature to obtain a target sample accumulated feature corresponding to the target sample noise image.

Specifically, the second sample normal dimension characteristic and the second sample depth dimension characteristic can be subjected to image affine transformation processing to obtain a sample normal transformation characteristic and a sample depth transformation characteristic; the first sample normal dimension characteristic and the first sample depth dimension characteristic can be input into a time sequence accumulation model, and an embedding characteristic (which can be called as a first sample image embedding characteristic) can be output through a first convolution network layer of the time sequence accumulation model; the sample normal transformation feature, the sample depth transformation feature, the first sample normal dimension feature and the first sample depth dimension feature may also be input into the time sequence accumulation model, and through a second convolution network layer of the time sequence accumulation model, one embedding feature (which may be referred to as a historical sample image embedding feature) corresponding to the sample normal transformation feature and the sample depth transformation feature may be output, and one embedding feature (which may be referred to as a second sample image embedding feature) corresponding to the first sample normal dimension feature and the first sample depth dimension feature may also be output; similarly, based on the first sample image embedding feature and the second sample image embedding feature, a first sample mixed embedding feature of the target sample noise image can be determined; based on the first sample image embedding features and the historical sample image embedding features, second sample mixed embedding features of the historical sample noise image can also be determined. And based on the first sample mixed embedding feature and the second sample mixed embedding feature, the target sample noise image and the historical sample noise image can be subjected to feature accumulation to obtain a target sample accumulation feature corresponding to the target sample noise image.

For a specific process of determining the first sample hybrid embedding feature, refer to the above description of determining the first hybrid embedding feature in the embodiment corresponding to fig. 3; for a specific process of determining the second sample mixed embedding feature, reference may be made to the description of determining the second mixed embedding feature in the embodiment corresponding to fig. 3 above; for a specific method for performing feature accumulation on the target sample noise image and the historical sample noise image based on the first sample mixed embedding feature and the second sample mixed embedding feature to obtain a target sample accumulated feature corresponding to the target sample noise image, reference may be made to the above-mentioned embodiment corresponding to fig. 3, and for a description that the target accumulated feature is obtained by performing feature accumulation on the target image and the historical image based on the first mixed embedding feature and the second mixed embedding feature, which will not be described again here.

And step S304, determining target sample fusion cascade characteristics corresponding to the target sample noise image according to the target sample image correlation characteristics, the target sample accumulation characteristics and the historical sample noise image.

Specifically, the target sample cumulative feature may also include a target sample cumulative shadow feature, a target sample cumulative noise image feature, and a target sample cumulative albedo feature, and a specific method for determining the target sample fusion cascade feature corresponding to the target sample noise image may be performed according to the target sample image association feature, the target sample cumulative feature, and the historical sample noise image, for example, in the embodiment corresponding to fig. 3, a description of the target fusion cascade feature is determined according to the target image association feature, the target cumulative feature, and the historical image, which will not be described herein again.

Step S305, inputting the target sample fusion cascade characteristic into an image reconstruction model, and outputting a target sample reconstruction image corresponding to the target sample noise image according to the target sample fusion cascade characteristic in the image reconstruction model.

Specifically, the image reconstruction model may refer to a model to be trained, and in the image reconstruction model, a target sample reconstructed image corresponding to a target sample noise image may be output based on a model parameter to be optimized. For a specific method for outputting a target sample reconstructed image corresponding to a target sample noise image according to a target sample fusion cascade feature in an image reconstruction model, refer to the above embodiment corresponding to fig. 3, and for a description that a reconstruction result image corresponding to a target image is output according to a target fusion cascade feature in a target image reconstruction model, details will not be repeated here.

Step S306, acquiring a target label sampling image corresponding to the target sample noise image; the target label sampling image is an image obtained after image acquisition is carried out on a sample scene based on the second unit pixel sampling number; the first unit pixel sample count is less than the second unit pixel sample count.

Specifically, the target tag sample image may be an image obtained after image capture is performed on the sample scene based on a second unit pixel sample number, where the second unit pixel sample number may be a numerical value lower than a unit pixel sample number threshold, where the unit pixel sample number threshold may be a specified numerical value (e.g., 0.3, 0.5 … …), and in a case that the set numerical value of the unit pixel sample number threshold is small (e.g., smaller than 1), image capture rendering is performed on the sample scene based on the first unit pixel sample number, which may be actually understood as sparse sampling. And secondly, performing image acquisition rendering on the sample scene based on the second unit pixel sampling number, and then, performing ordinary sampling (or referred to as conventional sampling). It should be understood that sparse sampling may refer to sampling each pixel once (some pixels may not even be sampled), while ordinary sampling may refer to sampling each pixel at least once (some pixels may even be sampled multiple times), and a high-resolution image without noise, which is rendered by ordinary sampling, may be used as a tag sample image (i.e., a reference image) of the present application; the target tag sample image may be a reference image corresponding to the target sample noise image, and the reference image is a high SPP image (a high resolution noise-free image).

Step S307, training the image reconstruction model according to the target label sampling image and the target sample reconstruction image to obtain a target image reconstruction model for performing image reconstruction processing on the target image in the target image sequence.

Specifically, the image reconstruction model may be trained based on the target label sampled image and the target sample reconstructed image output by the image reconstruction model. The specific mode can be as follows: in the sample image sequence, a sample reconstructed image corresponding to a residual sample noise image can be determined as a reconstructed image to be operated, and a label sampled image corresponding to the residual sample noise image can be determined as a label sampled image to be operated; the residual sample noise image is a sample noise image except the target sample noise image in the sample image sequence; then, according to the reconstructed image to be operated, the label sampling image to be operated, the target label sampling image and the target sample reconstructed image, a target loss value aiming at the image reconstruction model can be determined; and training the image reconstruction model according to the target loss value to obtain the target image reconstruction model.

The specific implementation manner for determining the target loss value for the image reconstruction model according to the to-be-computed reconstructed image, the to-be-computed label sampled image, the target label sampled image and the target sample reconstructed image may be as follows: according to the reconstructed image to be operated, the label sampling image to be operated, the target label sampling image and the target sample reconstructed image, a space loss value aiming at the image reconstruction model can be determined; according to the reconstructed image to be operated, the label sampling image to be operated, the target label sampling image and the target sample reconstructed image, a time sequence loss value aiming at the image reconstruction model can be determined; according to the reconstructed image to be operated, the label sampling image to be operated, the target label sampling image and the target sample reconstructed image, a relative edge loss value aiming at the image reconstruction model can be determined; from the spatial loss value, the temporal loss value, and the relative edge loss value, a target loss value for the image reconstruction model may be determined.

The specific implementation manner of determining the spatial loss value for the image reconstruction model according to the to-be-computed reconstructed image, the to-be-computed label sampled image, the target label sampled image and the target sample reconstructed image may be as follows: a space loss function can be obtained, and a first space sub-loss value aiming at the residual sample noise image can be determined according to the space loss function, the reconstructed image to be operated and the label sampling image to be operated; a second spatial sub-loss value for the target sample noise image may be determined from the spatial loss function, the target label sampled image, and the target sample reconstructed image; subsequently, the first spatial sub-loss value and the second spatial sub-loss value may be fused, so as to obtain a spatial loss value of the image reconstruction model. The specific implementation manner of obtaining the spatial loss value of the image reconstruction model by fusing the first spatial sub-loss value and the second spatial sub-loss value may be as follows: a first training weight for the remaining sample noise image and a second training weight for the target sample noise image may be obtained; the first training weight and the first spatial sub-loss value may be subjected to operation processing to obtain a first operation spatial sub-loss value; the second training weight and the second spatial sub-loss value can be subjected to operation processing to obtain a second operation spatial sub-loss value; and then, adding the first operation space sub-loss value and the second operation space sub-loss value to obtain a space loss value of the image reconstruction model.

For ease of understanding, please refer to equation (4), equation (4) may be a specific way to determine the space loss value:

the spatial loss value, the temporal loss value, and the relative edge loss value may be calculated over a sample image sequence of N consecutive frames (N is a positive integer). While equation (4) above may be a specific method of determining the spatial sub-loss value of a single frame in the sample image sequence. L (r, d) as shown in equation (4) can be used to characterize the corresponding spatial sub-loss value (e.g., the first spatial sub-loss value) of a single frame (e.g., the target sample noise image); p may be used to characterize the pixel, N may be the total number of image pixels; c may be used to characterize the number of image color channels (e.g., for an RGB image, an image color channel may contain three channels); d can be used to characterize a sample reconstructed image (e.g., a target sample reconstructed image) output by the image reconstruction model; r may be used to characterize a label sample image (e.g., a target label sample image). ε can be used to characterize the loss parameter and can be taken to be 10^-2. It should be understood that after obtaining the spatial sub-loss value corresponding to each sample noise image in the sample image sequence, the spatial sub-loss values may be added and summed, and the obtained result may be used as the final spatial loss value for training the model. Alternatively, when the spatial sub-loss values are subjected to the addition and summation processing, different weights may be assigned to different sample noise images in the sample image sequence in advance, and the spatial sub-loss values may be subjected to the weighted summation processing based on the different weights. For example, the sample image sequence is 5 frame imagesFor example, [0.05,0.25,0.5,0.75,1 ] can be used]The distribution mode of (1) is to distribute weights (namely, as the training weights) to the 5 frames of sample noise images, and after the spatial sub-loss values corresponding to the 5 frames of sample images are obtained, the training weights and the spatial sub-loss values can be multiplied to obtain the operation spatial sub-loss values corresponding to the frames; then, the operation space sub-loss values can be added and summed to obtain the final space loss value. It should be appreciated that the training weights corresponding to the target sample noise images may be referred to as first training weights, and the training weights corresponding to the remaining sample noise images may be referred to as second training weights.

Like the spatial loss value, the same is true for the timing loss value and the relative edge loss value, and the timing sub-loss value (or the relative edge sub-loss value) corresponding to each sample noise image in the sample image sequence can be sequentially calculated, and then the timing sub-loss value (or the relative edge sub-loss value) of each sample image is weighted and summed (the training weight of each sample noise image can be pre-assigned), so as to obtain the final timing loss value (or the relative edge loss value).

For ease of understanding, please refer to equation (5), equation (5) may be a specific way to determine the timing loss value:

wherein l (Δ r, Δ d) as shown in equation (5) can be used to characterize the corresponding temporal sub-loss value (e.g. the first temporal sub-loss value) of a single frame (e.g. the target sample noise image); the values of r, d, p, c, ε can be determined by the above formula (4). The delta can be used for representing the time gradient calculated between two continuous frames (namely delta r can be used for representing the time gradient calculated by a sample reconstructed image of a current frame image and a sample reconstructed image of a previous frame image; and delta d can be used for representing the time gradient calculated by two front and back label sampling images).

For ease of understanding, please refer to equation (6, equation (6) may be a specific way to determine the relative edge loss value:

wherein, as shown in formula (6)

May be used to characterize a corresponding relative edge sub-loss value (e.g., a first relative edge sub-loss value) for a single frame (e.g., a target sample noise image); the values of r, d, p, c, ε can be determined by the above formula (4).

Can be used to characterize the gradient (i.e., HFEN) calculated using a High Frequency Error Norm (HFEN)

The gradient can be used for representing the gradient obtained by calculating the sample reconstruction image of the current frame image and the sample reconstruction image of the previous frame image;

can be used to characterize the calculated gradient of the two label sample images before and after).

The spatial loss value, the timing loss value and the relative edge loss value can be part of loss values of a training image reconstruction model, and the image reconstruction model can also comprise the other part of loss values, and the two parts of loss values train the image reconstruction model together. The other part loss value may include an affine transformation loss value and an albedo loss value, that is, a target loss value may be determined based on the spatial loss value, the temporal loss value, the relative edge loss value, the affine transformation loss value, and the albedo loss value. Wherein the affine transformation loss value is calculated only on the last two frames in the sample image sequence and the albedo loss value is calculated only on the last image in the sample image sequence. Here, taking the target sample noise image as the last sample noise image in the sample image sequence (and then the historical sample noise image and the target sample noise image are the last two images in the sample image sequence) as an example, a specific implementation manner for determining the target loss value may be: the method comprises the steps of obtaining a historical sample reconstructed image and a historical label sampling image corresponding to a historical sample noise image, and determining an affine transformation loss value aiming at an image reconstruction model according to the historical sample reconstructed image, the target sample reconstructed image, the historical label sampling image and the target label sampling image; then, acquiring target sample albedo characteristics corresponding to the target sample noise image, acquiring target sample accumulated albedo characteristics corresponding to the target sample noise image in the target sample accumulated characteristics, and determining an albedo loss value aiming at the image reconstruction model according to an albedo loss function, the target sample albedo characteristics and the target sample accumulated albedo characteristics; and carrying out loss value fusion on the space loss value, the time sequence loss value, the relative edge loss value, the affine transformation loss value and the albedo loss value to obtain a target loss value of the image reconstruction model.

For ease of understanding, please refer to equation (7), equation (7) may be a specific way to determine the affine transformation loss value:

wherein l (ω r, ω d) as in equation (7) can be used to characterize the affine transformation loss value; the meanings of r, d, p, c and epsilon can be all described in the formula (4); ω r can be used to characterize r₄-W(r₃) Wherein r is₄As the last frame image (target sample noise image), r₃Which can be used to characterize the previous frame image (historical sample noise image), W is a warping operator that indicates the re-projection of the previous frame onto the current frame.

For ease of understanding, please refer to equation (8), equation (8) may be a specific way to determine the affine transformation loss value:

wherein l (a) is represented by the formula (8)_acc,a_r) Can be used to characterize albedo loss values; the r, d, p, c and epsilon all represent meanings which can be seen in the formula (4) above; a is_accCan be used to characterize the cumulatively derived albedo feature (target sample cumulative albedo feature); a is_rCan be used to characterize the albedo feature of the last image.

Further, we can perform weighted summation on the above loss values to obtain a target loss value finally used for training the image reconstruction model, and the target loss value is determined as shown in equation (9):

l＝0.7l_s+0.1l_t+0.2l_e+0.4l_wt+5.0l_aformula (9)

Wherein l_sCan be used to characterize the spatial loss value; l_tCan be used for characterizing the time sequence loss value; l_eCan be used to characterize relative edge loss values; l_wtMay possess a value characterizing affine transformation loss; l. the_aCan be used to characterize albedo loss values. The weight of each loss value (e.g., 0.7, 0.1, 0.2, 0.4, 5.0) is a preset value, which is not limited to the above, and may be other weight values determined through experimental tests, which is not limited to the above.

In summary, the loss value l may be used to measure a difference between the sample reconstructed image and the reference image, and the model parameter of the image reconstruction model may be adjusted by the target loss value l to obtain the optimal estimation parameter. The specific implementation manner can be shown as formula (10):

wherein, as in the formula (10)

Can be used for characterizing neural network functions, and is mainly used for reconstructing non-contained noiseThe image of (a);

can be used to characterize the optimal estimation parameters; n may be used to characterize the total number of images of the sample image sequence; c. C_iCan be used to characterize a certain sample noise image; fi (wireless fidelity)_iCan be used to characterize the assist features obtained during the rendering process; r is_iCan be used to characterize high SPP reference pictures. l can be used for representing target loss values, and we can train the deep neural network by minimizing the loss value l to obtain the optimal estimation parameters on a sample image sequence by using a gradient descent algorithm

The deep neural network

Can be deployed in the target image reconstruction model for image reconstruction processing (equivalent to that of the target image reconstruction model).

Further, please refer to fig. 5, wherein fig. 5 is a system architecture diagram according to an embodiment of the present application. As shown in fig. 5, the system architecture may include a temporal accumulator (also referred to as a temporal accumulation model), a feature fuser, and a target image reconstruction model (also referred to as a reconstruction network). The timing accumulator may be composed of a first convolutional network layer and a second convolutional network layer, and both the first convolutional network layer and the second convolutional network layer may include two Convolutional Neural Networks (CNNs). The feature fuser may also include a plurality of Convolutional Neural Networks (CNNs). The target image reconstruction model can be designed into a U-Net structure containing a residual error module, namely a network structure which is formed by stacking a plurality of convolutional layers (convlayers) to obtain a coding and decoding module and inserting the residual error module in the coding and decoding module.

It should be understood that the image-related features (including the normal feature and the depth feature) of the current frame and the image-related features (including the normal feature and the depth feature) of the historical frame may be input into a time sequence accumulator, and the time sequence accumulator may perform time-domain-based accumulation on the noisy image feature, the shadow feature and the albedo feature of the two frames to obtain accumulated features corresponding to the respective features. Each accumulated feature obtained by accumulation based on the time sequence accumulator is input into the feature fusion device, and the accumulated features and other auxiliary features (such as a normal feature, a transparency dimension feature, a roughness dimension feature and the like) can be subjected to feature fusion through the feature fusion device to obtain the image fusion feature of the current frame. Then, the fusion characteristic of the current frame and the normal characteristic and the depth characteristic of the history frame after warp denoising can be subjected to cascade processing, so that the fusion cascade characteristic can be obtained; the fusion cascade feature can be input into a reconstruction network (i.e. a target image reconstruction model), and a reconstruction result image of the current frame can be output through the reconstruction network.

It should be understood that the target scene can be rendered with a lower pixel sampling number, a large amount of calculation overhead required for rendering an image with a high pixel sampling number is reduced by a deep learning method, and the sparsely sampled image can be reconstructed in real time under high resolution, so that the target scene can be rendered by a sparsely sampled rendering mode under the condition of ensuring good resolution and frame rate, and the calculation cost can be well reduced.

Further, please refer to fig. 6, where fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application. The image processing apparatus may be a computer program (including program code) running on a computer device, for example the image processing apparatus being an application software; the image processing apparatus may be adapted to perform the method shown in fig. 3. As shown in fig. 6, the image processing apparatus 1 may include: an image acquisition module 11, a feature acquisition module 12, a feature generation module 13, a feature accumulation module 14, and an image reconstruction module 15.

An image obtaining module 11, configured to obtain a target image and a history image in a target image sequence; the target image sequence is an image sequence obtained after image acquisition is carried out on a target scene based on the sampling number of target unit pixels; the historical image is the last image of the target image in the target image sequence;

the feature obtaining module 12 is configured to obtain a target image correlation feature corresponding to a target image and a historical image correlation feature corresponding to a historical image;

a feature generation module 13, configured to generate a first mixed embedding feature for the target image according to the target image associated feature;

the feature generation module 13 is further configured to generate a second mixed embedded feature for the historical image according to the target image associated feature and the historical image associated feature;

the feature accumulation module 14 is configured to perform feature accumulation on the target image and the historical image based on the first mixed embedding feature and the second mixed embedding feature to obtain a target accumulated feature corresponding to the target image;

and the image reconstruction module 15 is configured to determine a reconstruction result image corresponding to the target image according to the target image association feature, the target accumulation feature and the historical image.

For specific implementation manners of the image obtaining module 11, the feature obtaining module 12, the feature generating module 13, the feature accumulating module 14, and the image reconstructing module 15, reference may be made to the description of step S101 to step S104 in the embodiment corresponding to fig. 3, and details will not be described here.

the feature generation module 13 may include: a first feature input unit 131, a first feature convolution unit 132, and a first feature operation unit 133.

A first feature input unit 131, configured to input the first normal dimension feature and the first depth dimension feature to the time sequence accumulation model;

a first feature convolution unit 132, configured to perform convolution processing on the first normal dimension feature and the first depth dimension feature through a first convolution network layer of the time sequence accumulation model, to obtain a first image embedding feature corresponding to the target image;

the first feature convolution unit 132 is further configured to perform convolution processing on the first normal dimension feature and the first depth dimension feature through a second convolution network layer of the time sequence accumulation model to obtain a second image embedding feature corresponding to the target image;

a first feature calculation unit 133, configured to perform pixel multiplication calculation processing on the first image embedding feature and the second image embedding feature to obtain a first mixed embedding feature for the target image.

For specific implementation of the first feature input unit 131, the first feature convolution unit 132, and the first feature operation unit 133, reference may be made to the description of step S103 in the embodiment corresponding to fig. 3, which will not be described herein again.

the feature generation module 13 may include: a feature transformation unit 134, a second feature input unit 135, a second feature convolution unit 136, and a second feature operation unit 137.

The feature transformation unit 134 is configured to perform affine transformation on the second normal dimension feature and the second depth dimension feature respectively to obtain a normal transformation feature corresponding to the second normal dimension feature and a depth transformation feature corresponding to the second depth dimension feature;

a second feature input unit 135 configured to input the first normal dimension feature, the first depth dimension feature, the normal transformation feature, and the depth transformation feature to the time series accumulation model;

the second feature convolution unit 136 is configured to perform convolution processing on the first normal dimension feature and the first depth dimension feature through a first convolution network layer of the time sequence accumulation model to obtain a target image embedding feature corresponding to the target image;

the second feature convolution unit 136 is further configured to perform convolution processing on the normal transformation feature and the depth transformation feature through a second convolution network layer of the time sequence accumulation model to obtain a history image embedding feature corresponding to the history image;

and a second feature operation unit 137, configured to perform pixel multiplication operation processing on the target image embedding feature and the history image embedding feature to obtain a second mixed embedding feature for the history image.

For specific implementation of the feature transformation unit 134, the second feature input unit 135, the second feature convolution unit 136, and the second feature operation unit 137, reference may be made to the description of step S103 in the embodiment corresponding to fig. 3, which will not be described herein again.

In one embodiment, the feature accumulation module 14 may include: a fusion coefficient determination unit 141 and a feature accumulation unit 142.

A fusion coefficient determining unit 141, configured to obtain a logistic regression function, and determine a first feature fusion coefficient for the target image through the logistic regression function and the first mixed embedded feature;

a fusion coefficient determination unit 141, configured to determine a second feature fusion coefficient for the historical image through a logistic regression function and the second mixed embedded feature;

and a feature accumulation unit 142, configured to perform feature accumulation on the target image and the historical image according to the first feature fusion coefficient and the second feature fusion coefficient, so as to obtain a target accumulation feature corresponding to the target image.

For a specific implementation manner of the fusion coefficient determining unit 141 and the feature accumulating unit 142, reference may be made to the description of step S103 in the embodiment corresponding to fig. 3, which will not be described herein again.

In one embodiment, the feature accumulation unit 142 may include: an image feature acquisition sub-unit 1421, a shadow accumulation sub-unit 1422, a noise accumulation sub-unit 1423, an albedo accumulation sub-unit 1424, and an accumulated feature determination sub-unit 1425.

An image feature obtaining subunit 1421, configured to obtain a target noise image feature, a target shadow feature, and a target albedo feature corresponding to a target image, and obtain a historical accumulated noise image feature, a historical accumulated shadow feature, and a historical accumulated albedo feature corresponding to a historical image;

a shadow accumulation subunit 1422, configured to accumulate the target shadow feature and the historical accumulated shadow feature according to the first feature fusion coefficient and the second feature fusion coefficient, so as to obtain a target accumulated shadow feature corresponding to the target image;

a noise accumulation subunit 1423, configured to accumulate the target noise image feature and the historical noise image feature according to the first feature fusion coefficient and the second feature fusion coefficient, so as to obtain a target accumulated noise image feature corresponding to the target acoustic image;

an albedo accumulating subunit 1424, configured to accumulate the target albedo feature and the historical accumulated albedo feature according to the first feature fusion coefficient and the second feature fusion coefficient, to obtain a target accumulated albedo feature corresponding to the target image;

an accumulated feature determining subunit 1425 is configured to determine the target accumulated shadow feature, the target accumulated noise image feature, and the target accumulated albedo feature as target accumulated features corresponding to the target image.

For a specific implementation manner of the image feature obtaining subunit 1421, the shadow accumulation subunit 1422, the noise accumulation subunit 1423, the albedo accumulation subunit 1424, and the accumulated feature determining subunit 1425, reference may be made to the description of step S103 in the embodiment corresponding to fig. 3, which will not be repeated here.

In one embodiment, the image reconstruction module 15 may include: a cascade feature determination unit 151 and a reconstructed image determination unit 152.

A cascade feature determining unit 151, configured to determine a target fusion cascade feature corresponding to the target image according to the target image association feature, the target accumulation feature, and the history image;

and a reconstructed image determining unit 152, configured to input the target fusion cascade feature into a target image reconstruction model, and output a reconstruction result image corresponding to the target image according to the target fusion cascade feature in the target image reconstruction model.

For a specific implementation of the cascade feature determining unit 151 and the reconstructed image determining unit 152, reference may be made to the description of step S104 in the embodiment corresponding to fig. 3, which will not be repeated herein.

the cascade characteristic determination unit 151 may include: a feature fusion subunit 1511, a feature transformation processing subunit 1512, and a feature concatenation subunit 1513.

A feature fusion subunit 1511, configured to obtain an auxiliary feature corresponding to the target image; the assistant features comprise first normal dimension features and first depth dimension features;

the feature fusion subunit 1511 is further configured to determine, as a remaining dimension feature, a dimension feature, in the auxiliary features, other than the first normal dimension feature and the first depth dimension feature;

the feature fusion subunit 1511 is further configured to perform feature fusion on the target cumulative feature, the first normal dimension feature, the first depth dimension feature, and the remaining dimension feature to obtain an image fusion feature corresponding to the target image;

the feature transformation processing subunit 1512 is configured to perform affine transformation on the second normal dimension feature and the second depth dimension feature respectively to obtain a normal transformation feature corresponding to the second normal dimension feature and a depth transformation feature corresponding to the second depth dimension feature;

and the feature cascade subunit 1513 is configured to cascade the image fusion feature, the normal transformation feature, and the depth transformation feature to obtain a target fusion cascade feature corresponding to the target image.

For a specific implementation manner of the feature fusion subunit 1511, the feature transformation processing subunit 1512, and the feature cascade subunit 1513, reference may be made to the description of step S104 in the embodiment corresponding to fig. 3, which will not be described herein again.

In one embodiment, the reconstructed image determination unit 152 may include: a parameter output subunit 1521 and an image fusion subunit 1522.

A parameter output subunit 1521, configured to determine, through the target image reconstruction model, a predicted high-quality image, a predicted low-quality image, a first image reconstruction parameter, and a second image reconstruction parameter corresponding to the target fusion cascade feature; predicting that the high-quality image does not contain noise, wherein the resolution of the predicted high-quality image is greater than that of the predicted low-quality image;

the image fusion subunit 1522 is configured to perform image fusion on the predicted high-quality image and the predicted low-quality image through the first image reconstruction parameter in the target image reconstruction model to obtain an initial reconstructed image corresponding to the target image;

the image fusion subunit 1522 is further configured to acquire a historical reconstructed image corresponding to the historical image, and perform image fusion on the initial reconstructed image and the historical reconstructed image according to the second image reconstruction parameter to obtain a reconstructed result image corresponding to the target image.

For a specific implementation manner of the parameter output subunit 1521 and the image fusion subunit 1522, reference may be made to the description of step S104 in the embodiment corresponding to fig. 3, which will not be described herein again.

In the embodiment of the application, the mixed embedding characteristics corresponding to the two images can be calculated according to the correlation characteristics of the current image and the previous frame of image, the two images are subjected to characteristic accumulation based on the mixed embedding characteristics, the obtained target accumulation characteristics comprise characteristics obtained by fusing the previous frame and the next frame, the fused characteristics can represent the correlation of the previous frame and the next frame, and the characteristic inclusion dimension is wider; meanwhile, when the reconstruction result image of the target image is determined, the determination is also based on the target image association characteristic of the target image, the target accumulation characteristic and the previous frame image (historical image), when the reconstruction result image of the target image is determined, not only the characteristic of the current frame but also the characteristic of the historical frame which is earlier in time are considered, the time domain information of the current frame can be fully utilized, the obtained reconstruction result image has higher time domain stability, and the image quality is higher. In conclusion, no matter the image is obtained by high pixel sampling number or low pixel sampling number, the method and the device can carry out feature accumulation according to the image association features of two frames of images in front of and behind in the target image sequence, and determine the reconstruction result image of the target image together through the accumulated features and the historical image, and have universality; by calculating the accumulated characteristics of the two frames before and after the image reconstruction method based on the accumulated characteristics, the time domain information of the image can be fully utilized, and therefore the image quality of the reconstructed image can be well improved.

Further, please refer to fig. 7, where fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application. The image processing apparatus may be a computer program (including program code) running on a computer device, for example the image processing apparatus being an application software; the image processing apparatus may be adapted to perform the method shown in fig. 4. As shown in fig. 7, the image processing apparatus 2 may include: a sample image acquisition module 21, a sample feature acquisition module 22, a sample feature generation module 23, a sample feature accumulation module 24, a sample feature concatenation module 25, a sample reconstructed image output module 26, a label image acquisition module 27, and a model training module 28.

A sample image obtaining module 21, configured to obtain a target sample noise image and a historical sample noise image in a sample image sequence; the sample image sequence is an image sequence obtained by carrying out image acquisition on a sample scene based on the first unit pixel sampling number; the historical sample noise image is the last sample noise image of the target sample noise image in the sample image sequence; the first unit pixel sampling number is less than the unit pixel sampling number threshold;

the sample feature obtaining module 22 is configured to obtain a target sample image correlation feature corresponding to the target sample noise image and a historical sample image correlation feature corresponding to the historical sample noise image;

a sample feature generation module 23, configured to generate a first sample mixed embedding feature for the target sample noise image according to the target sample image association feature;

the sample feature generation module 23 is further configured to generate a second sample mixed embedded feature for the historical sample noise image according to the target sample image associated feature and the historical sample image associated feature;

the sample feature accumulation module 24 is configured to perform feature accumulation on the target sample noise image and the historical sample noise image based on the first sample mixed embedding feature and the second sample mixed embedding feature, so as to obtain a target sample accumulation feature corresponding to the target sample noise image;

the sample characteristic cascading module 25 is configured to determine a target sample fusion cascading characteristic corresponding to the target sample noise image according to the target sample image association characteristic, the target sample accumulation characteristic, and the historical sample noise image;

the sample reconstructed image output module 26 is configured to input the target sample fusion cascade feature into an image reconstructed model, and output a target sample reconstructed image corresponding to the target sample noise image according to the target sample fusion cascade feature in the image reconstructed model;

a label image obtaining module 27, configured to obtain a target label sampling image corresponding to the target sample noise image; the target label sampling image is an image obtained after image acquisition is carried out on a sample scene based on the second unit pixel sampling number; the first unit pixel sampling number is less than the second unit pixel sampling number;

and the model training module 28 is configured to train the image reconstruction model according to the target label sampling image and the target sample reconstruction image, so as to obtain a target image reconstruction model for performing image reconstruction processing on a target image in the target image sequence.

For a specific implementation manner of the sample image obtaining module 21, the sample feature obtaining module 22, the sample feature generating module 23, the sample feature accumulating module 24, the sample feature cascading module 25, the sample reconstructed image output module 26, the label image obtaining module 27, and the model training module 28, reference may be made to the description of step S301 to step S307 in the embodiment corresponding to fig. 4, which will not be repeated here.

In one embodiment, model training module 28 may include: an image to be operated determining unit 281, a loss value determining unit 282, and a model training unit 283.

The to-be-operated image determining unit 281 is configured to determine a sample reconstructed image corresponding to the residual sample noise image in the sample image sequence as a to-be-operated reconstructed image, and determine a label sampling image corresponding to the residual sample noise image as a to-be-operated label sampling image; the residual sample noise image is a sample noise image except the target sample noise image in the sample image sequence;

a loss value determining unit 282 configured to determine a target loss value for the image reconstruction model according to the to-be-computed reconstructed image, the to-be-computed label sampled image, the target label sampled image, and the target sample reconstructed image;

and the model training unit 283 is used for training the image reconstruction model according to the target loss value to obtain a target image reconstruction model.

For specific implementation of the to-be-operated image determining unit 281, the loss value determining unit 282, and the model training unit 283, reference may be made to the description of step S307 in the embodiment corresponding to fig. 4, which will not be described herein again.

In one embodiment, the loss value determination unit 282 may include: a loss value determining sub-unit 2821, and a target loss value determining sub-unit 2822.

A loss value determining subunit 2821, configured to determine a spatial loss value for the image reconstruction model according to the to-be-computed reconstructed image, the to-be-computed label sampled image, the target label sampled image, and the target sample reconstructed image;

the loss value determining subunit 2821 is further configured to determine a time sequence loss value for the image reconstruction model according to the reconstructed image to be computed, the label sampled image to be computed, the target label sampled image, and the target sample reconstructed image;

the loss value determining subunit 2821 is further configured to determine a relative edge loss value for the image reconstruction model according to the to-be-computed reconstructed image, the to-be-computed label sampled image, the target label sampled image, and the target sample reconstructed image;

and a target loss value determining subunit 2822, configured to determine a target loss value for the image reconstruction model according to the spatial loss value, the temporal loss value, and the relative edge loss value.

For a specific implementation manner of the loss value determining subunit 2821 and the target loss value determining subunit 2822, reference may be made to the description of step S307 in the embodiment corresponding to fig. 4, which will not be described herein again.

In an embodiment, the loss value determining subunit 2821 is further specifically configured to obtain a spatial loss function, and determine a first spatial sub-loss value for the residual sample noise image according to the spatial loss function, the reconstructed image to be computed, and the tag sample image to be computed;

the loss value determining subunit 2821 is further specifically configured to determine, according to the spatial loss function, the target label sampled image, and the target sample reconstructed image, a second spatial sub-loss value for the target sample noise image;

the loss value determining subunit 2821 is further specifically configured to fuse the first spatial sub-loss value and the second spatial sub-loss value to obtain a spatial loss value of the image reconstruction model.

In one embodiment, the loss value determining subunit 2821 is further specifically configured to obtain a first training weight for the residual sample noise image, and a second training weight for the target sample noise image;

the loss value determining subunit 2821 is further specifically configured to perform operation processing on the first training weight and the first spatial sub-loss value, so as to obtain a first operation spatial sub-loss value;

the loss value determining subunit 2821 is further specifically configured to perform operation processing on the second training weight and the second spatial sub-loss value to obtain a second operation spatial sub-loss value;

the loss value determining subunit 2821 is further specifically configured to add the first operation space sub-loss value and the second operation space sub-loss value to obtain a space loss value of the image reconstruction model.

the target loss value determining subunit 2822 is further specifically configured to obtain a history sample reconstructed image and a history label sampling image corresponding to the history sample noise image, and determine an affine transformation loss value for the image reconstruction model according to the history sample reconstructed image, the target sample reconstructed image, the history label sampling image, and the target label sampling image;

the target loss value determining subunit 2822 is further specifically configured to obtain target sample albedo characteristics corresponding to the target sample noise image, obtain target sample accumulated albedo characteristics corresponding to the target sample noise image in the target sample accumulated characteristics, and determine an albedo loss value for the image reconstruction model according to the albedo loss function, the target sample albedo characteristics, and the target sample accumulated albedo characteristics;

the target loss value determining subunit 2822 is further specifically configured to perform loss value fusion on the spatial loss value, the temporal loss value, the relative edge loss value, the affine transformation loss value, and the albedo loss value to obtain a target loss value of the image reconstruction model.

In the embodiment of the application, the mixed embedding characteristics corresponding to the two images can be calculated according to the correlation characteristics of the current image and the previous frame of image, the two images are subjected to characteristic accumulation based on the mixed embedding characteristics, the obtained target accumulation characteristics comprise characteristics obtained by fusing the previous frame and the next frame, the fused characteristics can represent the correlation of the previous frame and the next frame, and the characteristic inclusion dimension is wider; meanwhile, when the reconstruction result image of the target image is determined, the determination is also based on the target image association characteristic of the target image, the target accumulation characteristic and the previous frame image (historical image), when the reconstruction result image of the target image is determined, not only the characteristic of the current frame but also the characteristic of the historical frame which is earlier in time are considered, the time domain information of the current frame can be fully utilized, the obtained reconstruction result image has higher time domain stability, and the image quality is higher. In summary, no matter the image is obtained by high pixel sampling number or low pixel sampling number, the method and the device can perform feature accumulation according to the image association features of two frames of images in front and back in the target image sequence, and determine the reconstruction result image of the target image together through the accumulated features and the historical image, so that the method and the device have universality; by calculating the accumulated characteristics of the two frames before and after the image reconstruction method based on the accumulated characteristics, the time domain information of the image can be fully utilized, and therefore the image quality of the reconstructed image can be well improved.

Further, please refer to fig. 8, where fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 8, the image processing apparatus 1 in the embodiment corresponding to fig. 7 may be applied to the computer device 8000, and the computer device 8000 may include: a processor 8001, a network interface 8004, and a memory 8005, wherein the computer device 8000 further includes: a user interface 8003, and at least one communication bus 8002. The communication bus 8002 is used for connection communication between these components. The user interface 8003 may include a Display (Display) and a Keyboard (Keyboard), and the optional user interface 8003 may further include a standard wired interface and a wireless interface. The network interface 8004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 8005 may be a high-speed RAM memory or a non-volatile memory, such as at least one disk memory. Memory 8005 may optionally also be at least one storage device located remotely from the aforementioned processor 8001. As shown in fig. 8, the memory 8005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the computer device 8000 of fig. 8, a network interface 8004 may provide network communication functions; and user interface 8003 is primarily an interface for providing input to the user; and processor 8001 may be used to invoke a device control application stored in memory 8005 to implement:

Or realize that:

acquiring a target sample noise image and a historical sample noise image in a sample image sequence; the sample image sequence is an image sequence obtained by carrying out image acquisition on a sample scene based on the first unit pixel sampling number; the historical sample noise image is the last sample noise image of the target sample noise image in the sample image sequence; the first unit pixel sampling number is less than the unit pixel sampling number threshold;

It should be understood that the computer device 8000 described in the embodiment of the present application may perform the description of the image processing method in the embodiment corresponding to fig. 3 or fig. 4, and may also perform the description of the image processing apparatus 1 in the embodiment corresponding to fig. 6 or the image processing apparatus 2 in the embodiment corresponding to fig. 7, which are not described again here. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present application further provides a computer-readable storage medium, where a computer program executed by the aforementioned data processing computer device 1000 is stored in the computer-readable storage medium, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the image processing method in the embodiment corresponding to fig. 3 or fig. 4 can be executed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of the embodiments of the method of the present application.

The computer-readable storage medium may be the image processing apparatus provided in any of the foregoing embodiments or an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, provided on the computer device. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the computer device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the computer device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.

The terms "first," "second," and the like in the description and in the claims and drawings of the embodiments of the present application are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprises" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or apparatus that comprises a list of steps or elements is not limited to the listed steps or modules, but may alternatively include other steps or modules not listed or inherent to such process, method, apparatus, product, or apparatus.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The method and the related apparatus provided by the embodiments of the present application are described with reference to the flowchart and/or the structural diagram of the method provided by the embodiments of the present application, and each flow and/or block of the flowchart and/or the structural diagram of the method, and the combination of the flow and/or block in the flowchart and/or the block diagram can be specifically implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block or blocks.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. An image processing method, comprising:

acquiring target image correlation characteristics corresponding to the target image and historical image correlation characteristics corresponding to the historical image;

and determining a reconstruction result image corresponding to the target image according to the target image association characteristic, the target accumulation characteristic and the historical image.

2. The method of claim 1, wherein the target image associated features include a first normal dimension feature and a first depth dimension feature corresponding to the target image;

the generating of the first hybrid embedded feature for the target image according to the target image associated feature comprises:

inputting the first normal dimension feature and the first depth dimension feature into a time sequence accumulation model;

performing convolution processing on the first normal dimension characteristic and the first depth dimension characteristic through a first convolution network layer of the time sequence accumulation model to obtain a first image embedding characteristic corresponding to the target image;

performing convolution processing on the first normal dimension characteristic and the first depth dimension characteristic through a second convolution network layer of the time sequence accumulation model to obtain a second image embedding characteristic corresponding to the target image;

and carrying out pixel multiplication operation processing on the first image embedding feature and the second image embedding feature to obtain a first mixed embedding feature aiming at the target image.

3. The method of claim 1, wherein the target image associated features comprise first normal dimension features and first depth dimension features corresponding to the target image, and wherein the historical image associated features comprise second normal dimension features and second depth dimension features corresponding to the historical image;

the generating a second mixed embedding feature for the historical image according to the target image association feature and the historical image association feature comprises:

performing affine transformation on the second normal dimension feature and the second depth dimension feature respectively to obtain a normal transformation feature corresponding to the second normal dimension feature and a depth transformation feature corresponding to the second depth dimension feature;

inputting the first normal dimension feature, the first depth dimension feature, the normal transformation feature, and the depth transformation feature to a temporal accumulation model;

performing convolution processing on the first normal dimension characteristic and the first depth dimension characteristic through a first convolution network layer of the time sequence accumulation model to obtain a target image embedding characteristic corresponding to the target image;

performing convolution processing on the normal transformation feature and the depth transformation feature through a second convolution network layer of the time sequence accumulation model to obtain a historical image embedding feature corresponding to the historical image;

and carrying out pixel multiplication operation processing on the target image embedding feature and the historical image embedding feature to obtain a second mixed embedding feature aiming at the historical image.

4. The method according to claim 1, wherein the performing feature accumulation on the target image and the historical image based on the first hybrid embedded feature and the second hybrid embedded feature to obtain a target accumulated feature corresponding to the target image comprises:

acquiring a logistic regression function, and determining a first feature fusion coefficient aiming at the target image through the logistic regression function and the first mixed embedded feature;

determining a second feature fusion coefficient for the historical image through the logistic regression function and the second mixed embedded feature;

and performing feature accumulation on the target image and the historical image according to the first feature fusion coefficient and the second feature fusion coefficient to obtain a target accumulated feature corresponding to the target image.

5. The method according to claim 4, wherein the performing feature accumulation on the target image and the historical image according to the first feature fusion coefficient and the second feature fusion coefficient to obtain a target accumulated feature corresponding to the target image comprises:

acquiring a target noise image characteristic, a target shadow characteristic and a target albedo characteristic corresponding to the target image, and acquiring a historical accumulated noise image characteristic, a historical accumulated shadow characteristic and a historical accumulated albedo characteristic corresponding to the historical image;

accumulating the target shadow features and the historical accumulated shadow features according to the first feature fusion coefficient and the second feature fusion coefficient to obtain target accumulated shadow features corresponding to the target image;

accumulating the target noise image features and the historical noise image features according to the first feature fusion coefficient and the second feature fusion coefficient to obtain target accumulated noise image features corresponding to the target sound image;

accumulating the target albedo characteristic and the historical accumulated albedo characteristic according to the first characteristic fusion coefficient and the second characteristic fusion coefficient to obtain a target accumulated albedo characteristic corresponding to the target image;

and determining the target accumulated shadow feature, the target accumulated noise image feature and the target accumulated albedo feature as a target accumulated feature corresponding to the target image.

6. The method according to claim 1, wherein the determining a reconstruction result image corresponding to the target image according to the target image associated feature, the target accumulated feature and the historical image comprises:

determining a target fusion cascade characteristic corresponding to the target image according to the target image association characteristic, the target accumulation characteristic and the historical image;

and inputting the target fusion cascade characteristic into a target image reconstruction model, and outputting a reconstruction result image corresponding to the target image according to the target fusion cascade characteristic in the target image reconstruction model.

7. The method of claim 6, wherein the target image associated features include a first normal dimension feature and a first depth dimension feature; the historical image association features comprise second normal dimension features and second depth dimension features corresponding to the historical images;

the determining the target fusion cascade characteristic corresponding to the target image according to the target image association characteristic, the target accumulation characteristic and the historical image comprises:

acquiring auxiliary features corresponding to the target image; the assist features include the first normal dimension feature and the first depth dimension feature;

determining dimension features, except the first normal dimension feature and the first depth dimension feature, in the auxiliary features as residual dimension features;

performing feature fusion on the target accumulated feature, the first normal dimension feature, the first depth dimension feature and the residual dimension feature to obtain an image fusion feature corresponding to the target image;

and carrying out cascade processing on the image fusion characteristic, the normal transformation characteristic and the depth transformation characteristic to obtain a target fusion cascade characteristic corresponding to the target image.

8. The method according to claim 6, wherein the outputting, in the target image reconstruction model, a reconstruction result image corresponding to the target image according to the target fusion cascade feature comprises:

determining a predicted high-quality image, a predicted low-quality image, a first image reconstruction parameter and a second image reconstruction parameter corresponding to the target fusion cascade feature through the target image reconstruction model; the predicted high-quality image contains no noise, and the resolution of the predicted high-quality image is greater than that of the predicted low-quality image;

in the target image reconstruction model, carrying out image fusion on the predicted high-quality image and the predicted low-quality image through the first image reconstruction parameter to obtain an initial reconstructed image corresponding to the target image;

and acquiring a historical reconstruction image corresponding to the historical image, and performing image fusion on the initial reconstruction image and the historical reconstruction image through the second image reconstruction parameter to obtain a reconstruction result image corresponding to the target image.

9. An image processing method, characterized by comprising:

acquiring a target sample noise image and a historical sample noise image in a sample image sequence; the sample image sequence is an image sequence obtained by carrying out image acquisition on a sample scene based on a first unit pixel sampling number; the historical sample noise image is a previous sample noise image of the target sample noise image in the sample image sequence; the first unit pixel sample number is less than a unit pixel sample number threshold;

inputting the target sample fusion cascade characteristic into an image reconstruction model, and outputting a target sample reconstruction image corresponding to the target sample noise image according to the target sample fusion cascade characteristic in the image reconstruction model;

acquiring a target label sampling image corresponding to the target sample noise image; the target label sampling image is an image obtained after image acquisition is carried out on the sample scene based on a second unit pixel sampling number; the first unit pixel sample number is less than the second unit pixel sample number;

and training the image reconstruction model according to the target label sampling image and the target sample reconstruction image to obtain a target image reconstruction model for performing image reconstruction processing on a target image in a target image sequence.

10. The method of claim 9, wherein the reconstructing an image from the target tag sample image and the target sample, training the image reconstruction model to obtain a target image reconstruction model for performing image reconstruction on a target image in a target image sequence, comprises:

determining a sample reconstruction image corresponding to a residual sample noise image in the sample image sequence as a reconstruction image to be operated, and determining a label sampling image corresponding to the residual sample noise image as a label sampling image to be operated; the residual sample noise image is a sample noise image except the target sample noise image in the sample image sequence;

determining a target loss value aiming at the image reconstruction model according to the to-be-operated reconstructed image, the to-be-operated label sampling image, the target label sampling image and the target sample reconstructed image;

and training the image reconstruction model according to the target loss value to obtain the target image reconstruction model.

11. The method according to claim 10, wherein the determining a target loss value for the image reconstruction model according to the reconstructed image to be operated on, the label-sampled image to be operated on, the target label-sampled image and the target sample reconstructed image comprises:

determining a space loss value aiming at the image reconstruction model according to the to-be-operated reconstructed image, the to-be-operated label sampling image, the target label sampling image and the target sample reconstructed image;

determining a time sequence loss value aiming at the image reconstruction model according to the to-be-operated reconstructed image, the to-be-operated label sampling image, the target label sampling image and the target sample reconstructed image;

determining a relative edge loss value aiming at the image reconstruction model according to the to-be-operated reconstructed image, the to-be-operated label sampling image, the target label sampling image and the target sample reconstructed image;

determining a target loss value for the image reconstruction model from the spatial loss value, the temporal loss value, and the relative edge loss value.

12. The method of claim 11, wherein the target sample noise image is a last sample noise image in the sequence of sample images;

determining a target loss value for the image reconstruction model from the spatial loss value, the temporal loss value, and the relative edge loss value, comprising:

acquiring a historical sample reconstructed image and a historical label sampling image corresponding to the historical sample noise image, and determining an affine transformation loss value aiming at the image reconstruction model according to the historical sample reconstructed image, the target sample reconstructed image, the historical label sampling image and the target label sampling image;

acquiring a target sample albedo characteristic corresponding to the target sample noise image, acquiring a target sample accumulated albedo characteristic corresponding to the target sample noise image in the target sample accumulated characteristic, and determining an albedo loss value aiming at the image reconstruction model according to an albedo loss function, the target sample albedo characteristic and the target sample accumulated albedo characteristic;

and carrying out loss value fusion on the space loss value, the time sequence loss value, the relative edge loss value, the affine transformation loss value and the albedo loss value to obtain a target loss value of the image reconstruction model.

13. An image processing apparatus characterized by comprising:

the characteristic acquisition module is used for acquiring the target image correlation characteristic corresponding to the target image and the historical image correlation characteristic corresponding to the historical image;

14. A computer device, comprising: a processor, a memory, and a network interface;

the processor is coupled to the memory and the network interface, wherein the network interface is configured to provide network communication functionality, the memory is configured to store program code, and the processor is configured to invoke the program code to cause the computer device to perform the method of any of claims 1-12.

15. A computer-readable storage medium, in which a computer program is stored which is adapted to be loaded by a processor and to carry out the method of any one of claims 1 to 12.

16. A computer program product or computer program, characterized in that it comprises computer instructions stored in a computer-readable storage medium, said computer instructions being adapted to be read and executed by a processor, to cause a computer device having said processor to perform the method of any of claims 1-12.