CN116977517A

CN116977517A - Image processing method, device, equipment and readable storage medium

Info

Publication number: CN116977517A
Application number: CN202310090445.2A
Authority: CN
Inventors: 高一鸣; 曹炎培; 单瀛
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-01-19
Filing date: 2023-01-19
Publication date: 2023-10-31

Abstract

The application discloses an image processing method, an image processing device and a readable storage medium, wherein the image processing device comprises the following steps: acquiring a t-th key frame image in a target video; the t-th key frame image is used for presenting a target scene under a first viewing angle; constructing surface elements corresponding to at least two pixels in a dimension space according to depth information corresponding to at least two pixels in a t-th key frame image, first camera parameter information corresponding to a first view angle and image characteristic information corresponding to the t-th key frame image; carrying out surface element fusion on at least two surface elements and the t-1 th global surface element representation to obtain the t-th global surface element representation associated with t key frame images; and carrying out rasterization sampling on the t-th global surface element representation to obtain a sampled global surface element, and rendering a simulated scene image of the target scene at the second view angle according to the sampled global surface element. By adopting the method and the device, the rendering speed and effect of the new view angle image can be improved.

Description

Image processing method, device, equipment and readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method, an image processing device, an image processing apparatus, and a readable storage medium.

Background

The scene reconstruction and rendering refers to using the corresponding global surface element representation of the information such as the graph or the image of the real scene in the dimensional space (usually the three-dimensional space), and reconstructing the digitized scene surface geometry according to the global surface element representation, and rendering the corresponding simulated scene image of the real scene under the specific observation view point.

In the existing scene reconstruction and rendering, a NeRF (Neural Radiance Fields, neural network field) method with a certain generalization capability is generally adopted, the method can learn to directly represent a brand new scene through an input image in the learning of a large amount of data, but for the situation that the reconstruction and rendering of a certain real scene are carried out on an online input image, the method needs to train a NeRF network by using all input images at the current moment for continuously input image streams, so that a corresponding global surface element representation is constructed, the corresponding image content repetition degree between adjacent images is high, but the corresponding image content repetition degree is not completely the same, the redundancy of the finally obtained global surface element representation is serious, furthermore, when the rendering of a corresponding simulated scene image under an online specific observation viewpoint is carried out, hundreds of sampling points are needed for a single pixel, so that hundreds of MLP (Multilayer perceptron, multi-layer perceptron) operations are needed, the operation time is slow, and the rendering effect is poor.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, image processing equipment and a readable storage medium, which can reconstruct a real scene in an incremental mode and improve the rendering speed and effect of a new view angle image.

In one aspect, an embodiment of the present application provides a data processing method, including:

acquiring a t-th key frame image in a target video; the t-th key frame image includes at least two pixels; the t-th key frame image is used for presenting a target scene under a first viewing angle; t is a positive integer greater than 1;

constructing surface elements corresponding to at least two pixels in a dimension space according to depth information corresponding to the at least two pixels, first camera parameter information corresponding to a first visual angle and image characteristic information corresponding to a t-th key frame image;

according to the first camera parameter information, carrying out surface element fusion processing on surface elements corresponding to at least two pixels in a dimensional space and t-1 th global surface element representations associated with t-1 key frame images respectively to obtain t th global surface element representations associated with t key frame images; the image frame time stamp of the t-1 key frame image is earlier than that of the t key frame image; the t key frame images comprise t-1 key frame images and a t key frame image;

And carrying out rasterization sampling according to second camera parameter information corresponding to the second view angle and the t-th global surface element representation to obtain a sampling global surface element, and rendering a simulated scene image of the target scene under the second view angle according to the sampling global surface element.

In one aspect, an embodiment of the present application provides a data processing apparatus, including:

the acquisition module is used for acquiring a t-th key frame image in the target video; the t-th key frame image includes at least two pixels; the t-th key frame image is used for presenting a target scene under a first viewing angle; t is a positive integer greater than 1;

the construction module is used for constructing surface elements corresponding to at least two pixels in a dimension space according to the depth information corresponding to the at least two pixels, the first camera parameter information corresponding to the first visual angle and the image characteristic information corresponding to the t-th key frame image;

the fusion module is used for carrying out surface element fusion processing on surface elements corresponding to at least two pixels in a dimensional space and t-1 th global surface element representations associated with t-1 key frame images respectively according to the first camera parameter information to obtain t th global surface element representations associated with t key frame images; the image frame time stamp of the t-1 key frame image is earlier than that of the t key frame image; the t key frame images comprise t-1 key frame images and a t key frame image;

And the rendering module is used for carrying out rasterization sampling according to the second camera parameter information corresponding to the second view angle and the t-th global surface element representation to obtain a sampling global surface element, and rendering a simulated scene image of the target scene under the second view angle according to the sampling global surface element.

Wherein, acquire the module, include:

the isochronous acquisition unit is used for determining a first image frame time stamp of the t-1 th key frame image in the target video;

the isochronous acquisition unit is further used for adding the first image frame time stamp and the sampling period duration to obtain a second image frame time stamp;

and the isochronal acquisition unit is also used for acquiring a frame image corresponding to the second image frame timestamp from the target video and taking the frame image as a t-th key frame image.

Wherein, acquire the module, include:

the change acquisition unit is used for acquiring historical camera parameter information corresponding to the t-1 key frame image; the historical camera parameter information comprises historical shooting angle information and historical shooting position information;

the change acquisition unit is also used for sequentially acquiring frame images positioned after the t-1 key frame image from the target video to serve as detection frame images;

the change acquisition unit is also used for acquiring the parameter information of the detection camera corresponding to the detection frame image; detecting camera parameter information comprises detecting shooting angle information and detecting shooting position information;

A change acquisition unit for determining a shooting change angle between the historical shooting angle information and the detected shooting angle information, and determining a position change distance between the historical shooting position information and the detected shooting position information;

the change acquisition unit is further configured to determine that the detected frame image is the t-th key frame image if the shooting change angle is greater than or equal to the angle change threshold, or if the position change distance is greater than or equal to the distance change threshold.

Wherein at least two pixels include a pixel M; the first camera parameter information comprises target shooting position information, target shooting orientation information and target shooting focal length information; the target shooting orientation information is used for representing shooting light directions corresponding to the center point pixels in the t-th key frame image; the target shooting position information is used for representing the position information of the camera when the t-th key frame image is shot;

a build module, comprising:

the geometric determining unit is used for determining the geometric position information corresponding to the pixel M in the dimensional space according to the depth information, the target shooting position information, the target shooting orientation information and the target shooting focal length information corresponding to the pixel M;

the feature determining unit obtains pixel image feature information corresponding to the pixel M from image feature information corresponding to the t-th key frame image, and determines element image feature information corresponding to the pixel M in a dimension space according to the pixel image feature information;

And the element determining unit is used for constructing the surface element corresponding to the pixel M in the dimension space according to the geometric position information and the element image characteristic information corresponding to the pixel M in the dimension space.

Wherein the geometry determining unit comprises:

the information determining subunit is used for determining dimension space position information corresponding to the camera when the t-th key frame image is shot in the dimension space according to the target shooting position information;

the information determining subunit is further configured to determine pixel orientation information corresponding to the pixel M according to the positional relationship between the pixel M and the center point pixel, the target shooting orientation information, and the dimensional spatial position information;

the information determining subunit is further configured to determine, according to the pixel orientation information, the dimensional space position information, and the depth information corresponding to the pixel M, position information corresponding to the pixel M in the dimensional space;

the information determination subunit is further used for performing normal estimation processing on the position information and the pixel orientation information to obtain normal information corresponding to the pixel M in the dimension space;

the information determining subunit is further configured to determine radius information corresponding to the pixel M in the dimensional space according to the normal information, the depth information corresponding to the pixel M, and the target shooting focal length information;

The information determining subunit is further configured to determine weight information corresponding to the pixel M according to the radius information and the weight parameter;

and the information summarizing subunit is used for determining the position information, the normal information, the radius information and the weight information as geometric position information corresponding to the pixel M in the dimension space.

Wherein the t-1 th global surface element representation comprises at least two first global surface elements;

a fusion module, comprising:

the first reconstruction unit is used for reconstructing first scene surface geometry corresponding to the target scene in the dimension space according to at least two first global surface elements;

the first projection unit is used for carrying out rasterization projection on the scene surface geometry according to the first camera parameter information, and determining projectable global surface elements in at least two first global surface elements and projectable pixels in at least two pixels; one projectable pixel corresponds to one or more projectable global surface elements;

the updating unit is used for carrying out fusion updating processing on the projectable global surface element according to the surface element corresponding to the projectable pixel to obtain a fusion updated surface element;

the representation determining unit is used for taking the fusion updating surface element, the non-projective global surface element and the surface element corresponding to the non-projective pixel as second global surface elements; the non-projectable global surface element is a first global surface element from which the projectable global surface element is removed from the at least two first global surface elements; the non-projectable pixels are pixels excluding projectable pixels from the at least two pixels;

The representation determining unit is further configured to combine each of the second global surface elements into a t-th global surface element representation.

Wherein the first projection unit includes:

a pixel traversing subunit, configured to traverse at least two pixels and sequentially obtain a kth pixel; k is a positive integer less than or equal to H; h is the total number of pixels corresponding to at least two pixels;

the light projection subunit is used for determining target pixel orientation information corresponding to the kth pixel in the dimension space according to the first camera parameter information;

the light projection subunit is further used for constructing projection light corresponding to the target pixel orientation information in the dimension space;

the ray projection subunit is further configured to determine, if the projected ray intersects the first scene surface geometry in the dimensional space, a first global surface element corresponding to a position where the projected ray intersects the first scene surface geometry as a projectable global surface element corresponding to a kth pixel, and determine the kth pixel as a projectable pixel.

Wherein the updating unit includes:

an element traversing subunit, configured to traverse the projectable global surface element to obtain an ith projectable global surface element; i is a positive integer less than or equal to the total number of projectable global surface elements;

A pixel obtaining subunit, configured to obtain, from the projectable pixels, a projectable pixel corresponding to the i-th projectable global surface element, as a target projection pixel, and use a surface element corresponding to the target projection pixel as a surface element to be fused;

a distance determination subunit for determining a first element depth distance between the i-th projectable global surface element and the surface element to be fused;

a marking subunit configured to mark the target projection pixel if the first element depth distance is less than or equal to the first element depth threshold;

the fusion subunit is used for carrying out fusion treatment on the ith projectable global surface element and the surface element to be fused to obtain a fusion surface element;

an updating subunit, configured to, if the first element depth distance is greater than the first element depth threshold, not mark the target projection pixel, and determine the i-th projectable global surface element as a fused surface element;

and the element processing subunit is used for taking the surface element corresponding to the unlabeled projectable pixel and each fused surface element as a fused updated surface element when the projectable global surface element is traversed.

The fusion subunit is specifically configured to obtain first geometric position information and first element image feature information corresponding to the i-th projectable global surface element; the first geometric position information is determined based on depth information and camera parameter information respectively corresponding to t-1 key frame images, and the first element image characteristic information is determined based on image characteristic information respectively corresponding to t-1 key frame images; acquiring second geometric position information and second element image characteristic information corresponding to the surface elements to be fused; the second geometric position information is determined based on the depth information corresponding to at least two pixels respectively, and the second element image characteristic information is determined based on the image characteristic information corresponding to the t-th key frame image; carrying out weighted fusion processing on the first geometric position information and the second geometric position information to obtain fusion geometric position information; performing feature fusion processing on the first element image feature information and the second element image feature information to obtain fusion element image feature information; and obtaining the fusion surface element according to the fusion geometric position information and the fusion element image characteristic information.

Wherein, rendering module includes:

the second reconstruction unit is used for reconstructing a second scene surface geometry corresponding to the target scene in the dimension space according to each second global surface element contained in the t-th global surface element representation;

the second projection unit is used for carrying out rasterization projection on the surface geometry of the second scene according to the second camera parameter information, and determining sampling global surface elements; the sampling global surface elements comprise one or more global surface elements to be processed, wherein the global surface elements to be processed correspond to at least two pixels to be rendered in an image to be rendered respectively; each global surface to be processedThe elements all belong to the t global surface element representation; at least two pixels to be rendered comprise pixel Z to be rendered _a A is a positive integer less than or equal to the total number of at least two pixels to be rendered;

an adjacent merging unit for pixel Z to be rendered _a Performing adjacent merging processing on the corresponding one or more global surface elements to be processed to obtain global surface elements to be rendered;

a color simulation unit for performing color simulation processing on the global surface element to be rendered and determining a pixel Z to be rendered _a Corresponding colors to be rendered;

and the rendering unit is used for rendering the image to be rendered according to the colors to be rendered corresponding to the at least two pixels to be rendered respectively to obtain the simulated scene image of the first scene under the second view angle.

Wherein the adjacent merging unit includes:

element determination sub-units for respectively determining pixels Z to be rendered _a The view angle distance between each corresponding global surface element to be processed and the second view angle in the dimension space;

an element determination subunit, configured to, in a pixel Z to be rendered _a Acquiring a global surface element to be processed corresponding to the shortest view angle distance from the corresponding one or more global surface elements to be processed as a first combined global surface element;

the element determining subunit is further configured to, if a global surface element to be processed with a view angle distance greater than a view angle distance corresponding to the first merging global surface element exists in the remaining global surface elements to be processed, obtain a global surface element to be processed corresponding to a shortest view angle distance from the remaining global surface elements to be processed, as a second merging global surface element, and determine a second element depth distance between the first merging global surface element and the second merging global surface element; the remaining global surface elements to be processed are pixel Z to be rendered _a The global surface elements to be processed in the corresponding one or more global surface elements to be processed except the first merging global surface element;

the element merging subunit is configured to perform normalized weighted merging on the first merged global surface element and the second merged global surface element if the second element depth distance is less than or equal to the second element depth threshold value, so as to obtain a new first merged global surface element;

The element merging subunit is further configured to determine the first merged global surface element as a global surface element to be rendered and determine the second merged global surface element as a new first merged global surface element if the second element depth distance is greater than the second element depth threshold;

and the element merging subunit is further configured to determine the first merged global surface element as the global surface element to be rendered if no global surface element to be processed with a view angle distance greater than the view angle distance corresponding to the first merged global surface element exists in the remaining global surface elements to be processed.

Wherein the number of global surface elements to be rendered is one or more; each global surface element to be rendered corresponds to geometric position information to be rendered and image characteristic information of the element to be rendered; each geometrical position information to be rendered is based on the pixel Z to be rendered _a The corresponding geometric position information of the one or more global surface elements to be processed is generated respectively; each element image characteristic information to be rendered is based on the pixel Z to be rendered _a The corresponding global surface element or elements to be processed are respectively generated by corresponding element image characteristic information; pixel Z to be rendered _a The corresponding geometric position information of the corresponding global surface element or elements to be processed is determined based on the depth information and the camera parameter information corresponding to the t key frame images respectively; pixel Z to be rendered _a The corresponding element image characteristic information corresponding to the global surface element to be processed is determined based on the image characteristic information corresponding to the t key frame images;

a color simulation unit comprising:

the feature prediction subunit is used for carrying out image feature prediction processing on each global surface element to be rendered according to the geometric position information to be rendered and the image feature information of the element to be rendered, which correspond to each global surface element to be rendered respectively, so as to obtain the opacity and the color of the element, which correspond to each global surface element to be rendered respectively;

a color merging subunit, configured to perform color merging processing on the opacity of the element and the color of the element corresponding to each global surface element to be rendered according to the distances between adjacent elements corresponding to one or more global surface elements to be rendered, so as to obtain a pixel Z to be rendered _a A corresponding color.

In one aspect, an embodiment of the present application provides a computer device, including: a processor, a memory, a network interface;

The processor is connected to the memory and the network interface, where the network interface is used to provide a data communication network element, the memory is used to store a computer program, and the processor is used to call the computer program to execute the method in the embodiment of the present application.

In one aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored therein, the computer program being adapted to be loaded by a processor and to perform a method according to embodiments of the present application.

In one aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium, the computer instructions being read from the computer-readable storage medium by a processor of a computer device, the computer instructions being executed by the processor, causing the computer device to perform a method according to an embodiment of the present application.

In the embodiment of the application, after a t-th key frame image in a target video is acquired, according to depth information corresponding to at least two pixels respectively, first camera parameter information corresponding to a first visual angle and image characteristic information corresponding to the t-th key frame image, constructing surface elements corresponding to at least two pixels respectively in a dimensional space; the method comprises the steps that a t-th key frame image is used for displaying a target scene under a first visual angle, the t-th key frame image comprises at least two pixels, and t is a positive integer larger than 1; then, according to the first camera parameter information, carrying out surface element fusion processing on surface elements corresponding to at least two pixels in a dimensional space and t-1 th global surface element representations associated with t-1 key frame images respectively to obtain t th global surface element representations associated with t key frame images; the image frame time stamps of the t-1 key frame images are all earlier than those of the t-th key frame image, and the t key frame images comprise the t-1 key frame image and the t-th key frame image; and finally, carrying out rasterization sampling according to second camera parameter information corresponding to the second view angle and the t-th global surface element representation to obtain a sampling global surface element, and rendering a simulated scene image of the target scene under the second view angle according to the sampling global surface element. According to the method provided by the embodiment of the application, after new surface elements are obtained according to the new key frame images, surface element fusion is required to be carried out on the global surface element representations related to the key frame images obtained before, so that a great amount of geometric redundancy can be avoided when a target scene is reconstructed according to the global surface element representations, and therefore, the key frame images can be obtained incrementally, and the target scene can be reconstructed more finely continuously; in addition, when a target scene under a new view angle is rendered, the rendering can be completed by rasterizing sampling and only acquiring a small amount of sampling global surface elements, so that the rendering time is reduced, and the rendering effect is improved.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present application;

fig. 2a is a schematic view of a target scene shot according to an embodiment of the present application;

FIG. 2b is a schematic view of a scene reconstruction according to an embodiment of the present application;

FIG. 2c is a schematic view of a scene rendered at a new view angle according to an embodiment of the present application;

fig. 3 is a schematic flow chart of an image processing method according to an embodiment of the present application;

fig. 4 is a schematic view of a scene of shooting change determination according to an embodiment of the present application;

fig. 5 is a schematic flow chart of an image processing method according to an embodiment of the present application;

fig. 6 is a schematic structural view of an image processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Computer Vision (CV) is a science of researching how to make a machine "look at", and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as identifying and measuring objects, and further perform graphic processing, so that the Computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR (optical character recognition, word recognition), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D (Three-dimensional) techniques, virtual reality, augmented reality, synchronous positioning, and map construction, and the like, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and the like.

Referring to fig. 1, fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present application. As shown in fig. 1, the network architecture may include a server 100 and a terminal device cluster, where the terminal device cluster may include: terminal device 10a, terminal device 10b, terminal devices 10c, …, terminal device 10n, wherein any terminal device in the terminal device cluster may have a communication connection with server 100, e.g. a communication connection between terminal device 10a and server 100. The communication connection is not limited to a connection manner, and may be directly or indirectly connected through a wired communication manner, may be directly or indirectly connected through a wireless communication manner, or may be other manners, and the present application is not limited herein.

It should be understood that each terminal device in the terminal device cluster shown in fig. 1 may be provided with an application client, and when the application client runs in each terminal device, data interaction may be performed between the application client and the server 100 shown in fig. 1. The application client can be an application client with image data processing and display functions, such as an instant messaging application, a social application, a live broadcast application, a short video application, a music application, a shopping application, a transaction application, a game application, a novel application, a payment application, a browser and the like. The application client may be an independent client, or may be an embedded sub-client integrated in a client (such as a social client, a game client, etc.), which is not limited herein. Taking a transaction application as an example, the server 100 may include a plurality of servers such as a background server and a data processing server corresponding to the transaction application, so that each terminal device may perform data transmission with the server 100 through an application client corresponding to the social application, for example, for a house to be rented, the terminal device may shoot an indoor scene of the house through an application client corresponding to the transaction application, and may continuously send an indoor scene video that has been shot to the server 100 while shooting, and the server 100 may perform scene reconstruction and rendering of a simulated scene image under a new view angle on line according to the indoor scene video, so that other terminal devices may request to acquire a simulated scene image corresponding to the house to be rented under the new view angle at any time.

In order to reconstruct a real scene in an incremental manner and improve the rendering speed and effect of an image of a simulated scene under a new view angle, the application provides an image processing method, and particularly provides a real scene reconstructing and rendering method based on surface elements. Specifically, after a t-th key frame image in a target video is acquired, according to depth information corresponding to at least two pixels in the t-th key frame image, first camera parameter information corresponding to a first view angle and image characteristic information corresponding to the t-th key frame image, surface elements corresponding to at least two pixels in a dimensional space are constructed. The t-th key frame image is used for presenting a target scene under a first viewing angle; t is a positive integer greater than 1; the surface element is a geometrical representation, which is a representation of a disc-like structure consisting of a point cloud and a corresponding radius. Then, in order to avoid geometric redundancy, according to the first camera parameter information, the application can perform surface element fusion processing on the surface elements corresponding to at least two pixels in the dimensional space and the t-1 th global surface element representation associated with the t-1 key frame images respectively to obtain the t-th global surface element representation associated with the t key frame images. Wherein, the image frame time stamp of the t-1 key frame image is earlier than the t key frame image; the t key frame images include t-1 key frame images and a t-th key frame image. And finally, carrying out rasterization sampling according to second camera parameter information corresponding to the second view angle and the t-th global surface element representation to obtain a sampling global surface element, and rendering a simulated scene image of the target scene under the second view angle according to the sampling global surface element.

It can be understood that after the surface element is constructed according to the t-th key frame image, the surface element is fused with the global surface element representations associated with all the previous t-1 key frame images, so that the problem of geometric redundancy generated by a large number of similar surface elements existing in the same global surface element representation is avoided; in addition, when a target scene under a new view angle is rendered, the rendering can be completed by rasterizing sampling and only acquiring a small amount of sampling global surface elements, so that the rendering time is reduced, and the rendering effect is improved. For a specific implementation of construction of surface elements, surface element fusion processing, rasterization sampling, and rendering of simulated scene images, reference may be made to the description in the corresponding embodiment of fig. 3, which follows.

It will be appreciated that the method provided by the embodiments of the present application may be performed by a computer device, including but not limited to a terminal device or a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing a cloud database, cloud service, cloud computing, cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, basic cloud computing service such as big data and an artificial intelligent platform. The user terminal may be a smart phone, tablet, notebook, desktop, palmtop, mobile internet device (mobile internet device, MID), wearable device (e.g., smart watch, smart bracelet, etc.), smart computer, smart car-mounted, etc. an intelligent terminal that may run an application client with instant session functionality. The terminal device and the server may be directly or indirectly connected through a wired or wireless manner, which is not limited in the embodiment of the present application.

Alternatively, it is understood that the computer device (e.g., the server 100, the terminal device 10a, the terminal device 10b, etc.) may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting the plurality of nodes through a network communication. Among them, a Peer-To-Peer (P2P) network may be formed between nodes, and the P2P protocol is an application layer protocol running on top of a transmission control protocol (TCP, transmission Control Protocol) protocol. In a distributed system, any form of computer device, such as a server, terminal device, etc., can become a node in the blockchain system by joining the point-to-point network. For ease of understanding, the concept of blockchain will be described as follows: the block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like, and is mainly used for sorting data according to time sequence, encrypting the data into an account book, preventing the account book from being tampered and forged, and simultaneously verifying, storing and updating the data. When the computer equipment is a blockchain node, the data (such as a target video, t key frame images, t global surface element representation and the like) in the application can be provided with authenticity and safety due to the non-tamperable characteristic and the anti-counterfeiting characteristic of the blockchain, so that the obtained result is more reliable after relevant data processing is carried out based on the data.

It will be appreciated that in the specific embodiments of the present application, data related to target video, camera parameter information, etc. is required to obtain user approval or consent when the above or below embodiments of the present application are applied to specific products or technologies, and the collection, use and processing of the related data is required to comply with relevant laws and regulations and standards of relevant countries and regions.

It is to be appreciated that embodiments of the present application may be applied to a variety of scenarios including, but not limited to, cloud technology, artificial intelligence, intelligent transportation, virtual reality, etc. For example, the method provided by the embodiment of the application can be applied to a target scene reconstruction scene, namely, by sequentially fusing the surface elements corresponding to a plurality of key frame images in a target video shot in the target scene to obtain a global surface element representation associated with each of the plurality of key frame images, and by using the global surface element representation, the scene surface element of the target scene can be restored, so that a simulated scene image corresponding to the target scene under a new view angle can be rendered. The target scene may be an indoor house scene, an cultural relic display scene, an outdoor landmark scene, and the like.

For easy understanding of the application of the method provided by the embodiment of the present application, please refer to fig. 2 a-2 c, wherein, the terminal device 20a and the terminal device 20c shown in fig. 2 a-2 c may be any one of the terminal devices in the terminal device cluster in the embodiment corresponding to fig. 1, for example, the terminal device 20a may be the terminal device 10a, and the terminal device 20c may be the terminal device 10b; the server 20b shown in fig. 2 a-2 c may be the server 100 in the embodiment corresponding to fig. 1.

Referring to fig. 2a, fig. 2a is a schematic view of shooting a target scene according to an embodiment of the present application. As shown in fig. 2a, the object having an association relationship with the terminal device 20a is the object 1, and the transaction application is integrally installed on the terminal device 20 a. The object 1 wants to rent the idle house 21, the object 1 can perform data interaction through the transaction application of the terminal device 20a and the server 20b, for example, after the object 1 opens the transaction application through the terminal device 20a, the terminal device 20a may display a shooting page 22, the shooting page 22 may display an image display frame 221 and a shooting control 222, the image display frame 221 may display the house 21 under the view angle of the terminal device 20a in real time, and the object 1 may shoot the target video 23 for the house 21 by triggering the shooting control 222. It will be appreciated that during the shooting process, the subject 1 may move the terminal device 20a at any time to shoot houses 21 at different perspectives. The terminal device 20 may then send the target video 23 to the server 20b via the transaction application. It should be noted that, after the shooting of the target video 23 is completed, the terminal device 20a may send the target video 23 to the server 20b; the terminal device 20a may also perform shooting and transmission simultaneously, that is, continuously send the shot video content to the server 20b during the shooting process of the target video 23, and the server 20b may reconstruct the house 21 online in the dimension space according to the continuously received target video 23. Wherein, the dimension space refers to a digital space, and generally refers to a three-dimensional digital space; reconstructing the house 21 refers to building a corresponding stereoscopic model of the house 21 in the dimensional space.

Referring to fig. 2b, fig. 2b is a schematic view of a scene reconstruction according to an embodiment of the present application. As shown in fig. 2b, according to a preset key frame image selection rule (e.g., isochronal sampling, sampling of a change in a photographing angle, sampling of a change in a photographing position), the server 20b may sequentially determine a plurality of key frame images, for example, key frame image 1, key frame image 2, … …, key frame image t, from the target video 23, and it is seen that different key frame images may present the house 21 at different positions or viewing angles. Once the server 20b acquires a key frame image, it may perform surface element fusion processing on the basis of the global surface element representation obtained based on the previous key frame image, to obtain a new global surface element representation. For convenience of understanding, taking the example that the server 20b obtains the key frame image t as shown in fig. 2b, the key frame image t may include at least two pixels, and the server 20b may perform surface element construction processing on the key frame image t according to depth information corresponding to the at least two pixels, first camera parameter information corresponding to a first view angle at which the terminal device 20a is located when the key frame image t is captured, and image feature information corresponding to the key frame image t, so as to obtain surface elements corresponding to the at least two pixels in the dimension space 24 respectively. Where the depth information refers to the distance of the pixel from the shooting source (i.e., camera of the terminal device 20 a). The image feature information may be extracted by a pre-trained image feature extraction model, and may generally include information such as a pixel color feature, a texture feature, and a shape feature corresponding to the key frame image t. Wherein the surface element refers to a geometrical representation of a disc-like structure consisting of a point cloud and corresponding radii, e.g. the surface element 26 corresponding to the pixel 25 as shown in fig. 2 b; because the surface elements are representations of disk structures, there is corresponding geometric positional information such as radius, normal, etc. In addition, the server 20b may store, on the surface element corresponding to each pixel, the element image feature information corresponding to the pixel, that is, the image information of the pixel color feature, texture feature, shape feature, and other features corresponding to the pixel.

Then, as shown in fig. 2b, the server 20b may obtain the t-1 th global surface element representation obtained by fusion, where the t-1 th global surface element representation includes a plurality of first global surface elements, and it is understood that the t-1 th global surface element representation is associated with each of the key frame image 1, the key frame images 2, …, and the key frame image t-1. The server 20b then performs a surface element fusion process on the at least two surface elements and the t-1 th global surface element representation to obtain the t-1 th global surface element representation. The surface element fusion processing is to fuse geometric position information and element image characteristic information of a surface element with an element depth distance smaller than an element depth threshold value and a first global surface element by a pointer, so that more accurate and comprehensive element image characteristic information is obtained under the condition of avoiding occurrence of geometric redundancy. When the server 20b acquires the key frame image t+1, surface element fusion processing is continuously performed on the surface element corresponding to the key frame image t+1 and the t-th global surface element representation, so as to obtain the t+1-th global surface element representation. Note that, the 1 st global surface element represents the global surface element included, that is, the surface element corresponding to the key frame image 1.

It will be appreciated that after the terminal device 20a begins to send the target video 23 to the server 20b, the server 20b can continue to reconstruct the house 21 in the dimension space 24 from the key frame images that have been received, and the server 20b can render new angles of view for the determined global surface element representations at any time. Referring to fig. 2c, fig. 2c is a schematic view of a scene rendered by a new view angle according to an embodiment of the present application. As shown in fig. 2c, the object having an association relationship with the terminal device 20c is object 2, and the transaction application is integrally installed on the terminal device 20 c. Assuming that the object 2 is not satisfied with the house 21 presented by the target video 23, the object 2 wants to learn about the presentation of the house 21 at the second view, and the object 2 can send a second view image acquisition request to the server 20b through the transaction application running in the terminal device 20 c. After the server 20b obtains the second view image obtaining request, it may obtain second camera parameter information corresponding to the second view, and then determine a position of the analog camera 28 corresponding to the second view in the field Jing Biaomian geometry 27. Wherein the scene surface geometry 27 is built in the dimension space based on the most recently obtained global surface element representation by the server 20b, the scene surface geometry 27 is the corresponding stereoscopic model of the house 21 in the dimension space. Then, the server 20b may acquire an image to be rendered, where the image to be rendered may be a default image set in advance and including at least two pixels to be rendered, and the server 20b needs to perform rasterization projection on the scene surface geometry 27 first to determine one or more global surface elements to be processed corresponding to the at least two pixels to be rendered respectively. For ease of understanding, taking the example of determining the global surface element to be processed corresponding to the pixel a to be rendered as an example, as shown in fig. 2c, the server 20b may determine the projection ray 29 corresponding to the pixel a to be rendered according to the second camera parameter information and the position of the simulation camera 28, and then determine the global surface element corresponding to the intersection position of the projection ray 29 and the scene surface geometry 27 as the global surface elements to be processed corresponding to the pixel a to be rendered, that is, the global surface element to be processed 210 and the global surface element to be processed 211. By analogy, the server 20b may obtain one or more global surface elements to be processed for at least two pixels to be rendered, respectively. Then, for one or more global surface elements to be processed corresponding to the single pixel to be rendered, the server 20b may perform proximity merging processing on the global surface elements to be rendered. For example, for the global surface element 210 to be processed and the global surface element 211 to be processed, if the second element depth distance between the two is less than or equal to the second element depth threshold, the two are normalized, weighted and combined; if the second element depth distance between the two is greater than the second element depth threshold, then no merging occurs. Finally, the server 20b may perform the rendering of the simulated scene image under the second view angle according to the global surface element to be rendered, to obtain a simulated scene image corresponding to the house 21 under the second view angle. The server 20b will return the simulated scene image to the terminal device 20c for the subject 2 to learn about the house 21 at more perspectives.

The specific implementation manner of the construction of the surface elements, the surface element fusion processing, the rasterization sampling and the rendering of the simulated scene image in the above scene can be referred to as description in the embodiment corresponding to fig. 3.

Further, referring to fig. 3, fig. 3 is a flowchart of an image processing method according to an embodiment of the present application. The image processing method may be performed by a computer device, which may include a terminal device or a server as described in fig. 1, wherein the image processing method may include at least the following steps S101 to S104:

step S101, acquiring a t-th key frame image in a target video; the t-th key frame image comprises at least two pixels; the t-th key frame image is used for presenting a target scene under a first viewing angle; t is a positive integer greater than 1.

Specifically, when the computer device has an image shooting function, the target video can be directly shot, and the shooting source is the computer device; when the computer device does not have the image capturing function, the target video transmitted by another capturing device (for example, a mobile phone, a camera, etc.) having the image capturing function may be acquired, and the capturing source is the capturing device. The computer device may then extract the key frame images for the target video according to a certain rule, for example, the rule may be a uniform sampling of the time length, or the key frame images may be selected according to the change in shooting angle and position being greater than a certain threshold.

Specifically, if a uniform sampling rule with an equal time length is used to acquire a key frame image, a feasible implementation process of acquiring the t-th key frame image in the target video may be: determining a first image frame timestamp of a t-1 th key frame image in the target video; adding the first image frame time stamp and the sampling period time length to obtain a second image frame time stamp; and acquiring a frame image corresponding to the second image frame time stamp from the target video, and taking the frame image as a t-th key frame image. For example, when the period duration is 15 seconds, the first image frame time stamp of the t-1 th key frame image in the target video is 2 minutes 15 seconds, the second image frame time stamp is 2 minutes 30 seconds, and the computer equipment can acquire the frame image corresponding to 2 minutes 30 seconds from the target video as the t key frame image.

Specifically, if the key frame image is selected according to the change of the shooting angle and the position being greater than a certain threshold, a feasible implementation process of acquiring the t-th key frame image in the target video may be: acquiring historical camera parameter information corresponding to the t-1 key frame image; the historical camera parameter information comprises historical shooting angle information and historical shooting position information; sequentially acquiring frame images positioned behind the t-1 key frame image from the target video to serve as detection frame images; acquiring detection camera parameter information corresponding to a detection frame image; detecting camera parameter information comprises detecting shooting angle information and detecting shooting position information; determining a shooting change angle between the historical shooting angle information and the detection shooting angle information, and determining a position change distance between the historical shooting position information and the detection shooting position information; if the shooting change angle is greater than or equal to the angle change threshold, or the position change distance is greater than or equal to the distance change threshold, determining that the detected frame image is the t-th key frame image. For ease of understanding, please refer to fig. 4, fig. 4 is a schematic view of a scene of shooting change determination according to an embodiment of the present application. As shown in fig. 4, according to the historical shooting location information corresponding to the t-1 th key frame image, the computer device may determine the shooting source 41 corresponding to the t-1 th key frame image in the dimension space 40, and then determine the orientation ray 411 corresponding to the shooting source 41 according to the historical shooting angle information; similarly, the computer device may determine, in the dimensional space 40, the shooting source 42 corresponding to the detected frame image and the orientation relation 421. As shown in fig. 4, the distance between the photographing sources 41 and 42 is a position change distance. In the dimension space, an auxiliary ray 422 parallel to the direction ray 421 and intersecting the direction ray 411 is constructed, and then the included angle formed by the direction ray 411 and the auxiliary ray 422 is the shooting change angle. It is understood that the detected frame image may be determined as the t-th key frame image as long as one of the position change distance and the photographing change angle between the detected frame image and the t-1 th key frame image is greater than a prescribed threshold.

Step S102, constructing surface elements corresponding to the at least two pixels in the dimension space according to the depth information corresponding to the at least two pixels, the first camera parameter information corresponding to the first view angle, and the image feature information corresponding to the t-th keyframe image.

Specifically, a pixel refers to an integral unit or element in the whole image, and the pixel may correspond to a small square in the image, where the small square has a definite position and an assigned color value, and the color and position of the small square determine the appearance of the image.

Specifically, the depth information corresponding to the t-th key frame image refers to a distance from a viewpoint to a corresponding object in the image, where the viewpoint generally refers to a shooting source for shooting the t-th key frame image. The depth information corresponding to the t-th key frame image may be determined by a depth estimation process, where the depth estimation, as the name implies, is to estimate the distance of each pixel in the image from the shooting source using an image of RGB (the colors representing the three red, green, and blue channels by changing the three red (R), green (G), and blue (B) color channels and overlapping them with each other) at one or only one or multiple viewing angles. The depth estimation can be performed in a plurality of modes, for example, a depth camera is adopted when the t-th key frame image is shot, and the depth camera can detect the depth of field distance of a shooting space during shooting, so that the depth information respectively corresponding to at least two pixels contained in the t-th key frame image is determined; or a depth sensor is adopted to collect depth information respectively corresponding to at least two pixels contained in the t key frame image, and then a trained single-frame depth estimation network is adopted to correct the depth information collected by the sensor, so that corrected depth information is obtained; or directly estimating the depth of RGB information respectively corresponding to at least two pixels contained in the t-th key frame image by adopting a depth estimation network of MVS (Multi-view stereo matching) technology, namely estimating three-dimensional information of an object according to a Multi-view image, so as to obtain depth information respectively corresponding to at least two pixels.

Specifically, the first camera parameter information may include target capturing position information, target capturing orientation information, and target capturing focal length information. The target shooting orientation information is used for representing shooting light directions corresponding to the center point pixels in the t-th key frame image; the target shooting position information is used for representing the position information of the camera when the t-th key frame image is shot; the object photographing focal length information refers to a distance between a focal point and a center point of a mirror, and it is understood that a lens of a camera is a group of lenses, and when light rays parallel to a main optical axis pass through the lenses, the light is converged to a point, which is called a focal point, and a distance between the focal point and a center (i.e., an optical center) of the lenses is called a focal length.

Specifically, to better understand the implementation process of one possible embodiment of constructing the surface elements corresponding to the at least two pixels in the dimension space according to the depth information corresponding to the at least two pixels, the first camera parameter information corresponding to the first view angle, and the image feature information corresponding to the t-th keyframe image, the implementation process is illustrated by taking the pixel M included in the at least two pixels as an example: the computer equipment can determine the corresponding geometric position information of the pixel M in the dimension space according to the depth information, the target shooting position information, the target shooting orientation information and the target shooting focal length information corresponding to the pixel M; acquiring pixel image characteristic information corresponding to a pixel M from image characteristic information corresponding to a t-th key frame image, and determining element image characteristic information corresponding to the pixel M in a dimension space according to the pixel image characteristic information; and constructing a surface element corresponding to the pixel M in the dimension space according to the geometric position information and the element image characteristic information corresponding to the pixel M in the dimension space.

One possible embodiment of determining the geometric position information corresponding to the pixel M in the dimensional space according to the depth information, the target shooting position information, the target shooting orientation information, and the target shooting focal length information corresponding to the pixel M may be: according to the target shooting position information, determining dimension space position information corresponding to a camera when shooting a t-th key frame image in a dimension space; determining pixel orientation information corresponding to the pixel M according to the position relation between the pixel M and the center point pixel, the target shooting orientation information and the dimensional space position information; determining the position information corresponding to the pixel M in the dimension space according to the pixel orientation information, the dimension space position information and the depth information corresponding to the pixel M; performing normal estimation processing on the position information and the pixel orientation information to obtain normal information corresponding to the pixel M in a dimension space; determining radius information corresponding to the pixel M in a dimensional space according to the normal information, the depth information corresponding to the pixel M and the target shooting focal length information; according to the radius information and the weight parameters, weight information corresponding to the pixel M is determined; the position information, normal information, radius information, and weight information are determined as geometric position information corresponding to the pixel M in the dimensional space.

The dimension space is a three-dimensional space formed by three dimensions of length, width and height. Wherein a Surface element (Surface) is a geometrical representation, which may also be referred to as Surface representation method, which is a disc-like structure of a point cloud and corresponding radii.

The point cloud is a data set, each point in the data set represents a set of dimensional geometric coordinates in a dimensional space and an intensity value, the intensity value records the intensity of a return signal according to the reflectivity of the surface of the object, the position of the point in the dimensional space can be determined according to the dimensional geometric coordinates corresponding to the point, and when the points are combined together, a point cloud is formed, namely a data point set representing a 3D shape or an object in the dimensional space. Thus, a surface element can be understood as a patch, corresponding to the following element information as geometrical position information: position information, normal information, radius information, and weight information. Wherein the position information may be geometrical coordinate information in a dimension space for characterizing a patch position of the surface element in the dimension space; the normal information may be normal vector information in a dimensional space for characterizing a patch direction of the surface element in the dimensional space; radius information, namely the radius of the surface patch corresponding to the surface element, is determined by the distance from the surface element to the optical center of the camera, and the larger the distance is, the larger the radius is; the weight information refers to the weight/confidence corresponding to the surface element, and is used for judging whether the current face sheet is stable, and subsequent weighted average and other operations; the initialization is based on the distance of the surface element from the camera, the further the distance is, the less the weight is, the less the trust is.

Specifically, as can be seen from the above-mentioned possible embodiment of determining the geometric position information corresponding to the pixel M in the dimension space, according to the depth information and the target capturing position information, the position corresponding to the camera when capturing the t-th key frame image (i.e., the dimension space position information) can be calculated, after determining the camera position, the camera orientation corresponding to the pixel M (i.e., the pixel orientation information) is found, that is, the capturing light direction corresponding to the pixel M (because the orientation inside the camera parameter (the target capturing orientation information) refers to the orientation of the pixel at the center point of the t-th key frame image, the orientation of other pixels can be obtained by performing the position angle change operation), then the distance between the camera orientation corresponding to the pixel M and the camera is the position of the depth (the depth information corresponding to the pixel M), that is the position of the surface element corresponding to the pixel M in the dimension space, and can be recorded as p ^m ∈R ³ . Then, according to the position of the surface element corresponding to the pixel M in the dimension space and the corresponding direction of the pixel M, adopting a normal estimation method to obtain the normal direction of the surface element corresponding to the pixel M in the dimension space, and recording as n ^m ∈R ³ . Then, the computer deviceAccording to the normal information, the depth information corresponding to the pixel M and the target shooting focal length information, determining the radius information corresponding to the pixel M in the dimensional space can be realized by the following formula (1):

Wherein r is the radius information corresponding to the pixel M in the dimension space, and can be recorded as r ^m ∈R ⁺ The method comprises the steps of carrying out a first treatment on the surface of the d refers to depth information corresponding to the pixel M; f is the focal length of the camera (i.e., target shooting focal length information), n _z ^m Is n corresponding to pixel M ^m The magnitude of the z-axis in the normal direction.

Finally, the computing process of the computer device for determining the weight information corresponding to the pixel M according to the radius information and the weight parameter may refer to the following formula (2):

where w refers to the weight corresponding to the pixel M, and may be recorded as w ^m ∈R ⁺ . Sigma is a weight parameter, which may be 0.6.

The image feature information corresponding to the t-th key frame image may be feature extraction of the t-th key frame image by the computer device through a picture feature extractor CNN (Convolutional neural network ). According to the pixel position of the pixel M in the t-th key frame image, the pixel image characteristic information corresponding to the pixel M can be determined, then the pixel image characteristic information corresponding to the pixel M is mapped onto the surface element corresponding to the pixel M in the dimension space, and the element image characteristic information corresponding to the pixel M is obtained. The elemental image characteristic information corresponding to the pixel M can be recorded as f ^m ∈R ^c Where c is the feature dimension.

Specifically, the final computer device constructs the correspondence of the pixel M in the dimension space according to the geometric position information and the element image characteristic information corresponding to the pixel M in the dimension spaceSurface element S of (2) ^m From the above, it can be seen that the surface element S ^m May contain { p } ^m ∈R ³ ，n ^m ∈R ³ ，r ^m ∈R ⁺ ，w ^m ∈R ⁺ ，f ^m ∈R ^c }。

Step S103, according to the first camera parameter information, carrying out surface element fusion processing on surface elements corresponding to the at least two pixels in a dimension space and t-1 th global surface element representations associated with t-1 key frame images respectively to obtain t th global surface element representations associated with t key frame images; the image frame time stamps of the t-1 key frame images are all earlier than those of the t-th key frame image; the t key frame images include the t-1 key frame image and the t-th key frame image.

Specifically, when the computer device obtains the first key frame image from the target video, the surface element corresponding to each pixel in the first key frame image may be constructed according to the method described in the step S101 to the step S102, and then the computer device may determine the surface element corresponding to the first key frame image as the global surface element included in the first global surface element representation; then, when the computer equipment acquires a second key frame image from the target video, still constructing a surface element corresponding to the second key frame image, and then, performing surface fusion processing on the surface element corresponding to the second key frame image and the first global surface element representation by the computer equipment so as to obtain a second global surface element representation; and by analogy, after the computer equipment constructs the surface elements corresponding to at least two pixels in the dimension space in the t-th key frame image, the t-1 global surface element representation can be obtained. It will be appreciated that from the global surface elements contained in a global surface element representation, the computer device may construct a scene surface geometry in the dimensional space, which may be understood as a reconstructed model of the target scene in the dimensional space. Thus, the computer device may be able to incrementally and continuously acquire key frame images from the target video and then continuously reconstruct the scene surface geometry corresponding to the target scene, resulting in a more refined global surface element representation.

Specifically, the t-1 th global surface element represents surface elements respectively corresponding to at least two pixels in the t-th key frame image in a dimensional space, and fusion can be performed based on rasterization projection. Assuming that the t-1 th global surface element representation includes at least two first global surface elements, performing surface element fusion processing on surface elements corresponding to at least two pixels in a dimension space and the t-1 th global surface element representation associated with each of the t-1 keyframe images according to the first camera parameter information to obtain a feasible implementation procedure of the t-th global surface element representation associated with each of the t keyframe images, where the possible implementation procedure may be: reconstructing a first scene surface geometry corresponding to the target scene in a dimension space according to at least two first global surface elements; according to the first camera parameter information, carrying out rasterization projection on scene surface geometry, and determining projectable global surface elements in at least two first global surface elements and projectable pixels in at least two pixels; one projectable pixel corresponds to one or more projectable global surface elements; according to the surface elements corresponding to the projectable pixels, fusion updating processing is carried out on the projectable global surface elements to obtain fusion updated surface elements; and taking the fusion updating surface element, the non-projective global surface element and the surface element corresponding to the non-projective pixel as second global surface elements, and combining each second global surface element into a t-th global surface element representation. Wherein the non-projectable global surface element is a first global surface element from the at least two first global surface elements from which the projectable global surface element is removed; the non-projectable pixel is a pixel from which the projectable pixel is removed from the at least two pixels. The rasterizing projection refers to projecting the first scene surface geometry according to the projection light rays on the directions (which can be determined by the first camera parameter information) corresponding to at least two pixels, for example, the rasterizing projection is performed on the scene surface geometry 27 shown in fig. 2c, that is, the global surface element corresponding to the intersection position between the projection light ray 29 on the direction of the pixel a to be rendered and the scene surface geometry 27 is the global surface element that can be projected to the pixel a to be rendered; the computer device then determines a first global surface element that can be projected onto a pixel of the at least two pixels as a projectable global surface element, and the projectable global surface element projected pixel determines a projectable pixel. For example, the first scene surface geometry includes a first global surface element Q1, a first global surface element Q2, a first global surface element Q3, at least two pixels include a pixel S1 and a pixel S2, and assuming that, through rasterization projection, the first global surface element Q1 and the first global surface element Q2 can both be projected to the pixel S1, the projectable global surface element is the first global surface element Q1 and the first global surface element Q2, the projectable pixel is the pixel S1, the non-projectable global surface element is the first global surface element Q3, and the non-projectable pixel is the pixel S2. Then, the computer device may perform fusion update processing on the surface elements corresponding to the pixels S1 of the first global surface element Q1 and the first global surface element Q2, and assume that the fusion updated surface element R1 and the fusion updated surface element R2 are obtained. Finally, the computer device may use the fusion updated surface element R1 and the fusion updated surface element R2, the first global surface element Q3, and the surface element corresponding to the pixel S2 as the second global surface element, and then obtain the representation of the t-th global surface element. And performing fusion updating processing on the projective global surface element according to the surface element corresponding to the projective pixel, wherein obtaining the fusion updating surface element refers to performing geometric position information fusion on geometric position information corresponding to the surface element corresponding to the projective pixel and geometric position information corresponding to the projective global surface element according to a fusion updating rule, and performing geometric position information fusion on element image characteristic information corresponding to the surface element corresponding to the projective pixel and element image characteristic information corresponding to the projective global surface element. Wherein the fusion update rule may be determined based on actual conditions, e.g., weighted fusion.

Step S104, carrying out rasterization sampling according to second camera parameter information corresponding to a second view angle and the t-th global surface element representation to obtain a sampling global surface element, and rendering a simulated scene image of the target scene under the second view angle according to the sampling global surface element.

Specifically, the t global surface element representation is subjected to rasterization sampling according to the second camera parameter information corresponding to the second view angle, so that sampled global surface elements, namely a plurality of global surface elements to be processed corresponding to each pixel to be rendered in the image to be rendered, can be obtained. The process of rasterizing the sample may be referred to as the process of determining the global surface element to be processed corresponding to the pixel a to be rendered shown in fig. 2 c. It will be appreciated that, after rasterization, for each pixel to be rendered, the computer device may obtain a plurality of near-to-far global surface elements to be processed that cover the pixel to be rendered after rasterization, and the computer device may perform a neighboring merging process, that is, merge neighboring surface elements to obtain the global surface element to be rendered. Then, the computer device may render the surface element to be rendered based on the rendering model (multi-layer MLP network), resulting in a simulated scene image of the first scene at the second perspective. The merging of adjacent surface elements can be performed based on a preset merging rule, and the merging rule can be determined according to practical situations, for example, weighted merging.

By adopting the method provided by the embodiment of the application, after obtaining the new surface element according to the new key frame image, the surface element fusion is required to be carried out on the global surface element representation related to the key frame image obtained before, so that a great amount of geometric redundancy can be avoided when the target scene is reconstructed according to the global surface element representation, and therefore, the key frame image can be obtained in an incremental mode, and the target scene can be reconstructed in a more refined mode continuously; in addition, when a target scene under a new view angle is rendered, the rendering can be completed by rasterizing sampling and only acquiring a small amount of sampling global surface elements, so that the rendering time is reduced, and the rendering effect is improved.

Further, referring to fig. 5, fig. 5 is a flowchart of an image processing method according to an embodiment of the present application. The image processing method may be performed by a computer device, which may include a terminal device or a server as described in fig. 1, wherein the image processing method may include at least the following steps S201 to S207:

step S201, acquiring a t-th key frame image in a target video; the t-th key frame image comprises at least two pixels; the t-th key frame image is used for presenting a target scene under a first viewing angle; t is a positive integer greater than 1.

Step S202, constructing surface elements corresponding to the at least two pixels in the dimension space according to the depth information corresponding to the at least two pixels, the first camera parameter information corresponding to the first view angle, and the image feature information corresponding to the t-th keyframe image.

Specifically, the implementation process of step S201 to step S202 may refer to the description process of step S101 to step 102 in the embodiment corresponding to fig. 3, which is not described herein.

Step S203, reconstructing a first scene surface geometry corresponding to the target scene in the dimension space according to at least two first global surface elements included in the t-1 th global surface element representation.

Specifically, the first scene surface geometry is a simulation model corresponding to the target scene in the dimension space.

Step S204, performing rasterization projection on the scene surface geometry according to the first camera parameter information, and determining a projectable global surface element of the at least two first global surface elements, and a projectable pixel of the at least two pixels; one projectable pixel corresponds to one or more projectable global surface elements.

Specifically, the computer device may traverse at least two pixels, sequentially obtaining the kth pixel; k is a positive integer less than or equal to H; h is the total number of pixels corresponding to at least two pixels; determining target pixel orientation information corresponding to the kth pixel in the dimension space according to the first camera parameter information; constructing projection rays corresponding to the target pixel orientation information in a dimension space; if the projection ray intersects the first scene surface geometry in the dimension space, determining a first global surface element corresponding to the intersection position of the projection ray and the first scene surface geometry as a projectable global surface element corresponding to a kth pixel, and determining the kth pixel as a projectable pixel. Alternatively, if the projected ray does not intersect the first scene surface geometry in the dimensional space, indicating that the kth pixel does not have a corresponding first global surface element, the kth pixel may be determined directly as a non-projectable pixel.

Step S205, performing fusion update processing on the projectable global surface element according to the surface element corresponding to the projectable pixel, to obtain a fusion updated surface element.

Specifically, the computer device traverses the projectable global surface element to obtain an ith projectable global surface element; i is a positive integer less than or equal to the total number of projectable global surface elements; then, a projectable pixel corresponding to the ith projectable global surface element is obtained from projectable pixels and used as a target projection pixel, and a surface element corresponding to the target projection pixel is used as a surface element to be fused; determining a first element depth distance between an ith projectable global surface element and a surface element to be fused; if the depth distance of the first element is smaller than or equal to the depth threshold value of the first element, marking a target projection pixel, and carrying out fusion treatment on the i-th projectable global surface element and the surface element to be fused to obtain a fused surface element; if the first element depth distance is greater than the first element depth threshold, not marking the target projection pixel, and determining the ith projectable global surface element as a fusion surface element; when the projectable global surface element is traversed, the surface element corresponding to the unmarked projectable pixel and each fusion surface element are used as fusion updating surface elements. The first element depth distance refers to the depth distance between the ith projectable global surface element and the surface element to be fused in the dimension space.

It will be appreciated that when the depth distance of the first element between the i-th projectable global surface element and the surface element to be fused exceeds the depth threshold of the first element, the influence and the contact ratio between the i-th projectable global surface element and the surface element to be fused can be considered to be low, and fusion processing is not needed, at this time, the surface element to be fused is not fused into the i-th projectable global surface element, so that the computer device can directly determine the i-th projectable global surface element as the fused surface element. However, since the surface element corresponding to the target projection pixel may correspond to a plurality of projectable surface elements, the surface element to be fused cannot be directly determined as the fused surface element at this time, and thus, whether the surface element corresponding to the target projection pixel participates in fusion may be determined in a marked manner. When the projectable global surface element is traversed, the computer equipment takes the surface element corresponding to the unlabeled projectable pixel and each fusion surface element as a fusion update surface element.

Specifically, the fusing treatment is performed on the i-th projectable global surface element and the surface element to be fused, so as to obtain a feasible implementation process of the fused surface element, which may be: acquiring first geometric position information and first element image characteristic information corresponding to an ith projectable global surface element; the first geometric position information is determined based on depth information and camera parameter information respectively corresponding to t-1 key frame images, and the first element image characteristic information is determined based on image characteristic information respectively corresponding to t-1 key frame images; acquiring second geometric position information and second element image characteristic information corresponding to the surface elements to be fused; the second geometric position information is determined based on the depth information corresponding to at least two pixels respectively, and the second element image characteristic information is determined based on the image characteristic information corresponding to the t-th key frame image; carrying out weighted fusion processing on the first geometric position information and the second geometric position information to obtain fusion geometric position information; performing feature fusion processing on the first element image feature information and the second element image feature information to obtain fusion element image feature information; and obtaining the fusion surface element according to the fusion geometric position information and the fusion element image characteristic information.

Specifically, as can be seen from the above step S102, the geometric position information corresponding to the surface element includes a position, a normal direction, a radius and a weight, and then the first geometric position information and the second geometric position information are weighted and fused to obtain fused geometric position information, where the following formula (3) -formula (6) may be adopted:

wherein, the liquid crystal display device comprises a liquid crystal display device,and +.>The position, normal direction, radius and weight corresponding to the ith fusion surface element are respectively, namely fusion geometric position information corresponding to the ith fusion surface element; />And +.>The positions, normal directions, radiuses and weights corresponding to the surface elements to be fused are respectively; />And +.>The position, normal direction, radius and weight corresponding to the i-th projectable global surface element in the t-1 global surface element representation are respectively.

Specifically, the feature fusion processing is performed on the first element image feature information and the second element image feature information to obtain a feasible implementation process of the fused element image feature information, which may be: and carrying out feature fusion processing on the first element image feature information and the second element image feature information through a feature update model to obtain fusion element image feature information. The feature update model may be obtained by training a GRU (Gate Recurrent Unit, recurrent neural network) network, and the feature fusion process may be specifically expressed as follows:

Wherein, the liquid crystal display device comprises a liquid crystal display device,i.e. the fusion element image characteristic information corresponding to the ith fusion surface element; />The second element image characteristic information corresponding to the surface elements to be fused; />Refers to the first element image characteristic information corresponding to the i-th projectable global surface element in the t-1 global surface element representation.

Step S206, the fusion updated surface element, the non-projectable global surface element and the surface element corresponding to the non-projectable pixel are all used as second global surface elements, and each second global surface element is combined into a t-th global surface element representation.

Specifically, the non-projectable global surface element is a first global surface element from which the projectable global surface element is removed from the at least two first global surface elements; the non-projectable pixel is a pixel from which the projectable pixel is removed from the at least two pixels.

Step S207, according to second camera parameter information corresponding to a second view angle and the t-th global surface element representation, carrying out rasterization sampling to obtain a sampling global surface element, and according to the sampling global surface element, rendering a simulated scene image of the target scene under the second view angle.

Specifically, the computer equipment reconstructs second scene surface geometry corresponding to the target scene in the dimensional space according to each second global surface element contained in the t-th global surface element representation; according to the second camera parameter information, carrying out rasterization projection on the second scene surface geometry, and determining sampling global surface elements; the sampling global surface elements comprise one or more global surface elements to be processed, wherein the global surface elements to be processed correspond to at least two pixels to be rendered in an image to be rendered respectively; each global surface element to be processed belongs to the t-th global surface element representation; at least two pixels to be rendered comprise pixel Z to be rendered _a A is a positive integer less than or equal to the total number of at least two pixels to be rendered; pixel Z to be rendered _a Performing adjacent merging processing on the corresponding one or more global surface elements to be processed to obtain global surface elements to be rendered; performing color simulation processing on the global surface element to be rendered, and determining a pixel Z to be rendered _a Corresponding colors to be rendered; and rendering the image to be rendered according to the colors to be rendered corresponding to the at least two pixels to be rendered respectively to obtain a simulated scene image of the first scene under the second view angle. According to the colors to be rendered corresponding to at least two pixels to be rendered respectively, rendering the images to be rendered, wherein a differential volume rendering method (volume rendering) can be adopted to render through a rendering model.

Specifically, pixel Z to be rendered _a The corresponding one or more global surface elements to be processed are subjected to adjacent merging processing, so as to obtain a feasible implementation process of the global surface elements to be rendered, which may be: determining the pixel Z to be rendered respectively _a The view angle distance between each corresponding global surface element to be processed and the second view angle in the dimension space; at pixel Z to be rendered _a The global surface element to be processed corresponding to the shortest view angle distance is obtained from the corresponding global surface element to be processed or the corresponding global surface elements to be processed, and is used as a first combination And global surface elements; if the to-be-processed global surface element with the view angle distance larger than the view angle distance corresponding to the first merging global surface element exists in the remaining to-be-processed global surface elements, acquiring the to-be-processed global surface element corresponding to the shortest view angle distance from the remaining to-be-processed global surface element as a second merging global surface element, and determining a second element depth distance between the first merging global surface element and the second merging global surface element; the remaining global surface elements to be processed are pixel Z to be rendered _a The global surface elements to be processed in the corresponding one or more global surface elements to be processed except the first merging global surface element; if the depth distance of the second element is smaller than or equal to the depth threshold value of the second element, carrying out normalized weighted combination on the first combined global surface element and the second combined global surface element to obtain a new first combined global surface element; if the second element depth distance is greater than the second element depth threshold, determining the first merged global surface element as a global surface element to be rendered, and determining the second merged global surface element as a new first merged global surface element; and if the to-be-processed global surface elements with the view angle distance larger than the view angle distance corresponding to the first merging global surface element do not exist in the rest to-be-processed global surface elements, determining the first merging global surface element as the to-be-rendered global surface element.

Specifically, the number of global surface elements to be rendered is one or more; each global surface element to be rendered corresponds to geometric position information to be rendered and image characteristic information of the element to be rendered; each geometrical position information to be rendered is based on the pixel Z to be rendered _a The corresponding geometric position information of the one or more global surface elements to be processed is generated respectively; each element image characteristic information to be rendered is based on the pixel Z to be rendered _a The corresponding global surface element or elements to be processed are respectively generated by corresponding element image characteristic information; pixel Z to be rendered _a The corresponding geometric position information of the corresponding global surface element or elements to be processed is determined based on the depth information and the camera parameter information corresponding to the t key frame images respectively; to be renderedPixel Z _a The corresponding element image characteristic information corresponding to the one or more global surface elements to be processed is determined based on the image characteristic information corresponding to the t key frame images.

Specifically, the global surface element to be rendered is subjected to color simulation processing, and the pixel Z to be rendered is determined _a One possible implementation of the corresponding color may be: according to the geometric position information to be rendered and the image characteristic information of the elements to be rendered, which correspond to each global surface element to be rendered respectively, image characteristic prediction processing is carried out on each global surface element to be rendered, so that the opacity and the color of the elements, which correspond to each global surface element to be rendered respectively, are obtained; according to the adjacent element distance corresponding to one or more global surface elements to be rendered, carrying out color combination processing on the element opacity and the element color corresponding to each global surface element to be rendered respectively to obtain a pixel Z to be rendered _a A corresponding color.

According to the geometric position information to be rendered and the image characteristic information of the elements to be rendered, which correspond to each global surface element to be rendered respectively, image characteristic prediction processing is performed on each global surface element to be rendered, so as to obtain a feasible implementation process of the opacity and the color of the elements, which correspond to each global surface element to be rendered respectively, which may be: inputting the geometric position information to be rendered and the image characteristic information of the elements to be rendered, which correspond to each global surface element to be rendered, into an image characteristic prediction model, and performing image characteristic prediction processing on each global surface element to be rendered through the image characteristic prediction model to obtain the opacity and the color of the elements, which correspond to each global surface element to be rendered, respectively. Wherein, the image characteristic prediction model can adopt a multi-layer MLP network.

According to the adjacent element distance corresponding to one or more global surface elements to be rendered, performing color combination processing on the element opacity and the element color corresponding to each global surface element to be rendered respectively to obtain a pixel Z to be rendered _a The corresponding colors can be expressed specifically as:

C _p ＝∑ ^N τ _i (1-exp(-σ _i Δ _i ))c _i formula (8)

Wherein C is _p Refers to the color corresponding to the pixel Za to be rendered; τ _i Can be determined by equation (9); n is the pixel Z to be rendered _a The number of corresponding global surface elements to be rendered; sigma (sigma) _i Refers to the opacity corresponding to the ith global surface element to be rendered; c _i Refers to the color corresponding to the ith global surface element to be rendered; delta _i Representing a distance between an i-th set of adjacent global surface elements to be rendered; delta _j Representing the distance between the j-th set of adjacent global surface elements to be rendered.

Optionally, for the model parameters in the image algorithm model (i.e. the picture feature extractor, the image feature prediction model, and the rendering model) applied in the above process, the target video may be obtained by training the existing sample library before the computer device acquires the target video. In addition, the computer equipment can further train the image algorithm model continuously in the process of scene reconstruction and new view angle image rendering according to the target video, and optimize model parameters, so that all the image algorithm models can be trained end to end. Specifically, in any image algorithm model training process, a large number of target videos can be used, then a key frame image is selected to make network input, then a non-key frame image is selected to make a training supervision signal, in the process of the steps S201-S207, a computer device can obtain a rendering result of a new view angle (namely a simulated scene image) according to camera parameters, in the training process, the camera parameters for rendering the new view angle can be changed into the camera parameters of the non-key frame image, and then rendering is performed, because the computer device can determine the real output corresponding to the non-key frame image, the real output corresponding to the non-key frame image can be used as a correct supervision signal, namely a trained label. The computer device can calculate the loss distance between the rendering result and the correct non-key frame image as a loss function of the image algorithm model, and then update the gradient by a gradient back-transfer method, thereby training the image algorithm model.

By using the method provided by the embodiment of the application, the geometric position information and the element image characteristic information of the point can be stored on the surface element, the combination of the geometric position information and the element image characteristic information of the similar surface element can be rapidly carried out by using the advantages of surface element representation, and the surface element obtained after the rasterization corresponding to each pixel point can be rapidly rendered by rasterizing the surface element; in addition, the parameters of the used image algorithm model can be optimized when a new scene is rendered, so that a better rendering effect is achieved.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The data processing apparatus may be a computer program (including program code) running on a computer device, for example the data processing apparatus is an application software; the device can be used for executing corresponding steps in the data processing method provided by the embodiment of the application. As shown in fig. 6, the data processing apparatus 1 may include: the system comprises an acquisition module 11, a construction module 12, a fusion module 13 and a rendering module 14.

An acquiring module 11, configured to acquire a t-th key frame image in a target video; the t-th key frame image includes at least two pixels; the t-th key frame image is used for presenting a target scene under a first viewing angle; t is a positive integer greater than 1;

The construction module 12 is configured to construct surface elements corresponding to at least two pixels in a dimension space according to depth information corresponding to at least two pixels, first camera parameter information corresponding to a first view angle, and image feature information corresponding to a t-th key frame image;

the fusion module 13 is configured to perform surface element fusion processing on surface elements corresponding to at least two pixels in a dimensional space and a t-1 th global surface element representation associated with each of the t-1 key frame images according to the first camera parameter information, so as to obtain a t-th global surface element representation associated with each of the t key frame images; the image frame time stamp of the t-1 key frame image is earlier than that of the t key frame image; the t key frame images comprise t-1 key frame images and a t key frame image;

and the rendering module 14 is configured to perform rasterization sampling according to the second camera parameter information corresponding to the second view angle and the t-th global surface element representation, obtain a sampled global surface element, and render a simulated scene image of the target scene under the second view angle according to the sampled global surface element.

The specific implementation manners of the obtaining module 11, the constructing module 12, the fusing module 13, and the rendering module 14 may be referred to the description in steps S101-S104 in the embodiment corresponding to fig. 3, and will not be described herein.

Wherein, the acquisition module 11 comprises: an isochronous acquisition unit 111.

An isochronous acquisition unit 111 for determining a first image frame timestamp of a t-1 th key frame image in a target video;

the isochronous acquisition unit 111 is further configured to add the first image frame timestamp to the sampling period duration to obtain a second image frame timestamp;

the isochronous acquisition unit 111 is further configured to acquire a frame image corresponding to the second image frame timestamp from the target video, as a t-th key frame image.

For a specific implementation of the isochronous acquisition unit 111, reference may be made to the description in step S101 in the embodiment corresponding to fig. 3, and a detailed description will not be given here.

Wherein, the obtaining module 111 includes: a change acquisition unit 112.

A change acquiring unit 112, configured to acquire historical camera parameter information corresponding to the t-1 th key frame image; the historical camera parameter information comprises historical shooting angle information and historical shooting position information;

a change acquiring unit 112, configured to sequentially acquire frame images located after the t-1 th key frame image from the target video, as detected frame images;

the change obtaining unit 112 is further configured to obtain detection camera parameter information corresponding to the detection frame image; detecting camera parameter information comprises detecting shooting angle information and detecting shooting position information;

A change acquisition unit 112 further configured to determine a shooting change angle between the historical shooting angle information and the detected shooting angle information, and determine a position change distance between the historical shooting position information and the detected shooting position information;

the change obtaining unit 112 is further configured to determine that the detected frame image is the t-th key frame image if the photographing change angle is greater than or equal to the angle change threshold, or if the position change distance is greater than or equal to the distance change threshold.

For a specific implementation manner of the change obtaining unit 112, reference may be made to the description in step S101 in the embodiment corresponding to fig. 3, and a detailed description will not be repeated here.

build module 12 includes: a geometry determination unit 121, a feature determination unit 122, and an element determination unit 123.

A geometry determining unit 121, configured to determine geometry position information corresponding to the pixel M in the dimensional space according to depth information, target shooting position information, target shooting orientation information, and target shooting focal length information corresponding to the pixel M;

a feature determining unit 122, configured to obtain pixel image feature information corresponding to the pixel M from image feature information corresponding to the t-th key frame image, and determine element image feature information corresponding to the pixel M in the dimension space according to the pixel image feature information;

an element determining unit 123, configured to construct a surface element corresponding to the pixel M in the dimension space according to the geometric position information and the element image feature information corresponding to the pixel M in the dimension space.

The specific implementation manner of the geometry determining unit 121, the feature determining unit 122, and the element determining unit 123 may be referred to the description in step S102 in the embodiment corresponding to fig. 3, and will not be described herein.

Wherein the geometry determining unit 121 includes: information determination subunit 1211 and information summarization subunit 1212.

An information determination subunit 1211, configured to determine, in a dimension space, dimension space position information corresponding to the camera when the t-th key frame image is captured, according to the target capturing position information;

The information determining subunit 1211 is further configured to determine, according to the positional relationship between the pixel M and the center point pixel, the target shooting orientation information, and the dimensional spatial position information, pixel orientation information corresponding to the pixel M;

the information determining subunit 1211 is further configured to determine, according to the pixel orientation information, the dimensional space position information, and the depth information corresponding to the pixel M, position information corresponding to the pixel M in the dimensional space;

the information determining subunit 1211 is further configured to perform normal estimation processing on the position information and the pixel orientation information, so as to obtain normal information corresponding to the pixel M in the dimensional space;

the information determining subunit 1211 is further configured to determine radius information corresponding to the pixel M in the dimensional space according to the normal information, the depth information corresponding to the pixel M, and the target shooting focal length information;

the information determining subunit 1211 is further configured to determine weight information corresponding to the pixel M according to the radius information and the weight parameter;

the information summarizing subunit 1212 is configured to determine the position information, the normal information, the radius information, and the weight information as geometric position information corresponding to the pixel M in the dimension space.

For a specific implementation manner of the information determining subunit 1211 and the information summarizing subunit 1212, reference may be made to the description in step S102 in the embodiment corresponding to fig. 3, which will not be repeated here.

a fusion module 13 comprising: a first reconstruction unit 131, a first projection unit 132, an update unit 133, and a representation determination unit 134.

The first reconstruction unit 131 is configured to reconstruct, according to at least two first global surface elements, a first scene surface geometry corresponding to the target scene in the dimensional space;

a first projection unit 132, configured to rasterize and project a scene surface geometry according to the first camera parameter information, and determine a projectable global surface element of the at least two first global surface elements, and a projectable pixel of the at least two pixels; one projectable pixel corresponds to one or more projectable global surface elements;

an updating unit 133, configured to perform fusion updating processing on the projectable global surface element according to the surface element corresponding to the projectable pixel, so as to obtain a fusion updated surface element;

a representation determining unit 134, configured to use the fusion updated surface element, the non-projectable global surface element, and the surface element corresponding to the non-projectable pixel as the second global surface element; the non-projectable global surface element is a first global surface element from which the projectable global surface element is removed from the at least two first global surface elements; the non-projectable pixels are pixels excluding projectable pixels from the at least two pixels;

The representation determining unit 134 is further configured to combine each second global surface element into a t-th global surface element representation.

The specific implementation manners of the first reconstruction unit 131, the first projection unit 132, the update unit 133, and the representation determining unit 134 may be referred to the descriptions in steps S203-S206 in the embodiment corresponding to fig. 5, and will not be described herein.

Wherein the first projection unit 132 includes: the pixel traversal subunit 1321 and the ray projection subunit 1322.

A pixel traversing subunit 1321, configured to traverse at least two pixels and sequentially obtain a kth pixel; k is a positive integer less than or equal to H; h is the total number of pixels corresponding to at least two pixels;

a light projection subunit 1322, configured to determine, according to the first camera parameter information, target pixel orientation information corresponding to the kth pixel in the dimensional space;

the light projection subunit 1322 is further configured to construct a projection light corresponding to the target pixel orientation information in the dimension space;

the ray projection subunit 1322 is further configured to determine, if the projected ray intersects the first scene surface geometry in the dimensional space, a first global surface element corresponding to a position where the projected ray intersects the first scene surface geometry as a projectable global surface element corresponding to a kth pixel, and determine the kth pixel as a projectable pixel.

The specific implementation manner of the pixel traversing subunit 1321 and the light projecting subunit 1322 may be referred to the description in step S204 in the embodiment corresponding to fig. 5, which will not be described herein.

Wherein the updating unit 133 includes: element traversal subunit 1331, pixel acquisition subunit 1332, distance determination subunit 1333, tagging subunit 1334, fusion subunit 1335, update subunit 1336, and element processing subunit 1337.

An element traversal subunit 1331 configured to traverse the projectable global surface element to obtain an i-th projectable global surface element; i is a positive integer less than or equal to the total number of projectable global surface elements;

a pixel acquiring subunit 1332, configured to acquire, from the projectable pixels, a projectable pixel corresponding to the i-th projectable global surface element, as a target projection pixel, and use a surface element corresponding to the target projection pixel as a surface element to be fused;

a distance determination subunit 1333 for determining a first element depth distance between the ith projectable global surface element and the surface element to be fused;

a labeling subunit 1334 configured to label the target projected pixel if the first element depth distance is less than or equal to the first element depth threshold;

A fusion subunit 1335, configured to perform fusion processing on the i-th projective global surface element and the surface element to be fused, so as to obtain a fused surface element;

an update subunit 1336 configured to determine an i-th projectable global surface element as a fused surface element without marking the target projected pixel if the first element depth distance is greater than the first element depth threshold;

an element processing subunit 1337, configured to, when traversing the projectable global surface element, use the surface element corresponding to the untagged projectable pixel and each fused surface element as a fused updated surface element.

The merging subunit 1335 is specifically configured to obtain first geometric location information and first element image feature information corresponding to the i-th projective global surface element; the first geometric position information is determined based on depth information and camera parameter information respectively corresponding to t-1 key frame images, and the first element image characteristic information is determined based on image characteristic information respectively corresponding to t-1 key frame images; acquiring second geometric position information and second element image characteristic information corresponding to the surface elements to be fused; the second geometric position information is determined based on the depth information corresponding to at least two pixels respectively, and the second element image characteristic information is determined based on the image characteristic information corresponding to the t-th key frame image; carrying out weighted fusion processing on the first geometric position information and the second geometric position information to obtain fusion geometric position information; performing feature fusion processing on the first element image feature information and the second element image feature information to obtain fusion element image feature information; and obtaining the fusion surface element according to the fusion geometric position information and the fusion element image characteristic information.

For the specific implementation manners of the element traversing subunit 1331, the pixel acquiring subunit 1332, the distance determining subunit 1333, the marking subunit 1334, the merging subunit 1335, the updating subunit 1336 and the element processing subunit 1337, reference may be made to the description in step S205 in the embodiment corresponding to fig. 5, and the description will not be repeated here.

Wherein the rendering module 14 comprises: a second reconstruction unit 141, a second projection unit 142, a neighboring merging unit 143, a color simulation unit 144, and a rendering unit 145.

A second reconstructing unit 141, configured to reconstruct, in the dimension space, a second scene surface geometry corresponding to the target scene according to each second global surface element included in the t-th global surface element representation;

a second projection unit 142, configured to perform rasterization projection on the second scene surface geometry according to the second camera parameter information, and determine a sampling global surface element; the sampling global surface elements comprise one or more global surface elements to be processed, wherein the global surface elements to be processed correspond to at least two pixels to be rendered in an image to be rendered respectively; each global surface element to be processed belongs to the t-th global surface element representation; at least two pixels to be rendered comprise pixel Z to be rendered _a A is a positive integer less than or equal to the total number of at least two pixels to be rendered;

an adjacent merging unit 143 for pixel Z to be rendered _a Performing adjacent merging processing on the corresponding one or more global surface elements to be processed to obtain global surface elements to be rendered;

a color simulation unit 144 for performing color simulation processing on the global surface element to be rendered to determine a pixel Z to be rendered _a Corresponding colors to be rendered;

and the rendering unit 145 is configured to perform rendering processing on the image to be rendered according to the colors to be rendered corresponding to the at least two pixels to be rendered, so as to obtain a simulated scene image of the first scene under the second viewing angle.

The specific implementation manners of the second reconstruction unit 141, the second projection unit 142, the neighboring merging unit 143, the color simulation unit 144, and the rendering unit 145 may be referred to the description in step S207 in the embodiment corresponding to fig. 5, and will not be repeated here.

Wherein the neighbor merging unit 143 includes: element determination subunit 1431 and element merge subunit 1432.

An element determination subunit 1431 for respectively determining the pixels Z to be rendered _a Each corresponding to-managing view distances of the global surface element and the second view in the dimensional space;

The element determination subunit 1431 is further configured to, at a pixel Z to be rendered _a Acquiring a global surface element to be processed corresponding to the shortest view angle distance from the corresponding one or more global surface elements to be processed as a first combined global surface element;

the element determining subunit 1431 is further configured to, if a global surface element to be processed whose perspective distance is greater than the perspective distance corresponding to the first merging global surface element exists in the remaining global surface elements to be processed, obtain, from the remaining global surface elements to be processed, a global surface element to be processed corresponding to the shortest perspective distance as a second merging global surface element, and determine a second element depth distance between the first merging global surface element and the second merging global surface element; the remaining global surface elements to be processed are pixel Z to be rendered _a The global surface elements to be processed in the corresponding one or more global surface elements to be processed except the first merging global surface element;

an element merging subunit 1432, configured to perform normalized weighted merging on the first merged global surface element and the second merged global surface element if the second element depth distance is less than or equal to the second element depth threshold, so as to obtain a new first merged global surface element;

The element merging subunit 1432 is further configured to determine the first merged global surface element as a global surface element to be rendered and determine the second merged global surface element as a new first merged global surface element if the second element depth distance is greater than the second element depth threshold;

the element merging subunit 1432 is further configured to determine the first merged global surface element as the global surface element to be rendered if there is no global surface element to be processed with a view angle distance greater than the view angle distance corresponding to the first merged global surface element in the remaining global surface elements to be processed.

The specific implementation of the element determining subunit 1431 and the element merging subunit 1432 may be referred to the description in step 207 in the embodiment corresponding to fig. 5, which will not be described herein.

the color simulation unit 144 includes: the feature prediction subunit 1441 and the color merging subunit 1442.

The feature prediction subunit 1441 is configured to perform image feature prediction processing on each global surface element to be rendered according to the geometric position information to be rendered and the image feature information of the element to be rendered, where the geometric position information to be rendered and the image feature information of the element to be rendered correspond to each global surface element to be rendered respectively, so as to obtain an element opacity and an element color corresponding to each global surface element to be rendered respectively;

a color merging subunit 1442, configured to perform color merging processing on the opacity and the color of the element corresponding to each global surface element to be rendered according to the distances between adjacent elements corresponding to one or more global surface elements to be rendered, so as to obtain a pixel Z to be rendered _a A corresponding color.

The specific implementation manner of the feature prediction subunit 1441 and the color combining subunit 1442 may be referred to the description in step S207 in the embodiment corresponding to fig. 5, which will not be described herein.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the application. As shown in fig. 7, the data processing apparatus 1 in the embodiment corresponding to fig. 6 described above may be applied to a computer device 1000, and the computer device 1000 may include: processor 1001, network interface 1004, and memory 1005, and in addition, the above-described computer device 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display (Display), a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface, among others. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 7, an operating system, a network communication module, a user interface module, and a device control application may be included in a memory 1005, which is a type of computer-readable storage medium.

In the computer device 1000 shown in fig. 7, the network interface 1004 may provide a network communication network element; while user interface 1003 is primarily used as an interface for providing input to a user; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:

It should be understood that the computer device 1000 described in the embodiments of the present application may perform the description of the data processing method in any of the foregoing embodiments corresponding to fig. 3 and 5, and will not be repeated herein. In addition, the description of the beneficial effects of the same method is omitted.

Furthermore, it should be noted here that: the embodiment of the present application further provides a computer readable storage medium, in which the computer program executed by the aforementioned data processing apparatus 1 is stored, and the computer program includes program instructions, when executed by the aforementioned processor, can execute the description of the data processing method in any of the foregoing embodiments corresponding to fig. 3 and 5, and therefore, the description will not be repeated here. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application.

The computer readable storage medium may be the data processing apparatus provided in any one of the foregoing embodiments or an internal storage unit of the computer device, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the computer device. Further, the computer-readable storage medium may also include both internal storage units and external storage devices of the computer device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

Furthermore, it should be noted here that: embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the method provided by any of the corresponding embodiments of fig. 3 and 5 above.

The terms first, second and the like in the description and in the claims and drawings of embodiments of the application are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or device that comprises a list of steps or elements is not limited to the list of steps or modules but may, in the alternative, include other steps or modules not listed or inherent to such process, method, apparatus, article, or device.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied as electronic hardware, as a computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of network elements in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether these network elements are implemented in hardware or software depends on the specific application and design constraints of the solution. The skilled person may use different methods for implementing the described network elements for each specific application, but such implementation should not be considered to be beyond the scope of the present application.

The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims

1. An image processing method, comprising:

acquiring a t-th key frame image in a target video; the t-th key frame image comprises at least two pixels; the t-th key frame image is used for presenting a target scene under a first viewing angle; t is a positive integer greater than 1;

constructing surface elements corresponding to the at least two pixels in a dimension space according to the depth information corresponding to the at least two pixels, the first camera parameter information corresponding to the first view angle and the image characteristic information corresponding to the t-th key frame image;

according to the first camera parameter information, carrying out surface element fusion processing on surface elements respectively corresponding to the at least two pixels in a dimensional space and t-1 th global surface element representations associated with t-1 key frame images to obtain t th global surface element representations associated with t key frame images; the image frame time stamps of the t-1 key frame images are all earlier than those of the t-th key frame image; the t key frame images comprise the t-1 key frame image and the t key frame image;

And carrying out rasterization sampling according to second camera parameter information corresponding to a second view angle and the t-th global surface element representation to obtain a sampling global surface element, and rendering a simulated scene image of the target scene under the second view angle according to the sampling global surface element.

2. The method of claim 1, wherein the acquiring the t-th keyframe image in the target video comprises:

determining a first image frame timestamp of a t-1 th key frame image in the target video;

adding the first image frame time stamp and the sampling period time length to obtain a second image frame time stamp;

and acquiring a frame image corresponding to the second image frame timestamp from the target video, and taking the frame image as a t-th key frame image.

3. The method of claim 1, wherein the acquiring the t-th keyframe image in the target video comprises:

acquiring historical camera parameter information corresponding to the t-1 key frame image; the historical camera parameter information comprises historical shooting angle information and historical shooting position information;

sequentially acquiring frame images positioned behind the t-1 key frame image from the target video to serve as detection frame images;

Acquiring detection camera parameter information corresponding to the detection frame image; the detecting camera parameter information comprises detecting shooting angle information and detecting shooting position information;

determining a shooting change angle between the historical shooting angle information and the detection shooting angle information, and determining a position change distance between the historical shooting position information and the detection shooting position information;

and if the shooting change angle is greater than or equal to an angle change threshold, or the position change distance is greater than or equal to a distance change threshold, determining that the detection frame image is a t-th key frame image.

4. The method of claim 1, wherein the at least two pixels comprise pixel M; the first camera parameter information comprises target shooting position information, target shooting orientation information and target shooting focal length information; the target shooting orientation information is used for representing shooting light directions corresponding to the center point pixels in the t-th key frame image; the target shooting position information is used for representing the position information of a camera when the t-th key frame image is shot;

the constructing surface elements of the at least two pixels corresponding to each other in the dimension space according to the depth information corresponding to the at least two pixels, the first camera parameter information corresponding to the first view angle, and the image feature information corresponding to the t-th key frame image, includes:

Determining geometric position information corresponding to the pixel M in a dimensional space according to depth information corresponding to the pixel M, the target shooting position information, the target shooting orientation information and the target shooting focal length information;

acquiring pixel image characteristic information corresponding to the pixel M from the image characteristic information corresponding to the t-th key frame image, and determining element image characteristic information corresponding to the pixel M in the dimensional space according to the pixel image characteristic information;

and constructing a surface element corresponding to the pixel M in the dimension space according to the geometric position information and the element image characteristic information corresponding to the pixel M in the dimension space.

5. The method according to claim 4, wherein determining the geometric position information corresponding to the pixel M in the dimensional space according to the depth information corresponding to the pixel M, the target shooting position information, the target shooting orientation information, and the target shooting focal length information includes:

determining dimension space position information corresponding to a camera when shooting the t-th key frame image in dimension space according to the target shooting position information;

Determining pixel orientation information corresponding to the pixel M according to the position relation between the pixel M and the central point pixel, the target shooting orientation information and the dimensional space position information;

determining the position information corresponding to the pixel M in the dimension space according to the pixel orientation information, the dimension space position information and the depth information corresponding to the pixel M;

performing normal estimation processing on the position information and the pixel orientation information to obtain normal information corresponding to the pixel M in the dimensional space;

determining radius information corresponding to the pixel M in the dimensional space according to the normal information, the depth information corresponding to the pixel M and the target shooting focal length information;

according to the radius information and the weight parameters, weight information corresponding to the pixel M is determined;

and determining the position information, the normal information, the radius information and the weight information as geometric position information corresponding to the pixel M in the dimensional space.

6. The method of claim 1, wherein the t-1 th global surface element representation comprises at least two first global surface elements;

According to the first camera parameter information, performing surface element fusion processing on the surface elements corresponding to the at least two pixels in the dimension space and the t-1 th global surface element representation associated with the t-1 key frame images respectively to obtain the t-th global surface element representation associated with the t key frame images, including:

reconstructing a first scene surface geometry corresponding to the target scene in the dimension space according to the at least two first global surface elements;

rasterizing and projecting the scene surface geometry according to the first camera parameter information, and determining projectable global surface elements in the at least two first global surface elements and projectable pixels in the at least two pixels; one projectable pixel corresponds to one or more projectable global surface elements;

according to the surface elements corresponding to the projectable pixels, fusion updating processing is carried out on the projectable global surface elements to obtain fusion updating surface elements;

the fusion updating surface element, the non-projective global surface element and the surface element corresponding to the non-projective pixel are all used as second global surface elements; the non-projectable global surface element is a first global surface element of the at least two first global surface elements from which the projectable global surface element is removed; the non-projectable pixels are pixels of the at least two pixels from which the projectable pixels are removed;

Combining each of the second global surface elements into a t-th global surface element representation.

7. The method of claim 6, wherein rasterizing the first scene surface geometry based on the first camera parameter information, determining a projectable global surface element of the at least two first global surface elements, and a projectable pixel of the at least two pixels, comprises:

traversing the at least two pixels, and sequentially acquiring a kth pixel; k is a positive integer less than or equal to H; h is the total number of pixels corresponding to the at least two pixels;

determining target pixel orientation information corresponding to a kth pixel in the dimensional space according to the first camera parameter information;

constructing projection rays corresponding to the target pixel orientation information in the dimensional space;

and if the projection ray intersects the first scene surface geometry in the dimensional space, determining a first global surface element corresponding to the intersection position of the projection ray and the first scene surface geometry as a projectable global surface element corresponding to the kth pixel, and determining the kth pixel as a projectable pixel.

8. The method according to claim 6, wherein the performing fusion update processing on the projectable global surface element according to the surface element corresponding to the projectable pixel to obtain a fusion updated surface element includes:

traversing the projectable global surface element to obtain an ith projectable global surface element; i is a positive integer less than or equal to the total number of projectable global surface elements;

acquiring a projectable pixel corresponding to the ith projectable global surface element from the projectable pixels, taking the projectable pixel as a target projection pixel, and taking a surface element corresponding to the target projection pixel as a surface element to be fused;

determining a first element depth distance between the ith projectable global surface element and the surface element to be fused;

if the first element depth distance is smaller than or equal to a first element depth threshold, marking the target projection pixel, and carrying out fusion treatment on the ith projectable global surface element and the surface element to be fused to obtain a fused surface element;

if the first element depth distance is greater than a first element depth threshold, not marking the target projection pixel, and determining the ith projectable global surface element as a fusion surface element;

When the projectable global surface element is traversed, the surface element corresponding to the unmarked projectable pixel and each fusion surface element are used as fusion updating surface elements.

9. The method according to claim 8, wherein the fusing the i-th projectable global surface element and the surface element to be fused to obtain a fused surface element includes:

acquiring first geometric position information and first element image characteristic information corresponding to the ith projectable global surface element; the first geometric position information is determined based on depth information and camera parameter information respectively corresponding to the t-1 key frame images, and the first element image characteristic information is determined based on image characteristic information respectively corresponding to the t-1 key frame images;

acquiring second geometric position information and second element image characteristic information corresponding to the surface elements to be fused; the second geometric position information is determined based on the depth information corresponding to the at least two pixels respectively, and the second element image characteristic information is determined based on the image characteristic information corresponding to the t-th key frame image;

Carrying out weighted fusion processing on the first geometric position information and the second geometric position information to obtain fusion geometric position information;

performing feature fusion processing on the first element image feature information and the second element image feature information to obtain fusion element image feature information;

and obtaining the fusion surface element according to the fusion geometric position information and the fusion element image characteristic information.

10. The method of claim 6, wherein the rasterizing the sampling according to the second camera parameter information corresponding to the second view angle and the t-th global surface element representation to obtain a sampled global surface element, and rendering the simulated scene image of the target scene at the second view angle according to the sampled global surface element, comprises:

reconstructing a second scene surface geometry corresponding to the target scene in the dimensional space according to each second global surface element contained in the t-th global surface element representation;

according to the second camera parameter information, carrying out rasterization projection on the second scene surface geometry, and determining sampling global surface elements; the sampling global surface elements comprise one or more global surface elements to be processed, wherein the global surface elements to be processed correspond to at least two pixels to be rendered in an image to be rendered respectively; each global surface element to be processed belongs to the t global surface element representation; the at least two pixels to be rendered include a pixel Z to be rendered _a A is a positive integer less than or equal to the total number of the at least two pixels to be rendered;

for the pixel Z to be rendered _a Performing adjacent merging processing on the corresponding one or more global surface elements to be processed to obtain global surface elements to be rendered;

performing color simulation processing on the global surface element to be rendered, and determining a pixel Z to be rendered _a Corresponding colors to be rendered;

and rendering the image to be rendered according to the colors to be rendered respectively corresponding to the at least two pixels to be rendered, so as to obtain the simulated scene image of the first scene under the second view angle.

11. The method of claim 10, wherein the pair of pixels Z to be rendered _a And carrying out adjacent merging processing on the corresponding one or more global surface elements to be processed to obtain the global surface elements to be rendered, wherein the method comprises the following steps:

determining the pixel Z to be rendered respectively _a The view angle distance between each corresponding global surface element to be processed and the second view angle in the dimension space;

at the pixel Z to be rendered _a Acquiring a global surface element to be processed corresponding to the shortest view angle distance from the corresponding one or more global surface elements to be processed as a first combined global surface element;

If the global surface elements to be processed with the view angle distance larger than the view angle distance corresponding to the first combined global surface element exist in the remaining global surface elements to be processed, acquiring the global surface elements to be processed with the shortest view angle distance corresponding to the shortest view angle distance from the remaining global surface elements to be processed as second combined global surface elements, and determining a second element depth distance between the first combined global surface elements and the second combined global surface elements; the remaining global surface elements to be processed are the pixel Z to be rendered _a The global surface elements to be processed except the first combined global surface element in the corresponding one or more global surface elements to be processed;

if the second element depth distance is smaller than or equal to a second element depth threshold, carrying out normalized weighted combination on the first combined global surface element and the second combined global surface element to obtain a new first combined global surface element;

if the second element depth distance is greater than a second element depth threshold, determining the first merged global surface element as a global surface element to be rendered, and determining the second merged global surface element as a new first merged global surface element;

And if the to-be-processed global surface elements with the view angle distance larger than the view angle distance corresponding to the first merging global surface element do not exist in the rest to-be-processed global surface elements, determining the first merging global surface element as the to-be-rendered global surface element.

12. The method of claim 10, wherein the number of global surface elements to be rendered is one or more; each global surface element to be rendered corresponds to geometric position information to be rendered and image characteristic information of the element to be rendered; each geometrical position information to be rendered is based on the pixel Z to be rendered _a The corresponding geometric position information of the one or more global surface elements to be processed is generated respectively; each element image characteristic information to be rendered is based on the pixel Z to be rendered _a The corresponding global surface element or elements to be processed are respectively generated by corresponding element image characteristic information; the pixel Z to be rendered _a The geometric position information corresponding to the corresponding one or more global surface elements to be processed is determined based on the depth information and the camera parameter information corresponding to the t key frame images respectively; the pixel Z to be rendered _a The corresponding element image characteristic information corresponding to the global surface element to be processed is determined based on the image characteristic information corresponding to the t key frame images;

performing color simulation processing on the global surface element to be rendered, and determining a pixel Z to be rendered _a A corresponding color, comprising:

according to the geometric position information to be rendered and the image characteristic information of the elements to be rendered, which correspond to each global surface element to be rendered respectively, image characteristic prediction processing is carried out on each global surface element to be rendered, so that the opacity and the color of the elements, which correspond to each global surface element to be rendered respectively, are obtained;

according to the adjacent element distance corresponding to one or more global surface elements to be rendered, for each global table to be renderedPerforming color combination processing on the element opacity and the element color respectively corresponding to the surface elements to obtain a pixel Z to be rendered _a A corresponding color.

13. An image processing apparatus, comprising:

the acquisition module is used for acquiring a t-th key frame image in the target video; the t-th key frame image comprises at least two pixels; the t-th key frame image is used for presenting a target scene under a first viewing angle; t is a positive integer greater than 1;

The construction module is used for constructing surface elements corresponding to the at least two pixels in a dimension space according to the depth information corresponding to the at least two pixels, the first camera parameter information corresponding to the first view angle and the image characteristic information corresponding to the t-th key frame image;

the fusion module is used for carrying out surface element fusion processing on the surface elements corresponding to the at least two pixels in the dimension space and the t-1 th global surface element representations associated with the t-1 key frame images respectively according to the first camera parameter information to obtain the t-th global surface element representations associated with the t key frame images; the image frame time stamps of the t-1 key frame images are all earlier than those of the t-th key frame image; the t key frame images comprise the t-1 key frame image and the t key frame image;

and the rendering module is used for rendering the simulated scene image of the target scene under the second view angle according to the second camera parameter information corresponding to the second view angle and the t-th global surface element representation.

14. A computer device, comprising: a processor, a memory, and a network interface;

The processor is connected to the memory, the network interface for providing data communication functions, the memory for storing program code, the processor for invoking the program code to perform the method of any of claims 1-12.

15. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program adapted to be loaded by a processor and to perform the method of any of claims 1-12.

16. A computer program product comprising computer programs/instructions which, when executed by a processor, are adapted to carry out the method of any one of claims 1-12.