CN117853678B

CN117853678B - Method for carrying out three-dimensional materialization transformation on geospatial data based on multi-source remote sensing

Info

Publication number: CN117853678B
Application number: CN202410263684.8A
Authority: CN
Inventors: 陈利; 张尔严; 李芳�; 谢卫杰; 李孟; 张晓楠; 田迪龙
Original assignee: Shaanxi Tirain Technology Co ltd
Current assignee: Shaanxi Tirain Technology Co ltd
Priority date: 2024-03-08
Filing date: 2024-03-08
Publication date: 2024-05-17
Anticipated expiration: 2044-03-08
Also published as: CN117853678A

Abstract

The invention discloses a method for carrying out three-dimensional materialization transformation on geospatial data based on multi-source remote sensing, which relates to the technical field of three-dimensional materialization on the geospatial data.

Description

Method for carrying out three-dimensional materialization transformation on geospatial data based on multi-source remote sensing

Technical Field

The invention relates to the technical field of three-dimensional materialization of geospatial data, in particular to a method for three-dimensional materialization transformation of geospatial data based on multi-source remote sensing.

Background

Along with the development of remote sensing technology, the acquisition and application of multi-source remote sensing data are more and more widespread. The multi-source remote sensing data comprises satellite remote sensing, aerial remote sensing, unmanned aerial vehicle remote sensing and the like, and can provide rich geographic space information.

However, the expression form of the multi-source remote sensing data is usually two-dimensional, and cannot intuitively reflect the three-dimensional form and the spatial relationship of the geospatial element. Therefore, how to utilize multi-source remote sensing data to perform three-dimensional materialization transformation on geospatial data so as to improve the visual effect and analysis capability of the geospatial data is an important problem facing the current field of geographic information science.

Disclosure of Invention

The present invention has been made to solve the above-mentioned technical problems. The embodiment of the invention provides a method for carrying out three-dimensional materialization transformation on geospatial data based on multi-source remote sensing, which is used for mining requirement information about rendering effect from rendering effect text description input by a user by combining a natural language processing technology based on deep learning, extracting element semantic features in a three-dimensional scene, and automatically generating a rendered three-dimensional scene which meets the requirements of the user based on cross-modal fusion association features of the element semantic features and the element semantic features, so as to improve the visual effect and the analysis capability of the geospatial data.

According to one aspect of the present invention, there is provided a method for three-dimensional materialization transformation of geospatial data based on multi-source remote sensing, comprising:

extracting geospatial elements and attribute information thereof from multi-source remote sensing data to obtain a set of geospatial elements;

Selecting a three-dimensional model matched with the geospatial element based on the attribute information of the geospatial element to obtain a set of three-dimensional models of the geospatial element;

combining the set of geospatial element three-dimensional models into a three-dimensional scene;

And rendering the three-dimensional scene to obtain a rendered three-dimensional scene.

Compared with the prior art, the method for three-dimensional materialization transformation of the geospatial data based on multi-source remote sensing provided by the invention has the advantages that the requirement information about the rendering effect is mined from the rendering effect text description input by a user by combining with the natural language processing technology based on deep learning, meanwhile, the element semantic features in the three-dimensional scene are extracted, and the rendered three-dimensional scene which meets the requirements of the user is automatically generated based on the cross-modal fusion association features of the element semantic features and the element semantic features, so that the visual effect and the analysis capability of the geospatial data are improved.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent by describing embodiments of the present invention in more detail with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, and not constitute a limitation to the invention. In the drawings, like reference numerals generally refer to like parts or steps.

FIG. 1 is a flow chart of a method for three-dimensional materialization reconstruction of geospatial data based on multi-source remote sensing in accordance with an embodiment of the present invention;

FIG. 2 is a system architecture diagram of a method for three-dimensional materialization reconstruction of geospatial data based on multi-source remote sensing in accordance with an embodiment of the present invention;

FIG. 3 is a flowchart of sub-step S4 of a method for three-dimensional materialization reconstruction of geospatial data based on multi-source remote sensing in accordance with an embodiment of the present invention;

FIG. 4 is a flowchart of sub-step S44 of a method for three-dimensional materialization reconstruction of geospatial data based on multi-source remote sensing in accordance with an embodiment of the present invention;

Fig. 5 is a flowchart of substep S443 of the method for three-dimensional materialization reconstruction of geospatial data based on multi-source remote sensing according to an embodiment of the present invention.

Detailed Description

Hereinafter, exemplary embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present invention and not all embodiments of the present invention, and it should be understood that the present invention is not limited by the example embodiments described herein.

As used in the specification and in the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

Although the present invention makes various references to certain modules in a system according to embodiments of the present invention, any number of different modules may be used and run on a user terminal and/or server. The modules are merely illustrative, and different aspects of the systems and methods may use different modules.

A flowchart is used in the present invention to describe the operations performed by a system according to embodiments of the present invention. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Also, other operations may be added to or removed from these processes.

In the technical scheme of the invention, a method for carrying out three-dimensional materialization transformation on geospatial data based on multi-source remote sensing is provided. Fig. 1 is a flow chart of a method for three-dimensional materialization transformation of geospatial data based on multi-source remote sensing in accordance with an embodiment of the present invention. As shown in fig. 1, a method for performing three-dimensional materialization transformation on geospatial data based on multi-source remote sensing according to an embodiment of the invention comprises the following steps: s1, extracting geographic space elements and attribute information thereof from multi-source remote sensing data to obtain a set of geographic space elements; s2, selecting a three-dimensional model matched with the geographic space element based on the attribute information of the geographic space element to obtain a set of three-dimensional models of the geographic space element; s3, combining the set of the three-dimensional models of the geographic space elements into a three-dimensional scene; and S4, rendering the three-dimensional scene to obtain a rendered three-dimensional scene.

In particular, the S1 extracts geospatial elements and their attribute information from multi-source remote sensing data to obtain a set of geospatial elements. The multi-source remote sensing data refers to a data set which utilizes remote sensing data from different sensors, platforms or sources to be combined for analysis and application. Such a data set may provide more comprehensive, multi-angle information.

In particular, the S2, based on the attribute information of the geospatial element, selects a three-dimensional model that matches the geospatial element to obtain a set of three-dimensional models of geospatial elements. In one particular example, if the attribute information indicates that a region is a forest, a tree model may be selected as a three-dimensional representation of the region. Combining the selected three-dimensional models into a three-dimensional model set of geospatial elements. From the attribute information of the geospatial element, the position, rotation, and scale of each model in space can be determined.

In particular, the S3 combines the set of geospatial element three-dimensional models into a three-dimensional scene. It should be appreciated that combining the set of geospatial element three-dimensional models into a three-dimensional scene may provide an intuitive, realistic visualization effect that helps users better understand the distribution, relationships, and characteristics of geospatial elements.

In particular, the S4 renders the three-dimensional scene to obtain a rendered three-dimensional scene. It should be appreciated that rendering the three-dimensional scene may improve the realism and visual effect of the three-dimensional scene, making it closer to a real-world environment. However, prior art rendering methods typically require manual adjustment. This approach requires a lot of time and effort. The rendering process may require multiple attempts and adjustments to achieve the desired effect. In particular, in one specific example of the present invention, fig. 2 is a system architecture diagram of a method for three-dimensional materialization reconstruction of geospatial data based on multi-source remote sensing according to an embodiment of the present invention. Fig. 3 is a flowchart of sub-step S4 of a method for three-dimensional materialization reconstruction of geospatial data based on multi-source remote sensing in accordance with an embodiment of the present invention. As shown in fig. 2 and 3, the S4 includes: s41, obtaining a rendering effect text description; s42, extracting scene semantic features of the three-dimensional scene to obtain a three-dimensional scene element semantic feature map; s43, carrying out semantic coding on the rendering effect text description to obtain a rendering effect semantic coding feature vector; s44, generating the rendered three-dimensional scene based on the cross-modal fusion features of the three-dimensional scene element semantic feature map and the rendering effect semantic coding feature vector.

Specifically, in S41, a rendering effect text description is acquired. In the technical scheme of the invention, the rendering effect text description expresses the rendering effect in a natural language form and is used as input for subsequent processing and analysis. Specifically, the rendering effect text description includes descriptions of rendering effects of color, lighting, material, shading, transparency, and the like. In the actual application scene of the invention, the text description of the rendering effect can be obtained through manual input. For example, a text description of the rendering effect is written manually by a designer, artist, or rendering expert. They can describe the desired rendering effect using natural language, for example: "buildings in the scene should appear gray in appearance, have a soft lighting effect, while casting a proper amount of shadows".

Specifically, the step S42 extracts the scene semantic features of the three-dimensional scene to obtain a three-dimensional scene element semantic feature map. In particular, in the technical scheme of the invention, the three-dimensional scene is passed through a scene three-dimensional feature extractor based on a three-dimensional convolutional neural network model to obtain the three-dimensional scene element semantic feature map. Here, the three-dimensional characteristic extractor of the scene based on the three-dimensional convolutional neural network model is used for extracting key characteristic information in the three-dimensional scene, so that the subsequent rendering process can better understand and process semantic information of the scene. In particular, by using the three-dimensional convolutional neural network-based scene three-dimensional feature extractor, advanced semantic feature representations can be extracted from the original three-dimensional scene data. These features can contain more semantic information such as the class, shape, orientation, position, etc. of the object, as well as the overall structure and composition of the scene. In this way, the key semantic information of the three-dimensional scene is extracted, and a scene foundation is provided for the subsequent rendering process. More specifically, passing the three-dimensional scene through a scene three-dimensional feature extractor based on a three-dimensional convolutional neural network model to obtain the three-dimensional scene element semantic feature map, comprising: each layer of the three-dimensional characteristic extractor of the scene based on the three-dimensional convolutional neural network model is used for respectively carrying out input data in forward transfer of the layer: carrying out convolution processing on input data to obtain a convolution characteristic diagram; pooling the convolution feature images based on the local feature matrix to obtain pooled feature images; performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the three-dimensional scene element semantic feature map is output by the three-dimensional convolutional neural network model-based scene three-dimensional feature extractor, and the input of the first layer of the three-dimensional scene element semantic feature map is input by the three-dimensional convolutional neural network model-based scene three-dimensional feature extractor.

Notably, three-dimensional convolutional neural network models are typically composed of multiple convolutional layers, pooling layers, fully-connected layers, and the like, in order to efficiently process three-dimensional data. The following is an example of the structure of a typical three-dimensional convolutional neural network model: input layer: receiving three-dimensional data as input data, typically a tensor of the shape [ Batch Size, channels, depth, height, width ]; convolution layer: the three-dimensional convolution layer extracts spatial features by applying a three-dimensional convolution kernel. Each convolution layer typically includes a convolution operation, an activation function, and possibly a regularization operation; pooling layer: the three-dimensional pooling layer is used for reducing the size of the feature map, reducing the calculation amount and retaining key information. Common pooling operations include maximum pooling and average pooling; batch normalization layer: the addition of a batch normalization layer between the convolution layer and the activation function is helpful to accelerate the training process and improve the generalization capability of the model; activation function: an activation function, such as a ReLU, is typically applied after each convolutional layer for introducing nonlinearity; full tie layer: flattening the output of the convolution layer and transmitting the flattened output to the full connection layer for classification or regression tasks; dropout layer: to prevent overfitting, dropout layers can be added between fully connected layers, randomly discarding some neurons; output layer: the last layer is an output layer, different activation functions, such as softmax, can be adopted for classifying tasks according to different tasks, and the linear activation function is used for regressing tasks; loss function and optimizer: defining an appropriate loss function for measuring the difference between the model predictions and the real labels, and selecting an appropriate optimizer to minimize the loss function; model training and evaluation: the model is trained using the training data and evaluated using the validation set and the test set.

Specifically, the step S43 performs semantic coding on the rendering effect text description to obtain a rendering effect semantic coding feature vector. That is, in the technical scheme of the invention, the rendering effect text description is subjected to semantic coding to obtain the rendering effect semantic coding feature vector. It should be understood that the rendering effect text description can be converted into vector representation which can be understood by a computer in a semantic coding mode, and meanwhile, semantic meanings and text connotations contained in the text description are extracted, so that a model is helped to understand the rendering requirements, and more accurate guidance is provided for a subsequent rendering process. More specifically, performing semantic coding on the rendering effect text description to obtain a rendering effect semantic coding feature vector, including: word segmentation processing is carried out on the rendering effect text description so as to convert the rendering effect text description into a word sequence composed of a plurality of words; mapping each word in the word sequence into a word embedding vector by using an embedding layer of the semantic encoder comprising the embedding layer to obtain a sequence of word embedding vectors; performing global context semantic coding on the sequence of word embedded vectors based on a converter thought by using a converter of the semantic encoder comprising an embedded layer to obtain a plurality of global context semantic feature vectors; and cascading the plurality of global context semantic feature vectors to obtain the rendering effect semantic coding feature vector. In a specific example, using the converter of the semantic encoder including the embedding layer to perform global context semantic coding on the sequence of word embedding vectors based on a converter concept to obtain a plurality of global context semantic feature vectors includes: one-dimensional arrangement is carried out on the sequence of the word embedding vectors to obtain global feature vectors; calculating the product between the global feature vector and the transpose vector of each word embedding vector in the sequence of word embedding vectors to obtain a plurality of self-attention association matrices; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; weighting each word embedding vector in the sequence of word embedding vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the plurality of context semantic feature vectors; and cascading the plurality of context semantic feature vectors to obtain the plurality of global context semantic feature vectors. It should be noted that the Softmax classification function is a commonly used classification function, and is commonly used in multi-class classification problems. The Softmax function converts each element of the input vector to a value between 0 and 1, which can be regarded as a predictive probability for each class.

Specifically, the S44 generates the rendered three-dimensional scene based on the cross-modal fusion feature of the three-dimensional scene element semantic feature map and the rendering effect semantic coding feature vector. In particular, in one specific example of the present invention, as shown in fig. 4, the S44 includes: s441, performing cross-modal fusion on the three-dimensional scene element semantic feature map and the rendering effect semantic coding feature vector to obtain a three-dimensional scene element semantic feature map containing rendering information; s442, the three-dimensional scene element semantic feature map containing rendering information passes through an adaptive enhancer based on an adaptive attention layer to obtain a three-dimensional scene element semantic feature map containing rendering information in an adaptive manner; s443, generating the rendered three-dimensional scene based on the self-adaptive three-dimensional scene element semantic feature map containing rendering information.

More specifically, in S441, cross-modal fusion is performed on the three-dimensional scene element semantic feature map and the rendering effect semantic coding feature vector to obtain a three-dimensional scene element semantic feature map containing rendering information. That is, the three-dimensional scene element semantic feature map and the rendering effect semantic coding feature vector are processed by using a cross-modal feature fusion device based on a meta-network to obtain a three-dimensional scene element semantic feature map containing rendering information. The method comprises the steps of carrying out full fusion and interaction on scene semantic information expressed by the three-dimensional scene element semantic feature map and rendering requirement semantic information expressed by the rendering effect semantic coding feature vector, so as to obtain comprehensive feature representation with richer information. The meta-network-based cross-modal feature fusion device enables one-dimensional feature vectors, namely the rendering effect semantic coding feature vectors, to interact with high-dimensional feature graphs, namely the three-dimensional scene element semantic feature graphs, directly controls relevant characteristics of each feature channel, and helps a network concentrate on a specific part of each feature channel so as to consider relevance and importance among different feature channels in a feature fusion process. This helps to better fuse the semantic information of the three-dimensional scene elements with the semantic information of the rendering effect, thus describing the features and rendering requirements of the scene more fully. In a specific example, the processing the three-dimensional scene element semantic feature map and the rendering effect semantic coding feature vector by using a cross-modal feature fusion device based on a meta-network to obtain the three-dimensional scene element semantic feature map containing rendering information includes: passing the rendering effect semantic coding feature vector through a point convolution layer to obtain a first convolution feature vector; passing the first convolution feature vector through a modified linear unit based on a ReLU function to obtain a first modified convolution feature vector; passing the first modified convolution feature vector through a point convolution layer to obtain a second convolution feature vector; passing the second convolution feature vector through a correction linear unit based on a Sigmoid function to obtain a second correction convolution feature vector; and fusing the second modified convolution feature vector and the three-dimensional scene element semantic feature map to obtain the three-dimensional scene element semantic feature map containing rendering information. It is worth mentioning that the point convolution layer is a special type of convolution layer in a convolutional neural network, also referred to as 1x1 convolution layer. Unlike conventional convolution layers, the convolution kernel size of a point convolution layer is 1x1, i.e., features at each location are linearly combined in the depth direction.

More specifically, the step S442 is to pass the three-dimensional scene element semantic feature map containing rendering information through an adaptive enhancer based on an adaptive attention layer to obtain an adaptive three-dimensional scene element semantic feature map containing rendering information. In other words, in the technical scheme of the invention, the three-dimensional scene element semantic feature map containing rendering information is obtained through an adaptive enhancer based on an adaptive attention layer. Here, the adaptive enhancer based on the adaptive attention layer is capable of adaptively learning the weight and the degree of association between features according to the inputted three-dimensional scene element semantic feature map containing rendering information. Specifically, the method and the device can adaptively adjust the weights of different channels in the semantic feature map of the three-dimensional scene element containing the rendering information, so that the features related to the rendering effect are more prominent and important, the features related to the rendering effect and the scene information are better captured, and more accurate guiding information is provided for the subsequent rendering process. In a specific example, passing the three-dimensional scene element semantic feature map containing rendering information through an adaptive enhancer based on an adaptive attention layer to obtain an adaptive three-dimensional scene element semantic feature map containing rendering information, including: processing the three-dimensional scene element semantic feature map containing rendering information in the following adaptive attention formula to obtain the three-dimensional scene element semantic feature map containing rendering information in an adaptive manner; wherein, the self-adaptive attention formula is:

wherein, For the three-dimensional scene element semantic feature map containing rendering information,/>For pooling processing,/>For pooling vectors,/>Is a weight matrix,/>Is a bias vector,/>For the activation process,/>For the initial meta-weight feature vector,/>Is the/>, of the initial meta-weight feature vectorCharacteristic value of individual position,/>To correct the meta-weight feature vector,/>Is the self-adaptive rendering information-containing three-dimensional scene element semantic feature map,/>And multiplying the characteristic value in the correction element weight characteristic vector by each characteristic matrix of the three-dimensional scene element semantic characteristic diagram containing rendering information along the channel dimension.

More specifically, S443 generates the rendered three-dimensional scene based on the three-dimensional scene element semantic feature map including rendering information. In particular, in one specific example of the present invention, as shown in fig. 5, the S443 includes: s4431, performing feature distribution optimization on the three-dimensional scene element semantic feature map which is self-adaptively contained with the rendering information to obtain an optimized three-dimensional scene element semantic feature map which is self-adaptively contained with the rendering information; and S4432, enabling the optimized self-adaptive three-dimensional scene element semantic feature map containing rendering information to pass through a rendering effect generator of AIGC models so as to obtain the rendered three-dimensional scene.

And S4431, performing feature distribution optimization on the three-dimensional scene element semantic feature map which is self-adaptive and contains the rendering information to obtain the optimized three-dimensional scene element semantic feature map which is self-adaptive and contains the rendering information. In particular, in the above technical solution, the three-dimensional scene element semantic feature map including rendering information expresses feature representation after the image semantic feature of the three-dimensional scene is channel-constrained by the text semantic feature described by the rendering effect text, so that after the three-dimensional scene element semantic feature map including rendering information passes through the adaptive enhancer based on the adaptive attention layer, local channel distribution enhancement can be performed in units of feature matrices, but this also causes the feature representation of the three-dimensional scene element semantic feature map including rendering information to deviate from the image semantic feature representation in the feature matrices and the text semantic feature representation between the feature matrices of the three-dimensional scene element semantic feature map including rendering information.

Thus, in order to promote the semantic feature expression effect of the three-dimensional scene element semantic feature map including rendering information in an adaptive manner, the three-dimensional scene element semantic feature map including rendering information in an adaptive manner can be optimized by further fusing the three-dimensional scene element semantic feature map including rendering information with the three-dimensional scene element semantic feature map including rendering information.

Here, in order to promote the consistency of the distribution representation during fusion, the invention performs fusion optimization on the three-dimensional scene element semantic feature map containing the rendering information and the three-dimensional scene element semantic feature map containing the rendering information, specifically expressed as follows: carrying out fusion optimization on the three-dimensional scene element semantic feature map containing rendering information and the three-dimensional scene element semantic feature map containing rendering information by using the following fusion optimization formula to obtain an optimized three-dimensional scene element semantic feature map containing rendering information; the fusion optimization formula is as follows:

wherein, Is the self-adaptive rendering information-containing three-dimensional scene element semantic feature map,/>Is the semantic feature map of the three-dimensional scene element containing rendering information,/>And/>Respectively/>Mean and standard deviation of corresponding feature sets,/>And/>Respectively/>Mean and standard deviation of corresponding feature sets,/>Representing the position-by-position evolution of the feature map, and/>Is a logarithm based on 2,/>Representing addition by position,/>Representing multiplication by location,/>Is the optimized self-adaptive three-dimensional scene element semantic feature map containing rendering information.

Here, in order to promote the consistency of the semantic feature map of the three-dimensional scene element containing rendering information and the distribution representation of the semantic feature map of the three-dimensional scene element containing rendering information in the feature fusion scene, considering that the traditional weighted fusion mode has limitation on deducing the semantic space evolution diffusion mode based on feature superposition, the feature fusion of the semantic feature map of the three-dimensional scene element containing rendering information and the distribution representation of the semantic feature map of the three-dimensional scene element containing rendering information in the same high-dimensional feature space is realized by adopting a mode of combining a low-order superposition fusion mode and a high-order superposition fusion mode of a space and simulating an evolution center and an evolution track through a feature statistical feature interaction relation so as to reconstruct the semantic space evolution diffusion under the fusion scene based on asynchronous evolution under the action of different evolution diffusion velocity fields. In this way, the optimized self-adaptive rendering information-containing three-dimensional scene element semantic feature map is promotedThe image semantic feature expression effect in the feature matrix and the text semantic feature expression effect among the feature matrices, thereby improving the self-adaption rendering information-containing three-dimensional scene element semantic feature map/>And the three-dimensional image quality of the rendered three-dimensional scene is obtained through a rendering effect generator of AIGC models.

And S4432, enabling the optimized self-adaptive three-dimensional scene element semantic feature map containing rendering information to pass through a rendering effect generator of AIGC models so as to obtain the rendered three-dimensional scene. In the technical scheme of the invention, the self-adaptive rendering information-containing three-dimensional scene element semantic feature map is passed through a AIGC model rendering effect generator to obtain the rendered three-dimensional scene. That is, the rendering effect generator is constructed by utilizing AIGC models to convert the self-adaptive three-dimensional scene element semantic feature map containing rendering information into a rendered three-dimensional scene. Here, the rendering information in the three-dimensional scene element semantic feature map adaptively containing rendering information can be converted into a specific rendering effect by using the powerful expression capability and learning capability of the neural network through the rendering effect generator of the AIGC model. The rendering effect generator can learn the association between the feature distribution and the rendering effect in the self-adaptive three-dimensional scene element semantic feature map containing the rendering information, and generate a corresponding rendered scene. Specifically, the rendering effect generator gradually restores details and structures of the rendered scene by performing operations such as convolution, deconvolution and the like on the self-adaptive three-dimensional scene element semantic feature map containing rendering information. In this process, the rendering effect generator can generate a rendered scene conforming to the rendering requirement by using the rendering information, such as illumination, materials, shadows, and the like, contained in the three-dimensional scene element semantic feature map adaptively containing the rendering information.

It should be noted that, in other specific examples of the present invention, the rendered three-dimensional scene may be generated by other manners based on the three-dimensional scene element semantic feature map adaptively including rendering information, for example: inputting a semantic feature map containing rendering information, typically a three-dimensional tensor, representing semantic information and features of different elements in the scene; a decoder network is designed for converting the semantic feature map into a rendered three-dimensional scene. Decoder networks are typically composed of multiple layers, each layer having specific functions; firstly, the decoder network decodes the semantic feature map, and maps the high-level semantic features back to the three-dimensional space to reconstruct the structural information of the scene; in the decoding process, the feature map is mapped into three-dimensional space and converted into a visualized image using rendering techniques. This may involve the addition of rendering effects of lighting, materials, shadows, etc.; depth information is typically restored during rendering to ensure the realism of the scene. The depth information helps to determine the distance and positional relationship between different objects; texture mapping may also be required to apply texture information to different parts of the scene to increase the detail and realism of the scene; adding proper illumination effect and shadow treatment to make the scene look more lifelike and stereoscopic; after the rendered three-dimensional scene is generated, some post-processing operations, such as denoising, sharpening, color correction, etc., may be required to improve image quality.

It should be noted that, in other specific examples of the present invention, the rendered three-dimensional scene may also be generated by other manners based on the cross-modal fusion feature of the three-dimensional scene element semantic feature map and the rendering effect semantic coding feature vector, for example: inputting a semantic feature map of a three-dimensional scene element and a semantic coding feature vector of a rendering effect, wherein the two features come from different modes, and respectively capture the structural information and the rendering effect information of the scene; fusing the semantic feature images of the three-dimensional scene elements and the semantic coding feature vectors of the rendering effects; designing a decoder network, receiving the fused features and converting the fused features into a rendered three-dimensional scene; the decoder network maps the fused features back to the three-dimensional space to reconstruct the structural information and rendering effect of the scene; in the decoding process, mapping the feature map into a three-dimensional space, and converting the feature map into a visualized image by using a rendering technology; in the rendering process, restoring the depth information to ensure the reality of the scene; after the rendered three-dimensional scene is generated, some post-processing operations, such as denoising, sharpening, color correction, etc., may be required to improve the image quality; the final output is the generated rendered three-dimensional scene image.

In summary, the method for three-dimensional materialization transformation of geospatial data based on multi-source remote sensing according to the embodiment of the invention is explained, which is used for mining the requirement information about rendering effect from rendering effect text description input by a user by combining a natural language processing technology based on deep learning, extracting element semantic features in a three-dimensional scene, and automatically generating a rendered three-dimensional scene meeting the user expectations based on cross-modal fusion association features of the element semantic features and the element semantic features.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for three-dimensional materialization transformation of geospatial data based on multi-source remote sensing, comprising the steps of:

rendering the three-dimensional scene to obtain a rendered three-dimensional scene, comprising:

Acquiring a rendering effect text description;

Extracting scene semantic features of the three-dimensional scene to obtain a three-dimensional scene element semantic feature map;

Carrying out semantic coding on the rendering effect text description to obtain a rendering effect semantic coding feature vector;

generating the rendered three-dimensional scene based on the cross-modal fusion features of the three-dimensional scene element semantic feature map and the rendering effect semantic coding feature vector;

Wherein generating the rendered three-dimensional scene based on the cross-modal fusion features of the three-dimensional scene element semantic feature map and the rendering effect semantic coding feature vector comprises:

Performing cross-modal fusion on the three-dimensional scene element semantic feature map and the rendering effect semantic coding feature vector to obtain a three-dimensional scene element semantic feature map containing rendering information;

The three-dimensional scene element semantic feature map containing rendering information passes through an adaptive enhancer based on an adaptive attention layer to obtain a three-dimensional scene element semantic feature map containing rendering information in an adaptive manner;

And generating the rendered three-dimensional scene based on the self-adaptive three-dimensional scene element semantic feature map containing rendering information.

2. The method for three-dimensional materialization transformation of geospatial data based on multi-source remote sensing according to claim 1, wherein extracting scene semantic features of the three-dimensional scene to obtain a three-dimensional scene element semantic feature map comprises:

And the three-dimensional scene passes through a scene three-dimensional feature extractor based on a three-dimensional convolutional neural network model to obtain the three-dimensional scene element semantic feature map.

3. The method for three-dimensional materialization transformation of geospatial data based on multi-source remote sensing according to claim 2, wherein semantically encoding the render effect text description to obtain a render effect semantically encoded feature vector comprises:

word segmentation processing is carried out on the rendering effect text description so as to convert the rendering effect text description into a word sequence composed of a plurality of words;

Mapping each word in the word sequence into a word embedding vector by using an embedding layer of a semantic encoder comprising the embedding layer to obtain a sequence of word embedding vectors;

Performing global context semantic coding on the sequence of word embedded vectors based on a converter thought by using a converter of the semantic encoder comprising an embedded layer to obtain a plurality of global context semantic feature vectors;

and cascading the plurality of global context semantic feature vectors to obtain the rendering effect semantic coding feature vector.

4. The method for three-dimensional materialization transformation of geospatial data based on multi-source remote sensing of claim 3 wherein cross-modal fusion of the three-dimensional scene element semantic feature map and the render effect semantic coding feature vector to obtain a three-dimensional scene element semantic feature map containing rendering information comprises:

and processing the three-dimensional scene element semantic feature map and the rendering effect semantic coding feature vector by using a cross-modal feature fusion device based on a meta-network to obtain the three-dimensional scene element semantic feature map containing rendering information.

5. The method for three-dimensional materialization transformation of geospatial data based on multi-source remote sensing of claim 4 wherein processing the three-dimensional scene element semantic feature map and the render effect semantic coding feature vector using a meta-network based cross-modal feature fusion engine to obtain the three-dimensional scene element semantic feature map containing rendering information comprises:

passing the rendering effect semantic coding feature vector through a point convolution layer to obtain a first convolution feature vector;

Passing the first convolution feature vector through a modified linear unit based on a ReLU function to obtain a first modified convolution feature vector;

passing the first modified convolution feature vector through a point convolution layer to obtain a second convolution feature vector;

passing the second convolution feature vector through a correction linear unit based on a Sigmoid function to obtain a second correction convolution feature vector;

And fusing the second modified convolution feature vector and the three-dimensional scene element semantic feature map to obtain the three-dimensional scene element semantic feature map containing rendering information.

6. The method for three-dimensional materialization transformation of geospatial data based on remote sensing of claim 5 wherein passing the three-dimensional scene element semantic feature map containing rendering information through an adaptive enhancer based on an adaptive attention layer to obtain an adaptive three-dimensional scene element semantic feature map containing rendering information comprises:

processing the three-dimensional scene element semantic feature map containing rendering information in the following adaptive attention formula to obtain the three-dimensional scene element semantic feature map containing rendering information in an adaptive manner; wherein, the self-adaptive attention formula is:

v_c＝pool(F)

A＝σ(W_a*v_c+b_a)

F′＝A′⊙F

Wherein F is the three-dimensional scene element semantic feature map containing rendering information, pool is pooling processing, v _c is pooling vector, W _a is weight matrix, b _a is bias vector, sigma is activating processing, A is initial element weight feature vector, A _i is feature value of ith position in the initial element weight feature vector, A 'is correction element weight feature vector, F' is the three-dimensional scene element semantic feature map containing rendering information, and as indicated by the weight of feature value in the correction element weight feature vector, multiplication is carried out on each feature matrix of the three-dimensional scene element semantic feature map containing rendering information along channel dimension.

7. The method for three-dimensional materialization transformation of geospatial data based on multi-source remote sensing of claim 6 wherein generating the rendered three-dimensional scene based on the adaptive three-dimensional scene element semantic feature map containing rendering information comprises:

performing feature distribution optimization on the three-dimensional scene element semantic feature map which is self-adaptive and contains rendering information to obtain an optimized three-dimensional scene element semantic feature map which is self-adaptive and contains rendering information;

and enabling the optimized self-adaptive three-dimensional scene element semantic feature map containing rendering information to pass through a rendering effect generator of AIGC models so as to obtain the rendered three-dimensional scene.