CN116309022A - Ancient architecture image self-adaptive style migration method based on visual encoder - Google Patents

Ancient architecture image self-adaptive style migration method based on visual encoder Download PDF

Info

Publication number
CN116309022A
CN116309022A CN202310216506.5A CN202310216506A CN116309022A CN 116309022 A CN116309022 A CN 116309022A CN 202310216506 A CN202310216506 A CN 202310216506A CN 116309022 A CN116309022 A CN 116309022A
Authority
CN
China
Prior art keywords
style
image
feature
features
migration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310216506.5A
Other languages
Chinese (zh)
Inventor
王耀南
曾凯
王蔚
毛建旭
张辉
钟杭
吴昊天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202310216506.5A priority Critical patent/CN116309022A/en
Publication of CN116309022A publication Critical patent/CN116309022A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • G06T3/04
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an ancient building image self-adaptive style migration method based on a visual encoder, which is based on a visual encoding-decoding style migration neural network, and comprises the steps of obtaining multi-stage relative position encoding characteristics of building images in a characteristic encoder stage, designing a self-adaptive color and structural characteristic migration fusion module, carrying out multi-stage multi-scale fusion on extracted building content characteristics and building style characteristics, and carrying out characteristic migration fusion by the self-adaptive color and structural characteristic migration fusion module by adopting a plurality of style characteristic migration methods; the building content features and building style features are then further migrated and fused at the feature decoder stage. The model predicts the historic building style migration test set, greatly improves the style migration estimation precision, solves the problems of building scene drawing refinement, morphological standardization, texture color standardization and the like, and realizes the reproduction of the building heritage space narrative.

Description

Ancient architecture image self-adaptive style migration method based on visual encoder
Technical Field
The invention relates to the technical field of image processing, in particular to an ancient architecture image self-adaptive style migration method based on a visual encoder.
Background
The ancient architecture is an important component of cultural heritage, and has important ancient value, scientific value and artistic value.
The digital depiction, the drawing narrative, the drawing recreating and the digital packaging of the ancient architecture are important means for transferring the ancient architecture culture. In the existing large-scale dense architecture heritage digital depiction and style narrative, the manual experience is still seriously relied on, the degree of automation is low, and the problems of messy creation scenes, irregular forms, non-uniform local textures and colors, time and labor waste and the like exist. These problems place high demands on the skill and experience of the practitioner.
Disclosure of Invention
The invention provides a visual encoder-based self-adaptive style migration method for ancient architecture images, which aims to solve the problems that in the prior art, the digital depiction of the ancient architecture depends on manual experience and the degree of automation is low.
In order to achieve the above purpose, the technical scheme of the invention specifically comprises:
a visual encoder-based self-adaptive style migration method for ancient architecture images is characterized in that: comprises the following steps:
s1, acquiring a plurality of ancient architecture images, style images, color reference images, structure reference images and style transfer learning real images, and constructing a data set by utilizing the five types of images; dividing the data set to obtain a training set and a testing set;
s2, constructing a visual coding-decoding style migration neural network; the visual coding-decoding style migration neural network comprises an encoder and a decoder;
s3, randomly extracting a group of samples in a training set, encoding the ancient architecture image and the style image in the samples by using an encoder, respectively extracting the characteristics of the ancient architecture image and the style image at different layers, outputting the architecture content characteristics and the architecture style characteristics, and fusing the architecture content characteristics and the architecture style characteristics to obtain fusion characteristics;
s4, inputting the fusion characteristics and the style image in the step S3 into a decoder for reconstruction in the process of extracting the coding characteristics to obtain a predicted style migrated picture;
s5, performing iterative training on the visual coding-decoding style migration neural network according to the picture subjected to style migration and the corresponding style migration learning real image in the data set to obtain a trained visual coding-decoding style migration neural network;
and S6, testing the trained visual coding-decoding style migration neural network by using the test set.
Preferably, the step S1 specifically includes the following steps:
step S11, collecting a plurality of ancient architecture images, style images, color reference images, structure reference images and style transfer learning real images, wherein the five types of images are equal in number and respectively correspond to each other, and are formed into data sets in a one-to-one correspondence mode, each group contains five types of different images, and the data sets can be expressed as follows:
Figure SMS_1
wherein D represents a dataset, I C Representing ancient architecture image, I S Representing a style image, θ C Representing a color reference image, θ cs Representing structural reference images, I G Representing a style transfer learning real image;
and step S12, randomly extracting images according to groups in the data set, and constructing a training set and a testing set according to the proportion of 8:2.
Preferably, the encoder in the step S2 is specifically a visual transducer encoder, and the visual transducer encoder includes two coding feature extraction sub-networks sharing weights, which are respectively an adaptive color adjustment network and an adaptive structure adjustment network;
the self-adaptive color adjustment network comprises i self-adaptive color adjustment modules which are sequentially connected and based on color reference images, and the i self-adaptive color adjustment modules sequentially perform color adjustment on the characteristics of different dimensions from low dimension to high dimension;
the self-adaptive structure adjustment network comprises i self-adaptive structure adjustment modules which are sequentially connected and based on the structure reference images, and the i self-adaptive structure adjustment modules sequentially perform structure adjustment on the characteristics of different dimensions from low dimension to high dimension.
Preferably, the decoder in the step S2 is specifically a visual transducer decoder, and the visual transducer decoder includes i feature decoding modules sequentially connected, where the i feature decoding modules sequentially decode features in different dimensions from high dimension to low dimension.
Preferably, the step S3 specifically includes the following steps:
step S31, randomly extracting a group of samples in a training set, and respectively inputting the extracted samples into a self-adaptive color adjustment network and a self-adaptive structure adjustment network;
s32, in the self-adaptive color adjustment network, i self-adaptive color adjustment modules are adopted, wherein i is more than or equal to 1 and less than or equal to 4, and the image characteristics of the ancient architecture are extracted
Figure SMS_2
And (3) the image features of style>
Figure SMS_3
Feature dimension is->
Figure SMS_4
B is the batch of extracted features, < > and->
Figure SMS_5
For extracting the channel number of the characteristic, W and H are the width and the height of the input image;
embedding codes at appointed positions of a visual coding-decoding style migration neural network to characterize ancient architecture images
Figure SMS_8
And style image feature->
Figure SMS_11
Input into the ith self-adaptive color adjustment module, and output the inquiry feature of the ancient architecture image by using embedded codes>
Figure SMS_13
Queried feature of ancient architecture image->
Figure SMS_7
And content features of the ancient building image->
Figure SMS_9
Query feature of style image +.>
Figure SMS_10
Queried features of style images->
Figure SMS_12
And content features of the stylistic image features +.>
Figure SMS_6
Fusion is carried out by utilizing a feature migration fusion AdaIN function, and the fusion process is defined as follows:
Figure SMS_14
Figure SMS_15
Figure SMS_16
Figure SMS_17
wherein μ (·) and σ (·) are the mean and variance functions, respectively;
Figure SMS_18
and->
Figure SMS_19
The query characteristics of the fused ancient building image and the queried characteristics of the fused ancient building image are respectively; />
Figure SMS_20
And->
Figure SMS_21
Query features of the fused style image and queried features of the fused style image are respectively;
next, the characteristic migration fusion MixAdaIN function is utilized to perform characteristic analysis on the content of the ancient building image
Figure SMS_22
And content features of the stylistic image features +.>
Figure SMS_23
Feature fusion is performed, and the fusion is defined as follows:
Figure SMS_24
Figure SMS_25
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_26
and->
Figure SMS_27
The content characteristics of the fused ancient building image and the content characteristics of the fused style image are respectively; gamma ray C1 、γ S1 And beta C1 、β S1 Parameters of the affine transformation of the self-adaptive color adjustment network are defined as follows:
Figure SMS_28
Figure SMS_29
Figure SMS_30
Figure SMS_31
wherein μ (·) and σ (·) are the mean and variance functions, respectively; lambda (lambda) 1 Is constant and has a value range of [0,1 ]];
Figure SMS_32
Is a color reference image; />
Figure SMS_33
Representing cross-attention feature extraction;
creating a first self-attention module, and outputting a reconstructed feature by the fused feature through the first self-attention module, wherein the definition is as follows:
Figure SMS_34
Figure SMS_35
wherein the function softmax (·) is the feature normalization function, d k For the feature dimension, T represents the feature matrix transpose,
Figure SMS_36
and->
Figure SMS_37
The reconstruction characteristics of the ancient building image output by the ith self-adaptive color adjustment and the reconstruction characteristics of the style image output by the ith self-adaptive color adjustment are respectively [ B, C, W/2 ] i ,H/2 i ];
Step S33, in the self-adaptive structure adjustment network, the characteristics of the ancient building image are extracted after i self-adaptive structure adjustment modules are adopted
Figure SMS_38
And (3) the image features of style>
Figure SMS_39
Feature dimension is->
Figure SMS_40
Image features of ancient architecture
Figure SMS_43
And (3) the image features of style>
Figure SMS_45
Respectively inputting the two images into an ith self-adaptive structure adjustment module, and respectively outputting query characteristics of the ancient building images by utilizing embedded codes>
Figure SMS_47
Query feature of style image->
Figure SMS_42
Queried feature of ancient architecture image->
Figure SMS_44
And style imageQueried feature->
Figure SMS_46
Content features of ancient building images
Figure SMS_48
And content characteristics of style image->
Figure SMS_41
Fusion is carried out through feature migration fusion AdaIN functions, and the fusion process is defined as follows:
Figure SMS_49
Figure SMS_50
Figure SMS_51
Figure SMS_52
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_53
and->
Figure SMS_54
The method comprises the steps of respectively inquiring characteristics of the fused ancient building image, inquiring characteristics of the fused style image, inquired characteristics of the fused ancient building image and inquired characteristics of the fused style image; sigma (·) and μ (·) are variance and mean functions of the feature, respectively;
next, the self-adaptive structural feature migration fusion MixAdaIN function is utilized to fuse the content features of the ancient building image
Figure SMS_55
And content characteristics of style image->
Figure SMS_56
Feature fusion is performed as defined below:
Figure SMS_57
Figure SMS_58
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_59
and->
Figure SMS_60
The content characteristics of the fused ancient building image and the content characteristics of the fused style image are respectively; gamma ray C2 、γ S2 And beta C2 、β S2 Parameters of affine transformation are integrated for feature cross attention migration, and are defined as follows:
Figure SMS_61
Figure SMS_62
Figure SMS_63
Figure SMS_64
here the number of the elements is the number,
Figure SMS_65
is a structural reference image; mu (·) and sigma (·) are mean and variance functions; lambda (lambda) 2 Is constant and has a value range of [0,1 ]];
Figure SMS_66
Representing cross-attention feature extraction;
creating a second self-attention module, and outputting a reconstructed feature by the fused feature through the second self-attention module, wherein the definition is as follows:
Figure SMS_67
Figure SMS_68
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_69
and->
Figure SMS_70
The reconstruction fusion characteristics of the ancient building image reconstructed and output by the ith self-adaptive structure adjusting module and the reconstruction fusion characteristics of the style image reconstructed and output by the ith self-adaptive structure adjusting module are respectively, and the dimensions of the reconstruction characteristics are respectively
Figure SMS_71
Figure SMS_72
Step S34, utilizing a pair of visual transducer encoders
Figure SMS_73
And->
Figure SMS_74
Fusion was performed and the pair +.>
Figure SMS_75
And->
Figure SMS_76
Fusion is carried out, and the fusion process is defined as follows:
Figure SMS_77
Figure SMS_78
wherein Conv (·) is the convolutional layer; function [. Cndot. ]]Representing feature stitching operations;
Figure SMS_79
and->
Figure SMS_80
The output historic building fusion characteristics and style fusion characteristics are respectively provided for the visual transducer encoder.
Preferably, the step S4 specifically includes the following steps:
s41, extracting the decoding characteristics of the ancient building image through the i-1 th characteristic decoding module
Figure SMS_83
Decoding features of picture with style->
Figure SMS_87
And input to the ith feature decoding module to output query features of the ancient building image by embedded coding>
Figure SMS_88
Query feature of style image->
Figure SMS_82
Queried feature of ancient architecture image->
Figure SMS_84
Queried features of style images->
Figure SMS_85
Content characteristics of ancient architecture image->
Figure SMS_86
Content features of style images/>
Figure SMS_81
Creating a third self-attention module by which the reconstructed feature is output, defined as follows:
Figure SMS_89
Figure SMS_90
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_91
and->
Figure SMS_92
The reconstruction features are the reconstruction features of the decoding features of the ancient building image and the reconstruction features of the decoding features of the style image, and the dimension of the reconstruction features is +.>
Figure SMS_93
Step S42: for a pair of
Figure SMS_94
And->
Figure SMS_95
And performing feature migration fusion, wherein the definition of the feature migration fusion process is as follows:
Figure SMS_96
Figure SMS_97
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_98
ancient architecture image output by ith characteristic decoding moduleIs a migration fusion feature of->
Figure SMS_99
Representing migration fusion characteristics of the style image output by the ith characteristic decoding module;
step S43: using visual transducer decoder pairs
Figure SMS_100
And processing and outputting the picture with the predicted style transferred, wherein the grid transfer estimation output module is defined as follows:
Figure SMS_101
wherein Conv (·) is the convolutional layer;
Figure SMS_102
for predicting the picture after style migration, the dimension is consistent with the input image.
Preferably, the step S5 specifically includes the following steps:
step S51, respectively calculating content characteristic loss functions L according to the generated pictures after the prediction style migration C Style characteristic loss function L S Semantic feature loss function L I And style reconstruction loss function L G
Step S52, respectively determining content characteristic loss functions L C Style characteristic loss function L S Semantic feature loss function L I And style reconstruction loss function L G Is a supervisory training weight lambda 1 、λ 2 、λ 3 And lambda (lambda) 4
Step S53, establishing a total loss function L ALL Repeating steps S3-S5, and iterating training to minimize the total loss function until training 50epoch or the loss value is less than 10 -3
Preferably, the content feature loss function L in the step S51 C Style characteristic loss function L S Semantic feature loss function L I And style reconstruction loss function L G The method comprises the following steps:
wherein the content characteristic loss function L C The definition is as follows:
Figure SMS_103
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_104
and->
Figure SMS_105
The building content image features and style features extracted by the first layer encoder feature extraction module are obtained; l is more than or equal to 1 and less than or equal to 4; />
Figure SMS_106
Reconstructing an image for style migration; the function II is L2 norm; function->
Figure SMS_107
Normalizing the mean variance channel; n (N) e Representing the number of layers of the feature encoder module, set to 4 in this example;
style characteristic loss function L S The definition is as follows:
Figure SMS_108
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_109
and->
Figure SMS_110
The image features and style features of the building content extracted for the first layer encoder are represented by the mean and variance functions, N d Representing the number of layers of the feature decoder module, set to 4 in this example;
semantic feature loss function L I The definition is as follows:
Figure SMS_111
style reconstruction loss function L G The definition is as follows:
Figure SMS_112
preferably, lambda in the step S52 1 、λ 2 、λ 3 And lambda (lambda) 4 The following formula is satisfied:
λ 1 + 2 + 3 + 4 =1, and λ 1 =0.1,λ 2 =0.2,λ 3 =0.2,λ 4 =0.5。
Preferably, the total loss function in the step S53 is:
L ALL1 L C + 2 L S + 3 L I + 4 L G
the beneficial effects are that:
compared with the prior art, the invention has the advantages that the following aspects are mainly realized:
(1) The invention provides a visual coding-decoding style migration neural network, which can fully learn the relative position texture and structural feature information of a building image, greatly improve the feature representation capability of colors and structures during style migration, and solve the problems of irregular local structure form, non-uniform texture and color and the like during style migration of the building image.
(2) The visual coding-decoding style migration neural network adopts multi-stage extraction of shallow layer features and deep layer features of an image, gradual global context fusion of building images and style image features, and gradual coding reconstruction of the fused features.
(3) The invention designs a self-adaptive color adjustment module which is integrated into a visual coding-decoding style migration neural network and can fuse style characteristics and color reference characteristics in a self-adaptive global supervision migration mode.
(4) The invention designs a self-adaptive structure adjusting module which is integrated into a visual coding-decoding style migration neural network and can be used for fusing building structure characteristics and structure reference characteristics in a self-adaptive global supervision mode.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a portion of a sample of an ancient architecture style migration simulation learning dataset according to an embodiment of the present invention, with FIG. (a) being an architecture image and FIG. (b) being a style image;
FIG. 3 is a block diagram of a visual coding-decoding style migrating neural network employed in the present invention;
fig. 4 is a block diagram of a visual coding-decoding style migration neural network, wherein fig. (a) to (c) correspond to an adaptive color adjustment network, an adaptive structure adjustment network, and a visual transducer decoder, respectively.
Detailed Description
The invention will be further described with reference to the drawings and examples.
Referring to fig. 1, a flow chart of the method of the present invention is shown, and the method for adaptive style migration of ancient architecture images based on a visual encoder comprises the following steps:
s1, acquiring a plurality of ancient architecture images, style images, color reference images, structure reference images and style transfer learning real images, and constructing a data set by utilizing the five types of images; dividing the data set to obtain a training set and a testing set;
s2, constructing a visual coding-decoding style migration neural network; the visual coding-decoding style migration neural network comprises an encoder and a decoder;
s3, randomly extracting a group of samples in a training set, encoding the ancient architecture image and the style image in the samples by using an encoder, respectively extracting the characteristics of the ancient architecture image and the style image at different layers, outputting the architecture content characteristics and the architecture style characteristics, and fusing the architecture content characteristics and the architecture style characteristics to obtain fusion characteristics;
s4, inputting the fusion characteristics and the style image in the step S3 into a decoder for reconstruction in the process of extracting the coding characteristics to obtain a predicted style migrated picture;
s5, performing iterative training on the visual coding-decoding style migration neural network according to the predicted picture transferred by the style and the corresponding style transfer learning real image in the data set to obtain a trained visual coding-decoding style migration neural network;
and S6, testing the trained visual coding-decoding style migration neural network by using the test set.
In this embodiment, the step S1 specifically includes the following steps:
step S11, collecting a plurality of ancient architecture images, style images, color reference images, structure reference images and style transfer learning real images, wherein the five types of images are equal in number and respectively correspond to each other, and are formed into data sets in a one-to-one correspondence mode, each group contains five types of different images, and the data sets can be expressed as follows:
Figure SMS_113
wherein D represents a dataset, I C Representing ancient architecture image, I S Representing a style image, θ C Representing a color reference image, θ cs Representing structural reference images, I G Representing a style transfer learning real image;
specifically, the ancient building images are taken from different angles in the plurality of ancient building groups, and it is to be noted that the example uses the ancient buildings such as the tai-level street and the copper official kiln as the shooting scenes, but the ancient building groups used in the data set constructed in the example are not limited to this. Fig. 2 is a shooting scene diagram. And acquiring the building style image to be learned, the color reference image and the structure reference image, and inviting professionals to draw style transfer learning real images based on the images.
And step S12, randomly extracting images according to groups in the data set, and constructing a training set and a testing set according to the proportion of 8:2.
In this embodiment, the encoder in step S2 is specifically a visual transducer encoder, where the visual transducer encoder includes two coding feature extraction sub-networks with shared weights, which are an adaptive color adjustment network and an adaptive structure adjustment network respectively;
the self-adaptive color adjustment network comprises i self-adaptive color adjustment modules which are sequentially connected and based on color reference images, and the i self-adaptive color adjustment modules sequentially perform color adjustment on the characteristics of different dimensions from low dimension to high dimension;
the self-adaptive structure adjustment network comprises i self-adaptive structure adjustment modules which are sequentially connected and based on the structure reference images, and the i self-adaptive structure adjustment modules sequentially perform structure adjustment on the characteristics of different dimensions from low dimension to high dimension.
Specifically, the adaptive color adjustment network and the adaptive structure adjustment network are as shown in fig. 4 (a) and (b). The adaptive color adjustment module feature migration fusion is as follows: the output characteristic of the i-1 self-adaptive color adjustment module is characterized by the ancient architecture image
Figure SMS_118
And (3) the image features of style>
Figure SMS_117
Respectively inputting the two images into an ith self-adaptive color adjustment module, and respectively outputting query features of the ancient building images by utilizing a feature local position embedded coding layer>
Figure SMS_123
Query feature of style image->
Figure SMS_116
Queried feature of ancient architecture image->
Figure SMS_128
And queried features of style images +.>
Figure SMS_121
Content characteristics of the ancient architecture image +.>
Figure SMS_129
And content characteristics of style image->
Figure SMS_120
Performing attention migration fusion on the feature migration fusion AdaIN function and the MixAdaIN function to obtain the feature +.>
Figure SMS_133
And (3) the image features of style>
Figure SMS_114
Likewise, the adaptive structure adjustment module feature migration fusion is as follows: the output characteristic of the i-1 self-adaptive structure adjusting module is the ancient architecture image characteristic +.>
Figure SMS_122
And style image features
Figure SMS_119
Respectively inputting the two images into an ith self-adaptive structure adjusting module, and respectively outputting query features of the ancient building images by utilizing the feature local position embedded coding layer>
Figure SMS_124
Query feature of style image->
Figure SMS_126
Queried feature of ancient architecture image->
Figure SMS_130
And queried features of style images +.>
Figure SMS_125
Content characteristics of the ancient architecture image +.>
Figure SMS_131
And content characteristics of style image->
Figure SMS_127
Performing attention migration fusion on the AdaIN function and the MixAdaIN function through feature migration fusion to obtain the ancient timesBuilding image feature->
Figure SMS_132
And (3) the image features of style>
Figure SMS_115
In this embodiment, the decoder in step S2 is specifically a visual transducer decoder, where the visual transducer decoder includes i feature decoding modules sequentially connected to each other, and the i feature decoding modules sequentially decode features in different dimensions from high dimension to low dimension.
In this embodiment, the step S3 specifically includes the following steps:
step S31, randomly extracting a group of samples in a training set, and respectively inputting the extracted samples into a self-adaptive color adjustment network and a self-adaptive structure adjustment network;
selecting an ancient building image needing style migration in a training set, a style image serving as a style source, a color reference image and a structure reference image serving as inputs of a self-adaptive color adjustment network and a self-adaptive structure adjustment network, wherein the sizes of all input images are W.H, W is the width of the image, and H is the height of the image;
s32, in the self-adaptive color adjustment network, i self-adaptive color adjustment modules are adopted, wherein i is more than or equal to 1 and less than or equal to 4, and the image characteristics of the ancient architecture are extracted
Figure SMS_134
And (3) the image features of style>
Figure SMS_135
Feature dimension is->
Figure SMS_136
B is the batch of extracted features, < > and->
Figure SMS_137
For extracting the channel number of the characteristic, W and H are the width and the height of the input image;
in visual encoding-decodingEmbedding codes in appointed positions of style migration neural network to characterize ancient architecture images
Figure SMS_139
And style image feature->
Figure SMS_142
Input into the ith self-adaptive color adjustment module, and output the inquiry feature of the ancient architecture image by using embedded codes>
Figure SMS_144
Queried feature of ancient architecture image->
Figure SMS_140
And content features of the ancient building image->
Figure SMS_141
Query feature of style image +.>
Figure SMS_143
Queried features of style images->
Figure SMS_145
And content features of the stylistic image features +.>
Figure SMS_138
Fusion is carried out by utilizing a feature migration fusion AdaIN function, and the fusion process is defined as follows:
Figure SMS_146
Figure SMS_147
Figure SMS_148
Figure SMS_149
wherein μ (·) and σ (·) are the mean and variance functions, respectively;
Figure SMS_150
and->
Figure SMS_151
The query characteristics of the fused ancient building image and the queried characteristics of the fused ancient building image are respectively; />
Figure SMS_152
And->
Figure SMS_153
Query features of the fused style image and queried features of the fused style image are respectively;
next, the characteristic migration fusion MixAdaIN function is utilized to perform characteristic analysis on the content of the ancient building image
Figure SMS_154
And content features of the stylistic image features +.>
Figure SMS_155
Feature fusion is performed, and the fusion is defined as follows:
Figure SMS_156
Figure SMS_157
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_158
and->
Figure SMS_159
The content characteristics of the fused ancient building image and the content characteristics of the fused style image are respectively; gamma ray C1 、γ S1 And beta C1 、β S1 Parameters of the affine transformation of the self-adaptive color adjustment network are defined as follows:
Figure SMS_160
Figure SMS_161
Figure SMS_162
Figure SMS_163
wherein μ (·) and σ (·) are the mean and variance functions, respectively; lambda (lambda) 1 Is constant and has a value range of [0,1 ]];
Figure SMS_164
Is a color reference image; />
Figure SMS_165
Representing cross-attention feature extraction;
creating a first self-attention module, and outputting a reconstructed feature by the fused feature through the first self-attention module, wherein the definition is as follows:
Figure SMS_166
/>
Figure SMS_167
wherein the function softmax (·) is the feature normalization function, d k For the feature dimension, T represents the feature matrix transpose,
Figure SMS_168
and->
Figure SMS_169
The reconstruction characteristics of the ancient building image output by the ith self-adaptive color adjustment and the reconstruction characteristics of the style image output by the ith self-adaptive color adjustment are respectively [ B, C, W/2 ] i ,H/2 i ];
Step S33, in the self-adaptive structure adjustment network, the characteristics of the ancient building image are extracted after i self-adaptive structure adjustment modules are adopted
Figure SMS_170
And (3) the image features of style>
Figure SMS_171
Feature dimension is->
Figure SMS_172
Image features of ancient architecture
Figure SMS_174
And (3) the image features of style>
Figure SMS_176
Respectively inputting the two images into an ith self-adaptive structure adjustment module, and respectively outputting query characteristics of the ancient building images by utilizing embedded codes>
Figure SMS_178
Query feature of style image->
Figure SMS_175
Queried feature of ancient architecture image->
Figure SMS_177
And queried features of style images +.>
Figure SMS_179
Content features of ancient building images
Figure SMS_180
And content characteristics of style image->
Figure SMS_173
Fusion is carried out through feature migration fusion AdaIN functions, and the fusion process is defined as follows:
Figure SMS_181
Figure SMS_182
Figure SMS_183
Figure SMS_184
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_185
and->
Figure SMS_186
The method comprises the steps of respectively inquiring characteristics of the fused ancient building image, inquiring characteristics of the fused style image, inquired characteristics of the fused ancient building image and inquired characteristics of the fused style image; sigma (·) and μ (·) are variance and mean functions of the feature, respectively;
next, the self-adaptive structural feature migration fusion MixAdaIN function is utilized to fuse the content features of the ancient building image
Figure SMS_187
And content characteristics of style image->
Figure SMS_188
Feature fusion is performed as defined below:
Figure SMS_189
/>
Figure SMS_190
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_191
and->
Figure SMS_192
The content characteristics of the fused ancient building image and the content characteristics of the fused style image are respectively; gamma ray C2 、γ S2 And beta C2 、β S2 Parameters of affine transformation are integrated for feature cross attention migration, and are defined as follows:
Figure SMS_193
Figure SMS_194
Figure SMS_195
Figure SMS_196
here the number of the elements is the number,
Figure SMS_197
is a structural reference image; mu (·) and sigma (·) are mean and variance functions; lambda (lambda) 2 Is constant and has a value range of [0,1 ]];
Figure SMS_198
Representing cross-attention feature extraction;
creating a second self-attention module, and outputting a reconstructed feature by the fused feature through the second self-attention module, wherein the definition is as follows:
Figure SMS_199
Figure SMS_200
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_201
and->
Figure SMS_202
The reconstruction fusion characteristics of the ancient building image reconstructed and output by the ith self-adaptive structure adjusting module and the reconstruction fusion characteristics of the style image reconstructed and output by the ith self-adaptive structure adjusting module are respectively, and the dimensions of the reconstruction characteristics are respectively
Figure SMS_203
Figure SMS_204
Step S34, utilizing a pair of visual transducer encoders
Figure SMS_205
And->
Figure SMS_206
Fusion was performed and the pair +.>
Figure SMS_207
And->
Figure SMS_208
Fusion is carried out, and the fusion process is defined as follows:
Figure SMS_209
Figure SMS_210
wherein Conv (·) is the convolutional layer; function [. Cndot. ]]Representing feature stitching operations;
Figure SMS_211
and->
Figure SMS_212
The output historic building fusion characteristics and style fusion characteristics are respectively provided for the visual transducer encoder.
In this embodiment, the step S4 specifically includes the following steps:
s41, extracting the decoding characteristics of the ancient building image through the i-1 th characteristic decoding module
Figure SMS_214
Decoding features of picture with style->
Figure SMS_219
And input to the ith feature decoding module to output query features of the ancient building image by embedded coding>
Figure SMS_220
Query feature of style image->
Figure SMS_215
Queried feature of ancient architecture image->
Figure SMS_216
Queried features of style images->
Figure SMS_217
Content characteristics of ancient architecture image->
Figure SMS_218
Content characteristics of style image->
Figure SMS_213
Creating a third self-attention module by which the reconstructed feature is output, defined as follows:
Figure SMS_221
Figure SMS_222
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_223
and->
Figure SMS_224
The reconstruction features are the reconstruction features of the decoding features of the ancient building image and the reconstruction features of the decoding features of the style image, and the dimension of the reconstruction features is +.>
Figure SMS_225
Step S42: for a pair of
Figure SMS_226
And->
Figure SMS_227
And performing feature migration fusion, wherein the definition of the feature migration fusion process is as follows:
Figure SMS_228
Figure SMS_229
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_230
representing migration fusion characteristics of the ancient architecture image output by the ith characteristic decoding module, and (I)>
Figure SMS_231
Representing migration fusion characteristics of the style image output by the ith characteristic decoding module;
step S43: using visionPerceptual transform decoder pair
Figure SMS_232
And processing and outputting the picture with the predicted style transferred, wherein the grid transfer estimation output module is defined as follows:
Figure SMS_233
wherein Conv (·) is the convolutional layer;
Figure SMS_234
for predicting the picture after style migration, the dimension is consistent with the input image. See fig. 4 (c) for a visual transducer decoder;
preferably, the step S5 specifically includes the following steps:
step S51, respectively calculating content characteristic loss functions L according to the generated pictures after the prediction style migration C Style characteristic loss function L S Semantic feature loss function L I And style reconstruction loss function L G
Step S52, respectively determining content characteristic loss functions L C Style characteristic loss function L S Semantic feature loss function L I And style reconstruction loss function L G Is a supervisory training weight lambda 1 、λ 2 、λ 3 And lambda (lambda) 4
Step S53, establishing a total loss function L ALL Repeating steps S3-S5, and iterating training to minimize the total loss function until training 50epoch or the loss value is less than 10 -3
Preferably, the content feature loss function L in the step S51 C Style characteristic loss function L S Semantic feature loss function L i And style reconstruction loss function L G The method comprises the following steps:
wherein the content characteristic loss function L C The definition is as follows:
Figure SMS_235
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_236
and->
Figure SMS_237
The building content image features and style features extracted by the first layer encoder feature extraction module are obtained; l is more than or equal to 1 and less than or equal to 4; />
Figure SMS_238
Reconstructing an image for style migration; the function II is L2 norm; function->
Figure SMS_239
Normalizing the mean variance channel; n (N) e Representing the number of layers of the feature encoder module, set to 4 in this example;
style characteristic loss function L S The definition is as follows:
Figure SMS_240
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_241
and->
Figure SMS_242
The image features and style features of the building content extracted for the first layer encoder are represented by the mean and variance functions, N d Representing the number of layers of the feature decoder module, set to 4 in this example;
semantic feature loss function L I The definition is as follows:
Figure SMS_243
style reconstruction loss function L G The definition is as follows:
Figure SMS_244
preferably, lambda in the step S52 1 、λ 2 、λ 3 And lambda (lambda) 4 The following formula is satisfied:
λ 1234 =1, and λ 1 =0.1,λ 2 =0.2,λ 3 =0.2,λ 4 =0.5。
Preferably, the total loss function in the step S53 is:
L ALL =λ 1 L C2 L S3 L I4 L G
specifically, the network training employs a back propagation algorithm, and the training optimizer employs an Adam algorithm, wherein the optimizer parameter β 1 =0.9,β 2 =0.999. The initial learning rate of the network training is set to 0.001, and the learning rate is reduced to 1/2 of the original learning rate when the training is performed to 20, 30 and 40 epochs. The network model trains the software platform to be PyTorch.

Claims (10)

1. The utility model provides a ancient building image self-adaptation style migration method based on visual encoder which is characterized in that the method comprises the following steps:
s1, acquiring a plurality of ancient architecture images, style images, color reference images, structure reference images and style transfer learning real images, and constructing a data set by utilizing the five types of images; dividing the data set to obtain a training set and a testing set;
s2, constructing a visual coding-decoding style migration neural network; the visual coding-decoding style migration neural network comprises an encoder and a decoder;
s3, randomly extracting a group of samples in a training set, encoding the ancient architecture image and the style image in the samples by using an encoder, respectively extracting the characteristics of the ancient architecture image and the style image at different layers, outputting the architecture content characteristics and the architecture style characteristics, and fusing the architecture content characteristics and the architecture style characteristics to obtain fusion characteristics;
s4, inputting the fusion characteristics and the style images in the step S3 into a decoder for reconstruction in the process of extracting the characteristics of the ancient architecture images and the style images in different layers by the encoder, and obtaining a predicted style transferred picture;
s5, performing iterative training on the visual coding-decoding style migration neural network according to the predicted picture transferred by the style and the corresponding style transfer learning real image in the data set to obtain a trained visual coding-decoding style migration neural network;
and S6, testing the trained visual coding-decoding style migration neural network by using the test set.
2. The method for adaptive style migration of ancient architecture image according to claim 1, wherein the step S1 specifically comprises the steps of:
step S11, collecting a plurality of ancient architecture images, style images, color reference images, structure reference images and style transfer learning real images, wherein the five types of images are equal in number and respectively correspond to each other, and are formed into data sets in a one-to-one correspondence mode, each group contains five types of different images, and the data sets can be expressed as follows:
Figure FDA0004115188850000011
wherein D represents a dataset, I C Representing ancient architecture image, I s Representing a style image, θ C Representing a color reference image, θ cs Representing structural reference images, I G Representing a style transfer learning real image;
and step S12, randomly extracting images according to groups in the data set, and constructing a training set and a testing set according to the proportion of 8:2.
3. The method for adaptive style migration of ancient architecture image according to claim 2, wherein the encoder in step S2 is specifically a visual transducer encoder, and the visual transducer encoder includes two coding feature extraction sub-networks sharing weights, which are an adaptive color adjustment network and an adaptive structure adjustment network, respectively;
the self-adaptive color adjustment network comprises i self-adaptive color adjustment modules which are sequentially connected and based on color reference images, and the i self-adaptive color adjustment modules sequentially perform color adjustment on the characteristics of different dimensions from low dimension to high dimension;
the self-adaptive structure adjustment network comprises i self-adaptive structure adjustment modules which are sequentially connected and based on the structure reference images, and the i self-adaptive structure adjustment modules sequentially perform structure adjustment on the characteristics of different dimensions from low dimension to high dimension.
4. The method for adaptive style migration of ancient architecture image according to claim 3, wherein the visual transducer decoder in step S2 comprises i feature decoding modules connected in sequence, and the i feature decoding modules decode features of different dimensions from high dimension to low dimension in sequence.
5. The method for adaptive style migration of ancient architecture image according to claim 4, wherein the step S3 specifically comprises the steps of:
step S31, randomly extracting a group of samples in a training set, and respectively inputting the extracted samples into a self-adaptive color adjustment network and a self-adaptive structure adjustment network;
s32, in the self-adaptive color adjustment network, i self-adaptive color adjustment modules are adopted, wherein i is more than or equal to 1 and less than or equal to 4, and the image characteristics of the ancient architecture are extracted
Figure FDA0004115188850000021
And (3) the image features of style>
Figure FDA0004115188850000022
Feature dimension is->
Figure FDA0004115188850000023
B is the batch of extracted features, < > and->
Figure FDA0004115188850000024
For extracting the channel number of the characteristic, W and H are the width and the height of the input image;
specifically, ancient architecture image features
Figure FDA0004115188850000025
And style image feature->
Figure FDA0004115188850000026
Inputting the query feature of the ancient building image output by the i self-adaptive color adjustment module and utilizing the feature local position embedded coding layer>
Figure FDA0004115188850000027
Queried feature of ancient architecture image->
Figure FDA0004115188850000028
And content features of the ancient building image->
Figure FDA0004115188850000029
Query feature of style image +.>
Figure FDA00041151888500000210
Queried features of style images->
Figure FDA00041151888500000211
And content features of the stylistic image features +.>
Figure FDA00041151888500000212
Fusion is carried out by utilizing a feature migration fusion AdaIN function, and the fusion process is defined as follows:
Figure FDA00041151888500000213
Figure FDA00041151888500000214
Figure FDA00041151888500000215
Figure FDA00041151888500000216
wherein μ (·) and σ (·) are the mean and variance functions, respectively;
Figure FDA00041151888500000217
and->
Figure FDA00041151888500000218
The query characteristics of the fused ancient building image and the queried characteristics of the fused ancient building image are respectively; />
Figure FDA00041151888500000219
And->
Figure FDA00041151888500000220
Query features of the fused style image and queried features of the fused style image are respectively;
next, the characteristic migration fusion MixAdaIN function is utilized to perform characteristic analysis on the content of the ancient building image
Figure FDA00041151888500000221
And content features of the stylistic image features +.>
Figure FDA0004115188850000031
Feature fusion is carried outThe definition is as follows:
Figure FDA0004115188850000032
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA0004115188850000033
and->
Figure FDA0004115188850000034
The content characteristics of the fused ancient building image and the content characteristics of the fused style image are respectively; gamma ray C1 、γ S1 And beta C1 、β S1 Parameters of the affine transformation of the self-adaptive color adjustment network are defined as follows:
Figure FDA0004115188850000035
Figure FDA0004115188850000036
Figure FDA0004115188850000037
Figure FDA0004115188850000038
wherein μ (·) and σ (·) are the mean and variance functions, respectively; lambda (lambda) 1 Is constant and has a value range of [0,1 ]];
Figure FDA0004115188850000039
Is a color reference image; />
Figure FDA00041151888500000310
Representing cross-attention feature extraction;
creating a first self-attention module, and outputting a reconstructed feature by the fused feature through the first self-attention module, wherein the definition is as follows:
Figure FDA00041151888500000311
Figure FDA00041151888500000312
wherein the function softmax (·) is the feature normalization function, d k For the feature dimension, T represents the feature matrix transpose,
Figure FDA00041151888500000313
and
Figure FDA00041151888500000314
the reconstruction characteristics of the ancient building image output by the ith self-adaptive color adjustment and the reconstruction characteristics of the style image output by the ith self-adaptive color adjustment are respectively [ B, C, W/2 ] i ,H/2 i ];
Step S33, in the self-adaptive structure adjustment network, the characteristics of the ancient building image are extracted after i self-adaptive structure adjustment modules are adopted
Figure FDA00041151888500000315
And (3) the image features of style>
Figure FDA00041151888500000316
Feature dimension is->
Figure FDA00041151888500000317
Image features of ancient architecture
Figure FDA00041151888500000318
And (3) the image features of style>
Figure FDA00041151888500000319
Respectively inputting the two images into an ith self-adaptive structure adjusting module, and respectively outputting query features of the ancient building images by utilizing the feature local position embedded coding layer>
Figure FDA00041151888500000320
Query feature of style image->
Figure FDA0004115188850000041
Queried feature of ancient architecture image->
Figure FDA0004115188850000042
And queried features of style images +.>
Figure FDA0004115188850000043
Content characteristics of the ancient architecture image +.>
Figure FDA0004115188850000044
And content characteristics of style image->
Figure FDA0004115188850000045
Fusion is carried out through feature migration fusion AdaIN functions, and the fusion process is defined as follows:
Figure FDA0004115188850000046
Figure FDA0004115188850000047
Figure FDA0004115188850000048
Figure FDA0004115188850000049
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA00041151888500000410
and->
Figure FDA00041151888500000411
The method comprises the steps of respectively inquiring characteristics of the fused ancient building image, inquiring characteristics of the fused style image, inquired characteristics of the fused ancient building image and inquired characteristics of the fused style image; sigma (·) and μ (·) are variance and mean functions of the feature, respectively;
next, the self-adaptive structural feature migration fusion MixAdaIN function is utilized to fuse the content features of the ancient building image
Figure FDA00041151888500000412
And content characteristics of style image->
Figure FDA00041151888500000413
Feature fusion is performed as defined below:
Figure FDA00041151888500000414
Figure FDA00041151888500000415
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA00041151888500000416
and->
Figure FDA00041151888500000417
The content characteristics of the fused ancient building image and the content characteristics of the fused style image are respectively; gamma ray C2 、γ S2 And beta C2 、β S2 Parameters of affine transformation are integrated for feature cross attention migration, and are defined as follows:
Figure FDA00041151888500000418
Figure FDA00041151888500000419
Figure FDA00041151888500000420
Figure FDA0004115188850000051
here the number of the elements is the number,
Figure FDA0004115188850000052
is a structural reference image; mu (·) and sigma (·) are mean and variance functions; lambda (lambda) 2 Is constant and has a value range of [0,1 ]];/>
Figure FDA0004115188850000053
Representing cross-attention feature extraction;
creating a second self-attention module, and outputting a reconstructed feature by the fused feature through the second self-attention module, wherein the definition is as follows:
Figure FDA0004115188850000054
Figure FDA0004115188850000055
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA0004115188850000056
and->
Figure FDA0004115188850000057
The reconstruction fusion characteristics of the ancient building image reconstructed and output by the ith self-adaptive structure adjusting module and the reconstruction fusion characteristics of the style image reconstructed and output by the ith self-adaptive structure adjusting module are respectively, and the dimensions of the reconstruction characteristics are respectively
Figure FDA0004115188850000058
Figure FDA0004115188850000059
Step S34, utilizing a pair of visual transducer encoders
Figure FDA00041151888500000510
And->
Figure FDA00041151888500000511
Fusion was performed and the pair +.>
Figure FDA00041151888500000512
And->
Figure FDA00041151888500000513
Fusion is carried out, and the fusion process is defined as follows:
Figure FDA00041151888500000514
Figure FDA00041151888500000515
wherein Conv (·) is the convolutional layer; function [. Cndot. ]]Representing feature stitching operations;
Figure FDA00041151888500000516
and->
Figure FDA00041151888500000517
The output historic building fusion characteristics and style fusion characteristics are respectively provided for the visual transducer encoder.
6. The method for adaptive style migration of ancient architecture image according to claim 5, wherein the step S4 specifically comprises the steps of:
s41, extracting the decoding characteristics of the ancient building image through the i-1 th characteristic decoding module
Figure FDA00041151888500000518
Decoding features of picture with style->
Figure FDA00041151888500000519
And input to the ith feature decoding module to output query features of the ancient building image by embedded coding>
Figure FDA00041151888500000520
Query feature of style image->
Figure FDA00041151888500000521
Queried feature of ancient architecture image->
Figure FDA00041151888500000522
Queried features for style images
Figure FDA00041151888500000523
Content characteristics of ancient architecture image->
Figure FDA00041151888500000524
Content characteristics of style image->
Figure FDA00041151888500000525
Creating a third self-attention module by which the reconstructed feature is output, defined as follows:
Figure FDA0004115188850000061
Figure FDA0004115188850000062
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA0004115188850000063
and->
Figure FDA0004115188850000064
The reconstruction features are the reconstruction features of the decoding features of the ancient building image and the reconstruction features of the decoding features of the style image, and the dimension of the reconstruction features is +.>
Figure FDA0004115188850000065
Step S42: for a pair of
Figure FDA0004115188850000066
And->
Figure FDA0004115188850000067
And performing feature migration fusion, wherein the definition of the feature migration fusion process is as follows:
Figure FDA0004115188850000068
Figure FDA0004115188850000069
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA00041151888500000610
representing migration fusion characteristics of the ancient architecture image output by the ith characteristic decoding module, and (I)>
Figure FDA00041151888500000611
Representing migration fusion characteristics of the style image output by the ith characteristic decoding module;
step S43: using visual transducer decoder pairs
Figure FDA00041151888500000612
And processing and outputting the predicted picture after style migration, wherein a style migration estimation output module is defined as follows:
Figure FDA00041151888500000613
wherein Conv (·) is the convolutional layer;
Figure FDA00041151888500000614
for predicting the picture after style migration, the dimension is consistent with the input image.
7. The method for adaptive style migration of ancient architecture image according to claim 6, wherein the step S5 specifically comprises the steps of:
step S51, respectively calculating content characteristic loss functions L according to the generated pictures after the prediction style migration C Style characteristic loss function L s Semantic featuresLoss function L I And style reconstruction loss function L G
Step S52, respectively determining content characteristic loss functions L C Style characteristic loss function L S Semantic feature loss function L I And style reconstruction loss function L G Is a supervisory training weight lambda 1 、λ 2 、λ 3 And lambda (lambda) 4
Step S53, establishing a total loss function L ALL Repeating steps S3-S5, and iterating training to minimize the total loss function until training 50epoch or the loss value is less than 10 -3
8. The method for adaptive style migration of building images according to claim 7, wherein the content feature loss function L in step S51 C Style characteristic loss function L S Semantic feature loss function L I And style reconstruction loss function L G The method comprises the following steps:
wherein the content characteristic loss function L C The definition is as follows:
Figure FDA0004115188850000071
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA0004115188850000072
and->
Figure FDA0004115188850000073
The building content image features and style features extracted by the first layer encoder feature extraction module are obtained; l is more than or equal to 1 and less than or equal to 4; />
Figure FDA0004115188850000074
Reconstructing an image for style migration; the function II is L2 norm; function->
Figure FDA0004115188850000075
Normalizing the mean variance channel; n (N) e Representing the number of layers of the feature encoder module, set to 4 in this example;
style characteristic loss function L S The definition is as follows:
Figure FDA0004115188850000076
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA0004115188850000077
and->
Figure FDA0004115188850000078
The image features and style features of the building content extracted for the first layer encoder are represented by the mean and variance functions, N d Representing the number of layers of the feature decoder module, set to 4 in this example;
semantic feature loss function L I The definition is as follows:
Figure FDA0004115188850000079
style reconstruction loss function L G The definition is as follows:
Figure FDA00041151888500000710
9. the method for adaptive style migration of ancient architecture image according to claim 8, wherein λ in step S52 is 1 、λ 2 、λ 3 And lambda (lambda) 4 The following formula is satisfied:
λ 1 + 2 + 3 + 4 =1, and λ 1 =0.1,λ 2 =0.2,λ 3 =0.2,λ 4 =0.5。
10. The method for adaptive style migration of ancient architecture images according to claim 8 or 9, wherein the total loss function in step S53 is: l (L) ALL1 L C + 2 L S + 3 L I + 4 L G
CN202310216506.5A 2023-03-08 2023-03-08 Ancient architecture image self-adaptive style migration method based on visual encoder Pending CN116309022A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310216506.5A CN116309022A (en) 2023-03-08 2023-03-08 Ancient architecture image self-adaptive style migration method based on visual encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310216506.5A CN116309022A (en) 2023-03-08 2023-03-08 Ancient architecture image self-adaptive style migration method based on visual encoder

Publications (1)

Publication Number Publication Date
CN116309022A true CN116309022A (en) 2023-06-23

Family

ID=86795581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310216506.5A Pending CN116309022A (en) 2023-03-08 2023-03-08 Ancient architecture image self-adaptive style migration method based on visual encoder

Country Status (1)

Country Link
CN (1) CN116309022A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152752A (en) * 2023-10-30 2023-12-01 之江实验室 Visual depth feature reconstruction method and device with self-adaptive weight

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152752A (en) * 2023-10-30 2023-12-01 之江实验室 Visual depth feature reconstruction method and device with self-adaptive weight
CN117152752B (en) * 2023-10-30 2024-02-20 之江实验室 Visual depth feature reconstruction method and device with self-adaptive weight

Similar Documents

Publication Publication Date Title
CN111145116B (en) Sea surface rainy day image sample augmentation method based on generation of countermeasure network
CN112634292B (en) Asphalt pavement crack image segmentation method based on deep convolutional neural network
CN112183637B (en) Single-light-source scene illumination re-rendering method and system based on neural network
CN113850824B (en) Remote sensing image road network extraction method based on multi-scale feature fusion
CN110647991B (en) Three-dimensional human body posture estimation method based on unsupervised field self-adaption
CN110717953B (en) Coloring method and system for black-and-white pictures based on CNN-LSTM (computer-aided three-dimensional network-link) combination model
CN114092832A (en) High-resolution remote sensing image classification method based on parallel hybrid convolutional network
CN112819876B (en) Monocular vision depth estimation method based on deep learning
CN109685743A (en) Image mixed noise removing method based on noise learning neural network model
CN110349087B (en) RGB-D image high-quality grid generation method based on adaptive convolution
CN111127360B (en) Gray image transfer learning method based on automatic encoder
WO2020093724A1 (en) Method and device for generating information
CN114549574A (en) Interactive video matting system based on mask propagation network
CN116309022A (en) Ancient architecture image self-adaptive style migration method based on visual encoder
CN110930378A (en) Emphysema image processing method and system based on low data demand
CN112926485A (en) Few-sample sluice image classification method
CN111027508A (en) Remote sensing image coverage change detection method based on deep neural network
CN110580289B (en) Scientific and technological paper classification method based on stacking automatic encoder and citation network
CN115937374B (en) Digital human modeling method, device, equipment and medium
CN116433980A (en) Image classification method, device, equipment and medium of impulse neural network structure
CN115908600A (en) Massive image reconstruction method based on prior regularization
CN115330930A (en) Three-dimensional reconstruction method and system based on sparse to dense feature matching network
CN116823627A (en) Image complexity evaluation-based oversized image rapid denoising method
CN109840888B (en) Image super-resolution reconstruction method based on joint constraint
CN114359229A (en) Defect detection method based on DSC-UNET model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination