CN114757820A - Semantic-guided content feature transfer style migration method and system - Google Patents
Semantic-guided content feature transfer style migration method and system Download PDFInfo
- Publication number
- CN114757820A CN114757820A CN202210405069.7A CN202210405069A CN114757820A CN 114757820 A CN114757820 A CN 114757820A CN 202210405069 A CN202210405069 A CN 202210405069A CN 114757820 A CN114757820 A CN 114757820A
- Authority
- CN
- China
- Prior art keywords
- content
- feature vector
- feature
- style
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 78
- 238000013508 migration Methods 0.000 title claims abstract description 63
- 230000005012 migration Effects 0.000 title claims abstract description 63
- 238000012546 transfer Methods 0.000 title claims abstract description 22
- 238000005457 optimization Methods 0.000 claims abstract description 21
- 239000013598 vector Substances 0.000 claims description 150
- 230000006870 function Effects 0.000 claims description 78
- 238000012545 processing Methods 0.000 claims description 58
- 230000008569 process Effects 0.000 claims description 32
- 238000011176 pooling Methods 0.000 claims description 28
- 238000005070 sampling Methods 0.000 claims description 23
- 238000013527 convolutional neural network Methods 0.000 claims description 21
- 230000004913 activation Effects 0.000 claims description 19
- 239000010410 layer Substances 0.000 claims description 17
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000013528 artificial neural network Methods 0.000 claims description 13
- 230000005540 biological transmission Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 8
- 210000002569 neuron Anatomy 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 6
- 238000012935 Averaging Methods 0.000 claims description 3
- 239000002356 single layer Substances 0.000 claims description 3
- 238000004148 unit process Methods 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims description 2
- 230000037430 deletion Effects 0.000 claims description 2
- 238000012544 monitoring process Methods 0.000 abstract description 8
- 238000000605 extraction Methods 0.000 abstract description 6
- 238000013507 mapping Methods 0.000 abstract description 3
- 238000013135 deep learning Methods 0.000 abstract description 2
- 230000003993 interaction Effects 0.000 abstract description 2
- 108091006146 Channels Proteins 0.000 description 36
- 239000000284 extract Substances 0.000 description 7
- 230000009466 transformation Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000008447 perception Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000002346 layers by function Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a semantic-guided content feature transfer style migration method and a semantic-guided content feature transfer style migration system, and belongs to the field of deep learning style migration. In order to realize the style migration with complete and consistent content features, the invention provides a content calibration module which comprises a feature optimization unit and an attribute reasoning unit. The feature optimization unit keeps the multi-channel content features complete by utilizing the network deep extraction capability, the attribute reasoning unit ignores the position space, and reclassifies the content semantics by means of an attention grouping interaction mode to provide help for searching proper content expression. And the content attribute extracted from the original content feature is endowed to the deep content feature again, the content feature mapping deviation is calibrated, the content feature noise is reduced, and the content consistency is achieved. The invention is suitable for the fields of automatic driving, security monitoring and the like.
Description
Technical Field
The invention relates to the technical field of deep learning style migration, in particular to a semantic-guided content feature transfer style migration method and system.
Background
With the rapid development of the application fields of automatic driving and industrial and service robots, a style migration technology essential to the automatic driving and path planning perception system becomes one of the current research hotspots. In the aspect of hardware, most of automatic driving systems rely on equipment such as radars and infrared cameras to improve the sensing capability of the surrounding environment of the system during driving, but the cost is high, and small targets and high-speed moving targets are not accurately positioned and predicted; in the aspect of software, the performance of the existing style migration method is improved mostly by deepening and widening a network or improving a loss function, but content mapping deviation is easily generated in a training process, complete and consistent feature transfer of contents is difficult to realize, and driving safety of an automatic driving system is influenced. Existing style migration methods can be divided into two categories, namely convolutional neural network-based and generation-based countermeasure networks:
the style migration method based on the convolutional neural network generally converts a content image into a given style image by optimizing or training an image migration neural network by means of a classification neural network. Specifically, the invention discloses an image style migration method integrating depth learning and depth perception, and the invention patent with the publication number of CN107705242B discloses that an integrated depth perception network adds depth loss to an object loss function to estimate the depth of field of an original image and a generated style image. In the style migration process, the generated image not only fuses corresponding styles and contents, but also keeps the far and near structure information of the original image. The invention patent application with the publication number of CN13837926A constructs a feature space to store feature information of different filters, thereby better obtaining multi-scale and stable features, without training real data, and flexibly performing style transformation. The invention discloses a style migration method, a device and related components based on feature fusion, and in the invention patent application with the publication number of CN113808011A, features of content and style images are extracted through a pre-trained content and style encoder, and then the trained content and style decoder is used for fusing and outputting the content and style features to obtain a target style migration image. The style migration method based on the convolutional neural network focuses on extracting content and style characteristics in an image by deepening or widening a network through the VGG and other functional layers, and the generated style migration effect is crossed in the aspects of detail texture and color filling, so that the method cannot be well applied to style migration of real scenes such as automatic driving and mobile robots.
The style migration method based on the generation countermeasure network accelerates the progress of the style migration field, generally based on an encoding-decoding structure, utilizes an encoder to synchronously extract content characteristics and style characteristics, directly inputs the two characteristics into a decoder for decoding, and simultaneously designs a related loss function from the aspects of color, content, smoothness and the like to monitor the network to obtain stylized results. The invention discloses a cross-domain variation confrontation self-coding method, which is characterized in that an encoder is utilized to decouple content coding and style coding of a cross-domain input image, confrontation operation and variation operation are utilized to respectively fit the content and style coding of the image, and content and style coding of different domains are crossed to realize one-to-many transformation of the cross-domain image. The invention discloses an image multi-format conversion method based on latent variable feature generation, and the invention patent application with the publication number of CN11099225A designs a style code generator to fit the style code of an image on the basis of a multi-mode unsupervised image conversion network, introduces jump connection between content coding and style coding, introduces an attention mechanism in the style coding, and improves the quality and diversity of image multi-format conversion. The invention discloses a multi-path parallel image content feature optimization style migration method, and the invention patent application with the publication number of CN113284042A separates the image content features of a single feature channel and a plurality of feature channels by using a multi-path parallel mode, and can improve the separation and extraction capacity of small targets and fuzzy targets and the migration capacity of image detail texture information. However, most of the style migration methods perform reasoning according to the network performance in a closed environment and in combination with context information, and the training process is difficult to avoid the influence of various confusion factors such as different target attributes and the like, so that the actual output and the theoretical output of the network have deviation. Therefore, how to effectively utilize the depth features extracted from the images, ensure the consistency of the image contents before and after the style migration, and better apply the depth features to the traffic scene becomes a problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a semantic-guided content feature transfer style migration method and system, which divide noisy attributes generated by content features into a plurality of groups in a channel attention mode, respectively perform random information exchange between different groups and the same group, weaken noise, perform channel-by-channel fusion correction on the rest content features, endow correct content attribute labels for the content features in the transfer process, guide the generation of the content features, reduce the mapping deviation in the feature transfer process and effectively realize style migration with consistent image content.
In order to achieve the purpose, the technical scheme of the invention is as follows: a semantically guided content feature delivery style migration method comprises the following steps:
preparing a data set of a training style migration network;
obtaining a source domain input image with a characteristic channel of cAnd target domain input imageRespectively carrying out double down-sampling operation on the data, wherein the double down-sampling operation comprises convolution operation and nonlinear activation function processing;
downsampling results for target domain input imagesObtaining style feature vectors by using global average pooling and full-join function processing
Downsampling results for source domain input imagesObtaining four-dimensional characteristic vector by adopting multi-layer residual error unit processing
The four-dimensional feature vectorSequentially processing the four-dimensional feature vectors by global maximum pooling and full-link function processing, and processing the four-dimensional feature vectors by a deep convolutional neural network, information exchange and a point convolutional neural network to obtain four-dimensional feature vectorsDeepening four-dimensional feature vector by simultaneously using multilayer residual error unitObtaining four-dimensional feature vectors
The four-dimensional feature vector is processedAnd four-dimensional feature vectorsMultiplying to generate four-dimensional content feature vectorReallocating target attributes in the content features and correcting feature transfer deviation;
the style feature vector is combinedAnd four-dimensional content feature vectorsAdding and fusing to obtain four-dimensional feature vectorThen, the style migration result Y is output by up-samplingc×2h×2w。
Further, inputting images to the source domainAnd target domain input imagePerforming double down-sampling operation, specifically:
using convolution kernels Mc×3×3Extracting the source domain input imageThe content characteristics ofAnd the target domain input imageThe style characteristic ofThe formula is as follows:
feature vector to be outputAndusing nonlinear activation function processing, when the characteristic value of the activation processing is less than or equal to 0, the output value of the activation function is 0, as shown in formula (3); conversely, if the activation function output value is the same as the input value, as shown in equation (4):
wherein the function A (-) is an activation function.
Further, down-sampling result of the target domain input imageAnd (3) processing by using a global average pooling function and a full-connection function, specifically:
averaging the features of each unit by using global average pooling to obtain a feature vector of each unitThe formula is as follows:
wherein, Paverage(. is a global average pooling function, Mc×2×2Performing pixel-by-pixel operation on the convolution kernel characteristic of the filter k-2, selecting an average value and outputting the average value;
for feature vectorProcessing the feature channels one by using a full-connection function, and outputting feature vectorsThe formula is as follows:
wherein, Cfully(. for) a full connection function, using Mc×1×1I.e. the convolution kernel of filter k 1 operates;
for feature vectorPerforming style cosine normalization processing to obtain four-dimensional style feature vectorThe formula is as follows:
wherein cosIN(. cndot.) is a style cosine normalization process function, μ (x) and μ (y) are means in the length and width dimensions of the feature vector, respectively, and σ (x) and σ (y) are standard deviations in the length and width dimensions of the four-dimensional feature vector, respectively.
Further, down-sampling result of the source domain input imageAdopting multi-layer residual error unit processing, and the formula is as follows:
where F (-) is a single-layer residual unit process function, ω3Is a weight matrix.
Further, the four-dimensional feature vectorSequentially carrying out global maximum pooling, full-connection function processing, deep convolutional neural network, information exchange and point convolutional neural network processing to obtain four-dimensional feature vectorsThe method specifically comprises the following steps:
processing four-dimensional feature vectors using global maximum poolingAnd obtaining the feature vectorThe formula is as follows:
wherein, Pmax(. is a global maximum pooling function, Mc×2×2Carrying out pixel-by-pixel operation on the convolution kernel characteristic of the filter k-2, selecting the maximum value and outputting the maximum value;
processing feature vectors using full join functionsAnd obtaining the feature vectorThe formula is as follows:
wherein, Cfully(. for) a full connection function, using Mc×1×1I.e. the convolution kernel of filter k 1 operates;
feature vector using deep convolutional neural networkUniformly dividing the characteristic channels into p branches (p is less than or equal to c) to obtain characteristic components of each characteristic channelThe formula is as follows:
wherein, Fdeep(. h) is a deep convolutional neural network process function;
randomly exchanging features by q groups on each branch, disturbing the inherent sequence of information between different channels, and reclassifying and combining the feature information to obtain feature componentsThe formula is as follows:
wherein, Shuffle (-) is an information exchange function;
using point convolution neural networks to characterize componentsMerging to obtain four-dimensional characteristic vectorThe point convolution neural network randomly deletes part of neurons in the merging process, and the formula is as follows:
wherein D isranIs a random deletion function, and m is the proportion of randomly deleted neurons;
wherein, Fpoi(. to) a point convolution neural network process function, using Mc×1×1A formal point convolution performs a point convolution operation on the feature vector.
Further, the feature vectors are further processed using a multi-layer residual unitObtaining four-dimensional feature vectorsThe method comprises the following specific steps:
Further, the feature vector is usedAnd four-dimensional feature vectorsMultiplying to generate four-dimensional content feature vectorThe method specifically comprises the following steps:
wherein the content of the first and second substances,andfor the weight matrix, x represents the feature matrix multiplication.
The invention also provides a semantic-guided content feature transfer style migration system, which comprises an encoding module, a content calibration module and a decoding module;
the encoding module comprises a content encoding module and a style encoding module; the content encoding module inputs source domain input imagesAs input, double down sampling operation is performed on the four-dimensional feature vector to output the four-dimensional feature vectorThe style encoding module inputs the target domain into the imageAs input, sequentially using double down sampling, global average pooling, full-connected function and style cosine normalization processing to output four-dimensional feature vector
The content calibration module comprises a feature optimization unit and an attribute reasoning unit; the characteristic optimization unit is used for down-sampling the source domain input imageObtaining four-dimensional characteristic vector by adopting multi-layer residual error unit processingThe above-mentionedAttribute inference unit pairs four-dimensional feature vectorsSequentially processing the four-dimensional feature vectors by global maximum pooling and full-link function processing, and processing the four-dimensional feature vectors by a deep convolutional neural network, information exchange and a point convolutional neural network to obtain four-dimensional feature vectorsDeepening processing of feature vectors using multi-layer residual unit simultaneouslyObtaining four-dimensional feature vectorsCombining four-dimensional feature vectorsAnd four-dimensional feature vectorIn a fixed ratio omega1And omega2Multiplying and outputting four-dimensional feature vector
The decoding module is used for decoding the four-dimensional feature vectorAnd four-dimensional feature vectorAdding and fusing to obtain four-dimensional feature vectorThen, the style migration result Y is output by up-samplingc×2h×2w。
Further, the content calibration module is expressed as:
wherein, Fopt(x) Learning a process function for the feature optimization unit, Fre(x) A process function is learned for the attribute reasoning unit.
Due to the adoption of the technical scheme, the invention can obtain the following beneficial effects: the method can be applied to real scenes such as automatic driving, industrial and service robots, can realize style transformation of any weather and environment scene, and provides help for accurately identifying small targets and fuzzy targets. The following points are listed and introduced for the beneficial effects of the invention:
(1) adapted for small target feature cases
The attribute reasoning unit can separate the potential example target attributes in the image content characteristics, fully excavate depth characteristic information, and accurately and clearly identify and extract the content characteristic information containing the small target image under the unsupervised condition.
(2) Is suitable for the characteristic situation of the high-speed moving object
The invention uses the characteristic optimization unit and the attribute reasoning unit to respectively extract the content characteristics and the target attributes in the input image, corrects the characteristic transfer deviation in the characteristic optimization unit according to the extracted target attributes, effectively improves the fuzzy phenomenon generated by the high-speed movement of the target and realizes the extraction work of the high-speed moving target.
(3) Monitoring system suitable for public security
The invention can meet the requirements of multi-scale feature effective extraction and style transformation in all-weather any complex scenes (such as underground parking lots, fire fighting passageways and traffic scenes) shot by a security monitoring camera and a sky-eye system, provides favorable conditions for next detection and identification work, improves the working efficiency of a public system, and provides guarantee for improving the production and living efficiency and maintaining public safety.
(4) Adapted for autonomous driving technique
The invention is a computer vision environment perception technology, is suitable for the field of automatic driving, can extract target characteristics and positions of pedestrians, vehicles, buildings, traffic signs and the like around a driving environment, provides comprehensive characteristic information for a style migration model, and provides powerful guarantee for driving safety.
(5) Is suitable for the situation of unclear vision
The method is suitable for different complex scene style migration situations, the characteristics of the visual unclear target obtained by the camera lens with different exposure degrees and definition degrees under the infrared and visible light conditions are recovered, and the style migration is carried out after the definition of the image is improved.
Drawings
FIG. 1 is a flow diagram of a semantically guided content feature delivery style migration method;
FIG. 2 is a schematic diagram of a content calibration module;
FIG. 3 is a schematic diagram of a migration situation of a security monitoring style in embodiment 1;
FIG. 4 is a schematic view of the autonomous driving style transition in embodiment 2;
fig. 5 is a schematic diagram of the visual blur scene style transition in embodiment 3.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the application, i.e., the embodiments described are only a subset of, and not all embodiments of the application. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
The invention provides a semantic-guided content feature transfer style migration method and system, which firstly enhance the extraction capability of depth features through deepening a network, ensure the integrity of image content features in the style migration process and reduce feature redundancy; secondly, compressing the content characteristics into one-dimensional semantic information expression, carrying out information exchange in batches channel by channel, enhancing the association of different channel characteristics, distributing a correct content label for each content characteristic, improving the phenomenon that the target attribute is not matched with the content characteristic expression in the characteristic transmission process, and ensuring the transmission consistency of the content characteristics. As shown in fig. 1, the specific migration method includes the following steps:
step 1: preparing a data set of the training style migration network, wherein the size of the data set can be 2h multiplied by 2 w;
step 2: obtaining a source domain input image with a characteristic channel of cAnd target domain input imageRespectively carrying out double down-sampling operation on the data, wherein the double down-sampling operation comprises convolution operation and nonlinear activation function processing, and specifically comprises the following steps:
(1) convolution kernel M using step size s-2 and filter k-3c×3×3Extracting the source domain input imageThe content characteristics ofAnd the target domain input imageThe style characteristic ofThe formula is as follows:
(2) feature vector to be outputAndusing nonlinear activation function processing, when the characteristic value of the activation processing is less than or equal to 0, the output value of the activation function is 0, as shown in formula (3); conversely, when the output value of the activation function is the same as the input value, as shown in equation (4):
the function A (-) is an activation function, the effectiveness of the feature vector can be improved and the feature redundancy can be reduced by carrying out nonlinear processing on the feature vector by adopting the activation function, and help is provided for realizing the feature consistency style migration of the image content.
And 3, step 3: in order to reduce the influence of the feature position on the style classification, the feature vector extracted in the step 2 is usedOutput feature vectors using global average pooling and full-join function processingThe method comprises the following specific steps:
(1) averaging the features of each unit by using global average pooling to obtain a feature vector of each unitThe formula is as follows:
wherein, Paverage(. is a global average pooling function, Mc×2×2Performing pixel-by-pixel operation on the convolution kernel features with k being 2, selecting an average value and outputting the average value;
(2) for feature vectorThe full-connection function is used for processing the characteristic channels one by one, the influence of pixels and characteristic positions on characteristic classification is reduced, and characteristic vectors are outputThe formula is as follows:
wherein, Cfully(. for) a full connection function, using Mc×1×1I.e., the convolution kernel with k equal to 1, operates.
(3) In order to change the style feature data distribution, accurate style feature transmission is realized. The invention is to the feature vectorPerforming style cosine normalization processing, suppressing feature information irrelevant to style, and outputting four-dimensional feature vectorIn preparation for fusing with content features, the formula is:
wherein cosIN(. is a stylized cosine normalized process function, having a μ (x) and μ (y) scoreRespectively, mean values in the length and width dimensions of the feature vector, and σ (x) and σ (y) are standard deviations in the length and width dimensions of the four-dimensional feature vector, respectively.
And 4, step 4: the characteristic optimization unit inputs the down-sampling result of the source domain input imageAnd a multi-layer (preferably 8-layer) residual error unit is adopted to process the feature vectors output by downsampling, so that feature redundancy is reduced, and the image content is ensured to be complete in the style migration process. The formula is as follows:
where F (-) is a single-layer residual unit process function, ω3Is a weight matrix;
and 5, step 5: the attribute reasoning unit inputs four-dimensional feature vectorsThe four-dimensional feature vector is output after global maximum pooling, full-connection function processing, deep convolution neural network, information exchange and point convolution neural network processing in sequenceAnd (4) intervening in a backstepping mechanism in the network training process, and endowing the content characteristics with correct content attribute labels again. Meanwhile, four-dimensional feature vectors are deeply processed by using a multi-layer residual error unitOutputting four-dimensional feature vectorsConsistent delivery of spatial structure information in implementation features, in particularComprises the following steps:
(1) to extract hidden target attributes in content features, the present invention uses global max-pooling to process four-dimensional feature vectorsAnd outputEliminating the classification influence of the target attributes caused by the target position, and the formula is
Wherein, Pmax(. is a global maximum pooling function, Mc×2×2Carrying out pixel-by-pixel operation on the convolution kernel characteristic with k being 2, selecting the maximum value and outputting the maximum value;
(2) to enhance the learning ability of the network to hidden content attributes, a full-connection function process is usedEnhancing correlation between channels and outputting characteristic vectorThe formula is as follows:
wherein, Cfully(. for) a full connection function, using Mc×1×1I.e., the convolution kernel with k equal to 1;
(3) feature vector transformation using deep convolutional neural networkUniformly dividing the characteristic channel into p branches (p is less than or equal to c) to obtain the characteristic component of each characteristic channelThe formula is as follows:
wherein, Fdeep(. h) is a deep convolutional neural network process function;
(4) randomly exchanging features by q groups on each branch, disturbing the inherent order of information between different channels, reclassifying and combining the feature information, and outputting feature componentsThe formula is as follows:
wherein, Shuffle (-) is an information exchange function, channel characteristics on each branch are divided into q groups, and the sequence is randomly disturbed between each group and different groups, so as to seek a new content and attribute matching relationship.
(5) Merging the interacted feature vectors by using a point convolution neural network, and outputting a content inference unit resultThe recombination and fusion of the characteristics among different characteristic channels provides more possibility for the accurate transmission of the content characteristics. The point convolution neural network can randomly delete part of neurons in the processing process, and the formula is as follows:
wherein D isranThe operation can prevent the network from generating an overfitting phenomenon because a random deleting function is adopted and m is the proportion of randomly deleting neurons;
wherein, Fpoi(. to) a point convolution neural network process function, using Mc×1×1A formal point convolution performs a point convolution operation on the feature vector.
(6) Processing four-dimensional feature vectors using multi-layer residual unitsAnd outputs the four-dimensional feature vectorThe consistency of the spatial structure and the semantics in the transmission process of the feature vector is ensured, and a foundation is laid for the effective interaction of the feature components of different feature channels.
And 6, step 6: outputting the four-dimensional feature vector from the step 5And four-dimensional feature vectorMultiplying to generate four-dimensional feature vectorRealizing the redistribution of target attributes in the content characteristics and correcting the characteristic transmission deviation, wherein the formula is as follows:
wherein the content of the first and second substances,andfor the weight matrix, x represents the feature matrix multiplication.
And 7, step 7: the style characteristics output by the step 3And the content characteristics output in step 6Adding and fusing to obtain four-dimensional feature vectorUpsampling output style migration result Y in decoderc×2h×2w。
The present embodiment further provides a system for implementing the method, which includes: the device comprises an encoding module, a content calibration module and a decoding module; each of these sections is described in detail below:
the encoding module comprises a content encoding module and a style encoding module; the content encoding module inputs source domain input imagesAs input, double down sampling operation is performed on the four-dimensional feature vector to output the four-dimensional feature vectorThe style encoding module inputs the target domain into the imageAs input, the four-dimensional feature vector is output by sequentially using global average pooling, full-connected function and style cosine normalization processingThe interference of the spatial structure on the color semantic information is reduced.
The content calibration module comprises a feature optimization unit and an attribute reasoning unit; as shown in fig. 2, the four-dimensional feature vector of the multi-feature channel is input into the feature optimization unit, and more content structure and detail texture information are learned while the image content features of the multi-feature channel are completely transferred; the attribute reasoning unit extracts hidden attributes from the four-dimensional feature vectors output by the residual error unit, and gives correct feature expression to each content feature again through multi-channel and batch training, so that the accurate correspondence of one-dimensional semantics and two-dimensional pixels is realized. The outputs of different units are multiplied in a channel attention mode, the content feature transfer deviation is corrected, the accurate classification of a content structure is realized, and the image content feature transfer consistency is implemented; the feature optimization unit and the attribute reasoning unit in the content calibration module are explained in detail as follows: the input of the attribute reasoning unit is a four-dimensional feature vector which comprises n feature channels and has the size of h multiplied by w after being processed by a multilayer residual error unit in the feature optimization unitWherein the learning process function of the feature optimization unit is Fopt(x) (ii) a The attribute reasoning unit learning process function is Fre(x) (ii) a The four-dimensional feature vector which comprises n feature channels and has the size of h multiplied by w is input to the next stage after the two unit output features are fusedThe content calibration module is expressed as:
the feature optimization unit extracts depth features having n feature channels using a multi-layer residual unitAnd feature redundancy is reduced, and texture details and contour feature information of the input image in the source domain are kept. And takes it as input to the attribute reasoning unit and the next stage.
Attribute inference unit using global average pooling pairsPerforming dimension reduction and regularization processing to output four-dimensional feature vectorsOn the basis, full-connection layer processing is used, the influence of spatial information such as target positions on content attribute classification is weakened, so that the network is concentrated on the association among different channel features, semantic feature expression is enhanced, and the extraction capability of depth features is enhanced. In order to further enhance the relevance among different characteristic channels, the hidden attribute in the source domain input image is fully extracted, and the deep convolution with the convolution kernel of 3 multiplied by 3 is utilized to carry out the methodThe method is divided into p branches, content characteristic information on each characteristic channel is respectively extracted, different branches learn and supervise mutually, and cross-characteristic channel reference is achieved. Meanwhile, the characteristic channels are divided into q groups in each branch, the channel sequence is disturbed in each group and different groups, the randomness of hidden content attributes is increased, the network generalization capability is improved, the classification attributes of the target are redistributed for each content characteristic, and the characteristic transfer deviation is reduced. In order to obtain accurate content attributes, the characteristics of each branch are filtered to obtainCompleting the integration work of p branch characteristic information by using 1 x 1 point convolution and outputting four-dimensional characteristic vectorsEach content feature is endowed with enhanced feature expression to guide the accurate generation of the content features. While processing the feature vectors in a single eigen-channel, use multi-layer residual units in parallel branches to face at multiple eigen-channel levelsProcessing and outputting four-dimensional feature vectorEnsuring the transmission integrity of the feature vector; will be provided withAndin a fixed ratio omega1And ω2Multiplying and outputting four-dimensional feature vectorThe hidden attributes and the inherent content characteristics are subjected to countermeasure screening, the problem of wrong distribution of the content attributes is solved, the transmission deviation of the content characteristics is reduced, and help is provided for realizing consistent and accurate style migration of the content.
The decoding module performs source domain and target domain feature vector operations: will be provided withAndadding and fusing to obtain four-dimensional feature vectorThen, the style migration result Y is output by up-samplingc×2h×2w。
The feature parameter constraint condition in this embodiment may be:
(1) the RGB three-channel image with a size of 256 × 256 is down-sampled, and the input image size is reduced to 128 × 128, the feature channel n ∈ {4,8,16,64,256,512}, and any one of {1,128, 4}, {1,128, 8}, {1,128, 16}, {1,128, 64}, {1,128,128,256}, and {1,128,128,512} can be output as a four-dimensional feature vector containing features of image contents.
(2) The content calibration module selects four-dimensional feature vectors of different feature channels as input according to different input image contents: when the input image contains a small target or a fuzzy target, selecting a four-dimensional feature vector with a feature channel of n-256 as the input of a content calibration module; when no small target or fuzzy target exists in the input image, a four-dimensional feature vector with a feature channel of n being 8 is selected as the input of the content calibration module.
(3) And the feature optimization unit transmits four-dimensional feature vectors with feature channels n epsilon {4,8,16,64,256 and 512 }.
(4) And the attribute reasoning unit transmits a four-dimensional feature vector with a feature channel n being 1.
Structural unit constraint conditions:
(1) and the feature optimization unit extracts the depth content features by using a 4-layer residual error unit.
(2) The attribute reasoning unit comprises p branches, and p is equal to {0,1,2,3,4 }. When p is 0, the content calibration module only contains the feature optimization unit.
(3) Each branch in the attribute reasoning unit contains q groups, wherein q is { q |10 ≦ q ≦ 512, q belongs to Z+}。
(4) The attribute reasoning unit selects different grouping numbers according to different input image content complexity: when the input image contains a small target or a fuzzy target, selecting q ═ { q |128 ≦ q ≦ 512, and q ∈ Z+The number of packets; when no small target or fuzzy target exists in the input image, selecting q ═ { q |10 ≦ q ≦ 128, q ∈ Z+The number of packets.
Example 1: security monitoring style migration situation
The embodiment is used for monitoring unmanned prevention and places with multiple accidents, such as schools, crossroads and the like. The method is used for outdoor safety monitoring, and the identification capability of the target under complex illumination can be effectively improved. The security monitoring image style migration situation is shown in fig. 3.
Example 2: autonomous driving style migration scenario
The present example is directed to autonomous driving system style migration. The invention is applied to the vehicle-mounted camera, senses the surrounding environment of the vehicle, provides an auxiliary means for a driver, reduces the traffic accident rate, improves the safe driving capability of the vehicle, and has the autonomous driving style migration situation as shown in figure 4.
Example 3: visual-blurred scene style migration scenarios
The method can improve the image quality of the style migration of the visual fuzzy scene caused by the conditions of uneven illumination or natural weather, and the like, and prepares for next target detection or image segmentation, wherein the style migration condition of the visual fuzzy scene is shown in fig. 5.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. All such possible equivalents and modifications are deemed to fall within the scope of the invention as defined in the claims.
Claims (9)
1. A semantic-guided content feature transfer style migration method is characterized by comprising the following steps:
preparing a data set of a training style migration network;
acquiring source domain input image with characteristic channel cAnd target domain input imageRespectively carrying out double down-sampling operation on the double down-sampling operation, wherein the double down-sampling operation comprises convolution operation and nonlinear activation function processing;
downsampling results for target domain input imagesObtaining style feature vectors by using global average pooling and full-connection function processing
Downsampling results for source domain input imagesObtaining four-dimensional characteristic vector by adopting multi-layer residual error unit processing
The four-dimensional feature vectorSequentially carrying out global maximum pooling, full-connection function processing, deep convolutional neural network, information exchange and point convolutional neural network processing to obtain four-dimensional feature vectorsDeepening four-dimensional feature vector by simultaneously using multilayer residual error unitObtaining four-dimensional feature vectors
The four-dimensional feature vector is processedAnd four-dimensional feature vectorsMultiplying to generate four-dimensional content feature vector Y1 c ×h×wReallocating the target attribute in the content characteristics and correcting the characteristic transmission deviation;
2. The method of claim 1, wherein the source domain input image is input into a semantic-guided content feature transfer style migration methodAnd target domain input imagePerforming double down-sampling operation, specifically:
using convolution kernel Mc×3×3Extracting the source domain input imageThe content characteristics ofAnd the target domain input imageThe style characteristic ofThe formula is as follows:
feature vector to be outputAndusing nonlinear activation function processing, and when the characteristic value of the activation processing is less than or equal to 0, setting the output value of the activation function to be 0, as shown in formula (3); conversely, when the output value of the activation function is the same as the input value, as shown in equation (4):
wherein the function A (-) is an activation function.
3. The method of claim 1, wherein the downsampling of the target domain input image results in the semantic-guided content feature delivery style migration methodProcessing by using a global average pooling and full join function, specifically:
averaging the features of each unit by using global average pooling to obtain a feature vector of each unitThe formula is as follows:
wherein, Paverage(. is a global average pooling function, Mc×2×2Performing pixel-by-pixel operation on the convolution kernel characteristic of the filter k-2, selecting an average value and outputting the average value;
for feature vectorProcessing the characteristic channels one by using a full-connection function, and outputting a characteristic vectorThe formula is as follows:
wherein, Cfully(. cndot.) is a full connection function, using Mc×1×1I.e. the convolution kernel of filter k ═ 1 operates;
for feature vectorPerforming style cosine normalization processing to obtain four-dimensional style feature vectorThe formula is as follows:
wherein cosIN(. cndot.) is a style cosine normalization process function, μ (x) and μ (y) are means in the length and width dimensions of the feature vector, respectively, and σ (x) and σ (y) are standard deviations in the length and width dimensions of the four-dimensional feature vector, respectively.
4. The method of claim 1, wherein the downsampling of the source domain input image results in a semantic-guided content feature delivery style migration methodAdopting multi-layer residual error unit processing, and the formula is as follows:
where F (-) is a single-layer residual unit process function, ω3Is a weight matrix.
5. The method of claim 1, wherein the four-dimensional feature vector is used for transferring the content feature transfer styleSequentially carrying out global maximum pooling, full-connection function processing, deep convolutional neural network, information exchange and point convolutional neural network processing to obtain four-dimensional feature vectorsThe method specifically comprises the following steps:
processing four-dimensional feature vectors using global maximum poolingAnd obtaining the feature vectorThe formula is as follows:
wherein, Pmax(. is a global maximum pooling function, Mc×2×2Carrying out pixel-by-pixel operation on the convolution kernel characteristic of the filter k-2, selecting the maximum value and outputting the maximum value;
processing feature vectors using full join functionsAnd obtaining a feature vectorThe formula is as follows:
wherein, Cfully(. for) a full connection function, using Mc×1×1I.e. the convolution kernel of filter k ═ 1 operates;
feature vector using deep convolutional neural networkUniformly dividing the characteristic channel into p branches (p is less than or equal to c) to obtain the characteristic component of each characteristic channelThe formula is as follows:
wherein, Fdeep() is a deep convolutional neural network process function;
randomly exchanging features by dividing each branch into q groups, and disturbing information between different channelsThe characteristic information is reclassified and combined in sequence to obtain characteristic componentsThe formula is as follows:
wherein, Shuffle (-) is an information exchange function;
using point convolution neural network to feature componentsMerging to obtain four-dimensional characteristic vectorThe point convolution neural network randomly deletes part of neurons in the merging process, and the formula is as follows:
wherein D isranIs a random deletion function, and m is the proportion of randomly deleted neurons;
wherein, Fpoi(. to) a point convolution neural network process function, using Mc×1×1A formal point convolution performs a point convolution operation on the feature vector.
7. The method of claim 1, wherein the feature vector is used to transfer the content feature transfer styleAnd four-dimensional feature vectorsMultiplying to generate four-dimensional content feature vector Y1 c×h×wThe method specifically comprises the following steps:
8. A semantic-guided content feature transfer style migration system is characterized by comprising an encoding module, a content calibration module and a decoding module;
the encoding module comprises a content encoding module and a style encoding module; the content encoding module inputs source domain input imagesAs input, double down sampling operation is performed on the four-dimensional feature vector to output the four-dimensional feature vectorThe style coding module inputs the target domain into the imageAs input, the four-dimensional feature vector is output by sequentially using global average pooling, full-connected function and style cosine normalization processing
The content calibration module comprises a feature optimization unit and an attribute reasoning unit; the characteristic optimization unit is used for down-sampling the source domain input imageObtaining four-dimensional characteristic vector by adopting multi-layer residual error unit processingThe attribute reasoning unit pairs four-dimensional feature vectorsSequentially carrying out global maximum pooling, full-connection function processing, deep convolutional neural network, information exchange and point convolutional neural network processing to obtain four-dimensional feature vectorsSimultaneous use of multilayer residuesFeature vector deepening processing of difference unitObtaining four-dimensional feature vectorsCombining four-dimensional feature vectorsAnd four-dimensional feature vectorIn a fixed ratio omega1And omega2Multiplying and outputting four-dimensional feature vector Y1 c×h×w;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210405069.7A CN114757820A (en) | 2022-04-18 | 2022-04-18 | Semantic-guided content feature transfer style migration method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210405069.7A CN114757820A (en) | 2022-04-18 | 2022-04-18 | Semantic-guided content feature transfer style migration method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114757820A true CN114757820A (en) | 2022-07-15 |
Family
ID=82330680
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210405069.7A Pending CN114757820A (en) | 2022-04-18 | 2022-04-18 | Semantic-guided content feature transfer style migration method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114757820A (en) |
-
2022
- 2022-04-18 CN CN202210405069.7A patent/CN114757820A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lu et al. | Monocular semantic occupancy grid mapping with convolutional variational encoder–decoder networks | |
Yu et al. | Underwater-GAN: Underwater image restoration via conditional generative adversarial network | |
Mehra et al. | ReViewNet: A fast and resource optimized network for enabling safe autonomous driving in hazy weather conditions | |
Rateke et al. | Road surface detection and differentiation considering surface damages | |
CN110689008A (en) | Monocular image-oriented three-dimensional object detection method based on three-dimensional reconstruction | |
CN111542816A (en) | Domain adaptive learning system | |
CN109509156B (en) | Image defogging processing method based on generation countermeasure model | |
EP3690744B1 (en) | Method for integrating driving images acquired from vehicles performing cooperative driving and driving image integrating device using same | |
CN112991350A (en) | RGB-T image semantic segmentation method based on modal difference reduction | |
CN112651423A (en) | Intelligent vision system | |
Maslov et al. | Online supervised attention-based recurrent depth estimation from monocular video | |
CN112581409A (en) | Image defogging method based on end-to-end multiple information distillation network | |
CN114972748B (en) | Infrared semantic segmentation method capable of explaining edge attention and gray scale quantization network | |
Zhou et al. | Graph attention guidance network with knowledge distillation for semantic segmentation of remote sensing images | |
Manssor et al. | Real-time human detection in thermal infrared imaging at night using enhanced Tiny-yolov3 network | |
CN116311254A (en) | Image target detection method, system and equipment under severe weather condition | |
Xu et al. | Dual-space graph-based interaction network for RGB-thermal semantic segmentation in electric power scene | |
Yang et al. | [Retracted] A Method of Image Semantic Segmentation Based on PSPNet | |
CN116912485A (en) | Scene semantic segmentation method based on feature fusion of thermal image and visible light image | |
Malav et al. | DHSGAN: An end to end dehazing network for fog and smoke | |
CN112149526A (en) | Lane line detection method and system based on long-distance information fusion | |
CN116740516A (en) | Target detection method and system based on multi-scale fusion feature extraction | |
Liu et al. | CMLocate: A cross‐modal automatic visual geo‐localization framework for a natural environment without GNSS information | |
CN114757819A (en) | Structure-guided style deviation correction type style migration method and system | |
Ogura et al. | Improving the visibility of nighttime images for pedestrian recognition using in‐vehicle camera |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |