CN114757820A

CN114757820A - Semantic-guided content feature transfer style migration method and system

Info

Publication number: CN114757820A
Application number: CN202210405069.7A
Authority: CN
Inventors: 杨大伟; 王萌; 毛琳
Original assignee: Dalian Minzu University
Current assignee: Dalian Minzu University
Priority date: 2022-04-18
Filing date: 2022-04-18
Publication date: 2022-07-15

Abstract

The invention discloses a semantic-guided content feature transfer style migration method and a semantic-guided content feature transfer style migration system, and belongs to the field of deep learning style migration. In order to realize the style migration with complete and consistent content features, the invention provides a content calibration module which comprises a feature optimization unit and an attribute reasoning unit. The feature optimization unit keeps the multi-channel content features complete by utilizing the network deep extraction capability, the attribute reasoning unit ignores the position space, and reclassifies the content semantics by means of an attention grouping interaction mode to provide help for searching proper content expression. And the content attribute extracted from the original content feature is endowed to the deep content feature again, the content feature mapping deviation is calibrated, the content feature noise is reduced, and the content consistency is achieved. The invention is suitable for the fields of automatic driving, security monitoring and the like.

Description

Semantic-guided content feature transfer style migration method and system

Technical Field

The invention relates to the technical field of deep learning style migration, in particular to a semantic-guided content feature transfer style migration method and system.

Background

With the rapid development of the application fields of automatic driving and industrial and service robots, a style migration technology essential to the automatic driving and path planning perception system becomes one of the current research hotspots. In the aspect of hardware, most of automatic driving systems rely on equipment such as radars and infrared cameras to improve the sensing capability of the surrounding environment of the system during driving, but the cost is high, and small targets and high-speed moving targets are not accurately positioned and predicted; in the aspect of software, the performance of the existing style migration method is improved mostly by deepening and widening a network or improving a loss function, but content mapping deviation is easily generated in a training process, complete and consistent feature transfer of contents is difficult to realize, and driving safety of an automatic driving system is influenced. Existing style migration methods can be divided into two categories, namely convolutional neural network-based and generation-based countermeasure networks:

the style migration method based on the convolutional neural network generally converts a content image into a given style image by optimizing or training an image migration neural network by means of a classification neural network. Specifically, the invention discloses an image style migration method integrating depth learning and depth perception, and the invention patent with the publication number of CN107705242B discloses that an integrated depth perception network adds depth loss to an object loss function to estimate the depth of field of an original image and a generated style image. In the style migration process, the generated image not only fuses corresponding styles and contents, but also keeps the far and near structure information of the original image. The invention patent application with the publication number of CN13837926A constructs a feature space to store feature information of different filters, thereby better obtaining multi-scale and stable features, without training real data, and flexibly performing style transformation. The invention discloses a style migration method, a device and related components based on feature fusion, and in the invention patent application with the publication number of CN113808011A, features of content and style images are extracted through a pre-trained content and style encoder, and then the trained content and style decoder is used for fusing and outputting the content and style features to obtain a target style migration image. The style migration method based on the convolutional neural network focuses on extracting content and style characteristics in an image by deepening or widening a network through the VGG and other functional layers, and the generated style migration effect is crossed in the aspects of detail texture and color filling, so that the method cannot be well applied to style migration of real scenes such as automatic driving and mobile robots.

The style migration method based on the generation countermeasure network accelerates the progress of the style migration field, generally based on an encoding-decoding structure, utilizes an encoder to synchronously extract content characteristics and style characteristics, directly inputs the two characteristics into a decoder for decoding, and simultaneously designs a related loss function from the aspects of color, content, smoothness and the like to monitor the network to obtain stylized results. The invention discloses a cross-domain variation confrontation self-coding method, which is characterized in that an encoder is utilized to decouple content coding and style coding of a cross-domain input image, confrontation operation and variation operation are utilized to respectively fit the content and style coding of the image, and content and style coding of different domains are crossed to realize one-to-many transformation of the cross-domain image. The invention discloses an image multi-format conversion method based on latent variable feature generation, and the invention patent application with the publication number of CN11099225A designs a style code generator to fit the style code of an image on the basis of a multi-mode unsupervised image conversion network, introduces jump connection between content coding and style coding, introduces an attention mechanism in the style coding, and improves the quality and diversity of image multi-format conversion. The invention discloses a multi-path parallel image content feature optimization style migration method, and the invention patent application with the publication number of CN113284042A separates the image content features of a single feature channel and a plurality of feature channels by using a multi-path parallel mode, and can improve the separation and extraction capacity of small targets and fuzzy targets and the migration capacity of image detail texture information. However, most of the style migration methods perform reasoning according to the network performance in a closed environment and in combination with context information, and the training process is difficult to avoid the influence of various confusion factors such as different target attributes and the like, so that the actual output and the theoretical output of the network have deviation. Therefore, how to effectively utilize the depth features extracted from the images, ensure the consistency of the image contents before and after the style migration, and better apply the depth features to the traffic scene becomes a problem to be solved urgently.

Disclosure of Invention

The invention aims to provide a semantic-guided content feature transfer style migration method and system, which divide noisy attributes generated by content features into a plurality of groups in a channel attention mode, respectively perform random information exchange between different groups and the same group, weaken noise, perform channel-by-channel fusion correction on the rest content features, endow correct content attribute labels for the content features in the transfer process, guide the generation of the content features, reduce the mapping deviation in the feature transfer process and effectively realize style migration with consistent image content.

In order to achieve the purpose, the technical scheme of the invention is as follows: a semantically guided content feature delivery style migration method comprises the following steps:

preparing a data set of a training style migration network;

obtaining a source domain input image with a characteristic channel of c

And target domain input image

Respectively carrying out double down-sampling operation on the data, wherein the double down-sampling operation comprises convolution operation and nonlinear activation function processing;

downsampling results for target domain input images

Obtaining style feature vectors by using global average pooling and full-join function processing

Downsampling results for source domain input images

Obtaining four-dimensional characteristic vector by adopting multi-layer residual error unit processing

The four-dimensional feature vector

Sequentially processing the four-dimensional feature vectors by global maximum pooling and full-link function processing, and processing the four-dimensional feature vectors by a deep convolutional neural network, information exchange and a point convolutional neural network to obtain four-dimensional feature vectors

Deepening four-dimensional feature vector by simultaneously using multilayer residual error unit

Obtaining four-dimensional feature vectors

The four-dimensional feature vector is processed

And four-dimensional feature vectors

Multiplying to generate four-dimensional content feature vector

Reallocating target attributes in the content features and correcting feature transfer deviation;

the style feature vector is combined

And four-dimensional content feature vectors

Adding and fusing to obtain four-dimensional feature vector

Then, the style migration result Y is output by up-sampling^c×2h×2w。

Further, inputting images to the source domain

And target domain input image

Performing double down-sampling operation, specifically:

using convolution kernels M^c×3×3Extracting the source domain input image

The content characteristics of

And the target domain input image

The style characteristic of

The formula is as follows:

wherein

For the convolution process, each matrix represents a 3 × 3 feature block;

feature vector to be output

And

using nonlinear activation function processing, when the characteristic value of the activation processing is less than or equal to 0, the output value of the activation function is 0, as shown in formula (3); conversely, if the activation function output value is the same as the input value, as shown in equation (4):

wherein the function A (-) is an activation function.

Further, down-sampling result of the target domain input image

And (3) processing by using a global average pooling function and a full-connection function, specifically:

averaging the features of each unit by using global average pooling to obtain a feature vector of each unit

The formula is as follows:

wherein, P_average(. is a global average pooling function, M^c×2×2Performing pixel-by-pixel operation on the convolution kernel characteristic of the filter k-2, selecting an average value and outputting the average value;

for feature vector

Processing the feature channels one by using a full-connection function, and outputting feature vectors

The formula is as follows:

wherein, C_fully(. for) a full connection function, using M^c×1×1I.e. the convolution kernel of filter k 1 operates;

for feature vector

Performing style cosine normalization processing to obtain four-dimensional style feature vector

The formula is as follows:

wherein cos_IN(. cndot.) is a style cosine normalization process function, μ (x) and μ (y) are means in the length and width dimensions of the feature vector, respectively, and σ (x) and σ (y) are standard deviations in the length and width dimensions of the four-dimensional feature vector, respectively.

Further, down-sampling result of the source domain input image

Adopting multi-layer residual error unit processing, and the formula is as follows:

where F (-) is a single-layer residual unit process function, ω₃Is a weight matrix.

Further, the four-dimensional feature vector

Sequentially carrying out global maximum pooling, full-connection function processing, deep convolutional neural network, information exchange and point convolutional neural network processing to obtain four-dimensional feature vectors

The method specifically comprises the following steps:

processing four-dimensional feature vectors using global maximum pooling

And obtaining the feature vector

The formula is as follows:

wherein, P_max(. is a global maximum pooling function, M^c×2×2Carrying out pixel-by-pixel operation on the convolution kernel characteristic of the filter k-2, selecting the maximum value and outputting the maximum value;

processing feature vectors using full join functions

And obtaining the feature vector

The formula is as follows:

feature vector using deep convolutional neural network

Uniformly dividing the characteristic channels into p branches (p is less than or equal to c) to obtain characteristic components of each characteristic channel

The formula is as follows:

wherein, F_deep(. h) is a deep convolutional neural network process function;

randomly exchanging features by q groups on each branch, disturbing the inherent sequence of information between different channels, and reclassifying and combining the feature information to obtain feature components

The formula is as follows:

wherein, Shuffle (-) is an information exchange function;

using point convolution neural networks to characterize components

Merging to obtain four-dimensional characteristic vector

The point convolution neural network randomly deletes part of neurons in the merging process, and the formula is as follows:

wherein D is_ranIs a random deletion function, and m is the proportion of randomly deleted neurons;

wherein, F_poi(. to) a point convolution neural network process function, using M^c×1×1A formal point convolution performs a point convolution operation on the feature vector.

Further, the feature vectors are further processed using a multi-layer residual unit

Obtaining four-dimensional feature vectors

The method comprises the following specific steps:

wherein the content of the first and second substances,

is a weight matrix.

Further, the feature vector is used

And four-dimensional feature vectors

Multiplying to generate four-dimensional content feature vector

The method specifically comprises the following steps:

wherein the content of the first and second substances,

and

for the weight matrix, x represents the feature matrix multiplication.

The invention also provides a semantic-guided content feature transfer style migration system, which comprises an encoding module, a content calibration module and a decoding module;

the encoding module comprises a content encoding module and a style encoding module; the content encoding module inputs source domain input images

As input, double down sampling operation is performed on the four-dimensional feature vector to output the four-dimensional feature vector

The style encoding module inputs the target domain into the image

As input, sequentially using double down sampling, global average pooling, full-connected function and style cosine normalization processing to output four-dimensional feature vector

The content calibration module comprises a feature optimization unit and an attribute reasoning unit; the characteristic optimization unit is used for down-sampling the source domain input image

The above-mentionedAttribute inference unit pairs four-dimensional feature vectors

Deepening processing of feature vectors using multi-layer residual unit simultaneously

Obtaining four-dimensional feature vectors

Combining four-dimensional feature vectors

And four-dimensional feature vector

In a fixed ratio omega₁And omega₂Multiplying and outputting four-dimensional feature vector

The decoding module is used for decoding the four-dimensional feature vector

And four-dimensional feature vector

Adding and fusing to obtain four-dimensional feature vector

Then, the style migration result Y is output by up-sampling^c×2h×2w。

Further, the content calibration module is expressed as:

wherein, F_opt(x) Learning a process function for the feature optimization unit, F_re(x) A process function is learned for the attribute reasoning unit.

Due to the adoption of the technical scheme, the invention can obtain the following beneficial effects: the method can be applied to real scenes such as automatic driving, industrial and service robots, can realize style transformation of any weather and environment scene, and provides help for accurately identifying small targets and fuzzy targets. The following points are listed and introduced for the beneficial effects of the invention:

(1) adapted for small target feature cases

The attribute reasoning unit can separate the potential example target attributes in the image content characteristics, fully excavate depth characteristic information, and accurately and clearly identify and extract the content characteristic information containing the small target image under the unsupervised condition.

(2) Is suitable for the characteristic situation of the high-speed moving object

The invention uses the characteristic optimization unit and the attribute reasoning unit to respectively extract the content characteristics and the target attributes in the input image, corrects the characteristic transfer deviation in the characteristic optimization unit according to the extracted target attributes, effectively improves the fuzzy phenomenon generated by the high-speed movement of the target and realizes the extraction work of the high-speed moving target.

(3) Monitoring system suitable for public security

The invention can meet the requirements of multi-scale feature effective extraction and style transformation in all-weather any complex scenes (such as underground parking lots, fire fighting passageways and traffic scenes) shot by a security monitoring camera and a sky-eye system, provides favorable conditions for next detection and identification work, improves the working efficiency of a public system, and provides guarantee for improving the production and living efficiency and maintaining public safety.

(4) Adapted for autonomous driving technique

The invention is a computer vision environment perception technology, is suitable for the field of automatic driving, can extract target characteristics and positions of pedestrians, vehicles, buildings, traffic signs and the like around a driving environment, provides comprehensive characteristic information for a style migration model, and provides powerful guarantee for driving safety.

(5) Is suitable for the situation of unclear vision

The method is suitable for different complex scene style migration situations, the characteristics of the visual unclear target obtained by the camera lens with different exposure degrees and definition degrees under the infrared and visible light conditions are recovered, and the style migration is carried out after the definition of the image is improved.

Drawings

FIG. 1 is a flow diagram of a semantically guided content feature delivery style migration method;

FIG. 2 is a schematic diagram of a content calibration module;

FIG. 3 is a schematic diagram of a migration situation of a security monitoring style in embodiment 1;

FIG. 4 is a schematic view of the autonomous driving style transition in embodiment 2;

fig. 5 is a schematic diagram of the visual blur scene style transition in embodiment 3.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the application, i.e., the embodiments described are only a subset of, and not all embodiments of the application. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

The invention provides a semantic-guided content feature transfer style migration method and system, which firstly enhance the extraction capability of depth features through deepening a network, ensure the integrity of image content features in the style migration process and reduce feature redundancy; secondly, compressing the content characteristics into one-dimensional semantic information expression, carrying out information exchange in batches channel by channel, enhancing the association of different channel characteristics, distributing a correct content label for each content characteristic, improving the phenomenon that the target attribute is not matched with the content characteristic expression in the characteristic transmission process, and ensuring the transmission consistency of the content characteristics. As shown in fig. 1, the specific migration method includes the following steps:

step 1: preparing a data set of the training style migration network, wherein the size of the data set can be 2h multiplied by 2 w;

step 2: obtaining a source domain input image with a characteristic channel of c

And target domain input image

Respectively carrying out double down-sampling operation on the data, wherein the double down-sampling operation comprises convolution operation and nonlinear activation function processing, and specifically comprises the following steps:

(1) convolution kernel M using step size s-2 and filter k-3^c×3×3Extracting the source domain input image

The content characteristics of

And the target domain input image

The style characteristic of

The formula is as follows:

wherein

For the convolution process, each matrix represents a 3 × 3 feature block;

(2) feature vector to be output

And

using nonlinear activation function processing, when the characteristic value of the activation processing is less than or equal to 0, the output value of the activation function is 0, as shown in formula (3); conversely, when the output value of the activation function is the same as the input value, as shown in equation (4):

the function A (-) is an activation function, the effectiveness of the feature vector can be improved and the feature redundancy can be reduced by carrying out nonlinear processing on the feature vector by adopting the activation function, and help is provided for realizing the feature consistency style migration of the image content.

And 3, step 3: in order to reduce the influence of the feature position on the style classification, the feature vector extracted in the step 2 is used

Output feature vectors using global average pooling and full-join function processing

The method comprises the following specific steps:

(1) averaging the features of each unit by using global average pooling to obtain a feature vector of each unit

The formula is as follows:

wherein, P_average(. is a global average pooling function, M^c×2×2Performing pixel-by-pixel operation on the convolution kernel features with k being 2, selecting an average value and outputting the average value;

(2) for feature vector

The full-connection function is used for processing the characteristic channels one by one, the influence of pixels and characteristic positions on characteristic classification is reduced, and characteristic vectors are output

The formula is as follows:

wherein, C_fully(. for) a full connection function, using M^c×1×1I.e., the convolution kernel with k equal to 1, operates.

(3) In order to change the style feature data distribution, accurate style feature transmission is realized. The invention is to the feature vector

Performing style cosine normalization processing, suppressing feature information irrelevant to style, and outputting four-dimensional feature vector

In preparation for fusing with content features, the formula is:

wherein cos_IN(. is a stylized cosine normalized process function, having a μ (x) and μ (y) scoreRespectively, mean values in the length and width dimensions of the feature vector, and σ (x) and σ (y) are standard deviations in the length and width dimensions of the four-dimensional feature vector, respectively.

And 4, step 4: the characteristic optimization unit inputs the down-sampling result of the source domain input image

And a multi-layer (preferably 8-layer) residual error unit is adopted to process the feature vectors output by downsampling, so that feature redundancy is reduced, and the image content is ensured to be complete in the style migration process. The formula is as follows:

where F (-) is a single-layer residual unit process function, ω₃Is a weight matrix;

and 5, step 5: the attribute reasoning unit inputs four-dimensional feature vectors

The four-dimensional feature vector is output after global maximum pooling, full-connection function processing, deep convolution neural network, information exchange and point convolution neural network processing in sequence

And (4) intervening in a backstepping mechanism in the network training process, and endowing the content characteristics with correct content attribute labels again. Meanwhile, four-dimensional feature vectors are deeply processed by using a multi-layer residual error unit

Outputting four-dimensional feature vectors

Consistent delivery of spatial structure information in implementation features, in particularComprises the following steps:

(1) to extract hidden target attributes in content features, the present invention uses global max-pooling to process four-dimensional feature vectors

And output

Eliminating the classification influence of the target attributes caused by the target position, and the formula is

Wherein, P_max(. is a global maximum pooling function, M^c×2×2Carrying out pixel-by-pixel operation on the convolution kernel characteristic with k being 2, selecting the maximum value and outputting the maximum value;

(2) to enhance the learning ability of the network to hidden content attributes, a full-connection function process is used

Enhancing correlation between channels and outputting characteristic vector

The formula is as follows:

wherein, C_fully(. for) a full connection function, using M^c×1×1I.e., the convolution kernel with k equal to 1;

(3) feature vector transformation using deep convolutional neural network

Uniformly dividing the characteristic channel into p branches (p is less than or equal to c) to obtain the characteristic component of each characteristic channel

The formula is as follows:

wherein, F_deep(. h) is a deep convolutional neural network process function;

(4) randomly exchanging features by q groups on each branch, disturbing the inherent order of information between different channels, reclassifying and combining the feature information, and outputting feature components

The formula is as follows:

wherein, Shuffle (-) is an information exchange function, channel characteristics on each branch are divided into q groups, and the sequence is randomly disturbed between each group and different groups, so as to seek a new content and attribute matching relationship.

(5) Merging the interacted feature vectors by using a point convolution neural network, and outputting a content inference unit result

The recombination and fusion of the characteristics among different characteristic channels provides more possibility for the accurate transmission of the content characteristics. The point convolution neural network can randomly delete part of neurons in the processing process, and the formula is as follows:

wherein D is_ranThe operation can prevent the network from generating an overfitting phenomenon because a random deleting function is adopted and m is the proportion of randomly deleting neurons;

(6) Processing four-dimensional feature vectors using multi-layer residual units

And outputs the four-dimensional feature vector

The consistency of the spatial structure and the semantics in the transmission process of the feature vector is ensured, and a foundation is laid for the effective interaction of the feature components of different feature channels.

Wherein the content of the first and second substances,

is a weight matrix.

And 6, step 6: outputting the four-dimensional feature vector from the step 5

And four-dimensional feature vector

Multiplying to generate four-dimensional feature vector

Realizing the redistribution of target attributes in the content characteristics and correcting the characteristic transmission deviation, wherein the formula is as follows:

wherein the content of the first and second substances,

and

for the weight matrix, x represents the feature matrix multiplication.

And 7, step 7: the style characteristics output by the step 3

And the content characteristics output in step 6

Adding and fusing to obtain four-dimensional feature vector

Upsampling output style migration result Y in decoder^c×2h×2w。

The present embodiment further provides a system for implementing the method, which includes: the device comprises an encoding module, a content calibration module and a decoding module; each of these sections is described in detail below:

The style encoding module inputs the target domain into the image

As input, the four-dimensional feature vector is output by sequentially using global average pooling, full-connected function and style cosine normalization processing

The interference of the spatial structure on the color semantic information is reduced.

The content calibration module comprises a feature optimization unit and an attribute reasoning unit; as shown in fig. 2, the four-dimensional feature vector of the multi-feature channel is input into the feature optimization unit, and more content structure and detail texture information are learned while the image content features of the multi-feature channel are completely transferred; the attribute reasoning unit extracts hidden attributes from the four-dimensional feature vectors output by the residual error unit, and gives correct feature expression to each content feature again through multi-channel and batch training, so that the accurate correspondence of one-dimensional semantics and two-dimensional pixels is realized. The outputs of different units are multiplied in a channel attention mode, the content feature transfer deviation is corrected, the accurate classification of a content structure is realized, and the image content feature transfer consistency is implemented; the feature optimization unit and the attribute reasoning unit in the content calibration module are explained in detail as follows: the input of the attribute reasoning unit is a four-dimensional feature vector which comprises n feature channels and has the size of h multiplied by w after being processed by a multilayer residual error unit in the feature optimization unit

Wherein the learning process function of the feature optimization unit is F_opt(x) (ii) a The attribute reasoning unit learning process function is F_re(x) (ii) a The four-dimensional feature vector which comprises n feature channels and has the size of h multiplied by w is input to the next stage after the two unit output features are fused

The content calibration module is expressed as:

the feature optimization unit extracts depth features having n feature channels using a multi-layer residual unit

And feature redundancy is reduced, and texture details and contour feature information of the input image in the source domain are kept. And takes it as input to the attribute reasoning unit and the next stage.

Attribute inference unit using global average pooling pairs

Performing dimension reduction and regularization processing to output four-dimensional feature vectors

On the basis, full-connection layer processing is used, the influence of spatial information such as target positions on content attribute classification is weakened, so that the network is concentrated on the association among different channel features, semantic feature expression is enhanced, and the extraction capability of depth features is enhanced. In order to further enhance the relevance among different characteristic channels, the hidden attribute in the source domain input image is fully extracted, and the deep convolution with the convolution kernel of 3 multiplied by 3 is utilized to carry out the method

The method is divided into p branches, content characteristic information on each characteristic channel is respectively extracted, different branches learn and supervise mutually, and cross-characteristic channel reference is achieved. Meanwhile, the characteristic channels are divided into q groups in each branch, the channel sequence is disturbed in each group and different groups, the randomness of hidden content attributes is increased, the network generalization capability is improved, the classification attributes of the target are redistributed for each content characteristic, and the characteristic transfer deviation is reduced. In order to obtain accurate content attributes, the characteristics of each branch are filtered to obtain

Completing the integration work of p branch characteristic information by using 1 x 1 point convolution and outputting four-dimensional characteristic vectors

Each content feature is endowed with enhanced feature expression to guide the accurate generation of the content features. While processing the feature vectors in a single eigen-channel, use multi-layer residual units in parallel branches to face at multiple eigen-channel levels

Processing and outputting four-dimensional feature vector

Ensuring the transmission integrity of the feature vector; will be provided with

And

in a fixed ratio omega₁And ω₂Multiplying and outputting four-dimensional feature vector

The hidden attributes and the inherent content characteristics are subjected to countermeasure screening, the problem of wrong distribution of the content attributes is solved, the transmission deviation of the content characteristics is reduced, and help is provided for realizing consistent and accurate style migration of the content.

The decoding module performs source domain and target domain feature vector operations: will be provided with

And

adding and fusing to obtain four-dimensional feature vector

Then, the style migration result Y is output by up-sampling^c×2h×2w。

The feature parameter constraint condition in this embodiment may be:

(1) the RGB three-channel image with a size of 256 × 256 is down-sampled, and the input image size is reduced to 128 × 128, the feature channel n ∈ {4,8,16,64,256,512}, and any one of {1,128, 4}, {1,128, 8}, {1,128, 16}, {1,128, 64}, {1,128,128,256}, and {1,128,128,512} can be output as a four-dimensional feature vector containing features of image contents.

(2) The content calibration module selects four-dimensional feature vectors of different feature channels as input according to different input image contents: when the input image contains a small target or a fuzzy target, selecting a four-dimensional feature vector with a feature channel of n-256 as the input of a content calibration module; when no small target or fuzzy target exists in the input image, a four-dimensional feature vector with a feature channel of n being 8 is selected as the input of the content calibration module.

(3) And the feature optimization unit transmits four-dimensional feature vectors with feature channels n epsilon {4,8,16,64,256 and 512 }.

(4) And the attribute reasoning unit transmits a four-dimensional feature vector with a feature channel n being 1.

Structural unit constraint conditions:

(1) and the feature optimization unit extracts the depth content features by using a 4-layer residual error unit.

(2) The attribute reasoning unit comprises p branches, and p is equal to {0,1,2,3,4 }. When p is 0, the content calibration module only contains the feature optimization unit.

(3) Each branch in the attribute reasoning unit contains q groups, wherein q is { q |10 ≦ q ≦ 512, q belongs to Z⁺}。

(4) The attribute reasoning unit selects different grouping numbers according to different input image content complexity: when the input image contains a small target or a fuzzy target, selecting q ═ { q |128 ≦ q ≦ 512, and q ∈ Z⁺The number of packets; when no small target or fuzzy target exists in the input image, selecting q ═ { q |10 ≦ q ≦ 128, q ∈ Z⁺The number of packets.

Example 1: security monitoring style migration situation

The embodiment is used for monitoring unmanned prevention and places with multiple accidents, such as schools, crossroads and the like. The method is used for outdoor safety monitoring, and the identification capability of the target under complex illumination can be effectively improved. The security monitoring image style migration situation is shown in fig. 3.

Example 2: autonomous driving style migration scenario

The present example is directed to autonomous driving system style migration. The invention is applied to the vehicle-mounted camera, senses the surrounding environment of the vehicle, provides an auxiliary means for a driver, reduces the traffic accident rate, improves the safe driving capability of the vehicle, and has the autonomous driving style migration situation as shown in figure 4.

Example 3: visual-blurred scene style migration scenarios

The method can improve the image quality of the style migration of the visual fuzzy scene caused by the conditions of uneven illumination or natural weather, and the like, and prepares for next target detection or image segmentation, wherein the style migration condition of the visual fuzzy scene is shown in fig. 5.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. All such possible equivalents and modifications are deemed to fall within the scope of the invention as defined in the claims.

Claims

1. A semantic-guided content feature transfer style migration method is characterized by comprising the following steps:

preparing a data set of a training style migration network;

acquiring source domain input image with characteristic channel c

And target domain input image

Respectively carrying out double down-sampling operation on the double down-sampling operation, wherein the double down-sampling operation comprises convolution operation and nonlinear activation function processing;

downsampling results for target domain input images

Obtaining style feature vectors by using global average pooling and full-connection function processing

Downsampling results for source domain input images

The four-dimensional feature vector

Obtaining four-dimensional feature vectors

The four-dimensional feature vector is processed

And four-dimensional feature vectors

Multiplying to generate four-dimensional content feature vector Y₁ ^c ^×h×wReallocating the target attribute in the content characteristics and correcting the characteristic transmission deviation;

the style feature vector is converted into a plurality of style feature vectors

And four-dimensional content featureEigenvector Y₁ ^c×h×wAdding and fusing to obtain four-dimensional feature vector

Then, the style migration result Y is output by up-sampling^c×2h×2w。

2. The method of claim 1, wherein the source domain input image is input into a semantic-guided content feature transfer style migration method

And target domain input image

Performing double down-sampling operation, specifically:

using convolution kernel M^c×3×3Extracting the source domain input image

The content characteristics of

And the target domain input image

The style characteristic of

The formula is as follows:

wherein

For the convolution process, each matrix represents a 3 × 3 feature block;

feature vector to be output

And

using nonlinear activation function processing, and when the characteristic value of the activation processing is less than or equal to 0, setting the output value of the activation function to be 0, as shown in formula (3); conversely, when the output value of the activation function is the same as the input value, as shown in equation (4):

wherein the function A (-) is an activation function.

3. The method of claim 1, wherein the downsampling of the target domain input image results in the semantic-guided content feature delivery style migration method

Processing by using a global average pooling and full join function, specifically:

The formula is as follows:

for feature vector

Processing the characteristic channels one by using a full-connection function, and outputting a characteristic vector

The formula is as follows:

wherein, C_fully(. cndot.) is a full connection function, using M^c×1×1I.e. the convolution kernel of filter k ═ 1 operates;

for feature vector

The formula is as follows:

4. The method of claim 1, wherein the downsampling of the source domain input image results in a semantic-guided content feature delivery style migration method

5. The method of claim 1, wherein the four-dimensional feature vector is used for transferring the content feature transfer style

The method specifically comprises the following steps:

processing four-dimensional feature vectors using global maximum pooling

And obtaining the feature vector

The formula is as follows:

processing feature vectors using full join functions

And obtaining a feature vector

The formula is as follows:

wherein, C_fully(. for) a full connection function, using M^c×1×1I.e. the convolution kernel of filter k ═ 1 operates;

feature vector using deep convolutional neural network

The formula is as follows:

wherein, F_deep() is a deep convolutional neural network process function;

randomly exchanging features by dividing each branch into q groups, and disturbing information between different channelsThe characteristic information is reclassified and combined in sequence to obtain characteristic components

The formula is as follows:

wherein, Shuffle (-) is an information exchange function;

using point convolution neural network to feature components

Merging to obtain four-dimensional characteristic vector

6. The method of claim 1, wherein the feature vector is further processed using multi-layer residual units

Obtaining four-dimensional feature vectors

The method specifically comprises the following steps:

wherein the content of the first and second substances,

is a weight matrix.

7. The method of claim 1, wherein the feature vector is used to transfer the content feature transfer style

And four-dimensional feature vectors

Multiplying to generate four-dimensional content feature vector Y₁ ^c×h×wThe method specifically comprises the following steps:

wherein the content of the first and second substances,

and

for the weight matrix, x represents the feature matrix multiplication.

8. A semantic-guided content feature transfer style migration system is characterized by comprising an encoding module, a content calibration module and a decoding module;