CN113610938A

CN113610938A - Truck picture color editing method based on memory augmentation network

Info

Publication number: CN113610938A
Application number: CN202110770164.2A
Authority: CN
Inventors: 夏立志; 吕强; 吕建春; 周平; 王雪雁; 郑刚
Original assignee: Zenmorn Hefei Technology Co ltd
Current assignee: Zenmorn Hefei Technology Co ltd
Priority date: 2021-07-07
Filing date: 2021-07-07
Publication date: 2021-11-05
Also published as: CN115601456A

Abstract

The invention discloses a truck picture color editing method based on a memory augmentation network, which comprises the specific steps of obtaining a truck picture of a real scene, and dividing the truck picture into a training set and a test set in proportion; preprocessing and standardizing picture data of a training set, and extracting picture color features; constructing a convolutional neural network model based on a memory augmentation network; setting a loss function, and utilizing a training set to train and update a network model; and testing according to the test set and the trained network model, and filtering background variegates by using a fusion module to obtain a final color editing result. According to the method, the color editing method of the freight car picture based on the memory augmentation network is realized by picture processing, the memory augmentation network and the deep convolution neural network building, feature extraction and the like, the high-quality color attribute editing of the freight car picture is realized, and the problem of change of the appearance, the license plate and other texture details caused by a common image generation method is solved.

Description

Truck picture color editing method based on memory augmentation network

Technical Field

The invention relates to the technical field of computer vision image generation, in particular to a truck picture color editing method based on a memory augmentation network.

Background

With the development of society, the important technology of image generation as a truck picture color editing method based on a memory augmentation network is more and more emphasized by people, the application range is gradually expanded, and a plurality of small fields are divided, such as: the method comprises the steps of face attribute editing, image coloring, image defogging and denoising, image stylization and the like. In daily life, a beauty camera on a small mobile phone, an AI old photo restoration and the like have shadow of an image generation technology. After the deep learning is applied to the image generation field, the quality of the generated image is greatly improved, the generation maneuverability is more flexible, and two main flow methods are gradually derived: an autoencoder and a countermeasure generation network.

Although both mainstream methods have been developed rapidly, they have shortcomings. There are also a number of variants in which the two are combined, but the following problems remain: firstly, the colors of the truck cannot be edited under a limited data set; secondly, the texture of the truck image is not changed except for the color; thirdly, most of color editing of specific parts of the truck needs a large amount of masks during training.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and in order to realize the purpose, the truck picture color editing method based on the memory augmentation network is adopted to solve the problems in the background technology.

A truck picture color editing method based on a memory augmentation network comprises the following specific steps:

acquiring a truck picture of a real scene, and dividing the truck picture into a training set and a test set in proportion;

preprocessing and standardizing picture data of a training set, and extracting picture color features;

constructing a convolutional neural network model based on a memory augmentation network;

setting a loss function, and utilizing a training set to train and update a network model;

and testing according to the test set and the trained network model, and filtering background variegates by using a fusion module to obtain a final color editing result.

As a further aspect of the invention: the pictures of the training set are provided with corresponding color labels of L ═ L₁，L₂，…，L_i，…，L_nIn which L is_iA label representing the ith training picture;

the image normalization formula of the training set is as follows: p_x＝(P_x-P_mean)/P_stdIn which P is_meanIs the pixel mean value, P_stdIs the pixel standard deviation.

As a further aspect of the invention: the specific steps of constructing a convolutional neural network model based on a memory augmentation network, testing picture results and introducing a fusion module comprise:

firstly, constructing a feature extractor to extract the spatial features of the truck picture;

step two, constructing a memory augmentation network model by adopting one-to-many space color feature pairs;

constructing a coloring network by using a generator and a discriminator, and introducing a self-adaptive normalization layer;

and step four, finally, introducing a feature fusion module to filter out unnecessary color changes.

As a further aspect of the invention: the first step of constructing the feature extractor to extract the spatial features of the truck picture comprises the following specific steps of:

constructing a residual error network ResNet18, introducing pre-training parameters, fixing the parameters without participating in an updating process, and adjusting the output of the last layer to be 512-dimensional feature vectors, namely outputting the feature vectors to be X_res∈R⁵¹²；

Accessing a fully-connected layer with an input dimension of 512 after a residual network ResNet18The dimension S of the output space characteristic, namely q epsilon R^S。

As a further aspect of the invention: the second step of constructing the memory augmentation network model by adopting one-to-many space color feature pairs comprises the following specific steps:

setting spatial feature information S, color feature information V and time information T, and determining a corresponding relation, wherein if the color feature information is a two-dimensional array, the color attribute of each row of color features is the same, each row of color features corresponds to all color feature information of the same spatial feature, and the numbers of the spatial feature information, the color feature information and the time information are the same, namely the size of the memory augmentation network is determined;

and setting the corresponding relation between the spatial characteristics and the color characteristics to be one-to-many, and setting the time information and the spatial information to be in one-to-one correspondence, wherein if the current spatial characteristic information S is being accessed, the value of the corresponding information T is updated to be 0, and if not, the value is added with 1.

As a further aspect of the invention: the third step of constructing a coloring network by using a generator and a discriminator and introducing an adaptive normalization layer comprises the following specific steps:

constructing a generator: setting 256 × 256 network input/output images in a full convolution network form, wherein an encoder of the full convolution network is provided with a plurality of encoding modules, one encoding module comprises a convolution layer and a self-adaptive normalization layer, a decoder is provided with a plurality of decoding modules, and one decoding module comprises a transposition convolution layer and a self-adaptive normalization layer;

sending the color characteristics obtained by the retrieval of the memory augmentation network into a generator by using a self-adaptive normalization layer for matching coloring to obtain diversified coloring results;

constructing a discriminator: the structure form of Conv2d convolution and LeakyReLU stacking is adopted, the number of channels is multiplied by 2 each time, the characteristic diagram is divided by 2, and the final identification result is obtained through a full connection layer, wherein the full connection layer is composed of BatchNorm1d, Linear and Sigmoid.

As a further aspect of the invention: the specific steps of filtering out unnecessary color changes by introducing a feature fusion module at the end of the step four comprise:

acquiring diversified color generation pictures generated by a generator

While masking m_xIntroducing a fusion module to obtain a diversified coloring result y' as:

in the formula, x is an input picture.

As a further aspect of the invention: the specific steps of setting the loss function and utilizing the training set to train and update the network model comprise:

setting a loss function, and updating the feature extractor by using a triplet loss with a threshold value;

an update memory augmentation network through spatial, color, and temporal memory;

updating the generator and the discriminator by setting the countermeasure loss of the smooth L1 loss;

the main functions of the generator and the discriminator are loss functions, and the formula of the loss functions is as follows:

in the formula, y is a true value, i.e. the real RGB diagram corresponding to the input picture x,

is the result of the generator, and δ is the set threshold.

Compared with the prior art, the invention has the following technical effects:

by adopting the technical scheme, the technical means of carrying out picture preprocessing, memory augmentation network and deep convolution neural network building, feature extraction and the like on the truck picture are utilized. The method realizes high-quality color attribute editing of the truck picture, and solves the problem of change of texture details such as appearance and license plate and the like brought by a common image generation method. The memory augmentation network is used as a storage network of color characteristics, so that rare samples can not be covered, and the problem of sample imbalance is solved. Meanwhile, the method can only change the color of the truck without changing textures of a license plate and the like, and is applied to the fields of target detection and the like.

Drawings

The following detailed description of embodiments of the invention refers to the accompanying drawings in which:

FIG. 1 is a schematic illustration of steps of a wagon picture color editing method according to some embodiments disclosed herein;

FIG. 2 is a schematic algorithmic flow diagram of some embodiments disclosed herein;

FIG. 3 is a schematic diagram of a feature extractor and memory augmentation network architecture according to some embodiments of the present disclosure;

FIG. 4 is a schematic diagram of a generator and discriminator configuration of some embodiments disclosed herein;

FIG. 5 is a schematic diagram of the use of a fusion module in testing of some embodiments disclosed herein.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The principle steps of the invention are as follows: acquiring a truck image under a real scene, processing pictures of a training set, converting an input picture RGB space into an LAB space, cutting and zooming the input picture RGB space into the same size, standardizing the pictures, extracting color features of the pictures, and finishing a preprocessing part of the input picture; and constructing a memory augmentation network, a generator and a discriminator to prepare for starting training. The input image is sent to a feature extractor, and spatial features are extracted and obtained; sending the spatial characteristics to a memory augmentation network, and updating memory information and parameters according to the updating rules of the characteristic extractor and the memory network; sending the gray level image of the input image and the real color characteristics obtained by the preprocessing part into a generator and a discriminator to complete a confrontation training part and obtain a colored image; after the model training is finished, inputting a test picture to a memory network to retrieve a series of color features corresponding to the nearest spatial features, sending the color features into a generator to obtain various color editing results, and sending the color editing results into a fusion module to filter unwanted regional color changes according to needs, thereby obtaining the final diversified color editing results.

Referring to fig. 1 and fig. 2, in an embodiment of the present invention, a truck picture color editing method based on a memory augmentation network includes:

s1, acquiring truck pictures of real scenes, dividing the truck pictures into a training set and a test set in proportion,

s2, preprocessing and standardizing the picture data of the training set, and extracting picture color features;

the method comprises the following specific steps:

obtaining a training set of the truck pictures, wherein the training set comprises M training pictures X ═ X₁，X₂，…，X_i，…，X_MIn which X is_iRepresenting the ith training picture; setting corresponding color labels simultaneously, wherein the pictures of the training set are provided with the corresponding color labels as L ═ L₁，L₂，…，L_i，…，L_nIn which L is_iAnd (5) a label representing the ith training picture. Setting N images Y ═ Y₁，Y₂，…，Y_j，…，Y_NIs the test set, where Y_jRepresenting the jth image in the test set;

then converting an input picture RGB space into an LAB space, cutting and scaling the picture, and standardizing the input picture, wherein an image standardization formula of the training set is as follows: p_x＝(P_x-P_mean)/P_stdIn which P is_meanIs the average value of the pixels and is,P_stdis the pixel standard deviation.

Finally, the Color features of the input picture are extracted through Color Thief.

S3, constructing a convolutional neural network model based on a memory augmentation network; the method comprises the following specific steps:

s31, constructing a feature extractor to extract the spatial features of the truck picture:

as shown in FIG. 3, ResNet18 is used as backbone, the pre-training parameters are introduced and fixed, the updating process is not involved, and the final layer output is adjusted to 512 dimensions, that is, the output characteristic is X_res∈R⁵¹²(ii) a Accessing a full connection layer after ResNet18 has an input dimension of 512 and an output dimension of S, i.e. a shape of [512, S ] of the set spatial characteristics]The output characteristic is q epsilon R^S(ii) a In this experiment, S is 313.

S32, constructing a memory augmentation network model by adopting one-to-many space color feature pairs:

setting the spatial feature information as follows: s ═ S₁，S₂，…，S_i，…，S_m}; the color characteristic information is

If the color feature information is viewed as a two-dimensional array, the color attribute of each column of color features is the same, and each row is all color feature information corresponding to the same spatial feature; the time information is T ═ T₁，T₂，…，T_i，…，T_m}; wherein C is the number of color attributes, m is the size of the memory augmentation network, and is also the number of spatial feature information in the memory augmentation network; the number of the color characteristic information is m × C, wherein C is the number of the colors; the number of time information is also m. The correspondence of spatial and color features is one-to-many, S_iCorrespond to

The time information and the space information are in one-to-one correspondence, and the value of the time information and the space information is related to the number of times that the space characteristic is not accessed if the time information and the space information are in one-to-one correspondenceCurrent spatial feature S_iIs being accessed, then the corresponding T_iThe value is updated to 0, otherwise the value is incremented by 1.

S33, constructing a coloring network by using a generator and a discriminator, and introducing an adaptive normalization layer; the method comprises the following specific steps:

as shown in fig. 4, the generator is constructed: the method comprises the steps that a full convolution network form is adopted, 256 × 256 images of network input and output are set, wherein an encoder of the full convolution network is provided with 7 encoding modules, one encoding module comprises a convolution layer and a self-adaptive normalization layer, a decoder is provided with 8 decoding modules, and one decoding module comprises a transposition convolution layer and a self-adaptive normalization layer;

S34, finally, a characteristic fusion module is introduced to filter out unnecessary color changes.

Acquiring diversified color generation pictures generated by a generator

in the formula, x is an input picture.

S4, setting a loss function, and carrying out network model training and updating by using a training set;

the method comprises the following specific steps:

updating the feature extractor: the input picture is firstly processed by ResNet18 to obtain X_resThen, a spatial feature q with an output of 512 dimensions is obtained through the full connection layer, and is expressed as: q ═ WX_res+ b; then standardizing the cells to ensure that the cells are not woven₂1 is ═ 1; then, q is sent to a memory augmentation network, and the cosine distance d is calculated between q and all the spatial features in the memory augmentation network_i＝q·S[i]The k spatial features that are closest to it are selected, namely:

in the above formula (n)₁,…,n_k) An index in S for the selected k spatial features; the query q has a color characteristic of V and a color attribute of c. Selecting color features (l) corresponding to the k spatial features according to the color attribute c₁,…,l_k)＝(n₁*m+c,…,n_kM + c), m is the size of the memory augmentation network (l)₁,…,l_k) Is the index of the color features in the memory network V.

The feature extractor is then updated with the triplet loss.

Specifically, the index of the positive samples in the k nearest neighbor spatial features in the memory network S is represented as

Negative sample index is expressed as

The index of the positive sample in the corresponding color feature in the memory network V is represented as

Negative sampleIs correspondingly as

The corresponding relation between the positive and negative sample indexes is as follows:

calculating KL divergence between the color feature v of the query q and the selected color feature, and setting a threshold value to distinguish positive and negative samples; specifically, the following are shown:

the above equation is a case where the color feature is determined as a positive sample;

the above expression is a case where the color feature is determined as a negative sample. The overall triplet loss sets a threshold β, which ensures that the cosine distance between the positive sample and the negative sample has the minimum value β, and is specifically expressed as follows:

the cosine distance between the positive sample and the query q can be minimized through the maximization formula, and the cosine distance between the negative sample and the query q can be maximized; thus, the feature extractor is updated accurately.

Updating the memory augmentation network: inputting query q to memory augmentation network, firstly calculating cosine distance of all spatial features in q and S, and selecting the spatial feature most similar to the cosine distance

Then according to the color of qThe attribute c is retrieved

Corresponding color feature

Final calculation

And the KL divergence between the color characteristic v and the query q is divided into two conditions according to the KL divergence result to carry out memory updating of the memory augmentation network. The method specifically comprises the following steps:

when the value of KL divergence is less than epsilon, the query q and index are first calculated

The mean value of the two corresponding spatial characteristics is normalized, and the normalized result is sent to the index in S

Corresponding spatial features and replacing their values; indexing in simultaneous time information

The corresponding value will be set to 0;

specifically, when

Then, the update is as follows:

when the value of KL divergence is larger than epsilon, it means that there is no color feature close to query q in the memory network at this time, so the spatial feature and color feature of query q should be written into the memory augmentation network. Based on the time information memory T, the block o with the largest value is selected and corresponds toThe index in T is

O at this time is a block that has not been accessed for a long time; corresponding spatial characteristics

And a color feature corresponding to a color attribute of c

Replacing the values of q and v respectively; specifically, when

Then, the update is as follows:

attention is paid to

It also corresponds to the spatial and color features being in a one-to-many relationship, while the spatial and temporal information is in a one-to-one relationship.

Update of generator and discriminator: the discriminator discriminates the true image and the color image generated using the gradation map and the color feature as conditions as much as possible, and the generator fools the discriminator by generating the true color image as much as possible from the input gradation map and the color feature; for generating images

And the decision between the ground truth image y is determined by adopting the formula of smooth L1 loss function:

is the result of the generator, and δ is the set threshold.

The main function of the discriminator and the generator is to resist the loss, and only the generator has an extra smooth L1 loss.

And S5, testing according to the test set and the trained network model, and filtering background variegation by using a fusion module to obtain a final color editing result. The method specifically comprises the following steps:

as shown in fig. 5, the test picture is preprocessed to obtain color features, and then sent to the feature extractor to extract spatial features, that is, query q; then, sending q to the memory augmentation network, searching the nearest space characteristic, and returning a series of corresponding color characteristics; the color features and the gray level image of the input picture are sent to a generator to obtain diversified color editing results

Then a mask m can be introduced as required_x(ii) a Inputting the picture x and the picture generated by the generator

Sum mask m_xSending the color data into a fusion module to filter out unnecessary background color changes, and obtaining a more refined color editing result as follows:

in the formula, m_xAs a mask, x is the input picture.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents, which should be construed as being within the scope of the invention.

Claims

1. A truck picture color editing method based on a memory augmentation network is characterized by comprising the following steps:

2. The truck picture color editing method based on the memory augmentation network as claimed in claim 1, wherein the pictures of the training set are provided with corresponding color labels of L ═ L₁，L₂，…，L_i，…，L_nIn which L is_iA label representing the ith training picture;

3. The truck picture color editing method based on the memory augmentation network according to claim 1, wherein the specific steps of constructing a convolutional neural network model based on the memory augmentation network, testing picture results, and introducing a fusion module comprise:

4. The method for editing colors of wagon pictures based on the memory augmentation network as claimed in claim 3, wherein the step one of constructing the feature extractor to extract the spatial features of the wagon pictures comprises the specific steps of:

A full connection layer with an input dimension of 512 is accessed after a residual error network ResNet18, and an output spatial characteristic dimension S, namely q e R^S。

5. The truck picture color editing method based on the memory augmentation network of claim 3, wherein the specific step of constructing the memory augmentation network model by using one-to-many space color feature pairs in the second step comprises:

6. The truck picture color editing method based on the memory augmentation network as claimed in claim 3, wherein the third step of constructing the coloring network by using the generator and the discriminator and introducing the adaptive normalization layer comprises the following specific steps:

7. The truck picture color editing method based on the memory augmentation network of claim 3, wherein the specific steps of introducing the feature fusion module to filter out unnecessary color changes in the last step of the fourth step include:

acquiring diversified color generation pictures generated by a generator

in the formula, x is an input picture.

8. The truck picture color editing method based on the memory augmentation network as claimed in claim 1, wherein the specific steps of setting the loss function and using the training set to perform the network model training and updating include:

is the result of the generator, and δ is the set threshold.