CN115641253A

CN115641253A - Material nerve style migration method for improving content aesthetic quality

Info

Publication number: CN115641253A
Application number: CN202211182280.3A
Authority: CN
Inventors: 陈森霖; 沈玉龙; 袁博; 胡凯
Original assignee: Nanjing Baituo Vision Technology Co ltd
Current assignee: Nanjing Baituo Vision Technology Co ltd
Priority date: 2022-09-27
Filing date: 2022-09-27
Publication date: 2023-01-24
Anticipated expiration: 2042-09-27
Also published as: CN115641253B

Abstract

The invention discloses a material neural style migration method for improving content aesthetic quality, which comprises the following steps of 1, screening out a beautiful picture with a material part occupying a larger total area of the picture from a style and material data set; and add to the pictureA selected number of times tag; step 2, using the style material picture with higher aesthetic index and more selected times to the original picture I _org Neural style migration to generate multiple migrated pictures I _gen (ii) a Step 3, from the original drawing I _org The region RegionTarget of the material to be transferred is segmented; step 4, a plurality of migrated images I _gen The content corresponding to the region of the RegionTarget is synthesized with the background part of the original image to form a plurality of synthesized images; and 5, selecting the most beautiful one of the plurality of synthetic images, and adding 1 to the selected times label value of the style and material picture corresponding to the selected times label value. Thereby enabling the generation of the most elegant picture with the associated experience retained.

Description

Material nerve style migration method for improving content aesthetic quality

Technical Field

The invention belongs to the field of image processing, in particular to a neural style migration algorithm based on style conversion of images, which is mainly applied to the field of virtual reality/augmented reality.

Background

The material neural style migration algorithm is used for converting a certain original material in an original image into a specified another material, for example, converting the content of a stone material in an original image A into the content of a specified metal material, wherein the content style of the metal material is derived from an image B in a data set, converting a house made of the stone material into a house made of the metal material, and keeping the rest unchanged to obtain a synthesized image C. The algorithm can be widely applied to the image synthesis direction in the field of virtual reality and has good application value.

At present, the effect of the result is judged by depending on the similarity of styles of the generated picture and the designated material part, but in addition to the characteristic of similarity, the form of the generated picture can be varied, and in the varied results, how to select the most beautiful picture, namely, how to select the most beautiful image result of the synthesized metal house, the style of which is metallic, but the metal house can also have a plurality of effects, and how to further select the most beautiful image result, the problem is that no related research scheme exists at present.

Disclosure of Invention

The technical problem to be solved by the invention is how to select the most beautiful image from a plurality of image effects after style migration, wherein the image B of a raw material is beautiful, and if the image C generated after the migration is beautiful, the related experience is fed back to the image in the field of the raw material, so that the checked knowledge is learned in the next task; therefore, the invention provides a material neural style migration method for improving the content aesthetic quality, which can generate the most beautiful picture on the basis of realizing the style migration of the generated image on the basis of the classical material neural style migration algorithm, thereby realizing the virtual reality effect more beautifully and increasing the comfort level of human-computer interaction; and secondly, the related experience is reserved and learned in the next task, so that the computing resources can be reduced, and a better engineering effect is obtained.

The method is based on the bases of deep learning, aesthetic indexes and the like, is oriented to the aesthetic problem of the generated picture, can generate the objective and beautiful picture on the basis of realizing style migration of the generated picture, can realize the virtual reality effect more beautifully, and increases the comfort level of human-computer interaction.

The invention provides a material neural style migration method for improving content aesthetic quality. The details are as follows

Step 1, preprocessing a style material data set Dataset1, screening out style material pictures with the material part occupying the total area ratio of the pictures larger than a threshold value T1, and forming a data set Dataset3;

further screening style material pictures with aesthetic index scores larger than a threshold value T2 from the data set Dataset3 to form a data set Dataset4;

adding a selected time label Pic4_ selected to all style material pictures in the Dataset4;

the main content of the process is that the larger the ratio of the material part in the sample of the Dataset3 to the total area of the picture is, the better the higher the aesthetic index score of the part is, the better the process is.

Step 2, selecting the style material picture with higher aesthetic index and the picture with more selected times Pic4 selected to be used for original drawing I _org The overall nervous style is transferred to generate a plurality of transferred pictures I _gen ；

Step 3, to the original drawing I _org The material is finely divided to obtain an original drawing I _org Defining a middle to-be-converted material part RegionTarget and an image I _org Matrix I with completely consistent size and all numerical values of 0 _mask And is combined with _mask All the parts corresponding to the positions of the regions of the material to be converted, namely the regions of the material to be converted, are set to be 1.

Step 4, synthesizing the images and outputting a picture I _out Expressed as follows:

I _out ＝I _gen I _mask +I _org (1-I _mask )

wherein, I _gen Are synthesized images, each synthesized image I _out And a post-migration picture I _gen Corresponding;

step 5, calculating all the synthetic images I _out Obtaining the synthetic image I with the maximum NIMA value _out That is, the finally obtained image is obtained, and 1 is added to the Pic4_ selected tag value of the style and material picture corresponding to the finally obtained image.

Further, style material pictures with the material part accounting for the total area ratio of the pictures larger than a threshold value T1 are screened out in the step 1, and a data set Dataset3 is formed; the method specifically comprises the following steps:

step 1.1, acquiring a primary material area tag data set SegDateset2;

randomly extracting all image-level label picture samples in FMD and EFMD to obtain a data set SegDateset1;

using a weakly supervised approach, learning the pixel-level semantic affinity PSA from the class activation map CAM of the multi-label CNN network, a primary material region label dataset SegDateset2 is obtained from the image-level label dataset SegDateset 1.

Step 1.2, training a harmonic dense connection network HarDNet named HarDNet1 by adopting a primary material area label data set SegDateset2, and finely adjusting the network HarDNet1 by using another group of pixel level annotation data set SegDateset3 to obtain a trained harmonic dense connection network HarDNet _ seg;

the data set SegDateset3 was randomly drawn from all pixel level label picture samples in FMD and EFMD.

Step 1.3, segmenting pixel level semantics of all images in a data set Dataset1 by adopting a trained HarDNet1 harmonic dense connection network; and labeled as Dataset2;

step 1.4, counting the total number of pixels of various semantic contents in the data set Dataset2 image, calculating the proportion of the pixels to the whole image, sequencing from high to low, and outputting the area ratio Ph2 (i) of the semantic classification with the highest sequence to each image in the data set Dataset2, wherein i is a number and the total number itotal of the images from 1 to Dataset 2.

Defining a threshold value T1 (0-1), deleting pictures Ph2 (i) < = T1 in the Dataset2, outputting the rest pictures as Dataset3, and obtaining a pixel-level semantic data set Dataset3 with the area of a single material occupying the whole picture and exceeding T1.

Further, in step 2, the style material picture with higher aesthetic index and the picture with more selected times Pic4_ selected are selected to be the original picture I _org Moving the whole nerve style to generate multiple moved pictures I _gen (ii) a The method comprises the following specific steps:

step 2.1, sorting each type of pictures in the data set Dataset4 from high to low according to the aesthetic NIMA index, wherein the style material category to be migrated has K pictures in total, and adopting the first N1 style pictures to perform original drawing I _org Overall nerve style migration to obtain N1 migrated output pictures I _gen (n 1); wherein the value range of N1 is 1-N1;

step 2.2, in the style and material category to be transferred; when the Pic4_ selected values of all pictures in the class are 0, let N3=0; entering the step 2.4;

otherwise, sorting the pictures from high to low according to the Pic4_ selected values of the pictures in the style material category to be migrated, selecting the style material pictures with the Pic4_ selected values not being 0, and taking the first N2 style material pictures;

step 2.3, removing the repeated pictures in the first N1 style material pictures in the step 2.1 from the first N2 style material pictures to obtain N3 style material pictures,

step 2.4, adopting the N3 style material pictures to carry out comparison on the original drawing I _org Migration of overall neural style; respectively obtaining N3 transferred pictures I _gen (n 3); wherein the value range of N3 is 1-N3;

step 2.5, finally obtaining N1+ N3 post-migration I _gen And (5) collecting pictures.

Further, in step 2, a classical neural style migration method is adopted for material migration, and the modification loss function is as follows:

wherein, alpha and beta are both weight coefficients,

the index of the loss of content is represented,

the table style loss index, NIMA (k, clas) represents the NIMA value of the k-th style material picture in the clas class;

further, step 3 adopts the harmonic dense connection network HarDNet _ seg trained in step 1 to correct the original image I _org The material is finely divided.

Has the advantages that: by the method, firstly, the generated image can be generated into the most beautiful picture on the basis of realizing the style migration on the basis of the classical material nerve style migration algorithm, so that the virtual reality effect can be more beautifully realized, and the comfort level of human-computer interaction is increased; and secondly, related experiences are reserved and learned in the next task, so that the computing resources can be reduced, and a better engineering effect is obtained.

Drawings

FIG. 1 is an overall flow diagram of the present invention;

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

The invention relates to a neural style migration method of a material for improving content aesthetic quality, which comprises the following steps:

step 1, preprocessing a style and material data set Dataset1, screening out pictures with the material part occupying the total area ratio of the pictures larger than a threshold value T1, and forming a data set Dataset3; further screening pictures with aesthetic index scores larger than a threshold value T2 from the data set Dataset3 to form a data set Dataset4; adding a selected frequency label Pic4_ selected to all style material pictures in the data set Dataset4, and setting an initial value of the selected frequency label Pic4_ selected of each style material picture as 0;

the method comprises the following specific steps:

step 1.1, screening out pictures with material portions occupying more than a threshold value T1 in the total area ratio of the pictures to form a data set Dataset3;

generally speaking, in an existing data set, a large number of semantic samples are marked at an image level, for example, a large area of a stone is marked in a graph, the whole graph is marked with a semantic of the stone manually, while a small number of semantic samples at a pixel level are marked in a graph, for example, a large area of the stone is marked in a graph, each pixel of a part of stone materials is marked with the stone, each pixel of a part of other materials is marked with other semantic, and this step needs refined pixel-level semantic identification.

Step 1.1.1, acquiring a primary material area tag data set SegDateset2;

the image samples in the Flickr Material Database (FMD) and the Extended FMD (EFMD) include image-level label image samples and pixel-level label image samples, in this embodiment, the data set SegDateset1 is obtained by randomly extracting 90% of all image-level label image samples in the FMD and the EFMD and then summarizing, and the model setting in the process of this step of the scheme is consistent with the scheme proposed by Ahn and Kwak.

The approach proposed by Ahn and Kwak (Ahn, J.; kwak, S.Learing pixel-level sensing with image-level super Vision for the super Vision of the procedure of the IEEE Conference Vision and Pattern recognition. IEEE,2018, pp.4981-4990.) is used herein to obtain from a Class Activation Map (CAM) of a multi-tag CNN network (Zhou, B.; khosla, A.; lapedriza, A.A.; oliva, A.toralba, A.left semantic facility for the semantic registration, A.A. Torllba, A.Learing devices for the high semantic registration of the image data set of the IEEE tag of the IEEE trade, the high semantic registration of the primary tag of the IEEE trade data set 2921, the low semantic data set of the IEEE tag of the View of the IEEE trade data set 2921, the high semantic data set of the primary tag of the high semantic registration of the image of the IEEE trade.

Step 1.1.2, a harmonic dense connection network (HarDNet) named HarDNet1 was trained using a primary material area tag dataset SegDateset2 (Chao, P.; kao, C.Y.; ruan, Y.S.; huang, C.H.; lin, Y.L.Hardnet: all memory transmission network in Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019, pp.3552-3561), and the network HarDNet1 was trimmed using another set of pixel level annotation datasets SegDateteset 3.

In this embodiment, the data set SegDateset3 is obtained by randomly extracting 90% of FMD number and all pixel level label picture samples in the EFMD and summarizing the samples. The network structure of HarDNet1 is consistent with that in the paper of Chao, P, and the fine tuning process is consistent. Obtaining a trained harmonic dense connection network (HarDNet _ seg) for the original image I in the later step _org And refining and dividing the material part.

Step 1.1.3, segmenting pixel level semantics of all images in a data set Dataset1 by adopting a trained HarDNet1 harmonic dense connection network; and labeled as Dataset2;

step 1.1.4, counting the total number of pixels of various semantic contents in the data set Dataset2 images, calculating the proportion of the pixels to the whole image, sequencing from high to low, and outputting the area ratio Ph2 (i) of the semantic classification with the highest sequence to each image in the data set Dataset2, wherein i is a number and the total number itotal of the images from 1 to Dataset 2.

Step 1.2, further screening out pictures with the aesthetic NIMA index score larger than a threshold value T2 from the data set Dataset3, and sequencing each type of pictures in the data set Dataset3 from high to low according to the aesthetic NIMA index to form a data set Dataset4;

the NIMA index is derived from a paper of "Neural Image Assessment", wherein google research team proposes a deep CNN capable of predicting the distribution of human opinion on Image evaluation from a direct look and feel (technical point of view) and an attraction (aesthetic point of view). In the approach taken by google, the NIMA model does not simply score images for high or low scores, nor does it regress for average scores, but rather makes a score distribution for any image-in the range of 1 to 10, NIMA will assign the score probability of this image to the 10 scores, with the quality ranked from high to low with score.

Step 1.2.1, calculating NIMA for all images in Dataset3 by using a method in the paper of Neural Image Association to obtain an NIMA index result NIMA (j) of a picture j, wherein the value of the j is the total number jtotal of the pictures from 1 to Dataset3.

Step 1.2.2, defining a threshold value T2 (0-1), deleting all NIMA (j) < = T2 pictures in the Dataset3, and in the rest pictures, sorting each type of style material from high to low according to the aesthetic NIMA index to form a data set Dataset4, wherein the value NIMA of the aesthetic index is added in each picture in the data set Dataset4, and the selected times Pic4_ selected;

in this embodiment, T2 is set to 3, so that pictures with unclear, poor saturation, poor aesthetic sense, and the like are basically deleted.

The pictures in Dataset4 are denoted Pic4 (K, clas), K is the number of this picture in the clas class, the value range of K is 1 to K, K denotes the total number of pictures in each clas class, and Dataset4 has NIMA (K, clas) value, which is used in later retrieval, is the number of the class, e.g. 10 material classes in the Dataset, if the stone is the 1 st material class, then the clas is 1. And adopting Pic4_ selected (k, clas) to represent the number of times the k-th picture in the clas class is selected.

Thus, a preprocessed and screened data set is obtained, the area of all main body materials in the data set accounts for a large image ratio, the image aesthetic index is high, the images in the data set are separately tabulated according to the material classes, sorted according to the image aesthetic index, the value of the aesthetic index is kept, and a selected time variable Pic4_ selected (k, clas) is set for each picture Pic4 (k, clas), and all the Pic4_ selected (k, clas) variables are 0 at the beginning.

Step 2, selecting the style material picture with higher aesthetic index and the picture with more selected times Pic4 selected to be used for drawing the original picture I _org The overall nervous style is transferred to generate a plurality of transferred pictures I _gen ；

Material transformations were performed using the traditional neural style migration method of Gatys (Gatys, L.A.; ecker, A.S.; bethge, M.Image style transfer using a connected neural network, in Proceedings of the IEEE Conference on Computer Vision and Pattern recognition. IEEE,2016, pp.2414-2423.) which uses a pre-trained VGG19 network to extract content and style features. According to the definition of the loss function of the classical traditional neural style migration method, the loss function is a new content loss function and a classical style loss function, wherein the feature distance is particularly minimized to be divided by the image aesthetic index (content loss index)

NIMA (k, clas)) and its Gram matrix (style loss index)

) To optimize the migrated images.

When the material image B is retrieved and converted in the data set, the content loss index of the classical traditional nerve style migration method is removed

Index of style loss

And also calculates the aesthetic index NIMA of the lattice material picture in Dataset4Since the larger the NIMA, the better and the smaller the loss function, the better, the reciprocal is obtained, which results in a new loss function:

since the target class is known at the time of the task, the clas is a known number, k of the search target is the number of the graph in the clas class, k is also known, and NIMA (k, clas) is known. The calculation of the classical neural style migration method is not affected as a whole. In the present embodiment, both α and β are set to 0.5.

The specific steps of the original image migration are as follows:

step 2.1, in the data set Dataset4, the style material category to be migrated has K pictures in total, and the original drawing I is subjected to the first N1 style pictures _org Overall nerve style migration to obtain N1 migrated output pictures I _gen (n 1); wherein the value range of N1 is 1-N1;

otherwise, sorting the pictures from high to low according to the Pic4_ selected values of the pictures, selecting the pictures with the Pic4_ selected values not being 0, and taking the first N2 pictures;

step 2.3, removing the repeated pictures in the first N1 pictures in the step 2.1 from the first N2 pictures to obtain N3 pictures;

step 2.4, adopting the N3 pictures to carry out comparison on the original image I _org Migration of overall neural style; respectively obtaining N3 transferred pictures I _gen (n 3); wherein the value range of N3 is 1-N3;

step 2.5, obtaining N1+ N3 post-migration I _gen Collecting pictures;

step 3, using the trained harmonic dense connection network (HarDNet _ seg) to finely divide the material of the original drawing to obtain an original drawing I _org Defining a middle to-be-converted material part RegionTarget and an image I _org The sizes are completely consistentMatrix I with values of all 0 _mask And is combined with _mask All the parts corresponding to the positions of the regions of the material to be converted, namely the regions of the material to be converted, are set to be 1.

Step 4, transferring the image I _gen And matrix I _mask Multiplying to obtain a migrated image I _gen The content corresponding to the region of the RegionTarget, namely the material region of the image after the migration;

the original image I _org And matrix I _mask Multiplying to obtain an original image I _org The background area of (1);

the material area of the image after the migration and the original image I _org Is synthesized to the final output (I) _out ) Wherein it is defined as follows:

I _out ＝I _gen I _mask +I _org (1-I _mask )

here, I _gen Is a synthesized image, I _mask E {0,1} is the region mask obtained by HarDNet, I _org Is the image of the content with the original object. Therefore, due to the migrated image I _gen N1+ N3, so a group of pictures with corresponding quantity can be obtained;

step 5, evaluating and optimizing

In this step, the invention is to solve the problem that the generated image C is beautiful; and secondly, if the generated image C is beautiful, the related experience is fed back to the image in the field of raw materials, so that the experience knowledge can be learned in the next task.

The specific operation comprises the following steps:

step 5.1, for this group I _out Calculating an Image by using a method in a paper of Neural Image Assessment to obtain an NIMA index, and selecting a picture with the maximum NIMA value as the final output of the whole algorithm from high to low;

and 5.2, increasing 1 for the Pic4_ selected (k, clas) variable of the original image of the selected picture in the Dataset4, increasing the probability that the style image is selected again in the step 2 in the next task, and realizing the learning of the empirical knowledge.

Therefore, through the implementation of the method, firstly, the generated image can be generated into the most beautiful picture on the basis of realizing the style migration on the basis of the classical material nerve style migration algorithm, so that the virtual reality effect can be realized more beautifully, and the comfort level of human-computer interaction is increased; and secondly, related experience is reserved and learned in the next task, so that computing resources can be reduced, and a better engineering effect is obtained.

Suppose original drawing I _org The content of (1) comprises a bowl made of wood and a spoon made of stone, wherein the bowl made of wood is a material part to be replaced by styles. The task of material style migration turns into target style material metal to the bowl of wood material, and all the other remain unchanged, and the ladle of stone material is unchangeable promptly. Step 2, selecting a plurality of metal style material pictures with high NIMA index and more selected times in the metal class; to the original drawing I _org The overall style of all the contents is transferred, namely, the bowl made of wood and the spoon made of stone are all converted into metal, and a plurality of pictures I are obtained _gen . Step 3, the original drawing I is processed _org The position RegionTarget of the part of the bowl made of wood is divided and forms a mask matrix I _mask . Step 4 is to pass the mask matrix I _mask Handle I _gen Part of middle metal bowl is used for replacing original drawing I _org The wood bowl part in the table obtains a plurality of outputs I _out . Step 5 is outputting I from a plurality of outputs _out Selecting the most beautiful sheet I _out And adding 1 to the selected times of the corresponding style and material picture to increase the probability of selecting the picture next time.

Claims

1. A neural style migration method for materials with improved content aesthetic quality is characterized by comprising the following steps:

step 2, selecting the style material pictures with higher aesthetic index and the style material pictures with more selected times Pic4_ selected, and drawing the original picture I _org The overall nervous style is transferred to generate a plurality of transferred pictures I _gen ；

Step 3, for the original drawing I _org The material is finely divided to obtain an original drawing I _org Defining a middle to-be-converted material part RegionTarget and an image I _org Matrix I with completely consistent size and all numerical values of 0 _mask And is combined with _mask All parts corresponding to the positions of the regioontargets of the materials to be converted are set to be 1;

I _out ＝I _gen I _mask +I _org (1-I _mask )

2. The neural style migration method of materials for improving content aesthetic quality according to claim 1, characterized in that style material pictures with the material part occupying the total area ratio of the pictures larger than a threshold value T1 are screened in step 1 to form a data set Dataset3;

the method specifically comprises the following steps:

step 1.1, acquiring a primary material area tag data set SegDateset2;

learning a pixel-level semantic affinity PSA from a class activation map CAM of a multi-label CNN network by using a weakly supervised scheme, and obtaining a primary material region label data set SegDateset2 from an image-level label data set SegDateset1;

the data set SegDateset3 is obtained by randomly extracting all pixel level label picture samples in FMD and EFMD;

step 1.1.4, counting the total number of pixels of various semantic contents in the data set Dataset2 image, calculating the proportion of the pixels to the whole image, sequencing from high to low, and outputting the area ratio Ph2 (i) of the semantic classification with the highest sequencing to each image in the data set Dataset2, wherein i is a number and the total number itotal of the images from 1 to Dataset2;

3. The method as claimed in claim 1, wherein the step 2 of selecting the material with higher aesthetic index and selecting the selected pictures with higher Pic4 selected times to obtain the original image I _org Moving the whole nerve style to generate multiple moved pictures I _gen (ii) a The method comprises the following specific steps:

step 2.1, sorting each type of pictures in the Dataset Dataset4 from high to low according to the aesthetic NIMA index, wherein the style material category to be migrated has K pictures in total, and adopting the first N1 style pictures to perform original drawing I _org Overall nerve style migration to obtain N1 migrated output pictures I _gen (n 1); wherein the value range of N1 is 1-N1;

step 2.3, removing the repeated pictures in the first N1 style material pictures in the step 2.1 from the first N2 style material pictures to obtain N3 style material pictures;

4. The neural-style migration method of materials for improving content aesthetic quality according to claim 1, wherein a classical neural-style migration method is adopted for material migration in the step 2, and the modification loss function is as follows:

wherein, alpha and beta are both weight coefficients,

the index of the loss of content is represented,

the table style loss index, NIMA (k, clas), represents the NIMA value of the k-th style material picture in the clas class.

5. The content-oriented material nervous style migration method with improved aesthetic quality, according to claim 2, characterized in that in step 3, the HarDNet _ seg pair of the original drawing I trained in step 1 is adopted _org The material of (2) is finely divided.