CN115641253B

CN115641253B - Material nerve style migration method for improving aesthetic quality of content

Info

Publication number: CN115641253B
Application number: CN202211182280.3A
Authority: CN
Inventors: 陈森霖; 沈玉龙; 袁博; 胡凯
Original assignee: Nanjing Baituo Vision Technology Co ltd
Current assignee: Nanjing Baituo Vision Technology Co ltd
Priority date: 2022-09-27
Filing date: 2022-09-27
Publication date: 2024-02-20
Anticipated expiration: 2042-09-27
Also published as: CN115641253A

Abstract

The invention discloses a material nerve style migration method for improving aesthetic quality of content, which comprises the following steps of 1, screening out pictures with larger and more beautiful material parts accounting for the total area of the pictures from a style material data set; adding a selected frequency label to the picture; step 2, using the style material picture with higher aesthetic index and more selected times to compare the original picture I _org Nerve style migration, generating a plurality of migrated pictures I _gen The method comprises the steps of carrying out a first treatment on the surface of the Step 3, from original image I _org Dividing a region target of a material to be migrated; step 4, a plurality of migrated images I _gen The content corresponding to the region target region is synthesized with the background part of the original image to form a plurality of synthesized images; and 5, selecting the most attractive one of the plurality of synthetic images, and adding 1 to the selected times label value of the corresponding style material picture. So that the most beautiful picture can be generated and the related experience is preserved.

Description

Material nerve style migration method for improving aesthetic quality of content

Technical Field

The invention belongs to the field of image processing, in particular to a nerve style migration type algorithm based on style conversion of images, which is mainly applied to the field of virtual reality/augmented reality.

Background

The role of the material nerve style migration algorithm is to convert a certain original material in the original image into another specified material, for example, to convert the content of stone material in the original image a into the content of specified metal material, the content style of the metal material is derived from the image B in the dataset, one stone-material house is converted into a metal-material house, and the rest remains unchanged, so as to obtain a synthesized image C. The algorithm can be widely applied to the image synthesis direction in the field of virtual reality, and has good application value.

The effect of the result is determined by the style similarity of the generated picture and the designated material part, but the form of the generated picture can be changed in addition to the feature of similarity, and in the changed results, how to select the most beautiful picture, namely the synthesized house made of metal, the style of which is metal, but the metal house can have various effects, and how to further select the most beautiful image result, so the problem has no related research scheme at present.

Disclosure of Invention

The invention aims to solve the technical problem of how to select the most beautiful image from various image effects after style migration, wherein the image B of the raw material is beautiful, and if the image C generated after migration is beautiful, the relevant experience is fed back to the image in the raw material field, so that the experience knowledge is learned in the next task; therefore, the invention provides a material nerve style migration method for improving aesthetic quality of content, which can generate the most beautiful picture on the basis of realizing style migration on the generated image on the basis of classical material nerve style migration algorithm, so that the effect of virtual reality can be realized more beautiful, and the comfort level of man-machine interaction is increased; secondly, the related experience is reserved and learned in the next task, so that the computing resources can be reduced, and a better engineering effect is obtained.

The invention is based on the foundations of deep learning, aesthetic indexes and the like and is oriented to the aesthetic problem of generated pictures, and can generate objective and most beautiful pictures on the basis of realizing style migration on the generated images, thereby realizing the effect of virtual reality more beautiful and increasing the comfort level of man-machine interaction.

The invention provides a material nerve style migration method for improving aesthetic quality of content, which comprises the steps of preprocessing a data set, migrating an original picture integral nerve style, segmenting the original picture, synthesizing and evaluating feedback. The method is as follows

Step 1, preprocessing a style texture data set Dataset1, screening style texture pictures with the texture part accounting for the total area ratio of the pictures being greater than a threshold T1, and forming a data set Dataset3;

further screening style material pictures with aesthetic index scores greater than a threshold T2 from a Dataset Dataset3 to form a Dataset Dataset4;

adding a selected times label Pic4_selected to all the style material pictures in the data set Dataset4;

the main content of the process is that the larger the ratio of the material part in the sample of the Dataset3 data set to the total area of the picture is required to be, the higher the aesthetic index score of the part is.

Step 2, selecting a picture of a style material with a higher aesthetic index and a picture with a larger selected number Pic4_selected to obtain an original picture I _org Integral nerve style migration, and multiple migrated pictures I are generated _gen ；

Step 3, for original image I _org The material of (2) is finely divided to obtain an original image I _org Defining a piece of image I and a part of region target of a material to be converted _org Matrix I with completely identical size and value of 0 _mask And I is combined with _mask The part corresponding to the region target position of the material to be converted is set to be 1.

Step 4, synthesizing the images and outputting a picture I _out The expression is as follows:

I _out ＝I _gen I _mask +I _org (1-I _mask )

wherein I is _gen Is a synthesized image, each synthesized image I _out And a migrated picture I _gen Corresponding to the above;

step 5, calculating all the synthesized images I _out Obtaining a synthetic image I with the maximum NIMA value _out I.e. the final image is obtained and the corresponding style is adoptedThe Pic4_selected tag value of the texture picture is incremented by 1.

Further, in the step 1, a style material picture with the material part accounting for the total area ratio of the picture being larger than a threshold value T1 is screened out, and a data set Dataset3 is formed; the method specifically comprises the following steps:

step 1.1, acquiring a primary material region label dataset SegDateset2;

randomly extracting all image-level label picture samples from FMD and EFMD to obtain a data set SegDateset1;

using a weakly supervised scheme, the pixel-level semantic affinity PSA is learned from the class activation map CAM of the multi-label CNN network, obtaining a primary material region label dataset SegDateset2 from the image-level label dataset SegDateset 1.

Step 1.2, performing harmonic dense connection network HarDNet with a training name of HarDNet1 by adopting a primary material region label dataset SegDateset2, and performing fine tuning on the network HarDNet1 by using another group of pixel-level annotation datasets SegDateset3 to obtain a trained harmonic dense connection network HarDNet_seg;

the dataset SegDateset3 is randomly decimated from all pixel level label picture samples in FMD and EFMD.

Step 1.3, segmenting pixel-level semantics of all images in a Dataset Dataset1 by adopting a trained harmonic dense connection network of HarDNet 1; and the label is Dataset2;

step 1.4, counting the total number of pixels of various semantic contents in the data set Dataset2 image, calculating the proportion of the pixels to the whole image, sorting from high to low, and outputting only the area ratio Ph2 (i) of the semantic classification with the highest sorting for each image in the data set Dataset2, wherein i is the number, and the total number itotal of the images from 1 to Dataset 2.

Defining a threshold T1 (0-1), deleting a picture with Ph2 (i) <=T1 in Dataset2, outputting the rest picture as Dataset3, and obtaining a pixel-level semantic data set Dataset3 with the area of a single material accounting for the whole picture exceeding T1.

Further, in step 2, a style picture with a higher aesthetic index is selected, and the selected times Pic4_selected are morePicture to original I _org Integral nerve style migration, and multiple migrated pictures I are generated _gen The method comprises the steps of carrying out a first treatment on the surface of the The method comprises the following specific steps:

step 2.1, sorting each type of pictures in the Dataset Dataset4 from high to low according to aesthetic NIMA indexes, wherein K pictures are shared in the style material category to be migrated, and the first N1 style pictures are adopted for comparing the original picture I _org The overall nerve style is migrated to obtain N1 migrated output pictures I respectively _gen (n 1); wherein, the value range of N1 is 1-N1;

step 2.2, in the style material category to be migrated; when pic4_selected values of all pictures in the class are 0, let n3=0; step 2.4 is entered;

otherwise, sorting from high to low according to Pic4_selected values of the pictures in the style material categories to be migrated, selecting style material pictures with Pic4_selected values not being 0, and taking the first N2 style material pictures;

step 2.3, removing the repeated pictures from the previous N2 style material pictures and the previous N1 style material pictures in step 2.1 to obtain N3 style material pictures,

step 2.4, adopting the N3 style material pictures to compare the original image I _org Overall nerve style migration; respectively obtaining N3 migrated pictures I _gen (n 3); wherein, the value range of N3 is 1-N3;

step 2.5, finally obtaining N1+N3 post-migration I _gen And (5) collecting pictures.

Further, in the step 2, a classical nerve style migration method is adopted to migrate materials, and the modification loss function is as follows:

wherein, alpha and beta are weight coefficients,indicating content loss index->A table style loss index (NIMA (k, clas)) which represents NIMA values of a kth style texture picture in the class clas;

further, the step 3 adopts the harmonic dense connection network HarDNet_seg pair original image I trained in the step 1 _org Is subjected to fine segmentation.

The beneficial effects are that: by implementing the invention, firstly, on the basis of a classical texture nerve style migration algorithm, the most beautiful picture can be generated on the basis of implementing style migration on the generated image, the effect of virtual reality can be realized more beautiful, and the comfort level of man-machine interaction is increased; and secondly, related experiences are reserved and learned in the next task, so that the computing resources can be reduced, and a better engineering effect is obtained.

Drawings

FIG. 1 is an overall flow chart of the present invention;

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

The invention discloses a material nerve style migration method for improving aesthetic quality of content, which comprises the following steps:

step 1, preprocessing a style material data set Dataset1, screening out pictures with the material part accounting for the total area ratio of the pictures being greater than a threshold T1, and forming a data set Dataset3; further screening out pictures with aesthetic index scores greater than a threshold T2 from the Dataset Dataset3 to form a Dataset Dataset4; adding a selected times label Pic4_selected to all the style texture pictures in the data set Dataset4, wherein the initial value of the selected times label Pic4_selected of each style texture picture is set to be 0;

the method comprises the following specific steps:

step 1.1, screening out pictures with the material part accounting for the total area ratio of the pictures being greater than a threshold T1 to form a Dataset3;

in general, in the existing dataset, the number of semantic samples marked with image level is large, for example, a large area is stone, the whole image is manually marked with the stone, while the number of semantic samples marked with pixel level is small, for example, a large area is stone, part of each pixel of stone material is marked with the stone, part of each pixel of other materials is respectively marked with other semantics, and the step needs fine pixel level semantic recognition.

Step 1.1.1, acquiring a primary material region label dataset SegDateset2;

the picture samples in the Flickr Material Database (FMD) and the Extended FMD (EFMD) include image-level label picture samples and pixel-level label picture samples, and in this embodiment, the dataset SegDateset1 is obtained by randomly extracting 90% of all image-level label picture samples in the FMD and the EFMD, and model setting in the course of this step is consistent with the schemes proposed by Ahn and Kwak.

Using the proposed scheme of Ahn and Kwak (Ahn, j.; kwak, s.learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segment.in Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern registration.ieee, 2018, pp.4981-4990.) a primary material region label data set segdate2 is obtained from a large data set segdate 1 of low cost, high data content image level labels (semantic labels) using a weakly supervised scheme from a Class Activation Map (CAM) of a multi-label CNN network (Zhou, b.; khosla, a.; lapedriza, a.; olyva, a.; torralba, a.learning deep features for discriminative localization.in Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern registration.ieee, 2016, pp.2921-2929.).

Step 1.1.2, a harmonic dense connection network (HarDNet) with the training name HarDNet1 is performed using a primary material region tag dataset SegDateset2 (Chao, P.; kao, C.Y.; ruan, Y.S.; huang, C.H.; lin, Y.L.Hardnet: alow memory traffific network.In Proceedings of the Proceedings of the IEEE International Conference on Computer Vision.IEEE,2019, pp.3552-3561.), and fine tuning of the network HarDNet1 is performed using another set of pixel-level annotation datasets SegDateset 3.

In this embodiment, the data set SegDateset3 is obtained by integrating the FMD number and all pixel levels in the EFMDAnd randomly extracting 90% of label picture samples to obtain the label picture samples. The network structure of HarDNet1 is consistent with the structure in Chao, P, and the fine tuning process is consistent. Obtaining a trained harmonic dense connection network (HarDNet_seg) for the original image I in the later step _org And (5) finely dividing the material part.

Step 1.1.3, segmenting pixel-level semantics of all images in a Dataset Dataset1 by adopting a trained harmonic dense connection network of HarDNet 1; and the label is Dataset2;

step 1.1.4, counting the total number of pixels of various semantic contents in the data set Dataset2 image, calculating the proportion of the pixels to the whole image, sorting from high to low, and outputting only the area ratio Ph2 (i) of the semantic classification with the highest sorting for each image in the data set Dataset2, wherein i is the number, and the total number itotal of the images from 1 to Dataset 2.

Step 1.2, further screening pictures with the aesthetic NIMA index score larger than a threshold T2 from a data set Dataset3, and sorting each type of pictures in the data set Dataset3 from high to low according to the aesthetic NIMA index to form a data set Dataset4;

the NIMA index is derived from Neural Image Assessment, a Google research team, which proposes a depth CNN that can predict the distribution of human opinion in image evaluation from direct look and feel (technical perspective) and attraction (aesthetic perspective). In google, rather than simply scoring the image high or low, or regressing for average score, the NIMA model does a scoring distribution for any image-in the range of 1 to 10, NIMA assigns the likelihood of scoring the graph to the 10 scores, with quality ranked as score goes from high to low.

Step 1.2.1, calculating NIMA for all images in Dataset3 by using the method in the paper Neural Image Assessment to obtain NIMA index result NIMA (j) of picture j, wherein the value of j is the total number jtotal of pictures from 1 to Dataset3.

Step 1.2.2, defining a threshold value T2 (0-1), deleting all pictures of NIMA (j) <=T2 in the Dataset3, and in the rest pictures, sorting the materials of each type according to aesthetic NIMA indexes from high to low to form a Dataset4, wherein the value NIMA of the aesthetic index is added to each picture in the Dataset4, and the selected times Pic4_selected;

in this embodiment, T2 is set to 3, so that pictures with unclear, poor saturation, poor aesthetic feeling, etc. contents are basically deleted.

The pictures in the Dataset4 are denoted Pic4 (K, clas), K is the number of the picture in the class clas, K is the range of 1 to K, K represents the total number of pictures in each class clas, and the Dataset4 has the NIMA (K, clas) value of each picture for later retrieval, clas is the number of the class, e.g. 10 classes of material in the Dataset, if the stone is the 1 st class of material, then its clas is 1. The number of times the kth picture in the class of clas is selected is denoted by pic4_selected (k, clas).

Thus, a pre-processed, filtered data set is obtained, in which all the subject material areas occupy a large image ratio and the image aesthetic index is high, the images in the data set are tabulated individually by material class, sorted according to the image aesthetic index high and low, and the values of the aesthetic index are retained, and for each sub-graph Pic4 (k, clas) a selected number of times variable Pic4 selected (k, clas) is set, pic4 selected (k, clas) variable initially being all 0.

Material conversion was performed using the traditional neurostimulation migration method of Gatys (Gatys, L.A.; ecker, A.S.; bethage, M.image style transfer using convolutional neural networks. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern recording. IEEE,2016, pp. 2414-2423.), which uses pre-trainingThe VGG19 network is trained to extract content and style features. The loss function according to the definition of classical traditional neural style migration method is a new content loss function and a classical style loss function, in particular by minimizing feature distance divided by image aesthetic index (content loss indexNIMA (k, clas)) and Gram matrix thereof (style loss index>) To optimize the migrated image.

In the process of searching the converted material image B in the data set, content loss indexes except for the classical traditional nerve style migration methodAnd style loss index->But also calculates the aesthetic index NIMA of the grid material picture in the Dataset4, and the larger NIMA is, the better NIMA is, and the smaller NIMA is, the better NIMA is, so that the NIMA is the smaller NIMA, and the loss function is the better NIMA, so that the new NIMA is obtained, and the new NIMA is obtained:

since the class of object is known at the time of task, clas is a known number, k of the search object is the figure in the class number of clas, k is also known, and NIMA (k, clas) is known. The calculation of classical neural style migration methods is not affected as a whole. In this embodiment, α and β are both set to 0.5.

The original image migration comprises the following specific steps:

step 2.1, in the Dataset Dataset4, K pictures are shared in the style material class to be migrated, and the first N1 style pictures are adopted to compare the original picture I _org The overall nerve style is migrated to obtain N1 migrated output pictures I respectively _gen (n 1); wherein, the value range of N1 is 1-N1;

otherwise, sorting from high to low according to the Pic4_selected value of the pictures, selecting the pictures with Pic4_selected values not being 0, and taking the first N2 pictures;

step 2.3, removing the repeated pictures from the previous N2 pictures and the previous N1 pictures in the step 2.1 to obtain N3 pictures;

step 2.4, adopting the N3 pictures to compare original image I _org Overall nerve style migration; respectively obtaining N3 migrated pictures I _gen (n 3); wherein, the value range of N3 is 1-N3;

step 2.5, obtaining N1+N3 post-migration I _gen A picture collection;

step 3, using a trained harmonic dense connection network (HarDNet_seg) to carry out fine segmentation on the material of the original picture, and segmenting the original picture I _org Defining a piece of image I and a part of region target of a material to be converted _org Matrix I with completely identical size and value of 0 _mask And I is combined with _mask The part corresponding to the region target position of the material to be converted is set to be 1.

Step 4, migrating the image I _gen And matrix I _mask Multiplying to obtain a migrated image I _gen Content corresponding to the region target region, namely a material region of the migrated image;

will original image I _org And matrix I _mask Multiplying to obtain an original image I _org Background area of (2);

material area of the migrated image and original image I _org Is synthesized to the final output (I _out ) In (b), it is defined as follows:

I _out ＝I _gen I _mask +I _org (1-I _mask )

here, I _gen Is a synthesized image, I _mask E {0,1} is through HarDNet gets the region mask, I _org Is a content image with the original target. Therefore, due to the migrated image I _gen There are N1+ N3, so a corresponding number of a group of pictures is also obtained;

step 5, evaluating and optimizing

In this step, the first thing the present invention addresses is that the generated image C is beautiful; secondly, the generated image C is beautiful, and relevant experience is fed back to the image in the raw material field, so that the experience knowledge is learned in the next task.

The specific operation comprises the following steps:

step 5.1, for this group I _out The method in the paper Neural Image Assessment is used for calculating an image to obtain NIMA indexes, and a picture with the maximum NIMA value is selected from high to low to be used as the final output of the whole algorithm;

and 5.2, increasing the Pic4_selected (k, clas) variable of the original picture of the selected picture in the Dataset4 by 1, increasing the probability of the picture of the style being selected again in the step 2 when the next task is performed, and realizing the learning of experience knowledge.

Therefore, by implementing the invention, firstly, on the basis of a classical texture nerve style migration algorithm, the most beautiful picture can be generated on the basis of implementing style migration on the generated image, the effect of virtual reality can be realized more beautiful, and the comfort level of man-machine interaction is increased; and secondly, related experiences are reserved and learned in the next task, so that the computing resources can be reduced, and a better engineering effect is obtained.

Assume original I _org The bowl is made of wood, and the bowl is made of stone, wherein the bowl made of wood is a material part with a style to be replaced. The task of material style migration is to convert a bowl made of wood material into a metal made of a target style material, and the rest of the bowl is kept unchanged, namely, the spoon made of stone material is unchanged. Step 2, selecting a plurality of metal style material pictures with high NIMA index and more selected times from metal types; to first handle original picture I _org All contents are subjected to integral style migration, namely bowls made of wood and stonesAll the spoons of (2) are converted into metal to obtain a plurality of pictures I _gen . Step 3 is to make the original picture I _org The location region target of the bowl of wood material is divided to form a mask matrix I _mask . Step 4 is to pass through mask matrix I _mask Handle I _gen Part of the middle metal bowl is replaced with original figure I _org In a plurality of output I are obtained from the wood bowl part _out . Step 5 is to output I from a plurality of sheets _out The most beautiful piece I is selected _out And the selected times of the corresponding style material pictures are added with 1 to increase the probability of selecting the picture next time.

Claims

1. The material nerve style migration method for improving the aesthetic quality of the content is characterized by comprising the following steps of:

further screening out style material pictures with aesthetic NIMA index scores greater than a threshold T2 from a data set Dataset3 to form a data set Dataset4;

step 2, selecting the style material picture with higher aesthetic NIMA index score and the style material picture with more selected times Pic4_selected from the Dataset Dataset4, and obtaining the original image I _org Integral nerve style migration, and multiple migrated pictures I are generated _gen The method comprises the following specific steps:

step 2.1, sorting each type of pictures in the Dataset Dataset4 from high to low according to the aesthetic NIMA index score, wherein K pictures are shared in the style material category to be migrated, and the top N1 style pictures are adopted for comparing the original picture I _org The overall nerve style is migrated to obtain N1 migrated output pictures I respectively _gen (n 1); wherein, the value range of N1 is 1-N1;

step 2.3, removing the repeated pictures from the first N2 style material pictures and the first N1 style material pictures in the step 2.1 to obtain N3 style material pictures;

step 2.5, finally obtaining N1+N3 post-migration I _gen A picture collection;

step 3, for original image I _org The material of (2) is finely divided to obtain an original image I _org Defining a piece of image I and a part of region target of a material to be converted _org Matrix I with completely identical size and value of 0 _mask And I is combined with _mask The part corresponding to the region target position of the material to be converted is set to be 1;

I _out ＝I _gen I _mask +I _org (1-I _mask )

step 5, calculating all the synthesized images I _out Obtaining a synthetic image I with the maximum NIMA value _out And adding 1 to the Pic4_selected label value of the corresponding style texture picture.

2. The material nerve style migration method for improving aesthetic quality of content according to claim 1, wherein in step 1, style material pictures with a material part occupying a picture total area ratio greater than a threshold T1 are screened out to form a Dataset3; the method specifically comprises the following steps:

step 1.1, acquiring a primary material region label dataset SegDateset2;

learning a pixel-level semantic affinity PSA from a class activation map CAM of a multi-label CNN network using a weakly supervised scheme, obtaining a primary material region label dataset SegDateset2 from an image-level label dataset SegDateset1;

step 1.2, performing harmonic dense connection network HarDNet with a training name HarDNetl by adopting a primary material region label dataset SegDateset2, and performing fine tuning on the network HarDNetl by using another group of pixel-level annotation datasets SegDateset3 to obtain a trained harmonic dense connection network HarDNet_seg;

the data set SegDateset3 is randomly extracted from all pixel-level label picture samples in FMD and EFMD;

step 1.1.3, segmenting pixel-level semantics of all images in a dataset data by adopting a trained harmonic dense connection network of HarDNetl; and the label is Dataset2;

step 1.1.4, counting the total number of pixels of various semantic contents in the data set Dataset2 image, calculating the proportion of the pixels to the whole image, sequencing from high to low, and only outputting the area ratio Ph2 (i) of the semantic classification with the highest sequencing for each image in the data set Dataset2, wherein i is the number, and the total number itotal of the images from 1 to Dataset2;

3. The material nerve style migration method for improving aesthetic quality of content according to claim 1, wherein in step 2, a classical nerve style migration method is adopted for material migration, and the modification loss function is as follows:

wherein, alpha and beta are weight coefficients,indicating content loss index->Table style loss indicator, NIMA (k, clas) represents the NIMA value of the kth style texture picture in the class clas.

4. The material nerve style migration method for improving aesthetic quality of content according to claim 2, wherein the method is characterized in that the harmonic dense connection network HarDNet_seg pair artwork I trained in the step 1 is adopted in the step 3 _ong Is subjected to fine segmentation.