CN113191365B

CN113191365B - Cultural semantic image reconstruction effect evaluation method

Info

Publication number: CN113191365B
Application number: CN202110515388.9A
Authority: CN
Inventors: 赵海英; 解光鹏
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-05-12
Filing date: 2021-05-12
Publication date: 2023-04-21
Anticipated expiration: 2041-05-12
Also published as: CN113191365A

Abstract

The invention relates to a cultural semantic image reconstruction effect evaluation method, which comprises the following steps: collecting a traditional pattern image, marking various semantic objects in the pattern image by using an image semantic marking tool, generating a semantic mask image according to the marking, forming a data set, and dividing the data set into a training set and a verification set; using a training set training generator, stopping when the loss function converges, and inputting the semantic mask image in the verification set into the generator to obtain a semantic image; comparing the image output by the generator with the real image in the verification set, and measuring the accuracy, the average merging ratio, the edge consistency, the PSNR, the SSIM and the like to obtain objective indexes; and integrating the objective indexes, and obtaining a single index capable of comprehensively evaluating the reconstruction effect of the cultural semantic image by setting index weights. The invention can realize the evaluation of the reconstruction of the cultural semantic image and can obtain an effective evaluation result.

Description

Cultural semantic image reconstruction effect evaluation method

Technical Field

The invention relates to the technical field of computer image processing, in particular to a method for quantitatively evaluating a culture semantic image reconstruction effect.

Background

China is a multi-ethnic civilization ancient country with a long history, and in the development of the history, people use various symbols and patterns to represent various objects and symbolism. Patterns representing the same content also have changes with the meaning of the specific history at that time at different times. Finally, the Chinese traditional pattern with rich and colorful content and evolution rule in period is formed. The construction of a traditional pattern material library, and the combination of traditional pattern patterns and modern fashion is an important way for protecting and speaking traditional culture.

In the process of constructing a traditional pattern material library, on one hand, the traditional pattern material library is generally transmitted from mouth to mouth, and a master transmits the technology to a hiking as a secret means, so that a common person cannot simply judge whether the reconstructed image belongs to the same class as the original pattern, and whether distortion conditions or non-conforming conditions exist; on the other hand, in the course of reconstruction, the quality of the image produced by reconstruction varies due to the reconstruction method used. Therefore, when the texture material library is constructed, screening is required to be carried out on the image in storage, and a large amount of manpower and material resources are consumed.

Disclosure of Invention

The invention aims to solve the problems encountered in the conventional pattern semantic reconstruction process, and provides a cultural semantic image reconstruction effect evaluation method which is used for completing the quality evaluation of reconstructed images and helping a user to rapidly screen high-quality reconstructed images from the aspects of structure, signal-to-noise ratio, edge consistency and the like.

The cultural semantic image reconstruction effect evaluation method provided by the invention is characterized by comprising the following steps of: the method comprises the following steps:

step 1, collecting a plurality of traditional pattern images, manually marking example objects in the traditional pattern images and obtaining corresponding artificial semantic mask images, thereby obtaining a data set consisting of the traditional pattern images and corresponding artificial semantic mask images; randomly dividing the data set into a training set and a verification set;

step 2, training the image generator by taking the artificial semantic mask image and the corresponding traditional pattern image in the training set as inputs until the energy function converges, and obtaining the trained image generator after the training is finished;

step 3, verifying the trained image generator by using the image in the verification set:

3.1, calculating the accuracy of the generated image

Training the semantic segmentation model by using the training set; the artificial semantic mask image in the verification set is used for generating a composite image G through an image generator, and the composite image G is subjected to semantic segmentation by using a trained semantic segmentation model to obtain a semantic segmentation mask image I _GS The method comprises the steps of carrying out a first treatment on the surface of the Then calculate the semantic mask semantic segmentation mask image I _GS And corresponding artificial semantic mask image I _S Pixel consistency between, i.e. the accuracy of the generated image:

wherein p is _ii Refers to the number of pixels with the true class I and predicted as I, k is the number of semantic classes, I _m*n Representing the dimensionsAn image of m x n;

3.2, calculating the average intersection ratio MIoU of the mask images

According to the semantic segmentation mask image and the corresponding artificial semantic mask image, calculating the average intersection ratio MIoU

Wherein p is _ii Refers to the number of pixels with the true class I and predicted as I, k is the number of semantic classes, I _m*n An image of size m x n;

3.3, calculating edge accuracy EA of the composite image

Acquiring edge contours of the composite image G and the traditional pattern image, and calculating edge accuracy EA of the composite image G according to the following formula

Wherein p is _ii Representing the same edge p as the conventional pattern image and the composite image G ₀₀ Or other positions p ₁₁ Is a pixel of (1); i=0, 1, j=0, 1;

3.4 calculating the peak signal to noise ratio PSNR of the composite image

Wherein MAX _I Representing the maximum value of the image color of the composite image, wherein MSE represents the mean square error of the pixel values of the composite image and the traditional pattern image; the 8-bit sample point is denoted as 255,

3.5, calculating the structural similarity SSIM of the generated image

SSIM(X,Y)＝l(X,Y)*c(X,Y)*s(X,Y)

Wherein SSIM (X, Y) represents the structural similarity of image X and image Y, l (X, Y) represents the brightness comparison of image X and image Y, c (X, Y) represents the contrast comparison of image X and image Y, s (X, Y) represents the structural comparison of image X and image Y, H represents the image height, W represents the image width, X (i, j) represents the pixel value at the X coordinate (i, j) of the image, Y (i, j) represents the pixel value at the Y coordinate (i, j) of the image, μ _x Sum mu _y Representing the pixel mean of image X and image Y respectively,

and->

Representing pixel variance, σ, of image X and image Y, respectively _XY Pixel covariance representing image X and image Y; c (C) ₁ 、C ₂ 、C ₃ Is a preset constant;

3.6, carrying out normalization treatment on the five evaluation indexes, and then carrying out weighted average to obtain a final evaluation index

Wherein G represents a composite image set, m represents the number of images in the set G, I represents the images, and alpha, beta, gamma, delta and epsilon are weights of different evaluation indexes.

The invention can realize the evaluation of the reconstruction of the cultural semantic image and can obtain an effective evaluation result.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for evaluating a reconstruction effect of a cultural semantic image according to an embodiment of the invention.

Detailed Description

As shown in the figure, the method for evaluating the reconstruction effect of the cultural semantic image comprises the following steps:

the cultural semantic image reconstruction effect evaluation method is characterized by comprising the following steps of: the method comprises the following steps:

s110, collecting a plurality of traditional pattern images, manually marking example objects in the traditional pattern images and obtaining corresponding artificial semantic mask images, thereby obtaining a data set consisting of the traditional pattern images and corresponding artificial semantic mask images; the data set is randomly divided into a training set and a verification set according to the proportion of 7:3. In this embodiment, an image semantic annotation tool (for example, labelme) is used to manually annotate a traditional pattern image, so as to obtain a semantic mask image.

S120, training the image generator by taking the artificial semantic mask image and the corresponding traditional pattern image in the training set as inputs until the energy function converges, and obtaining the trained image generator after the training is finished. In this step, a specific deep learning generation method should be used, and the generator is continuously optimized according to the training set until the loss function is converged and then stopped. The artificial semantic mask image in the verification set is then input into an image generator to obtain a composite image (semantic generated image).

S130, verifying the trained image generator by using the image in the verification set:

3.1, calculating the accuracy of the generated image

/>

wherein p is _ii Refers to the number of pixels with the true class I and predicted as I, k is the number of semantic classes, I _m x _n Representing an image of size m x n.

3.2, calculating the average intersection ratio MIoU of the mask images

The homography ratio refers to the ratio of the intersection and union of the two sets of real values and predicted values, and can be calculated by using the sum (union) of TP (intersection) ratios TP, FP, FN. MIoU is typically calculated on a class basis, and IoU for each class is calculated and accumulated and averaged. The larger the value, the more pixels that are predicted to be correct, while the fewer pixels that are predicted to be of other classes and other classes are predicted to be of the class.

Wherein p is _ii Refers to the number of pixels with the true class I and predicted as I, k is the number of semantic classes, I _m*n Representing an image of size m x n.

3.3, calculating edge accuracy EA of the composite image

Similar to the accuracy, the edge accuracy describes the pixel consistency of the generated image with the real image.

Wherein p is _ii Representing the same edge p as the conventional pattern image and the composite image G ₀₀ Or other positions p ₁₁ Is a pixel of (1); i=0, 1, j=0, 1. In the step, a Canny algorithm is used for extracting the edge contour, and the convolution kernel size used by the Canny algorithm is 3.

3.4 calculating the peak signal to noise ratio PSNR of the composite image

The peak signal-to-noise ratio, which represents the ratio of the maximum possible power of a signal to the destructive noise power affecting its accuracy of representation, is often expressed in logarithmic decibels, and is an objective measure of image distortion or noise level. Obtained by calculation by the following formula:

wherein MAX _I Representing the maximum value of the image color of the composite image, the MSE represents the mean square error of the pixel values of the composite image and the conventional pattern image.

3.5, calculating the structural similarity SSIM of the generated image

The index for measuring the structural similarity between images is measured by comparing three parts of brightness (brightness), contrast (contrast) and structure (structure), and can give a numerical value between 0 and 1, and the larger the index is, the smaller the difference between the output image and the undistorted image is, namely, the better the image quality is.

/>

SSIM(X,Y)＝l(X,Y)*c(X,Y)*s(X,Y)

and->

Representing pixel variance, σ, of image X and image Y, respectively _XY Pixel covariance representing image X and image Y; c (C) ₁ 、C ₂ 、C ₃ To avoid the case of a score of 0, C is usually taken as a preset constant ₁ ＝(K ₁ *L) ² ,C ₂ ＝(K ₂ *L) ² ,C ₃ ＝C ₂ /2. In this step, K ₁ ＝0.01，K ₂ ＝0.03，L＝255。

S140, carrying out normalization processing on the five evaluation indexes, and then carrying out weighted average to obtain a final evaluation index

Wherein G represents a composite image set, m represents the number of images in the set G, I represents the images, and alpha, beta, gamma, delta and epsilon are weights of different evaluation indexes. In this embodiment, α=0.2, β=0.2, γ=0.2, δ=0.2, and ε=0.2.

Wherein, the accuracy, edge accuracy and structure consistency are all in the range of [0,1]]In between, a higher value represents a higher quality of the generated image. The value range of the average cross ratio and the peak signal-to-noise ratio is 0, ++ infinity]Between using functions

It is normalized.

In addition to the embodiments described above, other embodiments of the invention are possible. All technical schemes formed by equivalent substitution or equivalent transformation fall within the protection scope of the invention.

Claims

1. A cultural semantic image reconstruction effect evaluation method is characterized in that: the method comprises the following steps:

3.1, calculating the accuracy of the generated image

Training the semantic segmentation model by using the training set; the artificial semantic mask image in the verification set generates a synthetic image through an image generator, and semantic segmentation is carried out on the synthetic image by using a trained semantic segmentation model to obtain a semantic segmentation mask image I _GS The method comprises the steps of carrying out a first treatment on the surface of the Then calculate the semantic mask semantic segmentation mask image I _GS And corresponding artificial semantic mask image I _S Pixel consistency between, i.e. the accuracy of the generated image:

wherein p is _ii Refers to the number of pixels with the true class I and predicted as I, k is the number of semantic classes, I _m*n An image of size m x n; 3.2, calculating the average intersection ratio MIoU of the mask images

3.3, calculating edge accuracy EA of the composite image

Acquiring edge contours of the composite image and the traditional pattern image, and calculating edge accuracy EA of the composite image according to the following formula

Wherein p is _ii Referring to the number of pixels of the true class i and predicted as i, i=0, 1, j=0, 1;

3.4 calculating the peak signal to noise ratio PSNR of the composite image

3.5, calculating the structural similarity SSIM of the generated image

/>

SSIM(X,Y)＝l(X,Y)*c(X,Y)*s(X,Y)

representing pixel variance, sigma, of image X _XY Pixel covariance representing image X and image Y; c (C) ₁ 、C ₂ 、C ₃ Is a preset constant;

Wherein G represents a composite image set, m represents the number of images in the composite image set G, I represents the images, and alpha, beta, gamma, delta and epsilon are weights of different evaluation indexes.

2. The cultural semantic image reconstruction effect evaluation method according to claim 1, wherein: in the step 1, the traditional pattern image is manually marked by using an image semantic marking tool, and a semantic mask image is obtained.

3. The cultural semantic image reconstruction effect evaluation method according to claim 1, wherein: in the step 1, the data set is randomly divided into a training set and a verification set according to the proportion of 7:3.

4. The cultural semantic image reconstruction effect evaluation method according to claim 1, wherein: in the step 2, a specific deep learning generation method should be used, and the generator is continuously optimized according to the training set until the loss function is converged and then stopped.

5. The cultural semantic image reconstruction effect evaluation method according to claim 1, wherein: in the step 3.3, the Canny algorithm is used for extracting the edge contour, and the convolution kernel size used by the Canny algorithm is 3.

6. The cultural semantic image reconstruction effect evaluation method according to claim 1, wherein: in step 3.5, C ₁ ＝(K ₁ *L) ² ,C ₂ ＝(K ₂ *L) ² ,

K ₁ ＝0.01，K ₂ ＝0.03，L＝255。

7. The cultural semantic image reconstruction effect evaluation method according to claim 1, wherein: in step 3.6, α=0.2, β=0.2, γ=0.2, δ=0.2, and ε=0.2.

8. The cultural semantic image reconstruction effect evaluation method according to claim 1, wherein: the range of values of the accuracy, the edge accuracy and the structural consistency are all between 0 and 1, and the higher the value is, the higher the quality of the generated image is.

9. The cultural semantic image reconstruction effect evaluation method according to claim 1, wherein: the value ranges of the average cross ratio and the peak signal-to-noise ratio are between [0, + -infinity ], and a function is used

It is normalized. />