CN117788906A

CN117788906A - A large model generated image identification method and system

Info

Publication number: CN117788906A
Application number: CN202311804911.5A
Authority: CN
Inventors: 郑威; 云剑; 郑晓玲; 凌霞
Original assignee: China Academy of Information and Communications Technology CAICT
Current assignee: China Academy of Information and Communications Technology CAICT
Priority date: 2023-12-26
Filing date: 2023-12-26
Publication date: 2024-03-29
Anticipated expiration: 2043-12-26
Also published as: CN117788906B

Abstract

The invention provides a large model generation image identification method and a large model generation image identification system. The method comprises the following steps: inputting the generated image into a first processing module based on residual filtering to obtain original characteristics; inputting the original features into a second processing module based on a self-attention mechanism and a residual error structure to obtain classification features; and inputting the classification characteristic into a classification network, and outputting a result of only true or false. The scheme provided by the invention solves the problems that the prior art cannot utilize shallow texture information of an input picture, and the loss function is simple and cannot dynamically change along with input data.

Description

A large model generated image identification method and system

技术领域Technical field

本发明属于图像鉴别领域，尤其涉及一种大模型生成图像鉴别方法和系统。The invention belongs to the field of image identification, and in particular relates to a large model generated image identification method and system.

背景技术Background technique

随着人工智能技术的发展，大模型逐步发展成熟在人们的生活中逐渐发挥各自的作用。其中，AI生成内容（AIGC）是一个热门的大模型方向。大量使用扩散模型（diffusion）的图片生成大模型进入人们的视野。例如StableDiffusion，Dreambooth、Midjourney等大模型，只需要用户输入提示词（prompt），让大模型自动生成相关对应的图像的大模型服务也让各种不善于绘画的人将自己脑海中的想法映射到图像上。With the development of artificial intelligence technology, large models have gradually matured and gradually played their respective roles in people's lives. Among them, AI-generated content (AIGC) is a popular large model direction. Large-scale image generation models that use a large number of diffusion models have entered people's field of vision. For example, large models such as StableDiffusion, Dreambooth, and Midjourney only require the user to enter a prompt word (prompt), and the large model service that allows the large model to automatically generate relevant images also allows various people who are not good at drawing to map their thoughts to on the image.

但是，生成图像大模型本身也是一把双面剑，会带来诸多的弊端。比如，一般商业活动中，投资人或者购买者往往希望购买由画师本人亲自设计绘制的图像而非大模型生成的图像，随意使用大模型生成图像也会带来版权纠纷。比如生成图像的大模型能根据提示词做出很细节的作画，有可能被心存歹念之人用来造谣，造成难以估量的后果。However, generating a large image model itself is also a double-sided sword, which will bring many disadvantages. For example, in general commercial activities, investors or buyers often want to purchase images designed and drawn by the artist himself rather than images generated by large models. Random use of large models to generate images will also lead to copyright disputes. For example, a large model that generates images can produce very detailed paintings based on prompt words, which may be used by people with evil intentions to spread rumors, causing incalculable consequences.

这时，社会各界都迫切的产生对能够区分大模型生成图像和真实现实图像的技术需求。At this time, all sectors of society are urgently in need of technology that can distinguish between images generated by large models and real-life images.

虽然前人已经对如何鉴别真实图片和伪造图片有了一定的研究，但是在鉴别电脑生成图片和真实图片这一领域，现有的研究往往更加关注于对抗生成网络（GAN）或者变分自编码器（VAE）等传统神经网络生成的图像，从空间域和频率域等多种方面提出了如何鉴别GAN、VAE等传统神经网络生成图片和现实图片的方法。Although previous studies have done some research on how to identify real images and fake images, in the field of identifying computer-generated images and real images, existing research tends to focus more on generative adversarial networks (GAN) or variational autoencoding. The method of distinguishing between images generated by traditional neural networks such as GAN and VAE and real-life images is proposed from various aspects such as spatial domain and frequency domain.

但是，使用扩散模型生成图像的图像生成大模型和以往的图像生成网络原理差距很大，很难把现有的技术直接应用到大模型生成图片的鉴伪任务中。有些学者使用现有的图像鉴别技术鉴别大模型图像和真实图像，结果模型表现非常不尽人意，无法满足人们当前的需求与期待。而随着扩散图像生成大模型的高速发展，现有的鉴别技术也会越来越难以将大模型生成的图像和真实图像区别开来。However, there is a big gap between the principles of large image generation models that use diffusion models to generate images and previous image generation networks. It is difficult to directly apply existing technology to the task of forgery identification of images generated by large models. Some scholars use existing image identification technology to identify large model images and real images. As a result, the model performance is very unsatisfactory and cannot meet people's current needs and expectations. With the rapid development of large models for diffusion image generation, it will become increasingly difficult for existing identification technologies to distinguish images generated by large models from real images.

现有技术current technology

DIRE技术出自论文DIRE for Diffusion-Generated Image Detection，是DIffusion Reconstruction Error的缩写。DIRE通过预训练的扩散模型测量输入图像与其重建之间的误差。该论文的作者发现，与真实图像相比，扩散模型生成的图像更容易被预训练的扩散模型重建，真实图片会因为现实的各种复杂因素难以重建。通过DDIM对输入图像进行重构，然后对重构的图像和原图求差值计算差异，最后把差异作为特征进行二分类，判断图像是否是大模型伪造的图片。DIRE technology comes from the paper DIRE for Diffusion-Generated Image Detection, which is the abbreviation of DIffusion Reconstruction Error. DIRE measures the error between the input image and its reconstruction through a pre-trained diffusion model. The authors of the paper found that images generated by the diffusion model are easier to reconstruct by the pre-trained diffusion model than real images, which are difficult to reconstruct due to various complex factors in reality. The input image is reconstructed through DDIM, and then the difference between the reconstructed image and the original image is calculated. Finally, the difference is used as a feature for binary classification to determine whether the image is a picture forged by a large model.

现有技术的缺陷Shortcomings of existing technology

现有方法DIRE的第一个缺点是，使用原图和重构图像做差，会损失原有图像的浅层纹理特征，无法从原有图像中提取到足够信息。The first disadvantage of the existing method DIRE is that using the original image and the reconstructed image to make a difference will lose the shallow texture features of the original image and fail to extract enough information from the original image.

现有方法DIRE的第二个缺点是，没有关注大模型生成图像内部各个像素点之间的关系特征。The second shortcoming of the existing method DIRE is that it does not pay attention to the relationship between pixels in the image generated by the large model.

现有方法DIRE的第三个缺点是，损失函数过于简单，无法根据输入数据的不同动态调整学习步幅。The third shortcoming of the existing method DIRE is that the loss function is too simple and cannot dynamically adjust the learning step according to the different input data.

发明内容Summary of the invention

为解决上述技术问题，本发明提出一种大模型生成图像鉴别方法的技术方案，以解决上述技术问题。In order to solve the above technical problems, the present invention proposes a technical solution of a large model generated image identification method to solve the above technical problems.

本发明第一方面公开了一种大模型生成图像鉴别方法，所述方法包括：A first aspect of the present invention discloses a large model generated image identification method. The method includes:

步骤S1、将生成图像输入基于残差滤波的第一处理模块，得到原始特征；Step S1: Input the generated image into the first processing module based on residual filtering to obtain original features;

步骤S2、将所述原始特征输入基于自注意力机制和残差结构的第二处理模块，得到分类特征；Step S2: Input the original features into the second processing module based on the self-attention mechanism and residual structure to obtain classification features;

步骤S3、将所述分类特征输入二分类网络，输出只有“真”或“假”的结果。Step S3: input the classification features into a binary classification network, and output only "true" or "false" results.

根据本发明第一方面的方法，在所述步骤S1中，所述将生成图像输入基于残差滤波的第一处理模块，得到原始特征的方法包括：According to the method of the first aspect of the present invention, in step S1, the generated image is input into the first processing module based on residual filtering, and the method of obtaining original features includes:

将生成图像分别输入残差滤波器和卷积核，然后将残差滤波器和卷积核的处理结果进行合并，最后将合并后的结果输入第一卷积池化层，得到原始特征。The generated images are input into the residual filter and convolution kernel respectively, then the processing results of the residual filter and convolution kernel are merged, and finally the merged results are input into the first convolution pooling layer to obtain the original features.

根据本发明第一方面的方法，在所述步骤S1中，所述残差滤波器有十七个；卷积核有八个；十七个残差滤波器的数值是固定的，不会在学习中更改；而八个卷积核的参数是在训练中学习得到的。According to the method of the first aspect of the present invention, in step S1, there are seventeen residual filters; there are eight convolution kernels; the values of the seventeen residual filters are fixed and will not change. Changed during learning; and the parameters of the eight convolution kernels are learned during training.

根据本发明第一方面的方法，在所述步骤S1中，所述第一卷积池化层为带有卷积层和池化层的应用残差机制的卷积池化层。According to the method of the first aspect of the present invention, in step S1, the first convolution pooling layer is a convolution pooling layer with a convolution layer and a pooling layer applying a residual mechanism.

根据本发明第一方面的方法，在所述步骤S2中，所述将所述原始特征输入基于自注意力机制和残差结构的第二处理模块，得到分类特征的方法包括：According to the method of the first aspect of the present invention, in the step S2, the method of inputting the original features into the second processing module based on the self-attention mechanism and the residual structure to obtain the classification features includes:

将所述原始特征输入第二卷积池化层，得到处理特征，将所述处理特征输入自注意力运算，将自注意力运算的结果与所述处理特征进行数值加法，得到分类特征。The original features are input into the second convolutional pooling layer to obtain the processing features, the processing features are input into the self-attention operation, and the results of the self-attention operation and the processing features are numerically added to obtain the classification features.

根据本发明第一方面的方法，在所述步骤S2中，所述注意力运算的V则是所述处理特征；通过Q与K的计算，使用softmax层，获取给所述V的分配的权重，然后加权平均值得到自注意力机制捕捉的浅层纹理特征。According to the method of the first aspect of the present invention, in the step S2, the V of the attention operation is the processing feature; through the calculation of Q and K, the softmax layer is used to obtain the weight assigned to the V , and then the weighted average is obtained to obtain the shallow texture features captured by the self-attention mechanism.

根据本发明第一方面的方法，在所述步骤S3中，在所述二分类网络使用高维球型边界目标函数优化分类。According to the method of the first aspect of the present invention, in step S3, a high-dimensional spherical boundary objective function is used in the two-classification network to optimize classification.

本发明第二方面公开了一种大模型生成图像鉴别系统，所述系统包括：The second aspect of the present invention discloses a large model generated image identification system, the system comprising:

第一处理模块，被配置为，将生成图像输入基于残差滤波的第一处理模块，得到原始特征；The first processing module is configured to input the generated image into the first processing module based on residual filtering to obtain original features;

第二处理模块，被配置为，将所述原始特征输入基于自注意力机制和残差结构的第二处理模块，得到分类特征；The second processing module is configured to input the original features into the second processing module based on the self-attention mechanism and the residual structure to obtain the classification features;

第三处理模块，被配置为，将所述分类特征输入二分类网络，输出只有“真”或“假”的结果。The third processing module is configured to input the classification features into a two-classification network and output only a "true" or "false" result.

本发明第三方面公开了一种电子设备。电子设备包括存储器和处理器，存储器存储有计算机程序，处理器执行计算机程序时，实现本公开第一方面中任一项的一种大模型生成图像鉴别方法中的步骤。A third aspect of the invention discloses an electronic device. The electronic device includes a memory and a processor. The memory stores a computer program. When the processor executes the computer program, it implements the steps in the large model generation image identification method of any one of the first aspects of the present disclosure.

本发明第四方面公开了一种计算机可读存储介质。计算机可读存储介质上存储有计算机程序，计算机程序被处理器执行时，实现本公开第一方面中任一项的一种大模型生成图像鉴别方法中的步骤。A fourth aspect of the present invention discloses a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by the processor, the steps in the large model generation image identification method of any one of the first aspects of the present disclosure are implemented.

综上，本发明提出的方案，包括自注意力机制和残差结构的网络结构，结合自注意力机制增强网络对于肉眼不可见的浅层纹理特征的提炼分析能力，通过残差补充损失的特征信息进一步丰富鉴别的依据。在基于高维球型边界目标函数的分类中增加组内相似度训练步幅的训练方式，可以帮助模型更关注组内相似度阈值。本申请解决了现有技术无法利用输入图片的浅层纹理信息，损失函数简单无法随着输入数据动态变化的问题。In summary, the solution proposed by the present invention includes a network structure of a self-attention mechanism and a residual structure. Combined with the self-attention mechanism, it enhances the network's ability to extract and analyze shallow texture features invisible to the naked eye, and supplements the lost features through the residual. The information further enriches the basis for identification. In the classification based on the high-dimensional spherical boundary objective function, the training method of increasing the intra-group similarity training stride can help the model pay more attention to the intra-group similarity threshold. This application solves the problem that the existing technology cannot utilize the shallow texture information of the input image, and the loss function is simple and cannot dynamically change with the input data.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明具体实施方式或现有技术中的技术方案，下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施方式，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the specific implementation methods of the present invention or the technical solutions in the prior art, the drawings required for use in the specific implementation methods or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are some implementation methods of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1为根据本发明实施例的一种大模型生成图像鉴别方法的流程图；Figure 1 is a flow chart of a large model generation image identification method according to an embodiment of the present invention;

图2为根据本发明实施例的模块流程图；Figure 2 is a module flow chart according to an embodiment of the present invention;

图3为根据本发明实施例的残差滤波模块图；Figure 3 is a diagram of a residual filtering module according to an embodiment of the present invention;

图4为根据本发明实施例的十七个残差滤波器的初始化具体数值；Figure 4 shows the specific initialization values of seventeen residual filters according to the embodiment of the present invention;

图5为根据本发明实施例的自注意力模块图；Figure 5 is a self-attention module diagram according to an embodiment of the present invention;

图6为根据本发明实施例的一种大模型生成图像鉴别系统的结构图；Figure 6 is a structural diagram of a large model generated image identification system according to an embodiment of the present invention;

图7为根据本发明实施例的一种电子设备的结构图。Figure 7 is a structural diagram of an electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例只是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solution and advantages of the embodiments of the present invention clearer, the technical solution in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

本发明第一方面公开了一种大模型生成图像鉴别方法。图1为根据本发明实施例的一种大模型生成图像鉴别方法的流程图，如图1所示，所述方法包括：A first aspect of the present invention discloses a large model generated image identification method. Figure 1 is a flow chart of a large model generated image identification method according to an embodiment of the present invention. As shown in Figure 1, the method includes:

步骤S2、将所述原始特征输入基于自注意力机制和残差结构的第二处理模块，得到分类特征；Step S2: inputting the original features into a second processing module based on a self-attention mechanism and a residual structure to obtain classification features;

步骤S3、将所述分类特征输入二分类网络，输出只有“真”或“假”的结果。Step S3: Input the classification features into the binary classification network, and output only "true" or "false" results.

在步骤S1，如图2所示，将生成图像输入基于残差滤波的第一处理模块(图2中的残差滤波模块)，得到原始特征。In step S1, as shown in Figure 2, the generated image is input into the first processing module based on residual filtering (the residual filtering module in Figure 2) to obtain the original features.

在一些实施例中，在所述步骤S1中，所述将生成图像输入基于残差滤波的第一处理模块，得到原始特征的方法包括：In some embodiments, in step S1, the generated image is input into the first processing module based on residual filtering, and the method for obtaining original features includes:

如图3所示，将生成图像分别输入残差滤波器和卷积核，然后将残差滤波器和卷积核的处理结果进行合并，最后将合并后的结果输入第一卷积池化层，得到原始特征。As shown in Figure 3, the generated images are input into the residual filter and convolution kernel respectively, then the processing results of the residual filter and convolution kernel are merged, and finally the merged results are input into the first convolution pooling layer , get the original features.

所述残差滤波器有十七个；卷积核有八个；十七个残差滤波器的数值是固定的，不会在学习中更改；而八个卷积核的参数是在训练中学习得到的。There are seventeen residual filters; there are eight convolution kernels; the values of the seventeen residual filters are fixed and will not be changed during learning; and the parameters of the eight convolution kernels are during training. Learned.

所述第一卷积池化层为带有卷积层和池化层的应用残差机制的卷积池化层。The first convolutional pooling layer is a convolutional pooling layer with a residual mechanism applied to the convolutional layer and the pooling layer.

具体地，这里的卷积池化层包括一个3*3卷积层，一个正则化层，一个ReLU层，一个最大池化层。Specifically, the convolutional pooling layer here includes a 3*3 convolutional layer, a regularization layer, a ReLU layer, and a maximum pooling layer.

如图4所示，十七个残差滤波器的初始化具体数值如下所示，这些滤波器可以高效地提取图像的残差信息。As shown in Figure 4, the specific initialization values of the seventeen residual filters are as follows. These filters can efficiently extract the residual information of the image.

在步骤S2，如图2所示，将所述原始特征输入基于自注意力机制和残差结构的第二处理模块(图2中的自注意力模块)，得到分类特征。结合自注意力机制增强网络对于肉眼不可见的浅层纹理特征的提炼分析能力，通过残差补充损失的特征信息进一步丰富鉴别的依据。In step S2, as shown in Figure 2, the original features are input into the second processing module (self-attention module in Figure 2) based on the self-attention mechanism and residual structure to obtain classification features. Combined with the self-attention mechanism, the network's ability to extract and analyze shallow texture features invisible to the naked eye is enhanced, and the residual feature information is supplemented to further enrich the basis for identification.

在一些实施例中，在所述步骤S2中，所述将所述原始特征输入基于自注意力机制和残差结构的第二处理模块，得到分类特征的方法包括：In some embodiments, in step S2, the method of inputting the original features into a second processing module based on a self-attention mechanism and a residual structure to obtain classification features includes:

如图5所示，将所述原始特征输入第二卷积池化层，得到处理特征，将所述处理特征输入自注意力运算，将自注意力运算的结果与所述处理特征进行数值加法，得到分类特征。通过残差机制，补充因为运算而丢失的信息，增强鉴别效果。As shown in Figure 5, the original features are input into the second convolutional pooling layer to obtain the processing features, the processing features are input into the self-attention operation, and the results of the self-attention operation are numerically added to the processing features. , get the classification features. Through the residual mechanism, the information lost due to operation is supplemented and the identification effect is enhanced.

所述注意力运算的V则是所述处理特征；通过Q与K的计算，使用softmax层，获取给所述V的分配的权重，然后加权平均值得到自注意力机制捕捉的浅层纹理特征。The V of the attention operation is the processed feature; through the calculation of Q and K, the softmax layer is used to obtain the weight assigned to the V, and then the weighted average is used to obtain the shallow texture feature captured by the self-attention mechanism.

具体地，卷积池化层按照如下顺序包含一个3*3卷积层，一个正则化层，一个ReLU层，一个3*3卷积层，一个正则化层，一个ReLU层，一个最大池化层。Specifically, the convolutional pooling layer includes a 3*3 convolution layer, a regularization layer, a ReLU layer, a 3*3 convolution layer, a regularization layer, a ReLU layer, and a maximum pooling layer in the following order layer.

这些处理后的特征会使用自注意力机制来捕捉浅层纹理特征在像素级别的关联关系。These processed features use a self-attention mechanism to capture the correlation of shallow texture features at the pixel level.

在进行自注意力前，需要调整数据的形式，使数据形成（像素数量*信息通道数量）的形式。然后，对数据进行自注意力运算。Before performing self-attention, the data needs to be adjusted so that it is in the form of (number of pixels * number of information channels). Then, the self-attention operation is performed on the data.

在步骤S3，将所述分类特征输入二分类网络(图2中的球型损失分类模块)，输出只有“真”或“假”的结果。In step S3, the classification features are input into the binary classification network (spherical loss classification module in Figure 2), and only results of "true" or "false" are output.

在一些实施例中，在所述步骤S3中，在所述二分类网络使用高维球型边界目标函数优化分类。In some embodiments, in step S3, a high-dimensional spherical boundary objective function is used to optimize classification in the two-classification network.

具体地，图像鉴伪问题可以看作一个二分类问题，输出结果只有“真”或“假”，两个结果。为了提高分类的准确性和有效性，本发明在分类层引入了高维球型边界目标函数。高维球型边界目标函数主要围绕组内相似度步幅和组间相似度步幅展开。本发明中更注重组内相似度，为组内相似度设计了专门的权重w。Specifically, the image counterfeiting problem can be regarded as a binary classification problem, and the output result is only "true" or "false". In order to improve the accuracy and effectiveness of classification, the present invention introduces a high-dimensional spherical boundary objective function in the classification layer. The high-dimensional spherical boundary objective function mainly revolves around the similarity step within the group and the similarity step between the groups. This invention pays more attention to the similarity within the group and designs a special weight w for the similarity within the group.

高维球型边界目标函数是一种自适应改变训练过程中优化步幅的目标函数。其步幅更新规则如下：The high-dimensional spherical boundary objective function is an objective function that adaptively changes the optimization stride during the training process. Its stride update rules are as follows:

在最开始的时候，为组内相似度和组间相似度设定各自分别的阈值。Initially, separate thresholds are set for intra-group similarity and inter-group similarity.

当输入一个样本的时候，首先计算当前样本的组内相似度和组间相似度。然后计算组内相似度步幅和组间相似度步幅。组内相似度步幅为权重w和组内相似度阈值减去组内相似度计算值的差值的乘积。组间相似度步幅为组间相似度计算值减去组间相似度阈值的差值。这两个步幅如果小于0，那么就将其设置为0，始终保持这两个步幅非负。When a sample is input, the intra-group similarity and inter-group similarity of the current sample are first calculated. Then calculate the within-group similarity stride and the between-group similarity stride. The intra-group similarity step is the product of the weight w and the difference between the intra-group similarity threshold minus the intra-group similarity calculation value. The similarity step between groups is the difference between the calculated similarity value between groups minus the similarity threshold between groups. If these two strides are less than 0, then set them to 0, and always keep these two strides non-negative.

在分组计算度量指标的时候，组内相似度损失和组间相似度损失要分别乘以各自的步幅再进行统计。这样子在梯度下降求解的时候，可以对不同数据更新不同的步幅。When calculating the metrics by group, the similarity loss within the group and the similarity loss between the groups should be multiplied by their respective strides before statistics are calculated. In this way, different strides can be updated for different data when solving the gradient descent problem.

在使用高维球型边界目标函数对模型进行训练后，模型就可以对输入的图片进行伪造鉴别，鉴别其是否是大模型生成而来。After training the model using a high-dimensional spherical boundary objective function, the model can perform forgery identification on the input image to identify whether it was generated by a large model.

综上，本发明提出的方案，自注意力机制和残差结构的网络结构，结合自注意力机制增强网络对于肉眼不可见的浅层纹理特征的提炼分析能力，通过残差补充损失的特征信息进一步丰富鉴别的依据。In summary, the solution proposed by this invention, the network structure of the self-attention mechanism and the residual structure, combines the self-attention mechanism to enhance the network's ability to extract and analyze shallow texture features invisible to the naked eye, and supplement the lost feature information through the residual Further enrich the basis for identification.

在基于高维球型边界目标函数的分类中增加组内相似度训练步幅的训练方式，可以帮助模型更关注组内相似度阈值。解决了现有技术无法利用输入图片的浅层纹理信息，损失函数简单无法随着输入数据动态变化的问题。In the classification based on the high-dimensional spherical boundary objective function, the training method of increasing the intra-group similarity training stride can help the model pay more attention to the intra-group similarity threshold. It solves the problem that the existing technology cannot utilize the shallow texture information of the input image, and the loss function is simple and cannot dynamically change with the input data.

本发明第二方面公开了一种大模型生成图像鉴别系统。图6为根据本发明实施例的一种大模型生成图像鉴别系统的结构图；如图6所示，所述系统100包括：A second aspect of the present invention discloses a large model generation image identification system. Figure 6 is a structural diagram of a large model generated image identification system according to an embodiment of the present invention; as shown in Figure 6, the system 100 includes:

第一处理模块101，被配置为，将生成图像输入基于残差滤波的第一处理模块，得到原始特征；The first processing module 101 is configured to input the generated image into the first processing module based on residual filtering to obtain the original features;

第二处理模块102，被配置为，将所述原始特征输入基于自注意力机制和残差结构的第二处理模块，得到分类特征；The second processing module 102 is configured to input the original features into a second processing module based on a self-attention mechanism and a residual structure to obtain classification features;

第三处理模块103，被配置为，将所述分类特征输入二分类网络，输出只有“真”或“假”的结果。The third processing module 103 is configured to input the classification features into a two-classification network and output only “true” or “false” results.

根据本发明第二方面的系统，所述第一处理模块101具体被配置为，所述将生成图像输入基于残差滤波的第一处理模块，得到原始特征的方法包括：According to the system of the second aspect of the present invention, the first processing module 101 is specifically configured to input the generated image to the first processing module based on residual filtering, and the method for obtaining the original features includes:

如图3所示，将生成图像分别输入残差滤波器和卷积核，然后将残差滤波器和卷积核的处理结果进行合并，最后将合并后的结果输入第一卷积池化层，得到原始特征。As shown in Figure 3, the generated image is input into the residual filter and the convolution kernel respectively, and then the processing results of the residual filter and the convolution kernel are merged, and finally the merged result is input into the first convolution pooling layer to obtain the original features.

所述第一卷积池化层为带有卷积层和池化层的应用残差机制的卷积池化层。The first convolutional pooling layer is a convolutional pooling layer with a convolutional layer and a pooling layer applying a residual mechanism.

具体地，这里的卷积池化层包括一个3*3卷积层，一个正则化层，一个ReLU层，一个最大池化层。Specifically, the convolution pooling layer here includes a 3*3 convolution layer, a regularization layer, a ReLU layer, and a maximum pooling layer.

根据本发明第二方面的系统，所述第二处理模块102具体被配置为，所述将所述原始特征输入基于自注意力机制和残差结构的第二处理模块，得到分类特征的方法包括：According to the system of the second aspect of the present invention, the second processing module 102 is specifically configured to input the original features to the second processing module based on the self-attention mechanism and the residual structure, and the method of obtaining the classification features includes: :

如图5所示，将所述原始特征输入第二卷积池化层，得到处理特征，将所述处理特征输入自注意力运算，将自注意力运算的结果与所述处理特征进行数值加法，得到分类特征。通过残差机制，补充因为运算而丢失的信息，增强鉴别效果。As shown in Figure 5, the original features are input into the second convolutional pooling layer to obtain processed features, the processed features are input into the self-attention operation, and the result of the self-attention operation is numerically added to the processed features to obtain classification features. Through the residual mechanism, the information lost due to the operation is supplemented to enhance the identification effect.

所述注意力运算的V则是所述处理特征；通过Q与K的计算，使用softmax层，获取给所述V的分配的权重，然后加权平均值得到自注意力机制捕捉的浅层纹理特征。The V of the attention operation is the processing feature; through the calculation of Q and K, the softmax layer is used to obtain the weight assigned to the V, and then the weighted average is obtained to obtain the shallow texture features captured by the self-attention mechanism .

在进行自注意力前，需要调整数据的形式，使数据形成（像素数量*信息通道数量）的形式。然后，对数据进行自注意力运算。Before performing self-attention, the form of the data needs to be adjusted so that the data forms the form of (number of pixels * number of information channels). Then, self-attention operations are performed on the data.

根据本发明第二方面的系统，所述第三处理模块103具体被配置为，在所述二分类网络使用高维球型边界目标函数优化分类。According to the system of the second aspect of the present invention, the third processing module 103 is specifically configured to optimize classification using a high-dimensional spherical boundary objective function in the two-classification network.

在最开始的时候，为组内相似度和组间相似度设定各自分别的阈值。At the beginning, separate thresholds are set for intra-group similarity and inter-group similarity.

在分组计算度量指标的时候，组内相似度损失和组间相似度损失要分别乘以各自的步幅再进行统计。这样子在梯度下降求解的时候，可以对不同数据更新不同的步幅。When calculating metric indicators in groups, the similarity loss within the group and the similarity loss between groups must be multiplied by their respective strides before statistics are made. In this way, when solving gradient descent, different strides can be updated for different data.

在使用高维球型边界目标函数对模型进行训练后，模型就可以对输入的图片进行伪造鉴别，鉴别其是否是大模型生成而来。After training the model using a high-dimensional spherical boundary objective function, the model can identify forgeries of input images and determine whether they are generated by a large model.

本发明第三方面公开了一种电子设备。电子设备包括存储器和处理器，存储器存储有计算机程序，处理器执行计算机程序时，实现本发明公开第一方面中任一项的一种大模型生成图像鉴别方法中的步骤。A third aspect of the invention discloses an electronic device. The electronic device includes a memory and a processor. The memory stores a computer program. When the processor executes the computer program, it implements the steps in the large model generation image identification method according to any one of the first aspects disclosed in the present invention.

图7为根据本发明实施例的一种电子设备的结构图，如图7所示，电子设备包括通过系统总线连接的处理器、存储器、通信接口、显示屏和输入装置。其中，该电子设备的处理器用于提供计算和控制能力。该电子设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该电子设备的通信接口用于与外部的终端进行有线或无线方式的通信，无线方式可通过WIFI、运营商网络、近场通信（NFC）或其他技术实现。该电子设备的显示屏可以是液晶显示屏或者电子墨水显示屏，该电子设备的输入装置可以是显示屏上覆盖的触摸层，也可以是电子设备外壳上设置的按键、轨迹球或触控板，还可以是外接的键盘、触控板或鼠标等。Figure 7 is a structural diagram of an electronic device according to an embodiment of the present invention. As shown in Figure 7, the electronic device includes a processor, a memory, a communication interface, a display screen and an input device connected through a system bus. Among them, the processor of the electronic device is used to provide computing and control capabilities. The memory of the electronic device includes non-volatile storage media and internal memory. The non-volatile storage medium stores operating systems and computer programs. This internal memory provides an environment for the execution of operating systems and computer programs in non-volatile storage media. The communication interface of the electronic device is used for wired or wireless communication with external terminals. The wireless mode can be implemented through WIFI, operator network, near field communication (NFC) or other technologies. The display screen of the electronic device may be a liquid crystal display or an electronic ink display. The input device of the electronic device may be a touch layer covered on the display screen, or may be a button, trackball or touch pad provided on the housing of the electronic device. , it can also be an external keyboard, trackpad or mouse, etc.

本领域技术人员可以理解，图7中示出的结构，仅仅是与本公开的技术方案相关的部分的结构图，并不构成对本申请方案所应用于其上的电子设备的限定，具体的电子设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in Figure 7 is only a structural diagram of the part related to the technical solution of the present disclosure, and does not constitute a limitation on the electronic equipment to which the solution of the present application is applied. Specific electronic devices Devices may include more or fewer components than shown in the figures, or some combinations of components, or have different arrangements of components.

本发明第四方面公开了一种计算机可读存储介质。计算机可读存储介质上存储有计算机程序，计算机程序被处理器执行时，实现本发明公开第一方面中任一项的一种大模型生成图像鉴别方法中的步骤。A fourth aspect of the present invention discloses a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by the processor, the steps in the large model generation image identification method according to any one of the first aspects disclosed in the present invention are implemented.

请注意，以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。以上实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。Please note that the technical features of the above embodiments can be combined in any way. To simplify the description, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features , should be considered to be within the scope of this manual. The above embodiments only express several implementation modes of the present application, and their descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the invention patent. It should be noted that, for those of ordinary skill in the art, several modifications and improvements can be made without departing from the concept of the present application, and these all fall within the protection scope of the present application. Therefore, the protection scope of this patent application should be determined by the appended claims.

Claims

1. A method for large model generation image authentication, the method comprising:

s1, inputting a generated image into a first processing module based on residual filtering to obtain original characteristics;

s2, inputting the original features into a second processing module based on a self-attention mechanism and a residual error structure to obtain classification features;

and S3, inputting the classification characteristics into a classification network, and outputting a result with only true or false.

2. The method for discriminating a large model generated image according to claim 1 wherein in step S1, the method for inputting the generated image to a first processing module based on residual filtering to obtain an original feature includes:

and respectively inputting the generated image into a residual filter and a convolution kernel, combining the processing results of the residual filter and the convolution kernel, and finally inputting the combined result into a first convolution pooling layer to obtain the original characteristics.

3. The large model generation image discrimination method according to claim 2, wherein in said step S1, there are seventeen of said residual filters; the convolution kernel has eight; the values of seventeen residual filters are fixed and not changed in learning; whereas the parameters of the eight convolution kernels are learned during training.

4. The large model generation image discrimination method according to claim 2, wherein in said step S1, said first convolution pooling layer is a convolution pooling layer with a residual mechanism of convolution layer and pooling layer.

5. The method of claim 1, wherein in step S2, the step of inputting the original features into a second processing module based on a self-attention mechanism and a residual structure to obtain classification features comprises:

and inputting the original features into a second convolution pooling layer to obtain processing features, inputting the processing features into self-attention operation, and carrying out numerical addition on the self-attention operation result and the processing features to obtain classification features.

6. The large model generation image discrimination method according to claim 5, wherein in said step S2, V of said attention operation is said processing feature; the weights assigned to the V are obtained by calculation of Q and K using a softmax layer, and then the weighted average results in shallow texture features captured from the attention mechanism.

7. A large model generation image discrimination method according to claim 1, wherein in said step S3, classification is optimized using a high-dimensional spherical boundary objective function in said two-classification network.

8. A system for large model generation image authentication, the system comprising:

the first processing module is configured to input the generated image into the first processing module based on residual filtering to obtain original characteristics;

the second processing module is configured to input the original features into the second processing module based on a self-attention mechanism and a residual error structure to obtain classification features;

and the third processing module is configured to input the classification characteristic into a classification network and output a result of only true or false.

9. An electronic device comprising a memory storing a computer program and a processor implementing the steps of a large model generation image authentication method according to any one of claims 1 to 7 when the computer program is executed by the processor.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of a large model generation image authentication method according to any of claims 1 to 7.