CN117523323B

CN117523323B - Detection method and device for generated image

Info

Publication number: CN117523323B
Application number: CN202410010345.9A
Authority: CN
Inventors: 洪燕; 兰钧; 祝慧佳; 王维强
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2024-01-03
Filing date: 2024-01-03
Publication date: 2024-05-14
Anticipated expiration: 2044-01-03
Also published as: CN117523323A

Abstract

One or more embodiments of the present disclosure disclose a method and an apparatus for detecting a generated image, where the method firstly obtains a target image, secondly selects a partial image from the target image, and performs reconstruction processing on the partial image based on a residual image except the partial image in the target image, obtains a reconstructed image composed of the residual image and the reconstructed partial image, then inputs the reconstructed image into a pre-trained classification model to obtain a reconstructed effect type of the reconstructed image, and finally determines that the target image is a real image or a generated image according to the reconstructed effect type of the reconstructed image.

Description

Detection method and device for generated image

Technical Field

The present document relates to the field of image recognition technologies, and in particular, to a method and an apparatus for detecting a generated image.

Background

With the development of artificial intelligence technology, various generative models are increasingly used, for example: with the text-based large model, a corresponding image can be generated based on a piece of text. The generated model based on the diffusion model can improve the quality of the generated image. Thus, various open source communities support related authoring by users based on different open source generative models.

However, the wide application of the generative model also has a corresponding effect, and on the one hand, the image generated based on the text-to-image technology can cause some infringement risks, such as: celebrity photos and celebrity paintings generated based on a large draft model can relate to the copyright or copyright problem of an art creator; on the other hand, the image generated based on the graph generation technology may cause data security problems, and affect the propagation of real effective information, such as: the technique of drawing may be used for performing operations such as stylizing and editing on a real image, and the generated image may be used for illegal operations related to picture authentication. With increasing importance of private data and compliance requirements of generated images, it is highly desirable to provide a detection method for generating images, so as to identify the generated images in time.

Disclosure of Invention

In one aspect, one or more embodiments of the present disclosure provide a method for detecting a generated image, including: acquiring a target image, wherein the target image comprises a real image and/or a generated image, the real image is an image shot by an image acquisition device, and the generated image is an image generated based on preset conditions; selecting a partial image from the target image, and carrying out reconstruction processing on the partial image based on the residual image except the partial image in the target image to obtain a reconstructed image formed by the residual image and the reconstructed partial image; inputting the reconstructed image into a pre-trained classification model to obtain a reconstruction effect category of the reconstructed image, wherein the classification model is used for classifying the reconstructed image corresponding to the real image and the reconstructed image corresponding to the generated image according to a first reconstruction error between the real image and the reconstructed image corresponding to the real image and a second reconstruction error between the generated image and the reconstructed image corresponding to the generated image; and determining that the target image is a real image or a generated image according to the reconstruction effect category of the reconstructed image.

In another aspect, one or more embodiments of the present specification provide a detection apparatus for generating an image, including: the image acquisition module acquires a target image, wherein the target image comprises a real image and/or a generated image, the real image is an image shot by the image acquisition equipment, and the generated image is an image generated based on preset conditions; the image filling processing module is used for selecting a partial image from the target image, reconstructing the partial image based on the residual image except the partial image in the target image, and acquiring a reconstructed image formed by the residual image and the reconstructed partial image; the classification module is used for inputting the reconstructed image into a pre-trained classification model to obtain the reconstruction effect category of the reconstructed image, and the classification model is used for classifying the reconstructed image corresponding to the real image and the reconstructed image corresponding to the generated image according to a first reconstruction error between the real image and the reconstructed image corresponding to the real image and a second reconstruction error between the generated image and the reconstructed image corresponding to the generated image; and the generated image determining module is used for determining whether the target image is a real image or a generated image according to the reconstruction effect category of the reconstructed image.

In yet another aspect, one or more embodiments of the present specification provide an electronic device comprising: a processor; and a memory arranged to store computer executable instructions that, when executed, enable the processor to: acquiring a target image, wherein the target image comprises a real image and/or a generated image, the real image is an image shot by an image acquisition device, and the generated image is an image generated based on preset conditions; selecting a partial image from the target image, and carrying out reconstruction processing on the partial image based on the residual image except the partial image in the target image to obtain a reconstructed image formed by the residual image and the reconstructed partial image; inputting the reconstructed image into a pre-trained classification model to obtain a reconstruction effect category of the reconstructed image, wherein the classification model is used for classifying the reconstructed image corresponding to the real image and the reconstructed image corresponding to the generated image according to a first reconstruction error between the real image and the reconstructed image corresponding to the real image and a second reconstruction error between the generated image and the reconstructed image corresponding to the generated image; and determining that the target image is a real image or a generated image according to the reconstruction effect category of the reconstructed image.

In yet another aspect, one or more embodiments of the present description provide a storage medium storing a computer program executable by a processor to implement the following flow: acquiring a target image, wherein the target image comprises a real image and/or a generated image, the real image is an image shot by an image acquisition device, and the generated image is an image generated based on preset conditions; selecting a partial image from the target image, and carrying out reconstruction processing on the partial image based on the residual image except the partial image in the target image to obtain a reconstructed image formed by the residual image and the reconstructed partial image; inputting the reconstructed image into a pre-trained classification model to obtain a reconstruction effect category of the reconstructed image, wherein the classification model is used for classifying the reconstructed image corresponding to the real image and the reconstructed image corresponding to the generated image according to a first reconstruction error between the real image and the reconstructed image corresponding to the real image and a second reconstruction error between the generated image and the reconstructed image corresponding to the generated image; and determining that the target image is a real image or a generated image according to the reconstruction effect category of the reconstructed image.

Drawings

In order to more clearly illustrate one or more embodiments of the present specification or the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described, and it is apparent that the drawings in the following description are only some embodiments described in one or more embodiments of the present specification, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a schematic flow chart of a detection method for generating an image according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of the implementation principle of a detection method for generating an image according to an embodiment of the present disclosure;

FIG. 3 is a schematic block diagram of a detection apparatus for generating an image according to an embodiment of the present disclosure;

fig. 4 is a schematic block diagram of an electronic device according to an embodiment of the present description.

Detailed Description

One or more embodiments of the present disclosure provide a method and apparatus for detecting a generated image.

In order to enable a person skilled in the art to better understand the technical solutions in one or more embodiments of the present specification, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the drawings in one or more embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one or more embodiments of the present disclosure without inventive effort by one of ordinary skill in the art, are intended to be within the scope of the present disclosure.

As shown in fig. 1, the embodiment of the present disclosure provides a method for detecting a generated image, where an execution subject of the method may be a terminal device or a server, where the terminal device may be a certain terminal device such as a mobile phone, a tablet computer, a computer device such as a notebook computer or a desktop computer, or may also be an IoT device (specifically, such as a smart watch, an in-vehicle device, etc.). The server may be a single server, a server cluster including a plurality of servers, a background server such as a financial service or an online shopping service, or a background server of an application program. In this embodiment, a server is taken as an example for detailed description, and the following related contents may be referred to for the execution process of the terminal device, which is not described herein. The method specifically comprises the following steps:

In step S102, a target image is acquired, the target image including a real image and/or a generated image.

The target image in the embodiment of the present specification, that is, the current image to be detected, may be a single image, specifically, may be a real image, or may be a generated image. The target image may be a set of a plurality of images to be detected, and the set may be a set of a real image composition, a set of a generated image composition, or a set of a mixture of a real image and a generated image, from which all the generated images can be identified by the detection method in the embodiment of the present specification.

It should be noted that, the real image and the generated image may be images with the same content, or may be images with different contents, and taking a target image in a form of a plurality of to-be-detected image sets as an example, the target image may include: the real images of the duckling, the running puppy, the kitten and the certificate can be identified by the detection method in the embodiment of the specification.

Further, the real image in the embodiment of the present specification is an image captured based on the image capturing apparatus, is an unprocessed image, for example: an image captured by the camera. The generated image is an image generated or synthesized based on preset conditions, typically an image processed by an image processing algorithm or an image processing tool, for example: images generated based on AIGG (AI GENERATED Content, artificial intelligence production), images generated by a large model of a meristematic map, images generated by a generative model, and the like.

In step S104, a partial image is selected from the target image, and the partial image is subjected to reconstruction processing based on the remaining images except the partial image in the target image, and a reconstructed image composed of the remaining images and the reconstructed partial image is acquired.

The reconstructed image is composed of the residual image and the reconstructed partial image, that is: in order to obtain a reconstructed image corresponding to a target image, in the embodiment of the present disclosure, a partial image is first selected from the target image, and then the partial image is reconstructed based on the remaining images except for the partial image in the target image.

In implementation, the partial image may be reconstructed by overlaying the partial image with another image, for example: covering part of the image after mask processing by using a mask, and then performing image filling processing on the covered area; the selected partial image can be scratched from the target image, and then the region after the scratch processing is subjected to image filling processing; the method comprises the steps of identifying picture content, selecting content with specified semantics, covering the selected content with specified semantics by using other images or matting the selected content with specified semantics, and filling the covered or matting region with images.

In step S106, the reconstructed image is input into a pre-trained classification model to obtain a classification of the reconstructed effect of the reconstructed image, where the classification model is used to classify the reconstructed image corresponding to the real image and the reconstructed image corresponding to the generated image according to the first reconstruction error between the real image and the reconstructed image corresponding to the real image and the second reconstruction error between the generated image and the reconstructed image corresponding to the generated image.

In implementation, the classification model may be obtained by performing model training based on a plurality of real image samples, a generated image sample, and a preset loss function. The classification model can adopt a classification model constructed based on a deep neural network, input data of the classification model is a reconstruction image corresponding to a target object, an output result is a reconstruction effect type of the reconstruction image, the reconstruction effect type comprises a good type and a poor type, the reconstruction effect is worse when the reconstruction error is larger, the reconstruction effect is better when the reconstruction error is smaller, the reconstruction effect is poor when the reconstruction error is larger, the reconstruction effect is judged to be better when the reconstruction error exceeds a preset reconstruction error threshold, the reconstruction effect is lower than the preset reconstruction error threshold, and the reconstruction effect is judged to be good. Because the difference between the reconstruction effect of the generated image and the reconstruction effect of the real image is large, the category of the reconstruction effect of the generated image is good, and the category of the reconstruction effect of the real image is bad, so that the generated image and the real image can be distinguished.

In step S108, it is determined that the target image is a real image or a generated image according to the reconstruction effect category of the reconstructed image.

Since the difference of the smoothness of the real image and the generated image is larger, the larger difference of the smoothness can lead to larger difference of the reconstruction effect of the reconstructed images corresponding to the real image and the generated image respectively. Specifically, when the smoothness of the pixels in the generated image is higher and the image filling processing is performed on the area where the partial image is located, a smoother reconstructed image is obtained by performing the filling processing according to the image information around the partial image (namely the image information of the rest image except the partial image in the target image), so that the reconstruction error between the reconstructed image corresponding to the generated image and the generated image is smaller, and the reconstruction effect is better; and the smoothness of pixels around the real image is lower, when the image filling processing is carried out on the region where the partial image is located, a smoother reconstructed image is obtained by carrying out the filling processing according to the image information around the partial image, so that the reconstruction error between the reconstructed image corresponding to the real image and the real image is larger, and the reconstruction effect is poorer. Therefore, the embodiment of the description can effectively identify the generated image by comparing the reconstruction effect of the reconstructed image of the real image and the generated image, thereby improving the accuracy and the detection efficiency of the detection result of the generated image.

In the implementation, if the reconstruction effect type of the reconstruction image corresponding to the target image is good, the current target image is determined to be the generated image, and if the reconstruction effect type of the reconstruction image corresponding to the target image is bad, the current target image is determined to be the real image.

The embodiment of the specification provides a detection method for generating an image, which comprises the steps of firstly acquiring a target image, secondly selecting a partial image from the target image, carrying out reconstruction processing on the partial image based on residual images except the partial image in the target image, acquiring a reconstructed image formed by the residual image and the reconstructed partial image, inputting the reconstructed image into a pre-trained classification model to obtain a reconstructed effect type of the reconstructed image, and finally determining whether the target image is a real image or a generated image according to the reconstructed effect type of the reconstructed image. According to the embodiment of the specification, the principle that the difference of the smoothness of the real image and the smoothness of the generated image is large, and the large difference of the smoothness can lead to the large difference of the reconstruction effect of the reconstructed images corresponding to the real image and the generated image respectively is utilized, the reconstructed image corresponding to the target image is obtained through the reconstruction processing of the partial area of the target image, and the current target image is determined to be the real image or the generated image quickly and effectively through classifying the reconstruction effect of the reconstructed image, so that the detection efficiency of the generated image and the accuracy of the detection result are effectively improved. In addition, in the embodiment of the present disclosure, the input data of the classification model is a reconstructed image, the classification model is based on a real image and a reconstructed image corresponding to the real image, and the classification of the classification model is performed based on a generated image and a reconstructed image corresponding to the generated image, so that the images used for classifying the reconstructed image have the same content.

In the embodiment of the present disclosure, the processing of the step S104 may be varied, and the following provides an alternative processing manner, and in particular, the following steps S1042 to S1044 may be referred to.

In step S1042, mask processing is performed on the selected partial image according to a preset mask, so as to obtain a target image to be filled, which is composed of the remaining image and the mask processed partial image.

The target image to be filled consists of the residual image and the part of the image after mask processing, namely the region to be filled of the target image to be filled is: the area where the mask is located.

And carrying out mask processing on the region where the partial image is located by using a preset mask, and covering the selected partial image in the target image, thereby obtaining the target image to be filled, which is composed of the residual image and the partial image after mask processing.

In one implementation, the size ratio of the preset mask to the target image is less than or equal to 1/4. In practice, the size ratio of the preset mask to the target image may be selected to be 1/4, or 1/8, or the like, and the preset mask may be set in an arbitrary shape. If the target image is a set of a plurality of images to be detected, the size ratio of the preset mask to the image of the last area sequence in the set of the plurality of images to be detected is less than or equal to 1/4. The preset mask size not only can improve the efficiency of image filling processing, but also accords with the accuracy of the image filling processing, and is beneficial to improving the operability of the image filling processing.

In step S1044, image filling processing is performed on the target image to be filled based on the image filling model, so as to perform reconstruction processing on a part of the images in the region where the mask is located based on the remaining images, and obtain a reconstructed image corresponding to the target image.

In practice, the image filling model may employ existing image filling models, such as: and (5) constructing an image filling model based on the diffusion model.

In the embodiment of the present disclosure, the above-mentioned processing of step S104 may be varied, and the following alternative processing is provided, and in particular, reference may be made to the following processing of steps S1046 to S1048.

In step S1046, a partial image is scratched out from the target image, and the target image to be filled is obtained based on the remaining image and the region where the partial image is located.

The target image to be filled consists of a residual image and an area where a part of the image is located (namely, a blank image), and the area to be filled of the target image to be filled is a corresponding area (or an area where the blank image is located) in the target image where the part of the image to be filled is scratched out.

In the implementation, the existing matting tool can be adopted to directly matting out the partial image to be selected from the target image, and the region where the partial image is located is reserved in the target image, so that the target image to be filled is obtained based on the region where the residual image and the partial image are located, the target image to be filled can be obtained quickly, and the efficiency of image filling processing is improved.

In step S1048, image filling processing is performed on the target image to be filled based on a preset image filling algorithm, so as to perform reconstruction processing on the partial image in the region where the partial image is located based on the remaining image, and obtain a reconstructed image corresponding to the target image.

In implementation, the preset image filling algorithm may be an injection filling area algorithm, a seed filling algorithm, a scan line filling algorithm, an edge filling algorithm, or the like, which is not limited in this embodiment of the present disclosure. And the image filling model can be adopted to carry out image filling processing on the target image to be filled, so as to obtain a reconstructed image corresponding to the target image.

In the embodiment of the present disclosure, the training method of the classification model in the step S106 may be various, and the following provides an alternative processing method, and in particular, the following steps A1-A3 may be referred to.

Step A1, a plurality of image samples with label information are obtained, wherein the image samples comprise real image samples and generated image samples, and the label information is used for marking the image samples as the real image samples or generating the image samples.

The acquired plurality of image samples may be image samples of the same content or image samples of different contents. The image sample is provided with label information and is used for performing supervised training, so that the accuracy of the classification result of the classification model is improved. The method for acquiring the plurality of image samples can directly acquire the plurality of real image samples and the plurality of generated image samples from the open source data set. It is also possible to acquire a plurality of real image samples only from the open source data set and generate a corresponding plurality of generated image samples based on the acquired real image samples.

And A2, selecting a partial image from each image sample, and carrying out reconstruction processing on the partial image based on the residual images except the partial image in each image sample to obtain a reconstructed image formed by the residual image corresponding to each image sample and the reconstructed partial image.

And A3, training the classification model based on a preset loss function by taking a plurality of image samples as input data and taking the reconstruction effect type of the reconstructed image corresponding to the plurality of image samples as an output result to obtain a trained classification model.

In implementation, the preset loss function is a classification model loss function, and a cross entropy loss function may be used.

In the embodiment of the present disclosure, the processing manner of the step A1 may be varied, and the following provides an alternative processing manner, and in particular, reference may be made to the following processing of steps a11 to a 12.

And step A11, acquiring a plurality of real image samples based on the open source data set.

Step A12, generating a plurality of generated image samples based on the plurality of real image samples by using the generation model.

In implementations, the generation model may employ a diffusion model, a VAE (Variational Autoencoder, variational automatic encoder) generation model, a stream-based generation model, or the like.

Based on an open source data set, only a plurality of real image samples are obtained, and corresponding generated image samples are generated based on the obtained real image samples, so that model training can be performed under the condition of a small number of samples, and the generated images are consistent with the real images in content in the mode, thereby being beneficial to model directional learning of the difference in actual content between the real images and the generated images, having pertinence, and being beneficial to further improving the efficiency of model training and the accuracy of classification results of classification models.

In the embodiment of the present disclosure, the classification model in step S106 is used to classify the reconstructed image corresponding to the real image and the reconstructed image corresponding to the generated image according to the first reconstruction error and the second reconstruction error. The implementation manners of the first reconstruction error and the second reconstruction error may be various, in one implementation manner, the first reconstruction error is determined according to a preset similarity index between the real image and the corresponding reconstruction image, the second reconstruction error is determined according to a preset similarity index between the generated image and the corresponding reconstruction image, and the first reconstruction error and the second reconstruction error are in inverse relation with the smoothness of the target image. The larger the first reconstruction error (second reconstruction error), the smaller the smoothness of the target image, and the larger the smoothness of the target image.

In one implementation, the preset similarity index includes: one or more of a structural similarity parameter, a peak signal-to-noise ratio parameter, and a learning perceptual image block similarity parameter. Namely: only one similarity index may be adopted, or two or three similarity indexes may be selected simultaneously to determine the first reconstruction error or the second reconstruction error. The structural similarity parameter is in inverse proportion to the first reconstruction error and the second reconstruction error and in direct proportion to the reconstruction effect; the peak signal-to-noise ratio parameter is in inverse proportion to the first reconstruction error and the second reconstruction error and in direct proportion to the reconstruction effect; the similarity parameter of the learning sensing image block is in a direct proportion relation with the first reconstruction error and the second reconstruction error and in an inverse proportion relation with the reconstruction effect. Namely: the larger the structural similarity parameter is, the smaller the first reconstruction error (second reconstruction error) is, the better the reconstruction effect is, the larger the peak signal to noise ratio parameter is, the smaller the first reconstruction error (second reconstruction error) is, the better the reconstruction effect is, the larger the learning perception image block similarity parameter is, the larger the first reconstruction error (second reconstruction error) is, and the worse the reconstruction effect is.

The implementation principle of the detection method for generating an image in the embodiment of the present specification may be referred to fig. 2. Taking the generated image output by the generated model based on the diffusion model as an example, the diffusion model performs image generation processing by a multi-step iterative denoising method, and noise irrelevant to a set conditional text is gradually removed by guiding of input conditions in the layering denoising process, so that the generated image conforming to the set conditional text is generated, and therefore, the pixels of the generated image obtained by the method are excessively smooth (high in smoothness) globally. On the basis of the real image shot by the image acquisition device, the pixels of the real image are relatively sharpened (the smoothness is lower) due to the difference of natural scenes and camera parameters during shooting, so that the smoothness of the generated image and the real image has a large difference. According to the embodiment of the specification, the smoothness of the generated image and the real image is distinguished by comparing the reconstruction effect of the reconstructed image corresponding to the real image and the reconstruction effect of the reconstructed image corresponding to the generated image, so that the generated image and the real image can be quickly and effectively distinguished.

Taking 2500 generated images generated based on Stable diffusion (an intelligent AI drawing generation tool and a generation model based on a diffusion model) and 2500 real images shot by a camera as examples, masking the generated images and the real images respectively by randomly generating masks with the proportion smaller than 1/8 area size of the real images and masks with the proportion smaller than 1/4 area size of the real images to obtain an area to be filled (namely a mask area), performing image filling processing on the area to be filled by using an image filling model based on Stable diffusion to obtain a reconstructed image corresponding to the generated images and a reconstructed image corresponding to the real images, and calculating various similarity indexes between the generated images and the real images after the image filling processing to obtain contents shown in the following tables 1 and 2.

TABLE 1 Each similarity index when the mask area is 1/8 of the original area

TABLE 2 similarity index for 1/4 of original image area in mask area

As can be seen from tables 1 and 2 above, SSIM (structural similarity index measurement, structural similarity parameter) and PSNR (PEAK SIGNAL to Noise Ratio parameter) of a real image are lower compared to SSIM and PSNR of a generated image, and LPIPS (Learned Perceptual IMAGE PATCH SIMILARITY, learning a perceived image block similarity parameter) of a real image is higher compared to LPIPS of a generated image. Specifically, since the smoothness of the pixels in the generated image is higher, when the image filling processing is performed on the area where the partial image is located, a smoother reconstructed image is obtained by performing the filling processing according to the image information around the partial image, so that the reconstruction error between the generated image and the reconstructed image corresponding to the generated image is smaller, the reconstruction effect is better, the smoothness of the pixels around the real image is lower, and when the image filling processing is performed on the area where the partial image is located, a smoother reconstructed image is obtained by performing the filling processing according to the image information around the partial image, so that the reconstruction error between the real image and the reconstructed image corresponding to the real image is larger, the reconstruction effect is worse, and as can be seen from the combination of table 1 and table 2, the reconstruction effect of the real image is far lower than that of the generated image. Based on this, in fig. 2, after the classification model is trained by using a plurality of image samples, the reconstructed image corresponding to the target image is classified into two types of images with good reconstruction effect and poor reconstruction effect by using the classification model, and finally, whether the current target image is a generated image or a real image is determined according to the reconstruction effect.

In summary, particular embodiments of the present subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may be advantageous.

The above method for detecting a generated image provided in one or more embodiments of the present disclosure is based on the same concept, and one or more embodiments of the present disclosure further provide a device for detecting a generated image, as shown in fig. 3.

The detection device for generating an image includes: an image acquisition module 210, an image population processing module 220, a classification module 230, and a generated image determination module 240, wherein:

The image acquisition module 210 acquires a target image, wherein the target image comprises a real image and/or a generated image, the real image is an image shot by the image acquisition device, and the generated image is an image generated based on preset conditions;

The image filling processing module 220 selects a partial image from the target image, and performs reconstruction processing on the partial image based on the residual image except the partial image in the target image to obtain a reconstructed image composed of the residual image and the reconstructed partial image;

The classification module 230 inputs the reconstructed image into a pre-trained classification model to obtain a reconstruction effect class of the reconstructed image, wherein the classification model is used for classifying the reconstructed image corresponding to the real image and the reconstructed image corresponding to the generated image according to a first reconstruction error between the real image and the reconstructed image corresponding to the real image and a second reconstruction error between the generated image and the reconstructed image corresponding to the generated image;

the generated image determining module 240 determines whether the target image is a real image or a generated image according to the reconstruction effect category of the reconstructed image.

In the embodiment of the present specification, the image filling processing module 220 may include:

The mask processing unit is used for performing mask processing on the selected partial images according to a preset mask to obtain target images to be filled, wherein the target images comprise residual images and the partial images after mask processing;

the first image filling processing unit is used for performing image filling processing on the target image to be filled based on the image filling model so as to perform reconstruction processing on partial images in the area where the mask is located based on the residual images and obtain a reconstructed image corresponding to the target image.

In one embodiment, the size ratio of the mask preset in the mask processing unit to the target image is less than or equal to 1/4.

In the embodiment of the present specification, the image filling processing module 220 may further include:

the matting unit is used for matting out part of the images from the target image and obtaining the target image to be filled based on the residual images and the area where the part of the images are located;

and the second image filling processing unit is used for performing image filling processing on the target image to be filled based on a preset image filling algorithm so as to perform reconstruction processing on the partial image in the area where the partial image is based on the residual image and acquire a reconstructed image corresponding to the target image.

In the embodiment of the present disclosure, in the classification model of the classification module 230, the first reconstruction error is determined according to a preset similarity index between the real image and the corresponding reconstructed image, the second reconstruction error is determined according to a preset similarity index between the generated image and the corresponding reconstructed image, and the first reconstruction error and the second reconstruction error are both in inverse relation with the smoothness of the target image.

In one embodiment, the similarity index preset in the classification model of the classification module 230 includes: the method comprises the steps of learning one or more of a structural similarity parameter, a peak signal-to-noise ratio parameter and a perceived image block similarity parameter, wherein the structural similarity parameter is in inverse proportion to a first reconstruction error and a second reconstruction error, the peak signal-to-noise ratio parameter is in inverse proportion to the first reconstruction error and the second reconstruction error, and the perceived image block similarity parameter is in direct proportion to the first reconstruction error and the second reconstruction error.

In this embodiment of the present disclosure, the detection apparatus for generating an image further includes a classification model training module, which trains a classification model based on a plurality of image samples with tag information and a preset loss function, to obtain a trained classification model, where the classification model training module includes:

The image sample acquisition unit acquires a plurality of image samples with label information, wherein the image samples comprise real image samples and generated image samples, and the label information is used for marking the image samples as the real image samples or generating the image samples;

a reconstructed image acquisition unit for selecting a partial image from each image sample, and performing reconstruction processing on the partial image based on the residual images except the partial image in each image sample to acquire a reconstructed image formed by the residual image corresponding to each image sample and the reconstructed partial image;

The model training unit takes a plurality of image samples as input data, takes the reconstruction effect types of the reconstructed images corresponding to the plurality of image samples as output results, and trains the classification model based on a preset loss function to obtain the trained classification model.

In the embodiment of the present specification, the image sample acquisition unit includes:

a real image sample acquisition subunit, configured to acquire a plurality of real image samples based on the open source data set;

And a generated image sample generation subunit that generates a plurality of generated image samples based on the plurality of real image samples using the generation model.

It should be understood by those skilled in the art that the above-mentioned image generation detection device can be used to implement the above-mentioned image generation detection method, and the detailed description thereof should be similar to that of the above-mentioned method section, so as to avoid complexity and avoid redundancy.

The embodiment of the specification provides a detection device for generating an image, which comprises the steps of firstly acquiring a target image through an image acquisition module, secondly selecting a partial image from the target image through an image filling processing module, carrying out reconstruction processing on the partial image based on the residual image except the partial image in the target image, acquiring a reconstructed image formed by the residual image and the reconstructed partial image, inputting the reconstructed image into a pre-trained classification model based on a classification module to obtain a reconstruction effect type of the reconstructed image, and finally determining whether the target image is a real image or a generated image according to the reconstruction effect type of the reconstructed image through a generated image determination module. According to the embodiment of the specification, the principle that the difference of the smoothness of the real image and the smoothness of the generated image is large, and the large difference of the smoothness can lead to the large difference of the reconstruction effect of the reconstructed images corresponding to the real image and the generated image respectively is utilized, the reconstructed image corresponding to the target image is obtained through the reconstruction processing of the partial area of the target image, and the current target image is determined to be the real image or the generated image quickly and effectively through classifying the reconstruction effect of the reconstructed image, so that the detection efficiency of the generated image and the accuracy of the detection result are effectively improved. In addition, in the embodiment of the present disclosure, the input data of the classification model is a reconstructed image, the classification model is based on a real image and a reconstructed image corresponding to the real image, and the classification of the classification model is performed based on a generated image and a reconstructed image corresponding to the generated image, so that the images used for classifying the reconstructed image have the same content.

Based on the same considerations, one or more embodiments of the present disclosure also provide an electronic device, as shown in fig. 4. The electronic device may be configured or configured differently, may include one or more processors 301 and memory 302, and may have one or more applications or data stored in memory 302. Wherein the memory 302 may be transient storage or persistent storage. The application programs stored in memory 302 may include one or more modules (not shown), each of which may include a series of computer-executable instructions for use in an electronic device. Still further, the processor 301 may be arranged to communicate with the memory 302 and execute a series of computer executable instructions in the memory 302 on an electronic device. The electronic device may also include one or more power supplies 303, one or more wired or wireless network interfaces 304, one or more input/output interfaces 305, and one or more keyboards 306.

In particular, in this embodiment, an electronic device includes a memory, and one or more programs, where the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the electronic device, and the one or more programs configured to be executed by one or more processors include instructions for:

Acquiring a target image, wherein the target image comprises a real image and/or a generated image, the real image is an image shot by an image acquisition device, and the generated image is an image generated on the basis of preset conditions;

Selecting a partial image from the target image, and carrying out reconstruction processing on the partial image based on the residual image except the partial image in the target image to obtain a reconstructed image formed by the residual image and the reconstructed partial image;

Inputting the reconstructed image into a pre-trained classification model to obtain a reconstruction effect category of the reconstructed image, wherein the classification model is used for classifying the reconstructed image corresponding to the real image and the reconstructed image corresponding to the generated image according to a first reconstruction error between the real image and the reconstructed image corresponding to the real image and a second reconstruction error between the generated image and the reconstructed image corresponding to the generated image;

and determining that the target image is a real image or a generated image according to the reconstruction effect type of the reconstructed image.

One or more embodiments of the present description provide a storage medium for storing computer-executable instructions that, when executed by a processor, implement the following:

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable GATE ARRAY, FPGA)) is an integrated circuit whose logic functions are determined by user programming of the device. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented with "logic compiler (logic compiler)" software, which is similar to the software compiler used in program development and writing, and the original code before being compiled is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but HDL is not just one, but a plurality of kinds, such as ABEL（Advanced Boolean Expression Language）、AHDL（Altera Hardware Description Language）、Confluence、CUPL（Cornell University Programming Language）、HDCal、JHDL（Java Hardware Description Language）、Lava、Lola、MyHDL、PALASM、RHDL（Ruby Hardware Description Language）, and VHDL (Very-High-SPEED INTEGRATED Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application SPECIFIC INTEGRATED Circuits (ASICs), programmable logic controllers, and embedded microcontrollers, examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing one or more embodiments of the present description.

One skilled in the art will appreciate that one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

One or more embodiments of the present description are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

One or more embodiments of the present specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing description of one or more embodiments is merely illustrative of the application and is not intended to be limiting. Various modifications and alterations to one or more embodiments of this description will be apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of one or more embodiments of the present disclosure, are intended to be included within the scope of the claims of one or more embodiments of the present disclosure.

Claims

1. A method of detecting a generated image, comprising:

Acquiring a target image, wherein the target image comprises a real image and/or a generated image, the real image is an image shot by an image acquisition device, and the generated image is an image generated based on preset conditions;

Inputting the reconstructed image into a pre-trained classification model to obtain a reconstructed effect category of the reconstructed image, wherein the classification model is used for classifying the reconstructed image corresponding to the real image and the reconstructed image corresponding to the generated image according to a first reconstructed error between the real image and the reconstructed image corresponding to the real image and a second reconstructed error between the generated image and the reconstructed image corresponding to the generated image, the first reconstructed error is determined according to a preset similarity index between the real image and the reconstructed image corresponding to the real image, the second reconstructed error is determined according to a preset similarity index between the generated image and the reconstructed image corresponding to the generated image, and the first reconstructed error and the second reconstructed error are in inverse relation with the smoothness of the target image;

And determining that the target image is a real image or a generated image according to the reconstruction effect category of the reconstructed image.

2. The method according to claim 1, wherein the reconstructing the partial image based on the remaining image except the partial image in the target image, and obtaining a reconstructed image composed of the remaining image and the reconstructed partial image, includes:

performing mask processing on the selected partial images according to a preset mask to obtain target images to be filled, wherein the target images are composed of the residual images and the partial images subjected to the mask processing;

And performing image filling processing on the target image to be filled based on an image filling model, so as to perform reconstruction processing on the partial image in the area where the mask is positioned based on the residual image, and acquiring a reconstructed image corresponding to the target image.

3. The method of claim 2, wherein a size ratio of the preset mask to the target image is less than or equal to 1/4.

4. The method according to claim 1, wherein the reconstructing the partial image based on the remaining image except the partial image in the target image, and obtaining a reconstructed image composed of the remaining image and the reconstructed partial image, includes:

the partial image is scratched from the target image, and a target image to be filled is obtained based on the residual image and the area where the partial image is located;

And performing image filling processing on the target image to be filled based on a preset image filling algorithm, so as to perform reconstruction processing on the partial image in the region where the partial image is located based on the residual image, and acquiring a reconstructed image corresponding to the target image.

5. The method of claim 1, the similarity index comprising: the method comprises the steps of obtaining one or more of structural similarity parameters, peak signal-to-noise ratio parameters and learning perception image block similarity parameters, wherein the structural similarity parameters are in inverse proportion to the first reconstruction errors and the second reconstruction errors, the peak signal-to-noise ratio parameters are in inverse proportion to the first reconstruction errors and the second reconstruction errors, and the learning perception image block similarity parameters are in direct proportion to the first reconstruction errors and the second reconstruction errors.

6. The method of claim 1, the training method of the classification model comprising:

Acquiring a plurality of image samples with tag information, wherein the image samples comprise real image samples and generated image samples, and the tag information is used for marking the image samples as the real image samples or generating the image samples;

Selecting partial images from each image sample, and carrying out reconstruction processing on the partial images based on residual images except the partial images in each image sample to obtain a reconstructed image formed by the residual images corresponding to each image sample and the reconstructed partial images;

and training the classification model based on a preset loss function by taking the plurality of image samples as input data and the reconstruction effect type of the reconstructed image corresponding to the plurality of image samples as an output result to obtain a trained classification model.

7. The method of claim 6, the acquiring a plurality of image samples, comprising:

Acquiring a plurality of real image samples based on an open source data set;

And generating a plurality of generated image samples based on the plurality of real image samples by using a generating model.

8. A detection apparatus that generates an image, comprising:

The image acquisition module acquires a target image, wherein the target image comprises a real image and/or a generated image, the real image is an image shot by the image acquisition equipment, and the generated image is an image generated based on preset conditions;

The image filling processing module is used for selecting a partial image from the target image, reconstructing the partial image based on the residual image except the partial image in the target image, and acquiring a reconstructed image formed by the residual image and the reconstructed partial image;

The classification module is used for inputting the reconstructed image into a pre-trained classification model to obtain a reconstruction effect category of the reconstructed image, the classification model is used for classifying the reconstructed image corresponding to the real image and the reconstructed image corresponding to the generated image according to a first reconstruction error between the real image and the reconstructed image corresponding to the real image and a second reconstruction error between the generated image and the reconstructed image corresponding to the generated image, the first reconstruction error is determined according to a preset similarity index between the real image and the reconstructed image corresponding to the real image, the second reconstruction error is determined according to a preset similarity index between the generated image and the reconstructed image corresponding to the generated image, and the first reconstruction error and the second reconstruction error are in inverse relation with the smoothness of the target image;

and the generated image determining module is used for determining whether the target image is a real image or a generated image according to the reconstruction effect category of the reconstructed image.

9. An electronic device, comprising:

A processor; and

A memory arranged to store computer executable instructions that, when executed, enable the processor to: