CN116958637A

CN116958637A - Training method, device, equipment and storage medium of image detection model

Info

Publication number: CN116958637A
Application number: CN202310479792.4A
Authority: CN
Inventors: 晏志远; 张勇; 樊艳波; 吴保元
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-04-26
Filing date: 2023-04-26
Publication date: 2023-10-27

Abstract

The application discloses a training method, device and equipment of an image detection model and a storage medium, and relates to the technical field of computer vision. The image detection model includes: a first encoder, a first classifier, and a second classifier; the method comprises the following steps: acquiring a training sample set of an image detection model; extracting, by the first encoder, a specific counterfeit feature and a general counterfeit feature of the sample image; obtaining a first classification result through a first classifier according to the specific counterfeiting characteristics; obtaining a second classification result according to the general forging characteristics through a second classifier; and adjusting parameters of the image detection model according to the classification result to obtain the trained image detection model. By training the image detection model by the method, the capability of the image detection model for extracting common characteristics of different forging methods in the image is improved, so that the image detection model can better detect various forged images generated by different forging methods, and the generalization of the image detection model is improved.

Description

Training method, device, equipment and storage medium of image detection model

Technical Field

The present application relates to the field of computer vision, and in particular, to a training method, apparatus, device, and storage medium for an image detection model.

Background

Depth forging refers to a class of methods that use deep learning techniques to generate images and video that are not present in reality. Using deep forgery techniques, the behavior of a real person can be created and simulated, or the behavior of one person in a video can be replaced by the behavior of another person, with a significant security risk. There is therefore a need for deep forgery detection of corresponding images.

The currently common deep forgery detection is based on a deep learning technology, and features which can be used for forgery detection are extracted from an image by mainly using a trained image detection model (namely a model constructed based on a neural network) to detect the image. The method can obtain better effect when detecting the image which adopts the same forging method with the training sample of the model. However, when an image using a different forgery method from that of a training sample is detected by this method, the detection performance thereof is significantly reduced.

It can be seen that the above model for detecting counterfeit images is poor in generalization.

Disclosure of Invention

The embodiment of the application provides a training method, device and equipment of an image detection model and a storage medium. The technical scheme provided by the embodiment of the application is as follows:

according to an aspect of an embodiment of the present application, there is provided a training method of an image detection model including a first encoder, a first classifier, and a second classifier; the method comprises the following steps:

acquiring a training sample set of the image detection model, wherein the training sample set comprises a plurality of sample images, and the plurality of sample images comprise at least one fake image and at least one real image;

extracting, by the first encoder, a specific counterfeit feature and a general counterfeit feature of the sample image, the specific counterfeit feature being a feature in the image for distinguishing between different counterfeit methods, the general counterfeit feature being a common feature in the image for distinguishing between the different counterfeit methods;

obtaining a first classification result of the sample image according to the specific forging characteristics of the sample image through the first classifier, wherein the first classification result is used for indicating the probability that the sample image corresponds to n categories respectively, the n categories comprise categories respectively corresponding to a real image category and n-1 forging methods, and n is an integer larger than 2;

Obtaining a second classification result of the sample image according to the universal forging characteristics of the sample image through the second classifier, wherein the second classification result is used for indicating the probability that the sample image corresponds to the forged image category and the true image category respectively;

and adjusting parameters of the image detection model according to the first classification result and the second classification result to obtain the trained image detection model.

According to an aspect of an embodiment of the present application, there is provided an image detection method including:

acquiring an image to be detected;

extracting general fake features of the image to be detected through a first encoder of an image detection model, wherein the general fake features refer to common features of different fake methods in the image;

obtaining an identification result of the image to be detected according to the universal counterfeit characteristic of the image to be detected by a second classifier of the image detection model, wherein the identification result is used for indicating whether the image to be detected is a counterfeit image or not;

the image detection model is obtained by adopting the training method of the image detection model.

According to an aspect of an embodiment of the present application, there is provided a training apparatus of an image detection model including a first encoder, a first classifier, and a second classifier; the device comprises:

A sample acquisition module for acquiring a training sample set of the image detection model, the training sample set comprising a plurality of sample images, the plurality of sample images comprising at least one counterfeit image and at least one authentic image;

the characteristic extraction module is used for extracting specific counterfeiting characteristics and general counterfeiting characteristics of the sample image through the first encoder, wherein the specific counterfeiting characteristics are characteristics used for distinguishing different counterfeiting methods in the image, and the general counterfeiting characteristics are common characteristics of different counterfeiting methods in the image;

the first classification module is used for obtaining a first classification result of the sample image according to the specific forging characteristics of the sample image through the first classifier, wherein the first classification result is used for indicating the probability that the sample image corresponds to n categories respectively, the n categories comprise categories corresponding to a real image category and n-1 forging methods respectively, and n is an integer larger than 2;

the second classification module is used for obtaining a second classification result of the sample image according to the universal counterfeiting characteristics of the sample image through the second classifier, and the second classification result is used for indicating the probability that the sample image corresponds to the counterfeiting image category and the real image category respectively;

And the parameter adjustment module is used for adjusting the parameters of the image detection model according to the first classification result and the second classification result to obtain the trained image detection model.

According to an aspect of an embodiment of the present application, there is provided an image detection apparatus including:

the image acquisition module is used for acquiring an image to be detected;

the common feature extraction module is used for extracting the common counterfeit features of the image to be detected through a first encoder of the image detection model, wherein the common counterfeit features refer to common features of different counterfeit methods in the image;

the identification module is used for obtaining an identification result of the image to be detected according to the universal counterfeit characteristics of the image to be detected through a second classifier of the image detection model, and the identification result is used for indicating whether the image to be detected is a counterfeit image or not;

According to an aspect of the embodiment of the present application, there is provided a computer device including a processor and a memory, in which a computer program is stored, the computer program being loaded and executed by the processor to implement the training method of the image detection model or to implement the image detection method.

According to an aspect of the embodiments of the present application, there is provided a computer-readable storage medium having stored therein a computer program loaded and executed by a processor to implement the training method of the image detection model or the image detection method described above.

According to an aspect of the embodiments of the present application, there is provided a computer program product including a computer program stored in a computer-readable storage medium, from which a processor reads and executes the computer program to implement the training method of the image detection model described above, or to implement the image detection method described above.

The technical scheme provided by the embodiment of the application at least comprises the following beneficial effects:

extracting specific counterfeit characteristics and general counterfeit characteristics of the sample image through a first encoder of the image detection model, obtaining a first classification result of the sample image according to the specific counterfeit characteristics, obtaining a second classification result of the sample image according to the general counterfeit characteristics, and adjusting parameters of the image detection model according to the first classification result and the second classification result to obtain the trained image detection model. By training the image detection model by the method, the capability of decoupling common features of different forging methods from all feature information of the image is improved, so that the image detection model can better detect various forged images generated by different forging methods, and the generalization of the image detection model is improved.

Drawings

FIG. 1 is a schematic illustration of an implementation environment for an embodiment of the present application;

FIG. 2 is a schematic diagram of an image detection method based on an image detection model according to an embodiment of the present application;

FIG. 3 is a flow chart of a training method for an image detection model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an encoder module of an image detection model provided by one embodiment of the present application;

FIG. 5 is a schematic diagram of a decoder module of an image detection model provided by one embodiment of the present application;

FIG. 6 is a flow chart of an image detection method provided by an embodiment of the present application;

FIG. 7 is a block diagram of a training apparatus for an image detection model provided by one embodiment of the present application;

fig. 8 is a block diagram of an image detection apparatus provided in one embodiment of the present application;

fig. 9 is a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Artificial intelligence (Artificial Intelligence, AI for short) is a theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Computer Vision (CV) is a science of researching how to make a machine "look at", and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as identifying and measuring on a target, and further perform graphic processing, so that the Computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition ), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous localization and map construction, and the like, as well as common biometric recognition techniques.

Machine Learning (ML) is a multi-domain interdisciplinary, and involves multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

The technical scheme provided by the embodiment of the application relates to the technology of artificial intelligence such as machine learning and computer vision, and is specifically described by the following embodiment.

Before describing the technical scheme of the application, some nouns related to the application are explained. The following related explanations may be optionally combined with the technical solutions of the embodiments of the present application, which all belong to the protection scope of the embodiments of the present application. Embodiments of the present application include at least some of the following.

Deep forgery (deepfake): a class of methods for generating images and videos that are not present in reality using deep learning techniques. Using deep forgery techniques, it can be used to create, simulate the behavior of a real person, or replace the behavior and utterances in the video of one person with that of another.

The content characteristics are as follows: features in the image that are not associated with counterfeiting, such as background, person identity, etc. In the embodiment of the present application, the Content feature may be represented by Content.

Counterfeit features: features in the image for forgery detection. In an embodiment of the present application, the counterfeit feature may be represented by finger.

Specific counterfeit features: features in the image that are used to distinguish between different forgery methods. In an embodiment of the present application, the specific counterfeit feature may be represented by Specific Fingerprint (specific feature). The specific forgery feature is a forgery feature unique to each forgery method. Since each counterfeiting method may be different, learning the specific counterfeiting characteristics of one method is often difficult to generalize to other methods.

General forgery features: common features of different forgery methods in an image. In an embodiment of the present application, the specific counterfeit feature may be represented by Common Fingerprint (general feature). A generic counterfeit feature is a counterfeit feature that co-exists with a number of different counterfeit methods. The common fake features are learned, so that the universal fake features have stronger generalization performance.

Generalization problem (Generalization Problem): the present application relates generally to the problem of generalization in deep forgery detection. The detection methods mentioned in the background above generally perform well when the training and test data use the same forgery techniques. However, in practical applications, test data may be created using an unknown process, resulting in differences between training and test data, and thus poor detection performance. That is, when an image using a different forgery method from that of a training sample is detected by an image detection model, a problem of degradation of detection performance occurs.

Decoupling learning (Disentanglement Learning): is a method of decomposing complex features into simpler, more narrowly defined variables and encoding them into independent dimensions with high discriminant power. In the embodiment of the application, the image detection model simultaneously learns specific counterfeit characteristics, general counterfeit characteristics and content characteristics, and the three are separated when the image characteristics are extracted.

Equity Domain (Within-Domain): refers to data used for training and testing, with similar distributions, either from the same dataset or generated by the same forgery method.

Cross-Domain (Cross-Domain): refers to data used for training and testing, with different distributions, or from different data sets or generated by different forgery methods.

Referring to fig. 1, a schematic diagram of an implementation environment of an embodiment of the present application is shown, where the implementation environment of the embodiment may include: model training apparatus 10 and model using apparatus 20.

Model training device 10 may be an electronic device such as a personal computer, tablet, server, intelligent robot, or some other electronic device with relatively high computing power. Model training apparatus 10 is used to train image detection model 15. In some embodiments, model training apparatus 10 may train the image detection model in a machine learning manner to provide better performance. In the model training process, after the sample image is input into the image detection model, specific fake characteristics, general fake characteristics and content characteristics of the sample image can be extracted, and a total loss is obtained, wherein the total loss is used for adjusting parameters of the image detection model. The specific modules of the image detection model and the calculation method of the total loss will be described in detail in the following embodiments, which are not repeated here.

The image detection model 15 trained as described above can be deployed in the model using apparatus 20 for detecting whether or not an image to be detected in the model using apparatus 20 has been falsified. The model using device 20 may be a terminal device such as a mobile phone, a computer, a smart tv, a multimedia playing device, or a server, which is not limited in the present application. For example, the trained image detection model may extract general counterfeit features of an image to be detected and determine whether the image to be detected is a counterfeit image based on the general counterfeit features.

In the method provided by the embodiment of the application, the execution main body of each step can be computer equipment, and the computer equipment refers to electronic equipment with data calculation, processing and storage capabilities. The computer device may be a terminal such as a PC (Personal Computer ), tablet, smart phone, wearable device, smart robot, etc.; or may be a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing service. The computer device may be the model training device 10 in fig. 1, for performing the training method of the image detection model provided by the embodiment of the present application; the model using apparatus 20 may also be used to perform the image detection method provided by the embodiment of the present application.

Referring to fig. 2, a schematic diagram of an image detection method based on an image detection model according to an embodiment of the application is shown.

In the model training process, the sample image is input into the encoder 52 of the image detection model, resulting in the content features, specific forgery features, and general forgery features of the sample image. In some embodiments, encoder 52 includes a first encoder and a second encoder, Wherein the first encoder is used to extract specific counterfeit features and generic counterfeit features and the second encoder is used to extract content features. The number of sample images is typically a plurality, each sample image being either a counterfeit image 50 or a genuine image 51. When the sample image is a counterfeit image 50, the content characteristics c of the counterfeit image 50 can be obtained by the encoder 52 ₀ Specific counterfeit feature f ^s ₀ And general forgery feature f ^c ₀ . When the sample image is the real image 51, the content feature c of the real image 51 can be obtained by the encoder 52 ₁ Specific counterfeit feature f ^s ₁ And general forgery feature f ^c ₁ . The specific counterfeit features of the sample image are input into a first classifier 53 of the image detection model, resulting in a first classification result 54. The first classification result 54 is used for indicating the probability that the sample image corresponds to n categories, where n includes a category corresponding to a real image category and n-1 forgery methods, and n is an integer greater than 2. The generic counterfeit features of the sample image are input into a second classifier 55 of the image detection model, resulting in a second classification result 56. The second classification result 56 is used to indicate the probability that the sample image corresponds to the counterfeit image category and the genuine image category, respectively. Based on the first classification result and the second classification result, classification task loss is obtained, and the classification task loss is used for reflecting performance of the image detection model in executing classification tasks.

In some embodiments, the content features, the specific counterfeit features and the general counterfeit features of the sample image (including the counterfeit image 50 and the authentic image 51) may also be combined and then input into the decoder 57 of the image detection model to obtain a first reconstructed image 58, a second reconstructed image 59, a third reconstructed image 60 and a fourth reconstructed image 61. And obtaining reconstruction task loss based on the reconstruction image and the sample image, wherein the reconstruction task loss is used for reflecting the performance of the image detection model for executing the reconstruction task.

In some embodiments, the comparison regularization loss may be further obtained by combining the specific counterfeit feature and the general counterfeit feature of the other sample images in the training sample set based on the specific counterfeit feature and the general counterfeit feature of the sample image, where the comparison regularization loss is used to reflect the specific counterfeit feature and the general counterfeit feature extracted by the image detection model, and the distinguishing performance of the similar image and the dissimilar image.

And then, based on the classification task loss, the reconstruction task loss and the comparison regularization loss, determining the total loss of the image detection model, and adjusting parameters of the image detection model with the aim of minimizing the total loss to obtain the trained image detection model. The model training process is actually a decoupling learning process, and aims to strengthen decoupling capability of the model, namely capability of separating a certain type of feature from total feature information of an image when the model is subjected to feature extraction.

During the use of the model, the image to be detected 70 is input into the encoder 52 (specifically, the first encoder) of the trained image detection model, so as to obtain the universal counterfeit feature f of the image to be detected ^c _t Generic forgery feature f of an image to be detected ^c _t The result 71 of the image to be detected is obtained by inputting the training image detection model into the second classifier 55. The detection result 71 is used to indicate whether the image to be detected is a genuine image or a counterfeit image.

In the related art, the image is detected by extracting all the counterfeit features of the image, and in the technical scheme provided by the embodiment of the application, only the general counterfeit features of the image are extracted and the image is detected based on the features in the model use stage. Therefore, the embodiment of the application provides a novel image detection framework based on decoupling learning.

Referring to fig. 3, a flowchart of a training method of an image detection model according to an embodiment of the application is shown. The subject of execution of the steps of the method may be a computer device, such as a model training device as described above. The method may include at least one of the following steps 210-250.

Step 210, a training sample set of the image detection model is obtained.

In an embodiment of the application, the image detection model comprises at least a first encoder, a first classifier and a second classifier. Wherein the first encoder may be constructed based on a neural network for extracting specific forgery features and general forgery features of the image. The first classifier and the second classifier are used to perform two different classification tasks, see in particular the description of the embodiments below.

The training sample set includes a plurality of sample images including at least one counterfeit image and at least one authentic image. The sample image may be a picture or a video, which is not limited in this application. When the sample image is a picture, the sample image can be from a data set of image Net and other label information marked with image authenticity, fake method and other label information, or from a classified image database established by a technician collecting images by himself. The forgery image refers to an image known to be subjected to forgery processing, and the true image refers to an image known to be not subjected to forgery processing. In the embodiment of the application, the forging process refers to changing the original content of the image by technical means. The technical means can be simple enhancement, blurring, displacement and deletion, and can also be an image forging technology based on deep learning, and the technical means used in forging processing are not limited by the application.

In the application, when the embodiment of the application is applied to specific products or technologies, the related data collection, use and processing processes should comply with national legal regulations, the information processing rules should be informed and independent consent (or legal basis) of the target object should be solicited before the image information is collected, the image is processed in strict compliance with legal regulations and personal information processing rules, and technical measures are taken to ensure the safety of the related data.

Step 220, extracting, by the first encoder, the specific counterfeit feature and the generic counterfeit feature of the sample image.

Image feature extraction refers to extracting a numerical representation from an image, which can represent image features, and the image features can include features such as local textures, colors, shapes, edges and the like in the image, which are not limited in the application.

The specific forgery feature refers to a feature in an image for distinguishing between different forgery methods. The specific forgery feature is a forgery feature unique to each forgery method. Different counterfeiting methods will also vary with the particular counterfeit characteristics to which they correspond. Thus, it is often difficult to generalize the specific counterfeit features of one approach to other approaches.

The universal counterfeit feature refers to a common feature of different counterfeit methods in the image. A generic counterfeit feature is a counterfeit feature that co-exists with a number of different counterfeit methods. The common fake features are learned, so that the universal fake features have stronger generalization performance.

Referring to fig. 4, in some embodiments, the first encoder 30 may be constructed based on an Xception algorithm, and the first encoder 30 may include a backbone network 31, a first branch network 32, and a second branch network 33. Step 220 may include the following sub-steps:

1. inputting a sample image into a backbone network, and extracting fake characteristics of the sample image through the backbone network, wherein the fake characteristics are characteristics for fake detection in the image;

the backbone network is used to extract counterfeit features of the image. In some embodiments, the backbone network may include a plurality of convolution and normalization and activation modules, a plurality of modules that apply the Xception algorithm to perform convolution calculations. Referring to fig. 4, after a sample image is input into a first encoder, the sample image is first subjected to two convolution and normalization and activation modules, and then subjected to an Xception module 1-3, an Xception module 4-7 and an Xception module 8-12, so as to obtain a counterfeit feature of the sample image.

2. Inputting the fake characteristics of the sample image into a first branch network, and obtaining the specific fake characteristics of the sample image through the first branch network;

the first branch network is used to extract a particular counterfeit feature of the image from the counterfeit features of the image. The first branch network may illustratively include several fully connected layers, or the first branch network may be of other structures, as the application is not limited in this regard.

3. The fake characteristics of the sample image are input into a second branch network, and the universal fake characteristics of the sample image are obtained through the second branch network.

The second branch network is used for extracting general fake features of the image from fake features of the image. The second branch network may illustratively include several fully connected layers, or the second branch network may be of other structures, as the application is not limited in this regard.

In some embodiments, of course, fig. 4 is only exemplary for the structure of the first encoder, and any encoder structure constructed based on a neural network may be applied to the present application. For example, a transducer may also be employed as the backbone network of the first encoder, with attention mechanisms to extract specific and generic forgery features of the sample image. The first branch network and the second branch network may have the same structure or may have different structures. In the case where the first branch network and the second branch network have the same structure, the parameters of the two networks are not shared, and thus the parameters of the two networks are different.

Step 230, obtaining, by a first classifier, a first classification result of the sample image according to the specific counterfeit characteristics of the sample image, where the first classification result is used to indicate probabilities that the sample image corresponds to n categories, and the n categories include categories corresponding to a real image category and n-1 counterfeit methods, respectively, and n is an integer greater than 2.

The true image category refers to the category to which an image which is not forged should belong, and the category to which a certain forging method corresponds refers to the category to which a forged image obtained by forging by the forging method should belong.

The n categories are determined by the categories of all sample images in the training sample set. For example, if the training sample set contains three categories of images, one is a real image, one is an image processed by the first forging method, and one is an image processed by the second forging method, there are 3 categories corresponding to the real image, the first forging method, and the second forging method, respectively.

The probabilities that the sample images respectively correspond to the n categories refer to probabilities that the sample images respectively belong to the categories. For example, if there are 3 categories, corresponding to the real image, the first forging method, and the second forging method, respectively, and the output corresponding to the first classifier is (0.1,0.9,0), the probability that the sample image belongs to the real image is indicated to be 0.1, the probability that the sample image belongs to the image processed by the first forging method is 0.9, and the probability that the sample image belongs to the image processed by the second forging method is 0. Optionally, the sum of probabilities that the sample image corresponds to n categories respectively is 1, and the probability that the sample image corresponds to each category is a value between [0,1 ].

In addition, in some embodiments, for the Face image, the above-mentioned forging method may be a method of Face reproduction based on expression driving, such as Face2Face, neuralTextures, or a method of replacing the entire Face area in the image, such as Deepfakes, faceSwap; for other types of images, the above-mentioned forgery method may be a method of tampering with any area in the image, which is not limited by the present application.

In addition, in the embodiment of the application, n is an integer greater than 2, that is, the value of n is greater than or equal to 3, so that n categories corresponding to the first classifier exist, besides one real image category, at least two categories corresponding to different forging methods respectively, and the first classifier can distinguish different forging methods, thereby improving the extraction capability of the model for specific forging features.

Step 240, obtaining, by a second classifier, a second classification result of the sample image according to the universal counterfeit feature of the sample image, where the second classification result is used to indicate probabilities that the sample image corresponds to the counterfeit image category and the real image category respectively.

The true image category refers to a category to which an image that has not been counterfeited should belong, and the counterfeit image category refers to a category to which an image that has been counterfeited should belong.

The probability that the sample image corresponds to the fake image category and the real image category respectively refers to the probability that the sample image belongs to the fake image and the real image respectively. For example, for the fake image category and the real image category, if the output corresponding to the second classifier is (0.1,0.9), it indicates that the probability that the sample image belongs to the fake image is 0.1 and the probability that the sample image belongs to the real image is 0.9. Optionally, the sum of probabilities that the sample image corresponds to the counterfeit image category and the real image category is 1, respectively, and the probability that the sample image corresponds to each category is a value between [0,1 ].

In some embodiments, a Multi-Layer perceptron (MLP) with different parameters may be used as the first classifier and the second classifier, where the first classifier may be used to learn a specific counterfeit feature of the sample image, and the second classifier may be used to learn a general counterfeit feature of the sample image.

And step 250, adjusting parameters of the image detection model according to the first classification result and the second classification result to obtain the trained image detection model.

Because the first classification result and the second classification result which can indicate the probability that the sample image belongs to each category are obtained, the known label information which indicates the category of the sample image is combined, the loss function value of the model can be calculated, then the parameters of the model are adjusted by taking the minimization of the loss function value as a target and adopting a gradient descent method, so that the image detection model which is completed with training is obtained.

In some embodiments, this step may include substeps 252-256.

And step 252, calculating a classification task loss according to the first classification result and the second classification result, wherein the classification task loss is used for reflecting the performance of the image detection model for executing the classification task.

In some embodiments, this step may include the following sub-steps:

1. and calculating to obtain a first classification loss according to the first classification result and first label information of the sample image, wherein the first label information is used for indicating the category to which the sample image belongs in the n categories.

In some embodiments, the first classification loss may be obtained by calculating a cross entropy between the first classification result and the first label information of the sample image.

I.e. first classification loss L ^s _ce ＝L _ce (H _s (f ^s _i ),y _i ) Wherein H is _s (f ^s _i ) For the first classification result, y _i First tag information for sample image, L _ce For cross entropy operations.

The first classification result is obtained according to the specific counterfeit characteristics of the sample image, and the first classification loss is obtained according to the first classification result, so that the first classification loss can be used for learning the specific counterfeit characteristics of different counterfeit methods by the supervision model.

2. And calculating a second classification loss according to a second classification result and second label information of the sample image, wherein the second label information is used for indicating the category of the sample image in the fake image category and the real image category.

In some embodiments, the second classification loss may be obtained by calculating a cross entropy between the second classification result and the second label information of the sample image.

I.e. second classification loss L ^c _ce ＝L _ce (H _c (f ^c _i ),y _i ') wherein H _c (f ^c _i ) For the second classification result, y _i ' second tag information of sample image, L _ce For cross entropy operations.

The second classification result is obtained according to the universal forging characteristics of the sample image, and the second classification loss is obtained according to the second classification result, so that the second classification loss can be used for learning the universal forging characteristics of different forging methods by the supervision model.

3. And calculating to obtain the classification task loss according to the first classification loss and the second classification loss.

Therefore, the classification task loss is obtained through calculation by the method, so that the model can learn the specific counterfeit characteristics and the general counterfeit characteristics at the same time, the capability of distinguishing the specific counterfeit characteristics and the general counterfeit characteristics of the model is enhanced, and the generalization of the model is improved.

Step 254, determining a total loss according to the classification task loss, wherein the total loss is used for reflecting the overall performance of the image detection model.

The total loss in the embodiments of the present application may be obtained only by the loss of the classification task, or may be obtained by weighted summation of the loss of the classification task and the loss of the classification task, which will be described in detail in the embodiments below.

And step 256, adjusting parameters of the image detection model according to the total loss to obtain the trained image detection model.

In the embodiment of the present application, the parameters of the image detection model refer to parameters of all modules including the first encoder, the first classifier and the second classifier. In some embodiments, parameters of the above modules may be adjusted based on total loss to continuously optimize model performance.

According to the technical scheme provided by the embodiment of the application, the specific counterfeit characteristics and the general counterfeit characteristics of the sample image are extracted through the first encoder of the image detection model, the first classification result of the sample image is obtained according to the specific counterfeit characteristics, the second classification result of the sample image is obtained according to the general counterfeit characteristics, and the parameters of the image detection model are adjusted according to the first classification result and the second classification result, so that the trained image detection model is obtained. By training the image detection model by the method, the capability of decoupling common features of different forging methods from all feature information of the image is improved, so that the image detection model can better detect various forged images generated by different forging methods, and the generalization of the image detection model is improved.

In the above embodiments, only one type of loss function is calculated for adjusting the model parameters. In some other embodiments, more types of loss functions may also be calculated to further optimize model performance.

In some embodiments, the image detection model further comprises a second encoder and decoder, and the reconstruction task loss of the image detection model may be calculated, and determining the reconstruction task loss may comprise the steps of:

1. the content features of the sample image, which are features of the image that are not related to forgery, are extracted by the second encoder.

Referring to fig. 4, in some embodiments, the second encoder 40 may also be constructed based on an Xception algorithm, with a backbone network similar in structure to the first encoder but different in parameters.

In some embodiments, a transducer or other type of backbone network may be used as the second encoder, and the attention mechanism is used to extract the content features of the sample image, which is not limited in this disclosure.

2. The reconstructed image is generated by a decoder from the content characteristics of the sample image, the specific counterfeit characteristics of the sample image, and the generic counterfeit characteristics of the sample image.

Referring to fig. 5, in some embodiments, the reconstructed image is generated by a decoder based on an AdaIN (adaptive instance normalization ) algorithm from the content features of the sample image, the specific counterfeit features of the sample image, and the generic counterfeit features of the sample image.

For example, the specific counterfeit feature and the general counterfeit feature of the sample image may be processed by the multi-layer sensor and then used as AdaIN parameters, and then AdaIN calculation is performed based on the AdaIN parameters and the content features to obtain an intermediate feature. After the intermediate feature is obtained, the specific counterfeit feature and the general counterfeit feature processed by the multi-layer perceptron are continuously input into a new convolution block and the multi-layer perceptron for processing to obtain a new AdaIN parameter, and the intermediate feature processed by the convolution is combined based on the parameter to obtain the new intermediate feature. Repeating the steps after the intermediate features are obtained, and convolving the finally obtained intermediate features to obtain the reconstructed image.

The method is simple in that the mean and variance of the content features and the counterfeit features are aligned by using an AdaIN algorithm to obtain intermediate features, and a series of up-sampling and convolution are performed based on the intermediate features, and multiple AdaIN operations are performed to obtain a reconstructed image. The AdaIN algorithm realizes image reconstruction by changing the data distribution of the features on the level of the feature image, has small calculation cost and storage cost and is easy to realize.

Intermediate feature AdaIN (c, f) =σ (f) ×{ [ c- μ (c) ]/σ (c) } +μ (f), where c denotes the content feature, f denotes the counterfeit feature, function σ denotes the standard deviation calculation, and function μ denotes the averaging calculation.

In some embodiments, proGAN may also be used as a decoder to improve the quality of the reconstructed image.

In some embodiments, for each sample image pair, including a counterfeit image and a genuine image, reconstructing the image includes:

a first reconstructed image generated from the content characteristics of the counterfeit image, the specific counterfeit characteristics of the counterfeit image, and the generic counterfeit characteristics of the counterfeit image;

a second reconstructed image generated from the content characteristics of the counterfeit image, the specific counterfeit characteristics of the authentic image, and the generic counterfeit characteristics of the authentic image;

a third reconstructed image generated from the content features of the real image, the specific counterfeit features of the real image, and the generic counterfeit features of the real image;

a fourth reconstructed image generated from the content characteristics of the real image, the specific counterfeit characteristics of the counterfeit image, and the generic counterfeit characteristics of the counterfeit image.

3. And calculating to obtain a reconstruction task loss according to the sample image and the reconstruction image, wherein the reconstruction task loss is used for reflecting the performance of the image detection model for executing the reconstruction task, and the reconstruction task loss is used for determining the total loss by combining the classification task loss.

In some embodiments, a reconstruction penalty is calculated from the counterfeit image and the first reconstructed image, and the authentic image and the third reconstructed image, the reconstruction penalty reflecting performance of the image detection model to perform a reconstruction task based on counterfeit features and content features from the same image, the same image comprising the same authentic image and the same counterfeit image. And calculating cross reconstruction loss according to the fake image and the second reconstruction image, and the real image and the fourth reconstruction image, wherein the cross reconstruction loss is used for reflecting the performance of the image detection model for performing reconstruction tasks based on fake features and content features from different images, and the different images comprise the fake image and the real image. And then, calculating to obtain the reconstruction task loss according to the reconstruction loss and the cross reconstruction loss.

Loss of self-reconstruction L ^s _rec ＝||x ₀ -D(f ₀ ,c ₀ )|| ₁ +||x ₁ -D(f ₁ ,c ₁ )|| ₁ Wherein x is ₀ Refers to counterfeit image, x ₁ Refer to the real image, f ₀ Refers to the counterfeit characteristics of the counterfeit image (including the specific counterfeit characteristics of the counterfeit image and the generic counterfeit characteristics of the counterfeit image), f ₁ Referring to counterfeit features of the real image (including specific counterfeit features of the real image and general counterfeit features of the real image), c ₀ Meaning the content characteristics of the counterfeit image c ₁ Refers to the content features of the real image, D (f ₀ ,c ₀ ) Refers to the first reconstructed image, D (f ₁ ,c ₁ ) Refers to the third reconstructed image.

Cross reconstruction loss L ^c _rec ＝||x ₀ -D(f ₁ ,c ₀ )|| ₁ +||x ₁ -D(f ₀ ,c ₁ )|| ₁ Wherein D (f ₁ ,c ₀ ) Refers to the second reconstructed image, D (f ₀ ,c ₁ ) Refers to the fourth reconstructed image.

Reconstruction task loss L _rec ＝L ^s _rec +L ^c _rec Calculating the loss enhances the decoupling of the content features from the counterfeit features, improving the ability of the model to distinguish between the content features and the counterfeit features.

In some embodiments, a contrast regularization loss of the image detection model may also be calculated, and determining the contrast regularization loss may include the steps of:

1. calculating to obtain a first contrast loss according to the specific counterfeit characteristics of the sample image, the specific counterfeit characteristics of the positive example image corresponding to the sample image and the specific counterfeit characteristics of the negative example image corresponding to the sample image; the sample image is used for detecting the sample image, wherein the sample image corresponds to the positive example image, the sample image does not have similarity, and the negative example image corresponds to the sample image;

First contrast loss L _scon ＝max(||f ^s _A -f ^s _P || ₂ -||f ^s _A -f ^s _N || ₂ +α, 0), where f ^s _A Refer to a specific counterfeit feature of the sample image, f ^s _P Specific counterfeit characteristics of the positive example image corresponding to the sample image, f ^s _N Referring to the specific counterfeit feature of the negative image to which the sample image corresponds, α is an interval super parameter, and α is set to 3, for example.

By calculating the first contrast loss, the model can be made to learn specific forgery characteristics of different forgery methods.

2. Calculating to obtain a second contrast loss according to the universal forging characteristics of the sample image, the universal forging characteristics of the positive example image corresponding to the sample image and the universal forging characteristics of the negative example image corresponding to the sample image, wherein the second contrast loss is used for reflecting the distinguishing performance of the universal forging characteristics extracted by the image detection model on similar and dissimilar images;

second contrast loss L _ccon ＝max(||f ^c _A -f ^c _P || ₂ -||f ^c _A -f ^c _N || ₂ +α, 0), where f ^c _A Refers to the general counterfeit feature of the sample image, f ^c _P Refers to the general counterfeit characteristics of the positive example image corresponding to the sample image, f ^c _N Refers to the general counterfeit feature of the negative image corresponding to the sample image.

By calculating the second contrast loss, the model can be made to learn generic forgery characteristics of different forgery methods.

The positive example image of the sample image refers to an image belonging to the same true or false category as the sample image in the sample data set, and the negative example image refers to an image belonging to a different true or false category as the sample image in the sample data set. That is, the positive example image of the real image is other real images in the sample data set, and the negative example image is a fake image in the sample data set; the positive example image of the fake image is other fake image in the sample data set, and the negative example image is the real image in the sample data set.

3. And calculating a comparison regularization loss according to the first comparison loss and the second comparison loss, wherein the comparison regularization loss is used for determining the total loss by combining the classification task losses.

The discrimination of the model to the true and false samples in the specific counterfeit characteristics and the universal counterfeit characteristics space can be further enhanced by calculating the contrast regularization loss, so that the detection performance of the model is improved.

In some embodiments, the total loss may be obtained by weighted summing the reconstructed task loss, the comparative regularization loss, and the classification task loss calculated by the method described above. Based on the total loss, parameters of the first encoder, the second encoder, the first classifier, the second classifier and the decoder are adjusted in real time, so that the model performance can be further optimized, and the model generalization performance can be improved.

In the technical solution provided in the present application, the total loss may be determined only based on the classification task loss, may be determined based on the classification task loss and the reconstruction task loss, may be determined based on the classification task loss and the comparison regularization loss, and may be determined based on all types of losses, which is not limited in this aspect of the present application.

In some embodiments, acquiring a training sample set of the image detection model may further comprise the following two sub-steps:

1. acquiring a plurality of real images and a plurality of fake images obtained based on m fake methods, wherein m is an integer larger than 1;

2. a counterfeit image is generated based on the authentic image by at least one generator, each corresponding to a newly added counterfeit method other than the m counterfeit methods described above.

For example, if the training sample set includes a real image and a counterfeit image obtained based on 3 counterfeit methods, 4 classes should be in the first classifier, which respectively correspond to the real image and the three counterfeit methods, if one of the above generators is added to the training sample set, the counterfeit image obtained based on the 4 th counterfeit method will appear in the training sample set, and the 5 th class also appears in the first classifier, which corresponds to the 4 th counterfeit method.

In some embodiments, at least one detector may also be trained to detect counterfeit images generated by the at least one generator and adjust parameters of the generator based on the detection results to generate a higher quality, more variety of counterfeit images.

By expanding the training sample set through the method, the richness of the forging method for forging the image in the training sample set can be improved, and the generalization of the image detection model is further improved.

In the following, a description will be given of a flow of image detection using the above image detection model by way of example, and the content related to the use of the image detection model and the content related to the training process correspond to each other, and the two are mutually communicated, for example, where one side is a detailed description, reference may be made to the description of the other side.

Referring to fig. 6, a flowchart of an image detection method according to an embodiment of the application is shown. The subject of execution of the steps of the method may be a computer device, such as a model-using device as described above. The method may include at least one of the following steps 410-430.

In step 410, an image to be detected is acquired.

The image to be detected may be a picture or a video, which is not limited in the present application. For example, when the training sample set for training the model is a picture, the image to be detected may also be a picture. When the training sample set for training the model is video, the image to be detected may also be video.

Step 420, extracting general counterfeit features of the image to be detected by the first encoder of the image detection model, wherein the general counterfeit features refer to common features of different counterfeit methods in the image.

During the use of the model, only the first encoder of the image detection model is required to extract the generic counterfeit features of the image.

Step 430, obtaining, by a second classifier of the image detection model, a recognition result of the image to be detected according to the universal counterfeit feature of the image to be detected, where the recognition result is used to indicate whether the image to be detected is a counterfeit image.

In some embodiments, a second classifier of the image detection model may obtain a second classification result of the image to be detected, where the second classification result is used to indicate probabilities that the image to be detected corresponds to a fake image class and a real image class respectively, and a technician may preset a corresponding fake probability threshold. If the probability that the image to be detected belongs to the fake image is larger than or equal to the fake probability threshold value, determining that the image to be detected is a fake image; if the probability that the image to be detected belongs to the fake image is smaller than the fake probability threshold value, determining that the image to be detected is not the fake image (such as a real image). Or if the probability that the image to be detected belongs to the fake image is larger than the fake probability threshold value, determining that the image to be detected is the fake image; if the probability that the image to be detected belongs to the fake image is smaller than or equal to the fake probability threshold value, determining that the image to be detected is not the fake image (such as a real image).

In some embodiments, the technician may also preset a corresponding true probability threshold. If the probability that the image to be detected belongs to the real image is larger than or equal to the real probability threshold value, determining that the image to be detected is the real image; if the probability that the image to be detected belongs to the true image is smaller than the true probability threshold, determining that the image to be detected is not the true image (such as a fake image). Or if the probability that the image to be detected belongs to the real image is larger than the real probability threshold, determining that the image to be detected is the real image; if the probability that the image to be detected belongs to the true image is smaller than or equal to the true probability threshold, determining that the image to be detected is not the true image (such as a fake image).

In addition, the image detection model is obtained by training the training method of the image detection model. The specific training process is described in the above embodiments, and is not repeated here.

In this embodiment, by extracting general counterfeit characteristics of the image to be detected and distinguishing the image to be detected, multiple counterfeit images generated by different counterfeit methods can be detected, and generalization is high.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Referring to fig. 7, a block diagram of a training apparatus for an image detection model according to an embodiment of the present application is shown. The device has the function of realizing the training method of the image detection model, and the function can be realized by hardware or by executing corresponding software by the hardware. The device may be a computer device or may be provided in a computer device. The apparatus 600 may include: a sample acquisition module 610, a feature extraction module 620, a first classification module 630, a second classification module 640, and a parameter adjustment module 650.

In an embodiment of the application, the image detection model includes a first encoder, a first classifier, and a second classifier.

A sample acquisition module 610 is configured to acquire a training sample set of the image detection model, where the training sample set includes a plurality of sample images, and the plurality of sample images includes at least one counterfeit image and at least one authentic image.

The feature extraction module 620 is configured to extract, by the first encoder, a specific counterfeit feature of the sample image and a general counterfeit feature, where the specific counterfeit feature is a feature in the image for distinguishing different counterfeit methods, and the general counterfeit feature is a common feature in the image for distinguishing different counterfeit methods.

The first classification module 630 is configured to obtain, by using the first classifier, a first classification result of the sample image according to the specific counterfeit feature of the sample image, where the first classification result is used to indicate probabilities that the sample image corresponds to n categories respectively, and the n categories include categories corresponding to a real image category and n-1 counterfeit methods respectively, and n is an integer greater than 1.

And the second classification module 640 is configured to obtain, by using the second classifier, a second classification result of the sample image according to the universal counterfeit feature of the sample image, where the second classification result is used to indicate probabilities that the sample image corresponds to a counterfeit image class and a true image class respectively.

And the parameter adjustment module 650 is configured to adjust parameters of the image detection model according to the first classification result and the second classification result, so as to obtain a trained image detection model.

In some embodiments, the first encoder includes a backbone network, a first branch network, and a second branch network; the feature extraction module 620 includes: the device comprises a fake feature extraction sub-module, a specific fake feature extraction sub-module and a general fake feature extraction sub-module.

And the fake feature extraction submodule is used for inputting the sample image into the backbone network, and extracting fake features of the sample image through the backbone network, wherein the fake features are features for fake detection in the image.

And the specific fake feature extraction sub-module is used for inputting fake features of the sample image into the first branch network, and obtaining the specific fake features of the sample image through the first branch network.

And the general fake feature extraction sub-module is used for inputting the fake features of the sample image into the second branch network and obtaining the general fake features of the sample image through the second branch network.

In some embodiments, the parameter adjustment module 650 includes: the system comprises a classification task loss calculation sub-module, a total loss calculation sub-module and a parameter adjustment sub-module.

And the classification task loss calculation sub-module is used for calculating to obtain classification task loss according to the first classification result and the second classification result, wherein the classification task loss is used for reflecting the performance of the image detection model for executing classification tasks.

And the total loss calculation sub-module is used for determining total loss according to the classification task loss, and the total loss is used for reflecting the overall performance of the image detection model.

And the parameter adjusting sub-module is used for adjusting the parameters of the image detection model according to the total loss to obtain the trained image detection model.

In some embodiments, the classification task loss calculation submodule includes: the system comprises a first classification loss calculation unit, a second classification loss calculation unit and a classification task loss calculation unit.

The first classification loss calculation unit is used for calculating and obtaining first classification loss according to the first classification result and first label information of the sample image, wherein the first label information is used for indicating the category of the sample image in the n categories.

The second classification loss calculation unit is used for calculating and obtaining second classification loss according to the second classification result and second label information of the sample image, wherein the second label information is used for indicating the category of the sample image in the forged image category and the real image category.

And the classification task loss calculation unit is used for calculating the classification task loss according to the first classification loss and the second classification loss.

In some embodiments, the image detection model further comprises a second encoder and decoder; the parameter adjustment module 650 further includes: the device comprises a content characteristic extraction sub-module, an image reconstruction sub-module and a reconstruction task loss calculation sub-module.

And the content characteristic extraction submodule is used for extracting the content characteristics of the sample image through the second encoder, wherein the content characteristics refer to characteristics which are irrelevant to counterfeiting in the image.

An image reconstruction sub-module for generating, by the decoder, a reconstructed image from the content characteristics of the sample image, the specific counterfeit characteristics of the sample image, and the generic counterfeit characteristics of the sample image.

And the reconstruction task loss calculation sub-module is used for calculating the reconstruction task loss according to the sample image and the reconstruction image, wherein the reconstruction task loss is used for reflecting the performance of the image detection model for executing the reconstruction task, and the reconstruction task loss is used for determining the total loss by combining the classification task loss.

In some embodiments, for each sample image pair, one counterfeit image and one authentic image are included. Reconstructing the image includes: a first reconstructed image generated from the content characteristics of the counterfeit image, the specific counterfeit characteristics of the counterfeit image, and the generic counterfeit characteristics of the counterfeit image; a second reconstructed image generated from the content characteristics of the counterfeit image, the specific counterfeit characteristics of the authentic image, and the generic counterfeit characteristics of the authentic image; a third reconstructed image generated from the content features of the real image, the specific counterfeit features of the real image, and the generic counterfeit features of the real image; a fourth reconstructed image generated from the content characteristics of the authentic image, the specific counterfeit characteristics of the counterfeit image, and the generic counterfeit characteristics of the counterfeit image.

The reconstruction task loss calculation submodule comprises: a self-reconstruction loss calculation unit, a cross-reconstruction loss calculation unit and a reconstruction task loss calculation unit.

And the reconstruction loss calculation unit is used for calculating the reconstruction loss according to the fake image, the first reconstruction image, the real image and the third reconstruction image.

And the cross reconstruction loss calculation unit is used for calculating the cross reconstruction loss according to the fake image, the second reconstruction image, the real image and the fourth reconstruction image.

And the reconstruction task loss calculation unit is used for calculating the reconstruction task loss according to the reconstruction loss and the cross reconstruction loss.

In some embodiments, the image reconstruction sub-module is configured to generate, by the decoder, the reconstructed image based on an adaptive instance normalization AdaIN algorithm from the content features of the sample image, the specific counterfeit features of the sample image, and the generic counterfeit features of the sample image.

In some embodiments, the parameter adjustment module 650 further comprises: the system comprises a first contrast loss calculation sub-module, a second contrast loss calculation sub-module and a contrast regularization loss calculation sub-module.

The first contrast loss calculation sub-module is used for calculating to obtain first contrast loss according to the specific counterfeit characteristics of the sample image, the specific counterfeit characteristics of the positive example image corresponding to the sample image and the specific counterfeit characteristics of the negative example image corresponding to the sample image; the sample image is used for detecting whether the sample image is a negative image or not, wherein the positive image corresponding to the sample image is similar to the sample image, the negative image corresponding to the sample image is not similar to the sample image, and the first contrast loss is used for reflecting distinguishing performance of specific fake features extracted by the image detection model on similar and dissimilar images.

The second contrast loss calculation sub-module is used for calculating and obtaining second contrast loss according to the general forging characteristics of the sample image, the general forging characteristics of the positive example image corresponding to the sample image and the general forging characteristics of the negative example image corresponding to the sample image, and the second contrast loss is used for reflecting the distinguishing performance of the general forging characteristics extracted by the image detection model on similar and dissimilar images.

And the comparison regularization loss calculation sub-module is used for determining the total loss by combining the classification task loss.

In some embodiments, the sample acquisition module 610 includes: an acquisition sub-module and a generation sub-module.

The acquisition sub-module is used for acquiring a plurality of real images and acquiring a plurality of fake images obtained based on m fake methods, wherein m is an integer larger than 1.

A generation sub-module for generating a forgery image based on the real image by at least one generator, each generator corresponding to one newly added forgery method other than the m forgery methods.

Referring to fig. 8, a block diagram of an image detection apparatus according to an embodiment of the present application is shown. The device has the function of realizing the image detection method, and the function can be realized by hardware or by executing corresponding software by the hardware. The device may be a computer device or may be provided in a computer device. The apparatus 700 may include: an image acquisition module 710, a commonality feature extraction module 720, and an identification module 730.

An image acquisition module 710, configured to acquire an image to be detected.

The common feature extraction module 720 is configured to extract, by using a first encoder of the image detection model, a common counterfeit feature of the image to be detected, where the common counterfeit feature is a common feature of different counterfeit methods in the image.

The identification module 730 is configured to obtain, according to the universal counterfeit feature of the image to be detected, an identification result of the image to be detected, where the identification result is used to indicate whether the image to be detected is a counterfeit image.

The image detection model is obtained by training the image detection model by the training method.

It should be noted that, in the apparatus provided in the foregoing embodiment, when implementing the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

Referring to FIG. 9, a block diagram of a computer device according to one embodiment of the present application is shown.

In general, the computer device 900 includes: a processor 901 and a memory 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, a 14-core processor, and the like. The processor 901 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 901 may also include a main processor and a coprocessor, the main processor being a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 901 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 901 may also include an AI processor for processing computing operations related to machine learning.

The memory 902 may include one or more computer-readable storage media, which may be tangible and non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 902 stores a computer program that is loaded and executed by processor 901 to implement the training method of the image detection model or the image detection method described above.

Those skilled in the art will appreciate that the architecture shown in fig. 9 is not limiting of the computer device 900, and may include more or fewer components than shown, or may combine certain components, or employ a different arrangement of components.

In some embodiments, there is also provided a computer readable storage medium having stored therein a computer program loaded and executed by a processor to implement the training method or the image detection method of the image detection model described above.

Alternatively, the computer-readable storage medium may include: ROM (Read-Only Memory), RAM (Random-Access Memory), SSD (Solid State Drives, solid State disk), optical disk, or the like. The random access memory may include ReRAM (Resistance Random Access Memory, resistive random access memory) and DRAM (Dynamic Random Access Memory ), among others.

In some embodiments, there is also provided a computer program product comprising a computer program stored in a computer readable storage medium, from which a processor reads and executes the computer program to implement the training method or the image detection method of the image detection model described above.

It should be noted that, before and during the process of collecting the relevant data of the user, the present application may display a prompt interface, a popup window or output voice prompt information, where the prompt interface, popup window or voice prompt information is used to prompt the user to collect the relevant data currently, so that the present application only starts to execute the relevant step of obtaining the relevant data of the user after obtaining the confirmation operation of the user to the prompt interface or popup window, otherwise (i.e. when the confirmation operation of the user to the prompt interface or popup window is not obtained), the relevant step of obtaining the relevant data of the user is finished, i.e. the relevant data of the user is not obtained. In other words, all user data (including training sample set and to-be-detected image obtained by the image detection model) collected by the application are processed strictly according to the requirements of relevant national laws and regulations, the informed consent or independent consent of the personal information body is obtained, and the subsequent data use and processing actions are developed within the authorized range of the laws and regulations and the personal information body.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. In addition, the step numbers described herein are merely exemplary of one possible execution sequence among steps, and in some other embodiments, the steps may be executed out of the order of numbers, such as two differently numbered steps being executed simultaneously, or two differently numbered steps being executed in an order opposite to that shown, which is not limiting.

The foregoing is illustrative of the present application and is not to be construed as limiting thereof, but rather, any modification, equivalent replacement, improvement or the like which comes within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A training method of an image detection model, characterized in that the image detection model comprises a first encoder, a first classifier and a second classifier; the method comprises the following steps:

2. The method of claim 1, wherein the first encoder comprises a backbone network, a first branch network, and a second branch network;

the extracting, by the first encoder, the specific counterfeit feature and the generic counterfeit feature of the sample image, comprising:

inputting the sample image into the backbone network, and extracting fake characteristics of the sample image through the backbone network, wherein the fake characteristics are characteristics for fake detection in the image;

inputting the fake features of the sample image into the first branch network, and obtaining the specific fake features of the sample image through the first branch network;

and inputting the counterfeit characteristics of the sample image into the second branch network, and obtaining the universal counterfeit characteristics of the sample image through the second branch network.

3. The method according to claim 1, wherein adjusting parameters of the image detection model according to the first classification result and the second classification result to obtain a trained image detection model comprises:

according to the first classification result and the second classification result, calculating to obtain classification task loss, wherein the classification task loss is used for reflecting the performance of the image detection model for executing classification tasks;

Determining a total loss according to the classification task loss, wherein the total loss is used for reflecting the overall performance of the image detection model;

and adjusting parameters of the image detection model according to the total loss to obtain the trained image detection model.

4. A method according to claim 3, wherein said calculating a classification task loss based on said first classification result and said second classification result comprises:

according to the first classification result and first label information of the sample image, calculating to obtain first classification loss, wherein the first label information is used for indicating the class of the sample image in the n classes;

calculating a second classification loss according to the second classification result and second label information of the sample image, wherein the second label information is used for indicating the class of the sample image in the forged image class and the real image class;

and calculating the classification task loss according to the first classification loss and the second classification loss.

5. A method according to claim 3, wherein the image detection model further comprises a second encoder and decoder; the method further comprises the steps of:

Extracting content features of the sample image by the second encoder, wherein the content features refer to features irrelevant to counterfeiting in the image;

generating, by the decoder, a reconstructed image from the content features of the sample image, the specific counterfeit features of the sample image, and the generic counterfeit features of the sample image;

and calculating a reconstruction task loss according to the sample image and the reconstruction image, wherein the reconstruction task loss is used for reflecting the performance of the image detection model for executing reconstruction tasks, and the reconstruction task loss is used for determining the total loss in combination with the classification task loss.

6. The method of claim 5, comprising, for each sample image pair, a counterfeit image and a genuine image;

the reconstructing an image includes:

a fourth reconstructed image generated from the content characteristics of the real image, the specific counterfeit characteristics of the counterfeit image, and the general counterfeit characteristics of the counterfeit image;

the calculating to obtain the reconstruction task loss according to the sample image and the reconstruction image comprises the following steps:

calculating to obtain a reconstruction loss according to the fake image, the first reconstruction image, the real image and the third reconstruction image;

calculating to obtain cross reconstruction loss according to the fake image, the second reconstruction image, the real image and the fourth reconstruction image;

and calculating the reconstruction task loss according to the reconstruction loss and the cross reconstruction loss.

7. The method of claim 5, wherein generating, by the decoder, a reconstructed image from the content characteristic of the sample image, the particular counterfeit characteristic of the sample image, and the generic counterfeit characteristic of the sample image, comprises:

the reconstructed image is generated by the decoder based on an adaptive instance normalization AdaIN algorithm from the content features of the sample image, the specific counterfeit features of the sample image and the generic counterfeit features of the sample image.

8. A method according to claim 3, characterized in that the method further comprises:

calculating to obtain a first contrast loss according to the specific forging characteristics of the sample image, the specific forging characteristics of the positive example image corresponding to the sample image and the specific forging characteristics of the negative example image corresponding to the sample image; the sample image is used for detecting whether the sample image is a negative example image or not, wherein the positive example image corresponding to the sample image is similar to the sample image, the negative example image corresponding to the sample image is not similar to the sample image, and the first contrast loss is used for reflecting the distinguishing performance of specific fake features extracted by the image detection model on similar and dissimilar images;

calculating a second contrast loss according to the general forging characteristics of the sample image, the general forging characteristics of the positive example image corresponding to the sample image and the general forging characteristics of the negative example image corresponding to the sample image, wherein the second contrast loss is used for reflecting the distinguishing performance of the general forging characteristics extracted by the image detection model on similar and dissimilar images; and calculating a comparison regularization loss according to the first comparison loss and the second comparison loss, wherein the comparison regularization loss is used for determining the total loss in combination with the classification task loss.

9. The method according to any one of claims 1 to 8, wherein said obtaining a training sample set of said image detection model comprises:

acquiring a plurality of real images and a plurality of fake images obtained based on m fake methods, wherein m is an integer larger than 1;

generating a forgery image based on the real image by at least one generator, each generator corresponding to one newly added forgery method other than the m forgery methods.

10. An image detection method, the method comprising:

acquiring an image to be detected;

wherein the image detection model is trained using the method of any one of claims 1 to 9.

11. A training device for an image detection model, wherein the image detection model comprises a first encoder, a first classifier and a second classifier; the device comprises:

12. An image detection apparatus, the apparatus comprising:

the image acquisition module is used for acquiring an image to be detected;

13. A computer device, characterized in that it comprises a processor and a memory in which a computer program is stored, which computer program is loaded and executed by the processor to implement the training method of the image detection model according to any one of claims 1 to 9 or to implement the image detection method according to claim 10.

14. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program, which is loaded and executed by a processor to implement the training method of the image detection model according to any one of claims 1 to 9 or to implement the image detection method according to claim 10.

15. A computer program product, characterized in that it comprises a computer program stored in a computer readable storage medium, from which a processor reads and executes the computer program for implementing a training method of an image detection model according to any one of claims 1 to 9 or for implementing an image detection method according to claim 10.