CN117591815A

CN117591815A - Comprehensive quality evaluation method and device for multi-mode forgery generated data

Info

Publication number: CN117591815A
Application number: CN202311426027.2A
Authority: CN
Inventors: 孙显; 郝凌翔; 邓楚博; 于泓峰; 卢宛萱; 刘小煜
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2023-10-31
Filing date: 2023-10-31
Publication date: 2024-02-23
Anticipated expiration: 2043-10-31
Also published as: CN117591815B

Abstract

The invention provides a comprehensive quality evaluation method and device for multi-mode counterfeiting generated data, which can be applied to the technical field of deep counterfeiting. The method comprises the following steps: determining an evaluation index corresponding to a modality of falsified data and an index value corresponding to the evaluation index; constructing an index dimension graph according to the evaluation index and an index value corresponding to the evaluation index, wherein the index dimension graph comprises a first area and a second area, and the first area comprises the second area; and calculating the comprehensive quality score of the generated data according to the areas of the first area and the second area. The sample size of the training set can be effectively reduced, and the model training cost is reduced.

Description

Comprehensive quality evaluation method and device for multi-mode forgery generated data

Technical Field

The invention relates to the field of computer vision, in particular to a comprehensive quality evaluation method and device for multi-mode counterfeiting generated data.

Background

With the development of related technologies such as depth forging, the method has more and more values in the fields of picture generation, audio and video synthesis and the like, but the quality of generated data is often difficult to control. The evaluation of the data is still based on subjective evaluation or single index, meanwhile, the forged data presents multiple modes such as video, audio and pictures, the principle properties of the corresponding evaluation indexes are different, and it is important how to realize comprehensive and effective quality evaluation of the data generated by different modes, and if the data for training a deep forging detection model is screened, the training cost is lower.

Disclosure of Invention

In view of the above problems, the invention provides a comprehensive quality evaluation method and device for multi-mode forgery generated data, which integrates multi-dimensional evaluation indexes of different mode forgery generated data to realize comprehensive and quantitative data quality evaluation, and lays a foundation for generating data quality classification and further training a deep forgery detection model.

According to a first aspect of the present invention, there is provided a comprehensive quality assessment method for multimodal forgery generated data, comprising:

determining an evaluation index corresponding to a modality that generates data, and an index value corresponding to the evaluation index;

constructing an index dimension graph according to the evaluation index and index values corresponding to the evaluation index, wherein the index dimension graph comprises a first area and a second area, and the first area comprises the second area;

calculating the comprehensive quality score of the generated data according to the areas of the first area and the second area;

and screening training data for training the deep forgery detection model based on the comprehensive quality scores of the generated data.

Optionally, the modes for generating data include a picture mode, an audio mode and a video mode;

the evaluation indexes corresponding to the video mode and the picture mode comprise at least one of a distance score FID, a structural similarity SSIM, a peak signal-to-noise ratio PSNR, a nuclear perception distance KID, an image perception similarity LPIPS and a mean opinion score MOS;

the evaluation index corresponding to the audio mode comprises at least one of mel-frequency cepstrum distortion (MCD), metal Oxide Semiconductor (MOS) and audio perception quality evaluation (PEAQ).

Optionally, the constructing an index dimension graph according to the evaluation index and the index value corresponding to the evaluation index includes:

selecting n evaluation indexes from the evaluation indexes corresponding to the modes of generating data, wherein n is a positive integer;

if n=1, directly calculating the related index value and performing normalization processing;

if n=2, the corresponding index is normalized, and then an average value is calculated;

if n is greater than or equal to 3, the calculation is performed in the following manner:

setting a central origin for the n evaluation indexes;

constructing a positive n-polygon by taking the radial direction from the center origin point to the vertex as 1;

drawing index points on each radius according to the index values of the evaluation indexes, and calculating coordinates of the index points;

and enclosing an irregular n-polygon based on all the index points.

Optionally, before constructing the index dimension graph according to the evaluation index and the index value corresponding to the evaluation index, the method includes:

and normalizing the index value.

Optionally, the normalizing the index value includes:

when the evaluation index includes the FID, mapping the 1-FID as an index value corresponding to the FID;

under the condition that the evaluation index comprises LPIPS, mapping the 1-LPIPS as index value corresponding to the LPIPS;

when the evaluation index includes SSIM, mapping (ssim+1)/2 as an index value corresponding to SSIM;

in case the evaluation index includes PSNR, PSNR/PSNR _max Mapping is performed as an index value corresponding to PSNR _max Represents the maximum value of PSNR;

optionally, the calculating the composite quality score of the generated data according to the areas of the first area and the second area includes:

and multiplying the ratio of the area of the second area to the area of the first area by a preset constant to obtain the comprehensive quality score of the generated data.

Optionally, the screening training data for training the deep forgery detection model based on the comprehensive quality score of the generated data includes:

and screening the generated data with the comprehensive quality score in a preset range as the training data.

A second aspect of the present invention provides an integrated quality assessment device for multimodal forgery generated data, comprising:

a determining module, configured to determine an evaluation index corresponding to a modality that generates data, and an index value corresponding to the evaluation index;

the construction module is used for constructing an index dimension graph according to the evaluation index and index values corresponding to the evaluation index, wherein the index dimension graph comprises a first area and a second area, and the first area comprises the second area;

the calculation module is used for calculating the comprehensive quality score of the generated data according to the areas of the first area and the second area;

and the screening module is used for screening the training data for training the deep forgery detection model based on the comprehensive quality score of the generated data.

A third aspect of the present invention provides an electronic device comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method described above.

A fourth aspect of the invention also provides a computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the above method.

According to the comprehensive quality evaluation method and the comprehensive quality evaluation device for the multi-mode forgery generated data, which are provided by the invention, on one hand, the current situation that subjective and qualitative evaluation of the forgery generated data is mainly performed in the past is improved; on the other hand, the method introduces the objective evaluation indexes of the counterfeit data in multiple dimensions, carries out comprehensive quantitative evaluation, has more comprehensive and reliable quality evaluation, and can be popularized to any mode such as audio, video, pictures and the like; on the other hand, training of the fake detection model based on part of high-quality data obtained through screening after quality classification can effectively reduce training cost; on the other hand, the method has good visualization effect, can intuitively display the quality of data through a front-end interface when the software is realized, and lays a method foundation for the research and development of software plug-ins such as data classification and the like.

Drawings

The foregoing and other objects, features and advantages of the invention will be apparent from the following description of embodiments of the invention with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates a flow chart of a method for comprehensive quality assessment of multimodal counterfeit generation data in accordance with an embodiment of the invention;

FIG. 2 schematically illustrates a dimension graph established from audio data evaluation metrics in accordance with an embodiment of the present invention;

FIG. 3 schematically illustrates a dimension graph established from picture data evaluation metrics in accordance with an embodiment of the present invention;

FIG. 4 schematically shows a block diagram of a comprehensive quality assessment device for multi-modal spurious generation data in accordance with an embodiment of the present invention;

fig. 5 schematically shows a block diagram of an electronic device adapted to implement a method of integrated quality assessment for multimodal counterfeit generated data, in accordance with an embodiment of the invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

In the technical scheme of the invention, the related processes of collecting, storing, using, processing, transmitting, providing, inventing, applying and the like of the personal information of the user all accord with the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.

In the technical scheme of the invention, the processes of data acquisition, collection, storage, use, processing, transmission, provision, invention, application and the like all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.

Fig. 1 schematically shows a flow chart of a method for integrated quality assessment of multimodal forgery generated data according to an embodiment of the invention.

As shown in fig. 1, the comprehensive quality assessment method for multi-mode forgery generation data of this embodiment includes operations S110 to S140.

In operation S110, an evaluation index corresponding to a modality that generates data, and an index value corresponding to the evaluation index are determined.

In operation S120, an index dimension map is constructed according to the evaluation index and an index value corresponding to the evaluation index, the index dimension map including a first region and a second region, the first region including the second region.

In operation S130, a composite quality score of the generated data is calculated from the areas of the first region and the second region.

In operation S140, training data for training the deep forgery detection model is screened based on the composite quality score of the generated data.

Firstly, the typical subjective and objective evaluation indexes of three modes of pictures, audio and video are subjected to carding and dividing, and the video can be regarded as a multi-frame picture, so that the video quality evaluation can be regarded as a picture mode, wherein the pictures of video key frames are usually selected for evaluation, and the evaluation indexes are basically the same as the pictures.

In an embodiment, the modalities of generating data include a picture modality and an audio modality; the evaluation index corresponding to the picture modality comprises at least one of a distance score FID, a structural similarity SSIM, a peak signal to noise ratio PSNR, a nuclear perception distance KID, an image perception similarity LPIPS, and a mean opinion score MOS. The evaluation index corresponding to the audio mode comprises at least one of mel-frequency cepstrum distortion MCD, MOS, audio perception quality evaluation PEAQ. The respective mode evaluation indexes are shown in table 1 below.

TABLE 1

Data modality	Evaluation index
		Picture data	FID、SSIM、PSNR、KID、LPIPS、MOS
Audio data	MCD、MOS、PEAQ
		Video data	FID、SSIM、PSNR、KID、LPIPS、MOS

The FID extracts the feature vectors of the target picture and the synthesized picture through the acceptance v3 network, and calculates the Frechet distance between the two vectors, wherein the smaller the FID value is, the closer the distribution of the two pictures is, and the better the generation quality is.

SSIM is a picture similarity measurement method based on human eye perception, wherein a target picture and a generated picture are divided into a plurality of small blocks, the brightness, contrast and structural similarity among the small blocks are compared, and finally the similarity of all the small blocks is averaged to obtain a similarity value of the whole picture. The larger the value, the closer the two pictures are, the better the quality, and the SSIM (x, y) calculation formula is as follows

Wherein x and y respectively represent a generated picture and a target picture, mu _x 、μ _y ，σ _x ，σ _y ，σ _xy Respectively represents x-means, y-means,x standard deviation, y standard deviation, covariance of x and y.

PSNR is an indicator of picture or video quality. It is generally used to compare the similarity between two pictures or video sequences, the similarity between them being evaluated by calculating the Mean Square Error (MSE) between the two pictures, the smaller MSE representing the higher the similarity between the two pictures, the formula of calculation being

Wherein MAX _I For maximum value of input picture (usually 2 ⁿ -1, n is the number of sample value bits, max=255 for an 8-bit image), MSE is the mean square error of the input two pictures, i.e. the squared average of the L2 distance is calculated point by point, as follows:

KID (Kernel Inception Distance) is an index for evaluating the quality of a generated model, in particular for generating a generated image evaluation of a countermeasure network (GAN). KID is a measure of the difference between the generated image and the real image based on the feature representation of the acceptance model. The method uses an acceptance model as a feature extractor to extract the representation of the generated image and the real image in a high-level feature space. The KID score is then obtained by computing the distance of the kernel matrix between the generated image and the representation of the real image features. The lower the KID score, the more similar the feature representations representing the generated and actual images, and the higher the quality of the generated model. Conversely, a higher KID score indicates a larger difference in characteristic representation between the generated image and the actual image, and the quality of the generated model is lower.

MOS (Mean Opinion Score) is a commonly used subjective assessment for measuring the average score of a human being for a certain perceived quality. MOS is commonly used to evaluate perceived quality in the fields of audio, video, voice communications, etc., including sound quality, image quality, voice clarity, etc. MOS evaluation is performed by collecting subjective scores of a set of reviewers or subjects. A reviewer or subject is required to score a given perceived experience, typically using a predefined scoring range, such as 1 to 5 or 1 to 10. Each reviewer gives its own score for the perceived experience, which is aggregated and an average score, MOS, is calculated.

LPIPS is an evaluation index based on a perceived distance, the principle of the evaluation index is similar to that of FID, feature vectors of a target picture and a synthesized picture are extracted through a pretrained convolutional neural network (such as VGG or AlexNet), euclidean distance or cosine similarity between the feature vectors is calculated, the distance can be regarded as the perceived distance between two pictures, the value of LPIPS ranges from 0 to 1, the smaller the value is, the smaller the perceived distance between the two pictures is, and the higher the similarity between the pictures is.

PEAQ (Perceptual Evaluation of Audio Quality) is an algorithm for objectively evaluating the audio quality for measuring the subjective perceptual quality of an audio signal. It is an evaluation method defined in the international standard ITU-R bs.1387. PEAQ calculates the subjective quality score of audio by comparing the difference between the original audio and the compressed or otherwise processed audio. The algorithm is based on the principle of human auditory perception, and considers a series of auditory factors including time domain features, frequency domain features, sound coding features and the like of the audio. The PEAQ algorithm calculates the feature differences for each frame by dividing the audio signal into a series of frames and calculates the subjective quality score of the audio based on these differences. The score typically ranges from 1 (worst) to 5 (best), representing the quality level of the audio.

In the audio field, MCD (Mel Cepstral Distortion) is a commonly used objective evaluation index for measuring the spectral distortion between two audios. MCD is mainly used in speech synthesis and speech conversion tasks for evaluating the difference between the generated audio and the target audio. The MCD is calculated by first converting the original audio and the generated audio into mel-frequency cepstral coefficients (Mel Cepstral Coefficients), and then calculating the euclidean distance between the two audio. The lower the MCD, the less spectral distortion between the generated audio and the target audio, and the better the quality.

Selecting proper indexes according to the generated data, taking the generated data as picture data as an example, selecting paired target pictures and generated pictures, selecting FID, LPIPS, SSIM, PSNR four evaluation indexes and respectively calculating. The corresponding index value can be obtained through the above process.

In an embodiment, before constructing the index dimension map according to the evaluation index and the index value corresponding to the evaluation index, the method includes: and normalizing the index value. Make the index value be in [0,1 ]]Wherein 1 represents the highest value of the current index, 0 represents the lowest value of the current index, and the values of the indexes such as FID and LPIPS are in [0,1 ]]In the present invention, 1-FID is used as index value corresponding to FID to perform mapping process, 1-LPIPS is used as index value corresponding to LPIPS to perform mapping process, and the formula is as follows: FID (FID) _normalized ＝1-FID，LPIPS _normalized ＝1-LPIPS。

For SSIM, its value range is [ _1,1]The mapping process can be performed on the related index, namely (SSIM+1)/2 is processed into a related index result value, and the formula is as follows: SSIM (secure Shell) _normalized ＝(SSIM+1)/2。

For PSNR, the value range is [0 ], PSNRmax]The mapping process is also performed as follows: PSNR (Power System noise ratio) _normalized ＝PSNR/PSNR _max 。

In an embodiment, the constructing the index dimension map according to the evaluation index and the index value corresponding to the evaluation index includes: selecting n evaluation indexes from the evaluation indexes corresponding to the modes of generating data, wherein n is a positive integer; if n=1, directly calculating the related index value and performing normalization processing; if n=2, the corresponding index is normalized, and then an average value is calculated; if n is greater than or equal to 3, the calculation is performed in the following manner: setting a central origin for the n evaluation indexes; constructing a positive n-sided polygon with the radial direction from the center origin point to the vertex being 1, wherein n is a positive integer; drawing index points on each radius according to the index values of the evaluation indexes, and calculating coordinates of the index points; based on all the index points, an irregular n-polygon is enclosed.

Fig. 2 schematically shows a dimension graph built from audio data evaluation indicators according to an embodiment of the invention. Fig. 3 schematically shows a dimension graph established from picture data evaluation indicators according to an embodiment of the present invention.

As shown in fig. 2 and 3, the area enclosed by the deformation of you is a first area, the first area is a regular polygon, the shadow part is a second area, and the second area is a polygon. After the dimension graph is established, the data quality displayed in the dimension graph is further calculated, the surrounding area of the second area is calculated according to the four vertex coordinates of the second area, and meanwhile the area of the first area is calculated. The irregular polygonal area may be calculated as follows S1-S3:

s1: the calculation is performed based on the coordinates of the polygon vertices, and the vertex coordinates (x 1, y 1), (x 2, y 2) … (xn, yn) are first solved based on the vertex-to-origin distances.

S2: and calculating a first area and a second area according to each vertex coordinate, wherein the operation codes are as follows:

s3: multiplying the ratio of the area of the second region to the area of the first region by a preset constant (constant A represents the set maximum score) to obtain the comprehensive quality score of the generated data, and calculating the following formula and the comprehensive quality score Q by the comprehensive data quality evaluation score _d The larger the value, the better the data quality, and conversely, the lower the data quality.

In an embodiment, the screening the training data for training the deep forgery detection model based on the composite quality score of the generated data includes: and screening the generated data with the comprehensive quality score in a preset range as the training data. The data classification is carried out according to the comprehensive quality scores, when the deep counterfeiting detection models such as XceptionNet, efficientNet are trained, the counterfeiting data with higher quality are used, so that the sample size of a training set can be effectively reduced, the training cost of the model is reduced, and a better practical application effect is achieved.

Based on the comprehensive quality evaluation method for the multi-mode forgery generated data, the invention also provides a comprehensive quality evaluation device for the multi-mode forgery generated data. The device will be described in detail below in connection with fig. 5.

Fig. 4 schematically shows a block diagram of a comprehensive quality assessment device for multi-mode forgery-generated data according to an embodiment of the present invention.

As shown in fig. 4, the wavelength routing apparatus 400 of this embodiment includes a determining module 410, a constructing module 420, a calculating module 430, and a screening module 440.

A determining module 410 is configured to determine an evaluation index corresponding to a modality that generates data, and an index value corresponding to the evaluation index. In an embodiment, the determining module 410 may be configured to perform the operation S110 described above, which is not described herein.

The construction module 420 is configured to construct an index dimension graph according to the evaluation index and an index value corresponding to the evaluation index, where the index dimension graph includes a first area and a second area, and the first area includes the second area. In an embodiment, the construction module 420 may be configured to perform the operation S120 described above, which is not described herein.

A calculation module 430, configured to calculate a composite quality score of the generated data according to the areas of the first region and the second region. In an embodiment, the computing module 430 may be configured to perform the operation S130 described above, which is not described herein.

A screening module 440 for screening training data for training the deep forgery detection model based on the composite quality score of the generated data. In an embodiment, the filtering module 440 may be configured to perform the operation S140 described above, which is not described herein.

According to an embodiment of the present invention, the modes of generating data include a picture mode, an audio mode, and a video mode;

the evaluation index corresponding to the audio mode comprises at least one of mel-frequency cepstrum distortion MCD, MOS, audio perception quality evaluation PEAQ.

According to an embodiment of the present invention, the constructing an index dimension graph according to the evaluation index and the index value corresponding to the evaluation index includes:

setting a central origin for the n evaluation indexes;

based on all the index points, an irregular n-polygon is enclosed.

According to an embodiment of the present invention, before constructing an index dimension graph according to the evaluation index and the index value corresponding to the evaluation index, the method includes:

and normalizing the index value.

According to an embodiment of the present invention, the normalizing the index value includes:

when the evaluation index includes an FID, mapping the 1-FID as an index value corresponding to the FID;

in the case where the evaluation index includes PSNR, PSNR/PSNR _max Mapping is performed as an index value corresponding to PSNR _max Represents the maximum value of PSNR;

according to an embodiment of the present invention, the calculating the composite quality score of the generated data based on the areas of the first region and the second region includes:

According to an embodiment of the present invention, the screening training data for training the deep forgery detection model based on the composite quality score of the generated data includes:

Any of the determining module 410, the constructing module 420, the calculating module 430, and the screening module 440 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules according to an embodiment of the present invention. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the invention, at least one of the determination module 410, the construction module 420, the calculation module 430, and the screening module 440 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable way of integrating or packaging the circuits, or in any one of or a suitable combination of any of the three implementations of software, hardware, and firmware. Alternatively, at least one of the determination module 410, the construction module 420, the calculation module 430, and the screening module 440 may be at least partially implemented as a computer program module, which when executed, may perform the corresponding functions.

As shown in fig. 5, an electronic device 500 according to an embodiment of the present invention includes a processor 501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. The processor 501 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 501 may also include on-board memory for caching purposes. The processor 501 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flow according to an embodiment of the invention.

In the RAM 503, various programs and data required for the operation of the electronic apparatus 500 are stored. The processor 501, ROM 502, and RAM 503 are connected to each other by a bus 504. The processor 501 performs various operations of the method flow according to an embodiment of the present invention by executing programs in the ROM 502 and/or the RAM 503. Note that the program may be stored in one or more memories other than the ROM 502 and the RAM 503. The processor 501 may also perform various operations of the method flow according to embodiments of the present invention by executing programs stored in the one or more memories.

According to an embodiment of the invention, the electronic device 500 may further comprise an input/output (I/O) interface 505, the input/output (I/O) interface 505 also being connected to the bus 504. The electronic device 500 may also include one or more of the following components connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 507 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.

The present invention also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present invention.

According to embodiments of the present invention, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the invention, the computer-readable storage medium may include ROM 502 and/or RAM 503 and/or one or more memories other than ROM 502 and RAM 503 described above.

Embodiments of the present invention also include a computer program product comprising a computer program containing program code for performing the method shown in the flowcharts. The program code means for causing a computer system to carry out the methods provided by embodiments of the present invention when the computer program product is run on the computer system.

The above-described functions defined in the system/apparatus of the embodiment of the present invention are performed when the computer program is executed by the processor 501. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, and/or installed from a removable medium 511 via the communication portion 509. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 509, and/or installed from the removable media 511. The above-described functions defined in the system of the embodiment of the present invention are performed when the computer program is executed by the processor 501. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.

According to embodiments of the present invention, program code for carrying out computer programs provided by embodiments of the present invention may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or in assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the invention can be combined in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the present invention. In particular, the features recited in the various embodiments of the invention can be combined and/or combined in various ways without departing from the spirit and teachings of the invention. All such combinations and/or combinations fall within the scope of the invention.

The embodiments of the present invention are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the invention, and such alternatives and modifications are intended to fall within the scope of the invention.

Claims

1. The comprehensive quality evaluation method for the multi-mode forgery generated data is characterized by comprising the following steps of:

and calculating the comprehensive quality score of the generated data according to the areas of the first area and the second area, wherein the comprehensive quality score of the generated data is used for screening training data for training a deep forgery detection model.

2. The comprehensive quality assessment method for multi-mode forgery-generated data according to claim 1, wherein the modes of generating data include a picture mode, an audio mode and a video mode;

3. The method for evaluating the comprehensive quality of data generated by multi-modal forgery according to claim 1, wherein constructing an index dimension graph according to the evaluation index and the index value corresponding to the evaluation index includes:

setting a central origin for the n evaluation indexes;

and enclosing an irregular n-polygon based on all the index points.

4. A method for evaluating the comprehensive quality of data generated by multi-modal forgery according to claim 1 or 3, wherein before the constructing an index dimension graph according to the evaluation index and the index value corresponding to the evaluation index, the method comprises:

and normalizing the index value.

5. The method for evaluating the comprehensive quality of multimode forgery-generated data according to claim 4, wherein the normalizing the index value includes:

in case the evaluation index includes PSNR, PSNR/PSNR _max Mapping is performed as an index value corresponding to PSNR _ma Represents the maximum value of PSNR.

6. The method of claim 1, wherein calculating a composite quality score for the generated data based on the areas of the first region and the second region comprises:

7. The method for comprehensive quality assessment of multimodal forgery-generated data according to claim 1, wherein the screening training data for training a deep forgery detection model based on the comprehensive quality score of the generated data comprises:

8. A multi-modal forgery-generated data-oriented comprehensive quality assessment device, characterized by comprising:

a determining module, configured to determine an evaluation index corresponding to a modality of the generated data, and an index value corresponding to the evaluation index;

and the calculation module is used for calculating the comprehensive quality score of the generated data according to the areas of the first area and the second area, and the comprehensive quality score of the generated data is used for screening training data for training the deep forgery detection model.

9. The multi-modal counterfeiting generated data-oriented comprehensive quality assessment device according to claim 8, wherein the generated data modalities include a picture modality, an audio modality, and a video modality;

10. The comprehensive quality assessment device for multi-modal spurious generation data according to claim 8, wherein the constructing an index dimension map according to the evaluation index and the index value corresponding to the evaluation index includes:

setting a central origin for the n evaluation indexes;

and enclosing an irregular n-polygon based on all the index points.